- Module 1: Essential to R programming
- Module 2: Data Manipulation Techniques using R programming
- DATA SCIENCE
Module 1: Essential to R programming
Module 2: Data Manipulation Techniques using R programming
DATA SCIENCE
- An Introduction to R
- History of R
- Introduction to R
- The R environment ï¶ What is Statistical Programming?
- Why use a command line?
- Your first R session
- Introduction to the R language
- Starting and quitting R
- Recording your work
- Basic features of R
- Calculating with R
- Named storage
- Functions
- Exact or approximate?
- R is case-sensitive
- Listing the objects in the workspace
- Vectors
- Extracting elements from vectors
- Vector arithmetic
- Simple patterned vectors
- Missing values and other special values
- Character vectors
- Factors
- More on extracting elements from vectors
- Matrices and arrays
- Data frames
- Dates and times
- Starting and quitting R
- Import and Export data in R
- Importing data in to R
- CSV File
- Excel File
- Import data from text table
- SAS and SPSS datasets
- Exporting Data from R
- CSV File
- Text Table
- Excel File
- SAS dataset
- Importing data in to R
- Merge / Join
- Inner Join
- Left Join
- Right Join
- Full Join
- Anti Join
- Semi Join
- Programming statistical graphics
- High-level plots
- Bar charts and dot charts
- Pie charts
- Histograms
- Box plots
- Scatterplots
- QQ plots
- Density Plot
- Choosing a high-level graphic
- Low-level graphics functions
-
- The plotting region and margins
- Adding to plots
- Setting graphical parameters
-
- Programming with R
- Flow control
- The for() loop
- The if() statement
- The while() loop
- The repeat loop, and the break and next statements
- Apply
- Sapply
- Lapply
- Managing complexity through functions ï‚· What are functions?
- Scope of variables
- Flow control
- Data in R
- Modes and Classes
- Data Storage in R
- Testing for Modes and Classes
- Structure of R Objects
- Conversion of Objects
- Missing Values
- Working with Missing Values
- Reading and Writing Data
- Reading Vectors and Matrices
- Data Frames: read.table
- Comma- and Tab-Delimited Input Files
- Fixed-Width Input Files
- Extracting Data from R Objects
- Connections
- Reading Large Data Files
- Generating Data
- Sequences
- Random Numbers
- Permutations
- Random Permutations
- Enumerating All Permutations
- Working with Sequences v Spreadsheets
- The RODBC Package on Windows
- The gdata Package (All Platforms)
- Saving and Loading R Data Objects
- Working with Binary Files
- Writing R Objects to Files in ASCII Format
- The write Function
- The write.table function
- Reading Data from Other Programs
- Dates
- as.Date
- The chron Package
- POSIX Classes
- Working with Dates
- Time Intervals
- Time Sequences
- Current time
- Present date
- Factors
- Using Factors
- Numeric Factors v Manipulating Factors
- Creating Factors from Continuous Variables
- Subscripting
- Basics of Subscripting
- Numeric Subscripts
- Character Subscripts
- Logical Subscripts
- Subscripting Matrices and Arrays
- Specialized Functions for Matrices
- Lists
- Subscripting Data Frames
- Character Manipulation
- Basics of Character Data
- Displaying and Concatenating Character
- Working with Parts of Character Values
- Regular Expressions in R
- Basics of Regular Expressions
- Breaking Apart Character Values
- Using Regular Expressions in R
- Substitutions and Tagging
- Reshaping Data
- Modifying Data Frame Variables
- Recoding Variables
- The recode Function
- Reshaping Data Frames
- The reshape Package
- Combining Data Frames
- Data Manipulation
- Random Selection of rows and columns
- Summarization
- Sort, Arrange
- Group by
- Filter
- Missing Value and Outlier
- Identify Missing values
- Impute missing values
- Identify Outliers
- Capping outliers
- Introduction to Statistics:
- Types of Statistics
- Types of Data
- Descriptive Statistics
- Measures of Central Tendency
- Measures of Central Tendency – Usage Chart
- Measures of Dispersion / Variability
- Measures of Shape
- Application of Variance/Std Deviation
- Hypothesis Testing
- Applications of Hypothesis Testing (Called T Test or Z Test)
- Steps in Hypothesis Testing
- Anova (Analysis of Variance)
- What is Anova
- Anova Steps
- Simple One-Way Anova
- Simple Two-Way Anova With Multiple Variables
- Chi Square Tests
- What is Chi-Square
- Applications of Chi-Square
- Correlation
- Types of Correlation
- Properties of Correlation
- Methods of Calculating Correlation
- Steps to Calculate Correlation
- Regression Analysis
- What is Regression
- Types of Regression Analysis
- Properties of The Regression Line
- Validating the Model
- Regression Assumptions
- Data Transformation for Regression
- Dummy Variable Analysis
- Variable Selection Procedure for Regression
- Forward Selection Procedure
- Backward Elimination Procedure
- Stepwise Regression Method
- Logistic Regression
- Likelihood Profiling
- Assumption
- Variable Selection Method :- Woe And Iv
- Model Validation
- Model Performance
- Prediction
- Cluster Analysis
- What is cluster
- Application of clustering
- Types of clustering
- K Means
- Dendrogram
- Validation of Cluster
- Decision Tree
- What is decision Tree
- How decision tree works
- Cart
- Pruning
- Overfitting
- Underfitting
- Model validation
- Model performance
- Market Basket Analysis
- What is MBA
- Application of MBA
- Support
- Confidence
- Lift
- Rules
- Random Forest
- What is random forest
- Application of random forest
- Tune parameters
- How to tune parameters
- Model validation
- Model performance
- Support Vector Machine
- What is support vector machine
- Why to use SVM
- Hyperplane
- Kernel
- Cost
- Gamma
- Model validation
- Model performance
- Naïve bayes
- What is Naïve bayes
- Bayes theorem
- Conditional probability
- Prior probability
- Posterior probability
- Application of Naïve bayes
- Model validation
- Model performance
- ARIMA
- What is time series
- What is Arima
- Stationary
- Seasonality
- Trend
- How to find p,d,q
- What are p,d,q
- Find best model
- Forecasting
- GBM
- High-level plots