- Introduction to Data Science
- Basic Data Manipulation using R
- Machine Learning Techniques Using R Part-1
- Machine Learning Techniques Using R Part-2
- Machine Learning Techniques Using R Part-3
- Introduction to Hadoop Architecture
- Integrating R with Hadoop
- Mahout Introduction and Algorithm Implementation
- Additional Mahout Algorithms and Parallel Processing using R
1. Introduction to Data Science
- Introduction to Big Data
- Roles played by a Data Scientist
- Analyzing Big Data using Hadoop and R
- Methodologies used for analysis
- The Architecture and Methodologies used to solve the Big Data problems
2. Basic Data Manipulation using R
- Understanding vectors in R
- Reading Data, Combining Data
- Subsetting data
- Sorting data and some basic data generation functions
3. Machine Learning Techniques Using R Part-1
- Machine Learning Overview,
- ML Common Use Cases
- Understanding Supervised and Unsupervised Learning
- Techniques, Clustering
- Similarity Metrics
- Distance Measure Types: Euclidean, Cosine Measures, Creating predictive models
4. Machine Learning Techniques Using R Part-2
- Understanding K-Means Clustering
- Understanding TF-IDF and Cosine Similarity and their application to Vector Space Model
- Implementing Association rule mining in R
5. Machine Learning Techniques Using R Part-3
- Understanding Process flow of Supervised Learning Techniques
- Decision Tree Classifier
- How to build Decision trees
- Random Forest Classifier
- What is Random Forests
- Features of Random Forest
- Out of Box Error Estimate and Variable Importance
- Naive Bayes Classifier
6. Introduction to Hadoop Architecture
- Hadoop Architecture
- Common Hadoop commands
- MapReduce and Data loading techniques (Directly in R and in Hadoop using SQOOP, FLUME, and other Data Loading Techniques)
- Removing anomalies from the data
7. Integrating R with Hadoop
- Integrating R with Hadoop using RHadoop and RMR package
- Exploring RHIPE (R Hadoop Integrated Programming Environment)
- Writing MapReduce Jobs in R and executing them on Hadoop
8. Mahout Introduction and Algorithm Implementation
- Implementing Machine Learning Algorithms on larger Data Sets with Apache Mahout
9. Additional Mahout Algorithms and Parallel Processing using R
- Implementation of different Mahout algorithms
- Random Forest Classifier with parallel processing Library in R