Prerequisites
- A course on Database Management Systems (DBMS).
- Knowledge of probability and statistics.
Course Objectives
The objectives of this course are:
- Explore Fundamental Concepts: Investigate the core concepts of data analytics.
- Learn Statistical Analysis: Understand the principles and methods of statistical analysis.
- Discover Patterns and Models: Analyze supervised and unsupervised models, identify patterns, and estimate algorithm accuracy.
- Understand Search and Visualization: Learn various search methods and visualization techniques.
Course Outcomes
Upon completion of this course, students will be able to:
- Understand Business Impact: Recognize the role of data analytics in business decisions and strategy.
- Perform Data Analysis: Carry out statistical analysis and interpret results.
- Visualize Data: Implement standard data visualization techniques and formal inference procedures.
- Design Data Architecture: Develop robust data architectures for analysis.
- Understand Data Sources: Identify and manage various data sources such as sensors, signals, GPS, etc.
Syllabus
UNIT - I: Data Management
- Data Architecture:
- Designing data architecture for analysis.
- Data Sources:
- Understanding sources like sensors, signals, GPS, etc.
- Data Quality:
- Handling noise, outliers, missing values, and duplicate data.
- Data Processing:
- Techniques for cleaning, transforming, and preparing data for analysis.
UNIT - II: Data Analytics
- Introduction to Analytics:
- Overview of data analytics and its tools/environments.
- Applications of Modeling in Business:
- Role of modeling in solving business problems.
- Databases and Types of Data:
- Understanding structured, semi-structured, and unstructured data.
- Data Modeling Techniques:
- Missing value imputation and other preprocessing methods.
- Need for Business Modeling:
- Importance of modeling in business decision-making.
UNIT - III: Regression and Logistic Regression
- Regression:
- Concepts and assumptions (BLUE properties).
- Least Square Estimation.
- Variable rationalization and model building.
- Logistic Regression:
- Model theory and fit statistics.
- Model construction and applications in business domains.
UNIT - IV: Object Segmentation and Time Series Methods
- Object Segmentation:
- Supervised vs. Unsupervised Learning.
- Tree-based models: Regression, classification, overfitting, pruning, and complexity.
- Multiple decision trees.
- Time Series Methods:
- ARIMA models.
- Measures of forecast accuracy.
- STL approach for time series decomposition.
- Feature extraction (e.g., height, average energy) for prediction.
UNIT - V: Data Visualization
- Pixel-Oriented Techniques:
- Visualizing data using pixel-based representations.
- Geometric Projection Techniques:
- Using geometric transformations for visualization.
- Icon-Based Techniques:
- Representing data using icons or symbols.
- Hierarchical Techniques:
- Visualizing hierarchical relationships in data.
- Complex Data and Relations:
- Techniques for visualizing complex datasets and relationships.
Textbooks
-
“Student’s Handbook for Associate Analytics – II, III”
- Publisher: Not specified.
-
“Data Mining Concepts and Techniques”
- Authors: Jiawei Han, Micheline Kamber
- Edition: 3rd
- Publisher: Morgan Kaufmann Publishers
Reference Books
-
“Introduction to Data Mining”
- Authors: Pang-Ning Tan, Michael Steinbach, Vipin Kumar
- Publisher: Addison-Wesley, 2006
-
“Data Mining Analysis and Concepts”
- Authors: Mohammed J. Zaki, Wagner Meira Jr.
-
“Mining of Massive Datasets”
- Authors: Jure Leskovec (Stanford Univ.), Anand Rajaraman (Milliway Labs), Jeffrey D. Ullman (Stanford Univ.)