Prerequisites

  1. A course on Database Management Systems (DBMS).
  2. Knowledge of probability and statistics.

Course Objectives

The objectives of this course are:

  1. Explore Fundamental Concepts: Investigate the core concepts of data analytics.
  2. Learn Statistical Analysis: Understand the principles and methods of statistical analysis.
  3. Discover Patterns and Models: Analyze supervised and unsupervised models, identify patterns, and estimate algorithm accuracy.
  4. Understand Search and Visualization: Learn various search methods and visualization techniques.

Course Outcomes

Upon completion of this course, students will be able to:

  1. Understand Business Impact: Recognize the role of data analytics in business decisions and strategy.
  2. Perform Data Analysis: Carry out statistical analysis and interpret results.
  3. Visualize Data: Implement standard data visualization techniques and formal inference procedures.
  4. Design Data Architecture: Develop robust data architectures for analysis.
  5. Understand Data Sources: Identify and manage various data sources such as sensors, signals, GPS, etc.

DA Assignment 2 DA Mid 1

Syllabus

UNIT - I: Data Management

  • Data Architecture:
    • Designing data architecture for analysis.
  • Data Sources:
    • Understanding sources like sensors, signals, GPS, etc.
  • Data Quality:
    • Handling noise, outliers, missing values, and duplicate data.
  • Data Processing:
    • Techniques for cleaning, transforming, and preparing data for analysis.

UNIT - II: Data Analytics

  • Introduction to Analytics:
    • Overview of data analytics and its tools/environments.
  • Applications of Modeling in Business:
    • Role of modeling in solving business problems.
  • Databases and Types of Data:
    • Understanding structured, semi-structured, and unstructured data.
  • Data Modeling Techniques:
    • Missing value imputation and other preprocessing methods.
  • Need for Business Modeling:
    • Importance of modeling in business decision-making.

UNIT - III: Regression and Logistic Regression

  • Regression:
    • Concepts and assumptions (BLUE properties).
    • Least Square Estimation.
    • Variable rationalization and model building.
  • Logistic Regression:
    • Model theory and fit statistics.
    • Model construction and applications in business domains.

UNIT - IV: Object Segmentation and Time Series Methods

  • Object Segmentation:
    • Supervised vs. Unsupervised Learning.
    • Tree-based models: Regression, classification, overfitting, pruning, and complexity.
    • Multiple decision trees.
  • Time Series Methods:
    • ARIMA models.
    • Measures of forecast accuracy.
    • STL approach for time series decomposition.
    • Feature extraction (e.g., height, average energy) for prediction.

UNIT - V: Data Visualization

  • Pixel-Oriented Techniques:
    • Visualizing data using pixel-based representations.
  • Geometric Projection Techniques:
    • Using geometric transformations for visualization.
  • Icon-Based Techniques:
    • Representing data using icons or symbols.
  • Hierarchical Techniques:
    • Visualizing hierarchical relationships in data.
  • Complex Data and Relations:
    • Techniques for visualizing complex datasets and relationships.

Textbooks

  1. “Student’s Handbook for Associate Analytics – II, III”

    • Publisher: Not specified.
  2. “Data Mining Concepts and Techniques”

    • Authors: Jiawei Han, Micheline Kamber
    • Edition: 3rd
    • Publisher: Morgan Kaufmann Publishers

Reference Books

  1. “Introduction to Data Mining”

    • Authors: Pang-Ning Tan, Michael Steinbach, Vipin Kumar
    • Publisher: Addison-Wesley, 2006
  2. “Data Mining Analysis and Concepts”

    • Authors: Mohammed J. Zaki, Wagner Meira Jr.
  3. “Mining of Massive Datasets”

  • Authors: Jure Leskovec (Stanford Univ.), Anand Rajaraman (Milliway Labs), Jeffrey D. Ullman (Stanford Univ.)