Course Objectives
- To explore the foundational principles of data analytics.
- To understand and apply statistical analysis methods.
- To discover patterns in data, analyze supervised and unsupervised models, and evaluate algorithm accuracy.
- To implement search methods and visualization techniques effectively.
Course Outcomes
Upon completion of this course, students will be able to:
- Understand and implement linear regression and logistic regression.
- Analyze and compare the functionality of various classifiers.
- Apply visualization techniques using different types of graphs.
- Use descriptive and predictive analytics for diverse datasets.
List of Experiments
1. Data Preprocessing
Perform the following tasks on a dataset:
- a) Handling Missing Values: Impute or remove missing values.
- b) Noise Detection and Removal: Identify and handle noisy data.
- c) Identifying Data Redundancy and Elimination: Remove duplicate or redundant data.
Example Code Snippet:
import pandas as pd
# Load dataset
data = pd.read_csv("data.csv")
# Handle missing values
data.fillna(data.mean(), inplace=True)
# Remove duplicates
data.drop_duplicates(inplace=True)
# Noise removal (example: removing outliers)
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
data = data[~((data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))).any(axis=1)]2. Implement Any One Imputation Model
Use mean, median, or KNN imputation to handle missing values.
Example Code Snippet:
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=2)
data_imputed = imputer.fit_transform(data)
print(data_imputed)3. Implement Linear Regression
Build a linear regression model to predict continuous values.
Example Code Snippet:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
X = data[['feature1', 'feature2']]
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")4. Implement Logistic Regression
Build a logistic regression model for binary classification.
Example Code Snippet:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")5. Implement Decision Tree Induction for Classification
Build a decision tree classifier and visualize the tree structure.
Example Code Snippet:
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
plt.figure(figsize=(12, 8))
plot_tree(model, filled=True, feature_names=['feature1', 'feature2'], class_names=['Class 0', 'Class 1'])
plt.show()6. Implement Random Forest Classifier
Build a random forest classifier and evaluate its performance.
Example Code Snippet:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")7. Implement ARIMA on Time Series Data
Apply ARIMA (AutoRegressive Integrated Moving Average) for time series forecasting.
Example Code Snippet:
from statsmodels.tsa.arima.model import ARIMA
# Fit ARIMA model
model = ARIMA(data['value'], order=(5, 1, 0))
model_fit = model.fit()
# Forecast
forecast = model_fit.forecast(steps=10)
print(forecast)8. Object Segmentation Using Hierarchical-Based Methods
Perform hierarchical clustering for object segmentation.
Example Code Snippet:
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt
cluster = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
labels = cluster.fit_predict(data)
plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='rainbow')
plt.show()9. Perform Visualization Techniques
Create various types of graphs (Bar, Column, Line, Scatter, 3D Cubes).
Example Code Snippet:
import matplotlib.pyplot as plt
import seaborn as sns
# Bar Chart
sns.barplot(x='category', y='value', data=data)
plt.show()
# Scatter Plot
plt.scatter(data['feature1'], data['feature2'])
plt.show()10. Perform Descriptive Analytics on Healthcare Data
Analyze healthcare data using descriptive statistics.
Example Code Snippet:
print(data.describe())
sns.boxplot(x='age', y='health_status', data=data)
plt.show()11. Perform Predictive Analytics on Product Sales Data
Predict future sales using regression or time series models.
Example Code Snippet:
# Use Linear Regression or ARIMA for prediction12. Apply Predictive Analytics for Weather Forecasting
Forecast weather conditions using time series or machine learning models.
Example Code Snippet:
# Use ARIMA or LSTM for weather forecastingTextbooks
- Student’s Handbook for Associate Analytics – II, III.
- Data Mining Concepts and Techniques by Han, Kamber (3rd Edition).
Reference Books
- Introduction to Data Mining by Tan, Steinbach, Kumar.
- Data Mining Analysis and Concepts by M. Zaki and W. Meira.
- Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey D Ullman.