Experiment No: 9

> python exp9.py
First 5 rows of the dataset:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal length (cm) 150 non-null float64
1 sepal width (cm) 150 non-null float64
2 petal length (cm) 150 non-null float64
3 petal width (cm) 150 non-null float64
4 species 150 non-null int64
dtypes: float64(4), int64(1)
memory usage: 6.0 KB
Training set size: 120 samples
Testing set size: 30 samples
Model Accuracy: 100.00%
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Classification Report:
precision recall f1-score support
setosa 1.00 1.00 1.00 10
versicolor 1.00 1.00 1.00 9
virginica 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Experiment No: 10
> python run.py
Data saved to 'play_tennis.csv'
Loaded dataset:
Outlook Temperature Humidity Wind PlayTennis
0 Sunny Hot High Weak No
1 Sunny Hot High Strong No
2 Overcast Hot High Weak Yes
3 Rain Mild High Weak Yes
4 Rain Cool Normal Weak Yes
------------------------------
Calculating Information Gain for each attribute:
Information Gain for 'Outlook': 0.2467
Information Gain for 'Temperature': 0.0292
Information Gain for 'Humidity': 0.1518
Information Gain for 'Wind': 0.0481
------------------------------
The best attribute to split on is 'Outlook' with an Information Gain of 0.2467.
Outlook,Temperature,Humidity,Wind,PlayTennis
Sunny,Hot,High,Weak,No
Sunny,Hot,High,Strong,No
Overcast,Hot,High,Weak,Yes
Rain,Mild,High,Weak,Yes
Rain,Cool,Normal,Weak,Yes
Rain,Cool,Normal,Strong,No
Overcast,Cool,Normal,Strong,Yes
Sunny,Mild,High,Weak,No
Sunny,Cool,Normal,Weak,Yes
Rain,Mild,Normal,Weak,Yes
Sunny,Mild,Normal,Strong,Yes
Overcast,Mild,High,Strong,Yes
Overcast,Hot,Normal,Weak,Yes
Rain,Mild,High,Strong,NoExperiment No: 11

> python exp11.py
First 5 rows of the dataset:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal length (cm) 150 non-null float64
1 sepal width (cm) 150 non-null float64
2 petal length (cm) 150 non-null float64
3 petal width (cm) 150 non-null float64
4 species 150 non-null int64
dtypes: float64(4), int64(1)
memory usage: 6.0 KB
Training set size: 112 samples
Testing set size: 38 samples
Model Accuracy: 100.00%
Confusion Matrix:
[[15 0 0]
[ 0 11 0]
[ 0 0 12]]
Classification Report:
precision recall f1-score support
setosa 1.00 1.00 1.00 15
versicolor 1.00 1.00 1.00 11
virginica 1.00 1.00 1.00 12
accuracy 1.00 38
macro avg 1.00 1.00 1.00 38
weighted avg 1.00 1.00 1.00 38
Experiment No: 12

> python exp12.py
First 5 rows of the dataset:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal length (cm) 150 non-null float64
1 sepal width (cm) 150 non-null float64
2 petal length (cm) 150 non-null float64
3 petal width (cm) 150 non-null float64
4 species 150 non-null int64
dtypes: float64(4), int64(1)
memory usage: 6.0 KB
Training set size: 112 samples
Testing set size: 38 samples
Model Accuracy (with k=3): 100.00%
Confusion Matrix:
[[15 0 0]
[ 0 11 0]
[ 0 0 12]]
Classification Report:
precision recall f1-score support
setosa 1.00 1.00 1.00 15
versicolor 1.00 1.00 1.00 11
virginica 1.00 1.00 1.00 12
accuracy 1.00 38
macro avg 1.00 1.00 1.00 38
weighted avg 1.00 1.00 1.00 38
Experiment No: 13

> python exp13.py
First 5 rows of the dataset:
sepal_length petal_length
0 5.1 1.4
1 4.9 1.4
2 4.7 1.3
3 4.6 1.5
4 5.0 1.4
------------------------------
Final cluster centroids for k=3:
[[6.83902439 5.67804878]
[5.00784314 1.49215686]
[5.87413793 4.39310345]]
------------------------------
Number of points in each cluster:
cluster
0 41
1 51
2 58
Name: count, dtype: int64
------------------------------
Inertia (Sum of squared distances): 53.81
Experiment No: 14

> python exp14.py
Shape of the dataset: (1000, 2)
First 5 data points:
[[-8.55503989 7.06461794]
[-6.13753182 -6.58081701]
[-6.32130028 -6.8041042 ]
[ 4.18051794 1.12332531]
[ 4.38028748 0.47002673]]
------------------------------
Number of clusters found: 4
------------------------------
Number of points assigned to each cluster:
Cluster 0: 250 points
Cluster 1: 250 points
Cluster 2: 250 points
Cluster 3: 250 points
Experiment No: 15

> python exp15.py
Shape of the dataset: (500, 2)
First 5 data points:
[[-5.73035386 -7.58328602]
[ 1.94299219 1.91887482]
[ 6.82968177 1.1648714 ]
[-2.90130578 7.55077118]
[ 5.84109276 1.56509431]]
------------------------------
Number of clusters found: 3
Indices of the medoids: [462 449 93]
------------------------------
Number of points in each cluster:
Cluster 0: 166 points
Cluster 1: 167 points
Cluster 2: 167 points
------------------------------
Total distance to medoids (cost): 613.31
Experiment No: 16

> python exp16.py
Shape of the dataset: (200, 2)
First 5 data points:
[[-1.02069027 0.10551754]
[ 0.9058265 0.45785751]
[ 0.61842175 0.75708632]
[ 1.22770701 -0.42518512]
[ 0.32935594 -0.20694568]]
------------------------------
Number of clusters found: 2
Outlier points (noise) found: 0
------------------------------
Number of points in each cluster (including outliers):
Cluster 0: 100
Cluster 1: 100