This notebook builds a simple categorical Naive Bayes classifier to predict whether a customer will buy a product based on weather and discount status.
It encodes the categorical features, splits the data into training and test sets, trains CategoricalNB, and makes predictions on the test set.
Finally, it evaluates performance with accuracy and a confusion matrix plot.
Whether a customer will buy a product based on weather and discount status:¶
In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OrdinalEncoder
from sklearn.naive_bayes import CategoricalNB
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
1. load dataset¶
In [2]:
data = {
'Weather': ['Sunny', 'Rainy', 'Sunny', 'Sunny', 'Rainy', 'Rainy', 'Sunny', 'Rainy',
'Sunny', 'Sunny', 'Rainy', 'Rainy', 'Sunny', 'Rainy', 'Sunny', 'Rainy'],
'Discount': ['Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes',
'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes'],
'Buy': ['Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes',
'Yes', 'Yes', 'No', 'No', 'Yes', 'No', 'Yes', 'Yes']
}
df = pd.DataFrame(data)
df
Out[2]:
| Weather | Discount | Buy | |
|---|---|---|---|
| 0 | Sunny | Yes | Yes |
| 1 | Rainy | No | No |
| 2 | Sunny | No | Yes |
| 3 | Sunny | Yes | Yes |
| 4 | Rainy | Yes | No |
| 5 | Rainy | No | No |
| 6 | Sunny | No | Yes |
| 7 | Rainy | Yes | Yes |
| 8 | Sunny | Yes | Yes |
| 9 | Sunny | No | Yes |
| 10 | Rainy | No | No |
| 11 | Rainy | Yes | No |
| 12 | Sunny | Yes | Yes |
| 13 | Rainy | No | No |
| 14 | Sunny | No | Yes |
| 15 | Rainy | Yes | Yes |
In [ ]:
In [ ]:
Split features (X) and label (y)¶
In [3]:
X = df[['Weather', 'Discount']]
y = df['Buy']
In [ ]:
In [ ]:
2. Encode categorical text to integers¶
In [4]:
encoder = OrdinalEncoder()
X_encoded = encoder.fit_transform(X)
In [9]:
# Compare actual X and X_encoded
print(list(X.loc[0]), "---->", X_encoded[0])
print(list(X.loc[1]), "---->", X_encoded[1])
['Sunny', 'Yes'] ----> [1. 1.] ['Rainy', 'No'] ----> [0. 0.]
In [ ]:
In [ ]:
3. Split the data (larger test size for a better evaluation)¶
In [10]:
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.3, random_state=42)
In [ ]:
In [ ]:
4. Train the Categorical Naive Bayes model¶
In [11]:
model = CategoricalNB()
model.fit(X_train, y_train)
Out[11]:
CategoricalNB()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
CategoricalNB()
In [ ]:
In [ ]:
5. Make predictions¶
In [12]:
y_pred = model.predict(X_test)
In [ ]:
In [ ]:
6. Evaluate and print results¶
In [5]:
print(f"True Labels: {y_test.tolist()}")
print(f"Predictions: {y_pred.tolist()}")
print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100}%")
True Labels: ['Yes', 'No', 'No', 'Yes', 'No'] Predictions: ['Yes', 'No', 'No', 'Yes', 'No'] Accuracy: 100.0%
In [ ]:
In [13]:
# Calculate the Confusion Matrix
cm = confusion_matrix(y_test, y_pred, labels=model.classes_)
print("Confusion Matrix Array:")
print(cm)
# Plot the Confusion Matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=model.classes_)
disp.plot(cmap=plt.cm.Blues)
# Show the plot window
plt.title("Confusion Matrix for Buy Predictions")
plt.show()
Confusion Matrix Array: [[3 0] [0 2]]
In [ ]:
