Logistic Regression is one of the most popular and beginner-friendly machine learning algorithms. Despite having the word “regression” in its name, it is actually used for classification — meaning it predicts which category something belongs to, not a number.
The most common use case is binary classification: predicting one of two outcomes, such as:
- Yes or No
- Spam or Not Spam
- Pass or Fail
- Buy insurance or Don’t buy insurance
A Real-World Example: Insurance Purchase Prediction
Let’s make this concrete. Imagine you work for an insurance company and want to predict: “Will this person buy insurance based on their age?”
Looking at your customer data, a clear pattern emerges:
- Young people (18–30) rarely buy insurance — they feel healthy and invincible
- Middle-aged people (40–55) start thinking about it
- Older people (60+) almost always buy insurance — they’re more aware of health risks
This is a perfect problem for Logistic Regression. You give it a person’s age, and it tells you: “There’s an 82% chance this person will buy insurance.”
How is Logistic Regression Different from Linear Regression?
This is the most common point of confusion for beginners, so let’s clear it up.
Linear Regression predicts a continuous number — for example, predicting someone’s salary based on years of experience. The output can be any value: 30,000, 75,500, 120,000, and so on.
Logistic Regression predicts a probability between 0 and 1 — for example, the probability that someone buys insurance. The output is always between 0% and 100%.
| Linear Regression | Logistic Regression | |
|---|---|---|
| Output | Any number (e.g. 52,000) | Probability between 0 and 1 |
| Used for | Predicting quantities | Predicting categories |
| Example | Predict house price | Predict if someone buys insurance |
| Decision | The number itself | If probability > 0.5 → Yes |
The Secret Ingredient: The Sigmoid Function
So how does Logistic Regression produce a probability? It uses a mathematical curve called the Sigmoid Function (also called the S-curve).
Here’s the intuition without the heavy math:
- Feed in any number (like a person’s age)
- The Sigmoid Function squashes that number into a value between 0 and 1
- That value becomes the predicted probability
Visually, the sigmoid curve looks like a stretched S:
- Ages on the left (young people) → curve stays near 0 → unlikely to buy
- Ages on the right (older people) → curve rises toward 1 → likely to buy
- The middle of the S is the decision boundary — the age at which the model is 50/50
This S-shape is exactly what makes Logistic Regression so well-suited to our insurance example. The transition from “probably won’t buy” to “probably will buy” is gradual and realistic — not a sudden cliff.
How Does the Model Make a Final Decision?
After the Sigmoid Function produces a probability, the model applies a simple rule called a decision threshold — usually set at 0.5:
If predicted probability ≥ 0.5 → Predict Yes (will buy insurance) If predicted probability < 0.5 → Predict No (won’t buy insurance)
So if a 58-year-old customer gets a probability score of 0.79, the model says: Yes, they will likely buy insurance.
You can also adjust this threshold depending on the problem. A hospital screening for a rare disease might lower it to 0.3 to catch more potential cases — accepting more false alarms to avoid missing real ones.
What Does “Training” the Model Mean?
When we train a Logistic Regression model, we show it hundreds or thousands of past examples — people whose ages we know, along with whether they actually bought insurance or not.
The model learns from these examples by adjusting its internal settings (called weights or coefficients) until it gets the predictions as right as possible. This learning process is driven by an algorithm called Gradient Descent, which iteratively nudges the weights in the direction that reduces prediction errors.
Once trained, the model has essentially learned: “For every extra year of age, the probability of buying insurance increases by roughly X amount.”
Key Concepts Summarized
| Term | Plain English Meaning |
|---|---|
| Logistic Regression | An algorithm that predicts the probability of a yes/no outcome |
| Sigmoid Function | The S-shaped curve that converts any number into a probability (0–1) |
| Decision Threshold | The cutoff (usually 0.5) that turns a probability into a final yes/no answer |
| Weights / Coefficients | Numbers the model learns that describe how much each feature matters |
| Binary Classification | Any problem with exactly two possible outcomes |
| Training | Showing the model past examples so it can learn patterns |
Why Use Logistic Regression?
✅ Simple and fast — trains in seconds even on large datasets
✅ Highly interpretable — you can actually understand why it made a prediction
✅ Outputs probabilities — not just yes/no, but how confident the model is
✅ Works well as a baseline — always a great first model to try before more complex ones
✅ Widely used in industry — from credit scoring to medical diagnosis to marketing
Key Limitations to Know
The main limitation of Logistic Regression is that it assumes a roughly linear relationship between the features and the outcome. In our insurance example, it assumes that each additional year of age increases the likelihood of buying insurance by a steady, consistent amount.
In reality, relationships can be more complex and non-linear — for example, perhaps people aged 35–45 specifically avoid insurance due to cost pressures, creating a dip in the middle. For those situations, more advanced algorithms like Decision Trees or Neural Networks capture the complexity better.
Logistic Regression: Binary¶
Would a person buy life insurance based on one variable (age) using logistic regression
This is binary logistic regression problem as there are only two possible outcomes (person buys insurance or doesn’t).
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
Data collection¶
data = {
"age": [
22, 25, 47, 52, 46, 56, 55, 60, 62, 61,
18, 28, 27, 29, 49, 55, 25, 58, 19, 18,
21, 26, 40, 45, 50, 54, 23, 27, 25, 19,
42, 44, 68, 30, 68
],
"bought_insurance": [
0, 0, 1, 0, 1, 1, 0, 1, 1, 1,
0, 0, 0, 0, 1, 1, 1, 1, 0, 0,
0, 0, 1, 1, 1, 1, 0, 0, 0, 0,
1, 1, 1, 0, 1
]
}
df = pd.DataFrame(data)
print(df)
age bought_insurance 0 22 0 1 25 0 2 47 1 3 52 0 4 46 1 5 56 1 6 55 0 7 60 1 8 62 1 9 61 1 10 18 0 11 28 0 12 27 0 13 29 0 14 49 1 15 55 1 16 25 1 17 58 1 18 19 0 19 18 0 20 21 0 21 26 0 22 40 1 23 45 1 24 50 1 25 54 1 26 23 0 27 27 0 28 25 0 29 19 0 30 42 1 31 44 1 32 68 1 33 30 0 34 68 1
EDA¶
print(df.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 35 entries, 0 to 34 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 age 35 non-null int64 1 bought_insurance 35 non-null int64 dtypes: int64(2) memory usage: 692.0 bytes None
print(df.describe())
age bought_insurance count 35.000000 35.000000 mean 39.828571 0.514286 std 16.243525 0.507093 min 18.000000 0.000000 25% 25.000000 0.000000 50% 42.000000 1.000000 75% 54.500000 1.000000 max 68.000000 1.000000
# Lets see how many people buy insurance based on age
plt.scatter(df['age'],
df['bought_insurance'],
marker='+',
color='green'
)
# Labels and title
plt.xlabel("Age")
plt.ylabel("Buy Insur(1) / Not Buy Insur(0)")
plt.title("Scatter showing relation between age and buying insurance")
plt.show()
Observation: Generally people over 40 buy insurance and younger people do not.¶
It follows a sigmoid function curve.
Split data into train and test¶
from sklearn.model_selection import train_test_split
X = df[['age']] # feature space: independant variable
y = df['bought_insurance'] # dependant variable, Label/Reponse variable
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=42)
# I want to see the shape of my training/testing dataset
print(X_train.shape)
print(X_test.shape)
(28, 1) (7, 1)
So 21 records are for training and 6 for testing. Not a huge dataset, but it would show proof of concept.
# Now I want to see the testing dataset: feature space
print(X_test)
age 26 23 13 29 24 50 21 26 15 55 29 19 19 18
Model training¶
from sklearn.linear_model import LogisticRegression
model = LogisticRegression() # This is modelling
model.fit(X_train, y_train) # This line of code would train your model
LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression()
Model Evaluation¶
Lets make a prediction on one data point. Then we see how the model performs on the test dataset
# Lets see if s/o of age=25 would buy insurance
age = 25
age_2Darray = np.array([[age]]) # Convert to 2D array
prediction = model.predict_proba(age_2Darray)
print(prediction) # 1st number is prob of not buying, 2nd number is prob of buying
probability = prediction[0, 1] # Probability of buying insurance
print(f"Probability of buying insurance for age {age}: {probability}")
predicted_value = model.predict(age_2Darray)
print(f"Prediction for a {age}-year-old: {predicted_value[0]}")
[[0.84755852 0.15244148]] Probability of buying insurance for age 25: 0.1524414794385959 Prediction for a 25-year-old: 0
print(X_test)
print("#############")
print(y_test)
age 26 23 13 29 24 50 21 26 15 55 29 19 19 18 ############# 26 0 13 0 24 1 21 0 15 1 29 0 19 0 Name: bought_insurance, dtype: int64
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
y_pred = model.predict(X_test)
print(f"y predicted: {y_pred}")
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
y predicted: [0 0 1 0 1 0 0] Accuracy: 1.0
print("Classification Report:\n", classification_report(y_test, y_pred))
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 5
1 1.00 1.00 1.00 2
accuracy 1.00 7
macro avg 1.00 1.00 1.00 7
weighted avg 1.00 1.00 1.00 7
# I want to see the confusion matrix: The diagonal elements should be high number.
import seaborn as sns
import matplotlib.pyplot as plt
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
Lets put the actual value and predicted values next to each other so we can see it better
# Create a dataFrame to compare actual with predicted values for better visualization
comparison_df = pd.DataFrame({
'Actual' : y_test,
'Predicted': y_pred}
)
print(comparison_df)
Actual Predicted 26 0 0 13 0 0 24 1 1 21 0 0 15 1 1 29 0 0 19 0 0
# show me the prediction probability of test dataset
print(model.predict_proba(X_test)) # 1st value is the probab of class 0, 2nd value is prob of class 1
[[0.87908048 0.12091952] [0.76481019 0.23518981] [0.16293194 0.83706806] [0.82941586 0.17058414] [0.09054462 0.90945538] [0.92553864 0.07446136] [0.93426826 0.06573174]]
Above, 1st number is the prob of it being 0, and the 2nd number is prob of it being 1.¶
Sigmoid curve¶
Lets generate sigmoid curve.
# Generate smooth age values for the sigmoid curve
age_range = np.linspace(df['age'].min(), df['age'].max(), 50).reshape(-1, 1)# create 50 ages
age_range = np.round(age_range,0)
print(f"age_range: {age_range.flatten()}")
# Predict probabilities of above artificially generated ages using the trained model
probabilities = model.predict_proba(age_range)[:, 1] # Probability of class 1, i.e. buying insurance
print(f"probabilities: {probabilities}")
# Plot the scatter plot
plt.scatter(df['age'], df['bought_insurance'], marker='+', color='green', label="Actual Data")
# Plot the sigmoid curve(probability curve)
plt.scatter(age_range, probabilities, color='blue', s=5, label="Sigmoid Curve")
# Labels and title
plt.xlabel("Age")
plt.ylabel("Probability of Buying Insurance")
plt.title("Logistic Regression Sigmoid Curve")
plt.legend()
plt.show()
age_range: [18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68.] probabilities: [0.06573174 0.07446136 0.08424578 0.09518368 0.10737518 0.12091952 0.13591221 0.15244148 0.17058414 0.19040096 0.21193167 0.23518981 0.26015777 0.28678244 0.31497187 0.34459343 0.37547384 0.4074014 0.44013055 0.47338862 0.50688454 0.54031876 0.57339396 0.60582546 0.63735079 0.69678956 0.72434993 0.75030292 0.77457283 0.79712163 0.81794507 0.83706806 0.85453959 0.87042762 0.88481418 0.89779096 0.90945538 0.91990738 0.92924672 0.93757093 0.94497379 0.95154418 0.95736543 0.96251489 0.96706379 0.97107726 0.9746145 0.97772906 0.98046915 0.98287802]
Lets calculate the if a 25 year old person is going to buy insur ?¶
age = 25
age = 25
age_2Darray = np.array([[age]]) # Convert to 2D array
prediction = model.predict_proba(age_2Darray)
print(prediction) # 1st number is prob of not buying, 2nd number is prob of buying
probability = prediction[0, 1] # Probability of buying insurance
print(f"Probability of buying insurance for age {age}: {probability}")
print(f"Probability of NOT buying insurance for age {age}: {1 - probability}")
predicted_value = model.predict(age_2Darray)
print(f"Prediction for a {age}-year-old: {predicted_value[0]}")
[[0.84755852 0.15244148]] Probability of buying insurance for age 25: 0.1524414794385959 Probability of NOT buying insurance for age 25: 0.8475585205614041 Prediction for a 25-year-old: 0
He would NOT buy insurance because his probability of not buying is 0.84
The image referenced contains the classic logistic (sigmoid) function transformed for a single predictor variable ($X$). The mathematical relationship between the probability $p(X)$ and the coefficients $\beta_0$ and $\beta_1$ is written as:
$$p(X) = \frac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}} = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}$$
- $p(X)$ is the probability that X belongs to a category
- $\beta_0$ and $\beta_1$ are the values that are found by logisticregression method above.
b0 = model.intercept_
b1 = model.coef_
print(b0)
print(b1)
[-5.06773102] [[0.13408608]]
import math
def logistic_function(z):
return math.exp(z) / (1 + math.exp(z))
z = b0 + b1 * age
y = logistic_function(z)
print(f"prob of NOT buying: {1-y}")
print(f"prob of buying: {y}")
prob of NOT buying: 0.8475585205614041 prob of buying: 0.1524414794385959
Good . This value matches with above.¶
