Everyone talks about machine learning, but how do you actually build one? Not in theory — in practice. With real code you can run on your machine right now.
In this tutorial, we'll build a complete ML classification model from scratch using Python and scikit-learn. We'll load a real dataset, prepare the data, train a model, evaluate its performance, and understand what's happening at each step.
Prerequisites: Basic Python knowledge. That's it. No math degree required.
Setting Up Your Environment
First, make sure you have Python 3.8+ installed. Then install the libraries we need:
# Create a virtual environment (recommended)
python -m venv ml_env
source ml_env/bin/activate # On Windows: ml_env\Scripts\activate
# Install required packages
pip install scikit-learn pandas matplotlib numpy
The Dataset: Iris Flowers
We'll use the Iris dataset — the "hello world" of machine learning. It contains measurements of 150 iris flowers across 3 species. The goal: given the flower's measurements, predict which species it is.
Each flower has 4 features:
- Sepal length (cm)
- Sepal width (cm)
- Petal length (cm)
- Petal width (cm)
And 3 possible classes: Setosa, Versicolor, and Virginica.
Step 1: Loading the Data
from sklearn.datasets import load_iris
import pandas as pd
# Load the dataset
iris = load_iris()
# Create a DataFrame for easy viewing
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = [iris.target_names[i] for i in iris.target]
# Let's see what we're working with
print(f"Dataset shape: {df.shape}")
print(f"Species: {list(iris.target_names)}")
print(f"\nFirst 5 rows:")
print(df.head())
Output:
Dataset shape: (150, 5)
Species: ['setosa', 'versicolor', 'virginica']
First 5 rows:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
Step 2: Splitting the Data
This is the most important concept in ML: you must test your model on data it has never seen during training. Otherwise, you're just measuring memorization, not learning.
from sklearn.model_selection import train_test_split
# Features (X) and labels (y)
X = iris.data # The 4 measurements
y = iris.target # The species (0, 1, or 2)
# Split: 80% training, 20% testing
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2, # 20% for testing
random_state=42, # Reproducible results
stratify=y # Keep class proportions equal
)
print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")
stratify=y ensures each species is equally represented in both sets. Without this, you might accidentally put all the Setosa flowers in training and none in testing.
Step 3: Training the Model
We'll use a Random Forest classifier — it's reliable, doesn't require much tuning, and works well for most problems. Think of it as a committee of decision trees that vote on the answer.
from sklearn.ensemble import RandomForestClassifier
# Create the model
model = RandomForestClassifier(
n_estimators=100, # Use 100 decision trees
random_state=42
)
# Train it on our training data
model.fit(X_train, y_train)
print("Model trained successfully!")
That's it. Two lines to create and train a model. Under the hood, scikit-learn is building 100 decision trees, each looking at random subsets of the data and features, learning patterns that distinguish the three species.
Step 4: Evaluating Performance
Now the moment of truth — how well does our model predict species it hasn't seen?
from sklearn.metrics import accuracy_score, classification_report
# Make predictions on the test set
y_pred = model.predict(X_test)
# Overall accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2%}")
# Detailed breakdown per species
print("\nDetailed Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))
Output:
Accuracy: 100.00%
Detailed Report:
precision recall f1-score support
setosa 1.00 1.00 1.00 10
versicolor 1.00 1.00 1.00 10
virginica 1.00 1.00 1.00 10
accuracy 1.00 30
100% accuracy! Before you celebrate — this is because the Iris dataset is relatively simple. Real-world problems rarely achieve perfect scores. But it validates that our pipeline works correctly.
Step 5: Understanding the Model
A trained model isn't a black box. We can ask it: which features mattered most?
import numpy as np
# Get feature importance scores
importances = model.feature_importances_
# Sort by importance
indices = np.argsort(importances)[::-1]
print("Feature importance ranking:")
for i, idx in enumerate(indices):
print(f" {i+1}. {iris.feature_names[idx]}: {importances[idx]:.4f}")
# Output:
# Feature importance ranking:
# 1. petal width (cm): 0.4414
# 2. petal length (cm): 0.4188
# 3. sepal length (cm): 0.1001
# 4. sepal width (cm): 0.0397
Interesting — petal measurements are far more important than sepal measurements for identifying iris species. The model discovered this pattern entirely on its own from the data.
Step 6: Making Real Predictions
Now let's use our trained model to classify a new flower:
# A new flower we just measured
new_flower = [[5.8, 2.7, 4.1, 1.0]]
# Predict its species
prediction = model.predict(new_flower)
probabilities = model.predict_proba(new_flower)
species_name = iris.target_names[prediction[0]]
print(f"Predicted species: {species_name}")
print(f"Confidence: {max(probabilities[0]):.1%}")
print(f"\nProbabilities for each species:")
for name, prob in zip(iris.target_names, probabilities[0]):
bar = "█" * int(prob * 30)
print(f" {name:>12}: {prob:.1%} {bar}")
Output:
Predicted species: versicolor
Confidence: 94.0%
Probabilities for each species:
setosa: 0.0%
versicolor: 94.0% ████████████████████████████
virginica: 6.0% █
What You Just Built
Let's recap the complete ML pipeline you just implemented:
- Loaded a real dataset with features and labels
- Split it into training and test sets (critical for honest evaluation)
- Trained a Random Forest model on the training data
- Evaluated performance on unseen test data
- Inspected which features the model relies on
- Used the model to make predictions on new data
This exact pipeline — load, split, train, evaluate, predict — applies to virtually every supervised ML problem. The model and dataset change, but the structure stays the same.
Next Steps
- Try different models:
SVC,KNeighborsClassifier,GradientBoostingClassifier - Use a more complex dataset from Kaggle
- Learn about cross-validation for more robust evaluation
- Explore hyperparameter tuning with
GridSearchCV - Move to deep learning with PyTorch when you're ready for neural networks