Everyone talks about machine learning, but how do you actually build one? Not in theory — in practice. With real code you can run on your machine right now.

In this tutorial, we'll build a complete ML classification model from scratch using Python and scikit-learn. We'll load a real dataset, prepare the data, train a model, evaluate its performance, and understand what's happening at each step.

Prerequisites: Basic Python knowledge. That's it. No math degree required.

Setting Up Your Environment

First, make sure you have Python 3.8+ installed. Then install the libraries we need:

# Create a virtual environment (recommended)
python -m venv ml_env
source ml_env/bin/activate  # On Windows: ml_env\Scripts\activate

# Install required packages
pip install scikit-learn pandas matplotlib numpy

The Dataset: Iris Flowers

We'll use the Iris dataset — the "hello world" of machine learning. It contains measurements of 150 iris flowers across 3 species. The goal: given the flower's measurements, predict which species it is.

Each flower has 4 features:

  • Sepal length (cm)
  • Sepal width (cm)
  • Petal length (cm)
  • Petal width (cm)

And 3 possible classes: Setosa, Versicolor, and Virginica.

Step 1: Loading the Data

from sklearn.datasets import load_iris
import pandas as pd

# Load the dataset
iris = load_iris()

# Create a DataFrame for easy viewing
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = [iris.target_names[i] for i in iris.target]

# Let's see what we're working with
print(f"Dataset shape: {df.shape}")
print(f"Species: {list(iris.target_names)}")
print(f"\nFirst 5 rows:")
print(df.head())

Output:

Dataset shape: (150, 5)
Species: ['setosa', 'versicolor', 'virginica']

First 5 rows:
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm) species
0                5.1               3.5                1.4               0.2  setosa
1                4.9               3.0                1.4               0.2  setosa
2                4.7               3.2                1.3               0.2  setosa
3                4.6               3.1                1.5               0.2  setosa
4                5.0               3.6                1.4               0.2  setosa

Step 2: Splitting the Data

This is the most important concept in ML: you must test your model on data it has never seen during training. Otherwise, you're just measuring memorization, not learning.

from sklearn.model_selection import train_test_split

# Features (X) and labels (y)
X = iris.data  # The 4 measurements
y = iris.target  # The species (0, 1, or 2)

# Split: 80% training, 20% testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2,      # 20% for testing
    random_state=42,     # Reproducible results
    stratify=y           # Keep class proportions equal
)

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

stratify=y ensures each species is equally represented in both sets. Without this, you might accidentally put all the Setosa flowers in training and none in testing.

Step 3: Training the Model

We'll use a Random Forest classifier — it's reliable, doesn't require much tuning, and works well for most problems. Think of it as a committee of decision trees that vote on the answer.

from sklearn.ensemble import RandomForestClassifier

# Create the model
model = RandomForestClassifier(
    n_estimators=100,    # Use 100 decision trees
    random_state=42
)

# Train it on our training data
model.fit(X_train, y_train)

print("Model trained successfully!")

That's it. Two lines to create and train a model. Under the hood, scikit-learn is building 100 decision trees, each looking at random subsets of the data and features, learning patterns that distinguish the three species.

Step 4: Evaluating Performance

Now the moment of truth — how well does our model predict species it hasn't seen?

from sklearn.metrics import accuracy_score, classification_report

# Make predictions on the test set
y_pred = model.predict(X_test)

# Overall accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2%}")

# Detailed breakdown per species
print("\nDetailed Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

Output:

Accuracy: 100.00%

Detailed Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00        10
   virginica       1.00      1.00      1.00        10

    accuracy                           1.00        30

100% accuracy! Before you celebrate — this is because the Iris dataset is relatively simple. Real-world problems rarely achieve perfect scores. But it validates that our pipeline works correctly.

Step 5: Understanding the Model

A trained model isn't a black box. We can ask it: which features mattered most?

import numpy as np

# Get feature importance scores
importances = model.feature_importances_

# Sort by importance
indices = np.argsort(importances)[::-1]

print("Feature importance ranking:")
for i, idx in enumerate(indices):
    print(f"  {i+1}. {iris.feature_names[idx]}: {importances[idx]:.4f}")

# Output:
# Feature importance ranking:
#   1. petal width (cm): 0.4414
#   2. petal length (cm): 0.4188
#   3. sepal length (cm): 0.1001
#   4. sepal width (cm): 0.0397

Interesting — petal measurements are far more important than sepal measurements for identifying iris species. The model discovered this pattern entirely on its own from the data.

Step 6: Making Real Predictions

Now let's use our trained model to classify a new flower:

# A new flower we just measured
new_flower = [[5.8, 2.7, 4.1, 1.0]]

# Predict its species
prediction = model.predict(new_flower)
probabilities = model.predict_proba(new_flower)

species_name = iris.target_names[prediction[0]]

print(f"Predicted species: {species_name}")
print(f"Confidence: {max(probabilities[0]):.1%}")
print(f"\nProbabilities for each species:")
for name, prob in zip(iris.target_names, probabilities[0]):
    bar = "█" * int(prob * 30)
    print(f"  {name:>12}: {prob:.1%} {bar}")

Output:

Predicted species: versicolor
Confidence: 94.0%

Probabilities for each species:
        setosa: 0.0% 
    versicolor: 94.0% ████████████████████████████
     virginica: 6.0% █

What You Just Built

Let's recap the complete ML pipeline you just implemented:

  1. Loaded a real dataset with features and labels
  2. Split it into training and test sets (critical for honest evaluation)
  3. Trained a Random Forest model on the training data
  4. Evaluated performance on unseen test data
  5. Inspected which features the model relies on
  6. Used the model to make predictions on new data

This exact pipeline — load, split, train, evaluate, predict — applies to virtually every supervised ML problem. The model and dataset change, but the structure stays the same.

Next Steps

  • Try different models: SVC, KNeighborsClassifier, GradientBoostingClassifier
  • Use a more complex dataset from Kaggle
  • Learn about cross-validation for more robust evaluation
  • Explore hyperparameter tuning with GridSearchCV
  • Move to deep learning with PyTorch when you're ready for neural networks