Skip to main content

TP3 - Build a Prediction API with FastAPI

Practical Lab 90 min Intermediate

Objectives

By the end of this lab, you will be able to:

  • Load a serialized ML model from Module 2 into a FastAPI application
  • Define Pydantic schemas for request validation and response serialization
  • Implement a /predict endpoint that serves real-time predictions
  • Implement a /health endpoint for service monitoring
  • Add proper error handling for common failure scenarios
  • Test the API using uvicorn and the auto-generated Swagger UI

Prerequisites

  • Completed TP2 (you should have a serialized model file model_v1.joblib)
  • Python 3.10+ installed
  • Basic understanding of REST APIs (Module 3 concepts)
No model from TP2?

If you haven't completed TP2, run this script to create a sample model:

# create_sample_model.py
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import joblib

X, y = make_classification(
n_samples=1000, n_features=5, n_informative=4,
n_redundant=1, random_state=42,
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
joblib.dump(model, "models/model_v1.joblib")
print("Model saved to models/model_v1.joblib")

Architecture Overview


Step 1 — Project Setup

1.1 Create the Project Structure

mkdir -p fastapi-ml-api/app
mkdir -p fastapi-ml-api/models
cd fastapi-ml-api

1.2 Create a Virtual Environment

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

1.3 Install Dependencies

pip install fastapi uvicorn pydantic scikit-learn joblib numpy

Create requirements.txt:

fastapi>=0.100.0
uvicorn>=0.23.0
pydantic>=2.0.0
scikit-learn>=1.3.0
joblib>=1.3.0
numpy>=1.24.0

1.4 Copy Your Model

Copy the model file from TP2 into the models/ directory:

cp /path/to/tp2/model_v1.joblib models/model_v1.joblib

Step 2 — Define Pydantic Schemas

Create app/schemas.py:

from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional

class PredictionInput(BaseModel):
"""Input features for the ML model."""

age: int = Field(
...,
ge=18,
le=120,
description="Applicant age in years",
examples=[35],
)
income: float = Field(
...,
gt=0,
description="Annual income in USD",
examples=[55000.0],
)
credit_score: int = Field(
...,
ge=300,
le=850,
description="Credit score (FICO)",
examples=[720],
)
employment_years: float = Field(
...,
ge=0,
description="Years of employment",
examples=[8.5],
)
loan_amount: float = Field(
...,
gt=0,
description="Requested loan amount in USD",
examples=[25000.0],
)

class Config:
json_schema_extra = {
"example": {
"age": 35,
"income": 55000.0,
"credit_score": 720,
"employment_years": 8.5,
"loan_amount": 25000.0,
}
}


class PredictionOutput(BaseModel):
"""Prediction result from the ML model."""

prediction: str = Field(..., description="Predicted class label")
probability: float = Field(
...,
ge=0,
le=1,
description="Prediction confidence (0 to 1)",
)
model_version: str = Field(..., description="Version of the model used")
timestamp: datetime = Field(
default_factory=datetime.utcnow,
description="UTC timestamp of the prediction",
)


class HealthResponse(BaseModel):
"""Health check response."""

status: str = Field(..., description="Service status")
model_loaded: bool = Field(..., description="Whether the model is loaded")
model_version: str = Field(..., description="Current model version")
timestamp: str = Field(..., description="Current UTC time")


class ErrorResponse(BaseModel):
"""Standard error response."""

error_code: str = Field(..., description="Machine-readable error code")
message: str = Field(..., description="Human-readable error message")
details: Optional[list] = Field(None, description="Additional error details")
Why define schemas?
  1. Validation: FastAPI automatically rejects requests that don't match the schema
  2. Documentation: Swagger UI displays field descriptions, types, and constraints
  3. Serialization: Response data is automatically formatted to match the output schema

Step 3 — Create the ML Service

Create app/ml_service.py:

import joblib
import numpy as np
from pathlib import Path


class MLService:
"""Handles model loading and inference."""

def __init__(self):
self.model = None
self.model_version = "unknown"
self.feature_names = [
"age", "income", "credit_score",
"employment_years", "loan_amount",
]

def load_model(self, model_path: str) -> None:
"""Load a serialized model from disk."""
path = Path(model_path)
if not path.exists():
raise FileNotFoundError(
f"Model file not found: {model_path}"
)

self.model = joblib.load(path)
self.model_version = path.stem
print(f"[MLService] Model loaded: {self.model_version}")

def predict(self, features: dict) -> dict:
"""
Run inference on input features.
Returns prediction label and probability.
"""
if self.model is None:
raise RuntimeError("Model is not loaded")

feature_array = np.array([[
features["age"],
features["income"],
features["credit_score"],
features["employment_years"],
features["loan_amount"],
]])

prediction = self.model.predict(feature_array)[0]
probabilities = self.model.predict_proba(feature_array)[0]
confidence = float(max(probabilities))

label = "approved" if prediction == 1 else "denied"

return {
"prediction": label,
"probability": round(confidence, 4),
"model_version": self.model_version,
}

@property
def is_ready(self) -> bool:
return self.model is not None


# Singleton instance
ml_service = MLService()

Step 4 — Build the FastAPI Application

Create app/main.py:

from contextlib import asynccontextmanager
from datetime import datetime
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware

from app.schemas import (
PredictionInput,
PredictionOutput,
HealthResponse,
ErrorResponse,
)
from app.ml_service import ml_service


# --- Lifespan: load model at startup ---
@asynccontextmanager
async def lifespan(app: FastAPI):
try:
ml_service.load_model("models/model_v1.joblib")
except FileNotFoundError as e:
print(f"[WARNING] {e}. API will start in degraded mode.")
yield
print("[INFO] Shutting down API...")


# --- FastAPI App ---
app = FastAPI(
title="Loan Prediction API",
description="ML-powered loan approval prediction service built in TP3",
version="1.0.0",
lifespan=lifespan,
openapi_tags=[
{
"name": "Predictions",
"description": "Submit features and receive ML predictions",
},
{
"name": "System",
"description": "Health checks and service monitoring",
},
],
)

# --- CORS Middleware ---
app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:3000"],
allow_methods=["GET", "POST"],
allow_headers=["*"],
)


# --- Health Check ---
@app.get(
"/health",
response_model=HealthResponse,
tags=["System"],
summary="Check service health",
)
def health_check():
"""Returns the current health status of the API and model."""
return HealthResponse(
status="healthy" if ml_service.is_ready else "degraded",
model_loaded=ml_service.is_ready,
model_version=ml_service.model_version,
timestamp=datetime.utcnow().isoformat(),
)


# --- Prediction Endpoint ---
@app.post(
"/api/v1/predict",
response_model=PredictionOutput,
responses={
422: {"model": ErrorResponse, "description": "Validation error"},
503: {"model": ErrorResponse, "description": "Model not available"},
500: {"model": ErrorResponse, "description": "Prediction failed"},
},
tags=["Predictions"],
summary="Get a loan approval prediction",
)
def predict(input_data: PredictionInput):
"""
Submit loan application features and receive a prediction.

The model returns:
- **prediction**: "approved" or "denied"
- **probability**: confidence score between 0 and 1
- **model_version**: which model version produced the result
"""
# Check model availability
if not ml_service.is_ready:
raise HTTPException(
status_code=503,
detail="Model is not loaded. Service is in degraded mode.",
)

# Run prediction
try:
features = input_data.model_dump()
result = ml_service.predict(features)

return PredictionOutput(
prediction=result["prediction"],
probability=result["probability"],
model_version=result["model_version"],
timestamp=datetime.utcnow(),
)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Prediction failed: {str(e)}",
)


# --- Root ---
@app.get("/", tags=["System"])
def root():
"""API root — returns basic service information."""
return {
"service": "Loan Prediction API",
"version": "1.0.0",
"docs": "/docs",
"health": "/health",
}

Step 5 — Run and Test

5.1 Start the Server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

You should see:

[MLService] Model loaded: model_v1
INFO: Uvicorn running on http://0.0.0.0:8000
INFO: Started reloader process

5.2 Access Swagger Documentation

Open your browser and navigate to: http://localhost:8000/docs

You should see the interactive Swagger UI with:

  • Predictions tag → POST /api/v1/predict
  • System tag → GET /health, GET /

5.3 Test the Health Endpoint

curl http://localhost:8000/health

Expected response:

{
"status": "healthy",
"model_loaded": true,
"model_version": "model_v1",
"timestamp": "2026-02-23T14:30:00.000000"
}

5.4 Test the Prediction Endpoint

curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-d '{
"age": 35,
"income": 55000,
"credit_score": 720,
"employment_years": 8.5,
"loan_amount": 25000
}'

Expected response:

{
"prediction": "approved",
"probability": 0.87,
"model_version": "model_v1",
"timestamp": "2026-02-23T14:30:05.123456"
}

5.5 Test Validation Errors

Send invalid data to verify Pydantic validation:

curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-d '{
"age": -5,
"income": 55000,
"credit_score": 720,
"employment_years": 8,
"loan_amount": 25000
}'

Expected: 422 Unprocessable Entity with details about the age field.

curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-d '{
"age": 35,
"income": 55000
}'

Expected: 422 with details about missing required fields.


Step 6 — Test with Swagger UI

  1. Open http://localhost:8000/docs in your browser
  2. Click on POST /api/v1/predict
  3. Click Try it out
  4. The example JSON is pre-filled from your schema
  5. Click Execute
  6. Observe the response code, body, and headers
Use Swagger for rapid testing

During development, Swagger UI is faster than writing curl commands. Use it to:

  • Test different inputs quickly
  • See exact request/response formats
  • Verify error responses
  • Share API documentation with teammates

Step 7 — Final Project Structure

Your completed project should look like:

fastapi-ml-api/
├── app/
│ ├── __init__.py # empty
│ ├── main.py # FastAPI application
│ ├── schemas.py # Pydantic models
│ └── ml_service.py # Model loading & inference
├── models/
│ └── model_v1.joblib # Serialized ML model
├── requirements.txt
└── venv/

Verification Checklist

Before marking this lab as complete, verify:

  • uvicorn starts without errors
  • GET /health returns {"status": "healthy", "model_loaded": true}
  • POST /api/v1/predict with valid data returns a prediction
  • Invalid data (negative age, missing fields) returns 422
  • Swagger UI at /docs shows all endpoints with schemas
  • Response includes model_version and timestamp

Bonus Challenges

Challenge 1: Add a batch prediction endpoint

Add a POST /api/v1/predict/batch endpoint that accepts a list of inputs:

from typing import List

class BatchInput(BaseModel):
inputs: List[PredictionInput] = Field(..., min_length=1, max_length=50)

class BatchOutput(BaseModel):
predictions: List[PredictionOutput]
total: int

@app.post("/api/v1/predict/batch", response_model=BatchOutput, tags=["Predictions"])
def predict_batch(batch: BatchInput):
results = []
for item in batch.inputs:
features = item.model_dump()
result = ml_service.predict(features)
results.append(PredictionOutput(
prediction=result["prediction"],
probability=result["probability"],
model_version=result["model_version"],
))
return BatchOutput(predictions=results, total=len(results))
Challenge 2: Add request timing middleware
import time
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request

class TimingMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
start = time.perf_counter()
response = await call_next(request)
ms = (time.perf_counter() - start) * 1000
response.headers["X-Response-Time-Ms"] = f"{ms:.2f}"
return response

app.add_middleware(TimingMiddleware)
Challenge 3: Add API key authentication
from fastapi import Depends, Header, HTTPException

API_KEYS = {"sk_test_abc123", "sk_test_def456"}

async def verify_api_key(x_api_key: str = Header(...)):
if x_api_key not in API_KEYS:
raise HTTPException(status_code=401, detail="Invalid API key")
return x_api_key

@app.post("/api/v1/predict", dependencies=[Depends(verify_api_key)])
def predict(input_data: PredictionInput):
...

Test with:

curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-H "X-API-Key: sk_test_abc123" \
-d '{"age": 35, "income": 55000, "credit_score": 720, "employment_years": 8, "loan_amount": 25000}'

Common Issues

IssueSolution
ModuleNotFoundError: app.schemasMake sure app/__init__.py exists (can be empty)
FileNotFoundError: model_v1.joblibCheck that the model file is in models/ relative to where you run uvicorn
Port 8000 already in useUse --port 8001 or kill the existing process
Changes not reflectedEnsure --reload flag is set with uvicorn
422 errors on valid-looking dataCheck field types — Pydantic is strict (e.g., "35" is not an int)