TP3 - Build a Prediction API with FastAPI

Practical Lab 90 min Intermediate

Objectives

By the end of this lab, you will be able to:

Load a serialized ML model from Module 2 into a FastAPI application
Define Pydantic schemas for request validation and response serialization
Implement a /predict endpoint that serves real-time predictions
Implement a /health endpoint for service monitoring
Add proper error handling for common failure scenarios
Test the API using uvicorn and the auto-generated Swagger UI

Prerequisites

Completed TP2 (you should have a serialized model file model_v1.joblib)
Python 3.10+ installed
Basic understanding of REST APIs (Module 3 concepts)

No model from TP2?

If you haven't completed TP2, run this script to create a sample model:

# create_sample_model.py
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import joblib

X, y = make_classification(
    n_samples=1000, n_features=5, n_informative=4,
    n_redundant=1, random_state=42,
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
joblib.dump(model, "models/model_v1.joblib")
print("Model saved to models/model_v1.joblib")

Architecture Overview

Step 1 — Project Setup

1.1 Create the Project Structure

mkdir -p fastapi-ml-api/app
mkdir -p fastapi-ml-api/models
cd fastapi-ml-api

1.2 Create a Virtual Environment

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

1.3 Install Dependencies

pip install fastapi uvicorn pydantic scikit-learn joblib numpy

Create requirements.txt:

fastapi>=0.100.0
uvicorn>=0.23.0
pydantic>=2.0.0
scikit-learn>=1.3.0
joblib>=1.3.0
numpy>=1.24.0

1.4 Copy Your Model

Copy the model file from TP2 into the models/ directory:

cp /path/to/tp2/model_v1.joblib models/model_v1.joblib

Step 2 — Define Pydantic Schemas

Create app/schemas.py:

from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional

class PredictionInput(BaseModel):
    """Input features for the ML model."""

    age: int = Field(
        ...,
        ge=18,
        le=120,
        description="Applicant age in years",
        examples=[35],
    )
    income: float = Field(
        ...,
        gt=0,
        description="Annual income in USD",
        examples=[55000.0],
    )
    credit_score: int = Field(
        ...,
        ge=300,
        le=850,
        description="Credit score (FICO)",
        examples=[720],
    )
    employment_years: float = Field(
        ...,
        ge=0,
        description="Years of employment",
        examples=[8.5],
    )
    loan_amount: float = Field(
        ...,
        gt=0,
        description="Requested loan amount in USD",
        examples=[25000.0],
    )

    class Config:
        json_schema_extra = {
            "example": {
                "age": 35,
                "income": 55000.0,
                "credit_score": 720,
                "employment_years": 8.5,
                "loan_amount": 25000.0,
            }
        }


class PredictionOutput(BaseModel):
    """Prediction result from the ML model."""

    prediction: str = Field(..., description="Predicted class label")
    probability: float = Field(
        ...,
        ge=0,
        le=1,
        description="Prediction confidence (0 to 1)",
    )
    model_version: str = Field(..., description="Version of the model used")
    timestamp: datetime = Field(
        default_factory=datetime.utcnow,
        description="UTC timestamp of the prediction",
    )


class HealthResponse(BaseModel):
    """Health check response."""

    status: str = Field(..., description="Service status")
    model_loaded: bool = Field(..., description="Whether the model is loaded")
    model_version: str = Field(..., description="Current model version")
    timestamp: str = Field(..., description="Current UTC time")


class ErrorResponse(BaseModel):
    """Standard error response."""

    error_code: str = Field(..., description="Machine-readable error code")
    message: str = Field(..., description="Human-readable error message")
    details: Optional[list] = Field(None, description="Additional error details")

Why define schemas?

Validation: FastAPI automatically rejects requests that don't match the schema
Documentation: Swagger UI displays field descriptions, types, and constraints
Serialization: Response data is automatically formatted to match the output schema

Step 3 — Create the ML Service

Create app/ml_service.py:

import joblib
import numpy as np
from pathlib import Path


class MLService:
    """Handles model loading and inference."""

    def __init__(self):
        self.model = None
        self.model_version = "unknown"
        self.feature_names = [
            "age", "income", "credit_score",
            "employment_years", "loan_amount",
        ]

    def load_model(self, model_path: str) -> None:
        """Load a serialized model from disk."""
        path = Path(model_path)
        if not path.exists():
            raise FileNotFoundError(
                f"Model file not found: {model_path}"
            )

        self.model = joblib.load(path)
        self.model_version = path.stem
        print(f"[MLService] Model loaded: {self.model_version}")

    def predict(self, features: dict) -> dict:
        """
        Run inference on input features.
        Returns prediction label and probability.
        """
        if self.model is None:
            raise RuntimeError("Model is not loaded")

        feature_array = np.array([[
            features["age"],
            features["income"],
            features["credit_score"],
            features["employment_years"],
            features["loan_amount"],
        ]])

        prediction = self.model.predict(feature_array)[0]
        probabilities = self.model.predict_proba(feature_array)[0]
        confidence = float(max(probabilities))

        label = "approved" if prediction == 1 else "denied"

        return {
            "prediction": label,
            "probability": round(confidence, 4),
            "model_version": self.model_version,
        }

    @property
    def is_ready(self) -> bool:
        return self.model is not None


# Singleton instance
ml_service = MLService()

Step 4 — Build the FastAPI Application

Create app/main.py:

from contextlib import asynccontextmanager
from datetime import datetime
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware

from app.schemas import (
    PredictionInput,
    PredictionOutput,
    HealthResponse,
    ErrorResponse,
)
from app.ml_service import ml_service


# --- Lifespan: load model at startup ---
@asynccontextmanager
async def lifespan(app: FastAPI):
    try:
        ml_service.load_model("models/model_v1.joblib")
    except FileNotFoundError as e:
        print(f"[WARNING] {e}. API will start in degraded mode.")
    yield
    print("[INFO] Shutting down API...")


# --- FastAPI App ---
app = FastAPI(
    title="Loan Prediction API",
    description="ML-powered loan approval prediction service built in TP3",
    version="1.0.0",
    lifespan=lifespan,
    openapi_tags=[
        {
            "name": "Predictions",
            "description": "Submit features and receive ML predictions",
        },
        {
            "name": "System",
            "description": "Health checks and service monitoring",
        },
    ],
)

# --- CORS Middleware ---
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:3000"],
    allow_methods=["GET", "POST"],
    allow_headers=["*"],
)


# --- Health Check ---
@app.get(
    "/health",
    response_model=HealthResponse,
    tags=["System"],
    summary="Check service health",
)
def health_check():
    """Returns the current health status of the API and model."""
    return HealthResponse(
        status="healthy" if ml_service.is_ready else "degraded",
        model_loaded=ml_service.is_ready,
        model_version=ml_service.model_version,
        timestamp=datetime.utcnow().isoformat(),
    )


# --- Prediction Endpoint ---
@app.post(
    "/api/v1/predict",
    response_model=PredictionOutput,
    responses={
        422: {"model": ErrorResponse, "description": "Validation error"},
        503: {"model": ErrorResponse, "description": "Model not available"},
        500: {"model": ErrorResponse, "description": "Prediction failed"},
    },
    tags=["Predictions"],
    summary="Get a loan approval prediction",
)
def predict(input_data: PredictionInput):
    """
    Submit loan application features and receive a prediction.

    The model returns:
    - **prediction**: "approved" or "denied"
    - **probability**: confidence score between 0 and 1
    - **model_version**: which model version produced the result
    """
    # Check model availability
    if not ml_service.is_ready:
        raise HTTPException(
            status_code=503,
            detail="Model is not loaded. Service is in degraded mode.",
        )

    # Run prediction
    try:
        features = input_data.model_dump()
        result = ml_service.predict(features)

        return PredictionOutput(
            prediction=result["prediction"],
            probability=result["probability"],
            model_version=result["model_version"],
            timestamp=datetime.utcnow(),
        )
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Prediction failed: {str(e)}",
        )


# --- Root ---
@app.get("/", tags=["System"])
def root():
    """API root — returns basic service information."""
    return {
        "service": "Loan Prediction API",
        "version": "1.0.0",
        "docs": "/docs",
        "health": "/health",
    }

Step 5 — Run and Test

5.1 Start the Server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

You should see:

[MLService] Model loaded: model_v1
INFO:     Uvicorn running on http://0.0.0.0:8000
INFO:     Started reloader process

5.2 Access Swagger Documentation

Open your browser and navigate to: http://localhost:8000/docs

You should see the interactive Swagger UI with:

Predictions tag → POST /api/v1/predict
System tag → GET /health, GET /

5.3 Test the Health Endpoint

curl http://localhost:8000/health

Expected response:

{
  "status": "healthy",
  "model_loaded": true,
  "model_version": "model_v1",
  "timestamp": "2026-02-23T14:30:00.000000"
}

5.4 Test the Prediction Endpoint

curl -X POST http://localhost:8000/api/v1/predict \
  -H "Content-Type: application/json" \
  -d '{
    "age": 35,
    "income": 55000,
    "credit_score": 720,
    "employment_years": 8.5,
    "loan_amount": 25000
  }'

Expected response:

{
  "prediction": "approved",
  "probability": 0.87,
  "model_version": "model_v1",
  "timestamp": "2026-02-23T14:30:05.123456"
}

5.5 Test Validation Errors

Send invalid data to verify Pydantic validation:

curl -X POST http://localhost:8000/api/v1/predict \
  -H "Content-Type: application/json" \
  -d '{
    "age": -5,
    "income": 55000,
    "credit_score": 720,
    "employment_years": 8,
    "loan_amount": 25000
  }'

Expected: 422 Unprocessable Entity with details about the age field.

curl -X POST http://localhost:8000/api/v1/predict \
  -H "Content-Type: application/json" \
  -d '{
    "age": 35,
    "income": 55000
  }'

Expected: 422 with details about missing required fields.

Step 6 — Test with Swagger UI

Open http://localhost:8000/docs in your browser
Click on POST /api/v1/predict
Click Try it out
The example JSON is pre-filled from your schema
Click Execute
Observe the response code, body, and headers

Use Swagger for rapid testing

During development, Swagger UI is faster than writing curl commands. Use it to:

Test different inputs quickly
See exact request/response formats
Verify error responses
Share API documentation with teammates

Step 7 — Final Project Structure

Your completed project should look like:

fastapi-ml-api/
├── app/
│   ├── __init__.py          # empty
│   ├── main.py              # FastAPI application
│   ├── schemas.py           # Pydantic models
│   └── ml_service.py        # Model loading & inference
├── models/
│   └── model_v1.joblib      # Serialized ML model
├── requirements.txt
└── venv/

Verification Checklist

Before marking this lab as complete, verify:

uvicorn starts without errors
GET /health returns {"status": "healthy", "model_loaded": true}
POST /api/v1/predict with valid data returns a prediction
Invalid data (negative age, missing fields) returns 422
Swagger UI at /docs shows all endpoints with schemas
Response includes model_version and timestamp

Bonus Challenges

Challenge 1: Add a batch prediction endpoint

Add a POST /api/v1/predict/batch endpoint that accepts a list of inputs:

from typing import List

class BatchInput(BaseModel):
    inputs: List[PredictionInput] = Field(..., min_length=1, max_length=50)

class BatchOutput(BaseModel):
    predictions: List[PredictionOutput]
    total: int

@app.post("/api/v1/predict/batch", response_model=BatchOutput, tags=["Predictions"])
def predict_batch(batch: BatchInput):
    results = []
    for item in batch.inputs:
        features = item.model_dump()
        result = ml_service.predict(features)
        results.append(PredictionOutput(
            prediction=result["prediction"],
            probability=result["probability"],
            model_version=result["model_version"],
        ))
    return BatchOutput(predictions=results, total=len(results))

Challenge 2: Add request timing middleware

import time
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request

class TimingMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        start = time.perf_counter()
        response = await call_next(request)
        ms = (time.perf_counter() - start) * 1000
        response.headers["X-Response-Time-Ms"] = f"{ms:.2f}"
        return response

app.add_middleware(TimingMiddleware)

Challenge 3: Add API key authentication

from fastapi import Depends, Header, HTTPException

API_KEYS = {"sk_test_abc123", "sk_test_def456"}

async def verify_api_key(x_api_key: str = Header(...)):
    if x_api_key not in API_KEYS:
        raise HTTPException(status_code=401, detail="Invalid API key")
    return x_api_key

@app.post("/api/v1/predict", dependencies=[Depends(verify_api_key)])
def predict(input_data: PredictionInput):
    ...

Test with:

curl -X POST http://localhost:8000/api/v1/predict \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk_test_abc123" \
  -d '{"age": 35, "income": 55000, "credit_score": 720, "employment_years": 8, "loan_amount": 25000}'

Common Issues

Issue	Solution
`ModuleNotFoundError: app.schemas`	Make sure `app/__init__.py` exists (can be empty)
`FileNotFoundError: model_v1.joblib`	Check that the model file is in `models/` relative to where you run uvicorn
Port 8000 already in use	Use `--port 8001` or kill the existing process
Changes not reflected	Ensure `--reload` flag is set with uvicorn
422 errors on valid-looking data	Check field types — Pydantic is strict (e.g., `"35"` is not an `int`)

Objectives​

Prerequisites​

Architecture Overview​

Step 1 — Project Setup​

1.1 Create the Project Structure​

1.2 Create a Virtual Environment​

1.3 Install Dependencies​

1.4 Copy Your Model​

Step 2 — Define Pydantic Schemas​

Step 3 — Create the ML Service​

Step 4 — Build the FastAPI Application​

Step 5 — Run and Test​

5.1 Start the Server​

5.2 Access Swagger Documentation​

5.3 Test the Health Endpoint​

5.4 Test the Prediction Endpoint​

5.5 Test Validation Errors​

Step 6 — Test with Swagger UI​

Step 7 — Final Project Structure​

Verification Checklist​

Bonus Challenges​

Common Issues​