Construire des API avec FastAPI
Pourquoi FastAPI ?
FastAPI est un framework web Python moderne concu pour construire des API. Il a ete cree par Sebastian Ramirez en 2018 et est rapidement devenu le framework de reference pour servir des modeles ML en production.
Avantages cles
| Fonctionnalite | Description | Importance pour le ML |
|---|---|---|
| Support asynchrone | Construit sur Starlette, supporte async/await | Gerer de nombreuses requetes de prediction simultanees |
| Annotations de type | Utilise nativement les annotations de type Python | Code auto-documente, autocompletion IDE |
| Validation Pydantic | Validation automatique des requetes/reponses | Rejeter les entrees de modele invalides avant l'inference |
| Documentation auto-generee | Swagger UI + ReDoc directement integres | Les clients peuvent explorer et tester votre API instantanement |
| Haute performance | Un des frameworks Python les plus rapides | Faible latence pour les predictions en temps reel |
| Base sur les standards | Construit sur OpenAPI et JSON Schema | Integration facile avec n'importe quel client ou outil |
- WSGI (Web Server Gateway Interface) : Synchrone — une requete a la fois par worker (utilise par Flask)
- ASGI (Asynchronous Server Gateway Interface) : Asynchrone — gere de nombreuses requetes simultanees dans un seul worker (utilise par FastAPI)
Pour les API ML recevant de nombreuses requetes de prediction simultanees, ASGI ameliore significativement le debit.
Installation et configuration
Installation des dependances
pip install fastapi uvicorn pydantic
pip install scikit-learn joblib numpy pandas
Structure du projet
Un projet d'API ML bien organise suit cette structure :
ml-api/
├── app/
│ ├── __init__.py
│ ├── main.py # Point d'entree de l'application FastAPI
│ ├── models/
│ │ ├── __init__.py
│ │ └── schemas.py # Modeles Pydantic requete/reponse
│ ├── routers/
│ │ ├── __init__.py
│ │ └── predictions.py # Gestionnaires de routes de prediction
│ ├── services/
│ │ ├── __init__.py
│ │ └── ml_service.py # Logique de chargement et d'inference du modele
│ └── core/
│ ├── __init__.py
│ └── config.py # Parametres de configuration
├── models/
│ └── model_v1.joblib # Modele ML serialise
├── requirements.txt
└── README.md
Votre premiere application FastAPI
Exemple minimal
from fastapi import FastAPI
app = FastAPI(
title="ML Prediction API",
description="API for serving machine learning predictions",
version="1.0.0",
)
@app.get("/")
def root():
return {"message": "ML Prediction API is running"}
@app.get("/health")
def health_check():
return {"status": "healthy"}
Lancez l'application :
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Cela vous donne :
- API disponible a
http://localhost:8000 - Swagger UI a
http://localhost:8000/docs - ReDoc a
http://localhost:8000/redoc
Modeles Pydantic pour la validation requete/reponse
Pydantic est la fondation de la validation des donnees de FastAPI. Vous definissez des classes Python avec des annotations de type, et Pydantic valide automatiquement les donnees entrantes.
Definition des schemas d'entree
from pydantic import BaseModel, Field
from typing import Optional, List
from enum import Enum
class LoanPurpose(str, Enum):
home = "home"
car = "car"
education = "education"
personal = "personal"
class PredictionInput(BaseModel):
"""Input features for loan approval prediction."""
age: int = Field(
...,
ge=18,
le=120,
description="Applicant age in years",
example=35,
)
income: float = Field(
...,
gt=0,
description="Annual income in USD",
example=55000.0,
)
credit_score: int = Field(
...,
ge=300,
le=850,
description="Credit score (FICO)",
example=720,
)
employment_years: float = Field(
...,
ge=0,
description="Years of employment",
example=8.5,
)
loan_amount: float = Field(
...,
gt=0,
description="Requested loan amount in USD",
example=25000.0,
)
loan_purpose: LoanPurpose = Field(
...,
description="Purpose of the loan",
example="home",
)
class Config:
json_schema_extra = {
"example": {
"age": 35,
"income": 55000.0,
"credit_score": 720,
"employment_years": 8.5,
"loan_amount": 25000.0,
"loan_purpose": "home",
}
}
Definition des schemas de sortie
from datetime import datetime
class PredictionOutput(BaseModel):
"""Prediction result from the ML model."""
prediction: str = Field(..., description="Predicted class label")
probability: float = Field(
..., ge=0, le=1, description="Prediction confidence"
)
model_version: str = Field(..., description="Model version used")
timestamp: datetime = Field(
default_factory=datetime.utcnow,
description="Prediction timestamp",
)
class ErrorResponse(BaseModel):
"""Standard error response."""
error_code: str
message: str
details: Optional[List[str]] = None
| Contrainte | Utilisation | Exemple |
|---|---|---|
... (Ellipsis) | Champ obligatoire | Field(...) |
default= | Valeur par defaut | Field(default=0.5) |
ge=, gt= | Superieur a (ou egal) | Field(ge=0) |
le=, lt= | Inferieur a (ou egal) | Field(le=100) |
min_length= | Longueur minimale de chaine | Field(min_length=1) |
max_length= | Longueur maximale de chaine | Field(max_length=255) |
regex= | Correspondance de motif | Field(regex=r"^[a-z]+$") |
Chargement et service d'un modele ML
Le service ML
Creez une classe de service qui charge le modele une seule fois au demarrage et le reutilise pour chaque requete :
import joblib
import numpy as np
from pathlib import Path
class MLService:
"""Handles model loading and inference."""
def __init__(self):
self.model = None
self.model_version = "unknown"
self.feature_names = [
"age", "income", "credit_score",
"employment_years", "loan_amount",
]
def load_model(self, model_path: str):
"""Load a serialized model from disk."""
path = Path(model_path)
if not path.exists():
raise FileNotFoundError(f"Model not found: {model_path}")
self.model = joblib.load(path)
self.model_version = path.stem
return self
def predict(self, features: dict) -> dict:
"""Run inference on input features."""
if self.model is None:
raise RuntimeError("Model not loaded")
feature_array = np.array([[
features["age"],
features["income"],
features["credit_score"],
features["employment_years"],
features["loan_amount"],
]])
prediction = self.model.predict(feature_array)[0]
probabilities = self.model.predict_proba(feature_array)[0]
return {
"prediction": "approved" if prediction == 1 else "denied",
"probability": float(max(probabilities)),
"model_version": self.model_version,
}
ml_service = MLService()
Integration dans FastAPI avec les evenements Lifespan
from contextlib import asynccontextmanager
from fastapi import FastAPI
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Load model at startup, clean up at shutdown."""
ml_service.load_model("models/model_v1.joblib")
print(f"Model loaded: {ml_service.model_version}")
yield
print("Shutting down, releasing resources...")
app = FastAPI(
title="ML Prediction API",
version="1.0.0",
lifespan=lifespan,
)
Ne chargez jamais le modele a l'interieur d'un gestionnaire de requete. Deserialiser un modele depuis le disque a chaque requete ajoute une latence massive. Utilisez l'evenement lifespan ou un singleton global pour le charger une seule fois au demarrage.
Creation de l'endpoint de prediction
Route de prediction complete
from fastapi import FastAPI, HTTPException
from datetime import datetime
@app.post(
"/api/v1/predict",
response_model=PredictionOutput,
summary="Get a loan approval prediction",
tags=["Predictions"],
)
def predict(input_data: PredictionInput):
"""
Submit loan application features and receive
an approval/denial prediction with confidence score.
"""
try:
features = input_data.model_dump(exclude={"loan_purpose"})
result = ml_service.predict(features)
return PredictionOutput(
prediction=result["prediction"],
probability=result["probability"],
model_version=result["model_version"],
timestamp=datetime.utcnow(),
)
except RuntimeError as e:
raise HTTPException(
status_code=503,
detail=f"Model not available: {str(e)}",
)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Prediction failed: {str(e)}",
)
Endpoint de verification de sante
@app.get("/health", tags=["System"])
def health_check():
"""Check if the API and model are ready."""
model_loaded = ml_service.model is not None
return {
"status": "healthy" if model_loaded else "degraded",
"model_loaded": model_loaded,
"model_version": ml_service.model_version,
"timestamp": datetime.utcnow().isoformat(),
}
Injection de dependances
Le systeme d'injection de dependances de FastAPI vous permet de partager de la logique entre les endpoints de maniere propre. C'est utile pour l'authentification, les connexions a la base de donnees ou l'acces au modele.
from fastapi import Depends, Header, HTTPException
async def verify_api_key(x_api_key: str = Header(...)):
"""Validate the API key from request headers."""
valid_keys = {"sk_live_abc123", "sk_live_def456"}
if x_api_key not in valid_keys:
raise HTTPException(
status_code=401,
detail="Invalid API key",
)
return x_api_key
@app.post("/api/v1/predict", dependencies=[Depends(verify_api_key)])
def predict(input_data: PredictionInput):
# Only reached if API key is valid
...
Gestion des erreurs
Gestionnaires d'exceptions personnalises
from fastapi import Request
from fastapi.responses import JSONResponse
class ModelNotLoadedError(Exception):
pass
class PredictionError(Exception):
def __init__(self, detail: str):
self.detail = detail
@app.exception_handler(ModelNotLoadedError)
async def model_not_loaded_handler(request: Request, exc: ModelNotLoadedError):
return JSONResponse(
status_code=503,
content={
"error_code": "MODEL_NOT_LOADED",
"message": "The ML model is not available. Please try again later.",
},
)
@app.exception_handler(PredictionError)
async def prediction_error_handler(request: Request, exc: PredictionError):
return JSONResponse(
status_code=500,
content={
"error_code": "PREDICTION_FAILED",
"message": exc.detail,
},
)
Middleware
Le middleware s'execute avant chaque requete et apres chaque reponse. C'est parfait pour la journalisation, le chronometrage et l'ajout d'en-tetes.
Middleware de chronometrage des requetes
import time
from starlette.middleware.base import BaseHTTPMiddleware
class TimingMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
start_time = time.perf_counter()
response = await call_next(request)
duration_ms = (time.perf_counter() - start_time) * 1000
response.headers["X-Response-Time-Ms"] = f"{duration_ms:.2f}"
return response
app.add_middleware(TimingMiddleware)
Middleware CORS
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=[
"http://localhost:3000",
"https://myapp.example.com",
],
allow_credentials=True,
allow_methods=["GET", "POST"],
allow_headers=["*"],
)
Endpoints asynchrones vs synchrones
FastAPI supporte les endpoints synchrones et asynchrones. Le choix depend de ce que fait votre endpoint.
| Scenario | Utiliser | Pourquoi |
|---|---|---|
| Inference ML (CPU-bound) | def (sync) | scikit-learn n'est pas async ; FastAPI l'execute dans un pool de threads |
| Requetes base de donnees (I/O-bound) | async def | I/O non bloquant, meilleure concurrence |
| Operations fichiers | async def avec aiofiles | Ne bloque pas la boucle d'evenements |
| Appels API externes | async def avec httpx | Requetes HTTP concurrentes |
# Sync — FastAPI l'execute dans un pool de threads automatiquement
@app.post("/api/v1/predict")
def predict_sync(input_data: PredictionInput):
result = ml_service.predict(input_data.model_dump())
return result
# Async — s'execute sur la boucle d'evenements, ne faites pas de travail lourd CPU ici
@app.post("/api/v1/predict-async")
async def predict_async(input_data: PredictionInput):
import asyncio
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(
None, ml_service.predict, input_data.model_dump()
)
return result
Si vous definissez async def mais appelez ensuite une fonction bloquante (comme joblib.load() ou model.predict()), vous allez bloquer la boucle d'evenements et geler toutes les autres requetes. Utilisez def (sync) pour l'inference ML CPU-bound, ou executez-la explicitement dans un executor.
Endpoint de telechargement de fichiers
Pour les modeles qui traitent des images, de l'audio ou des documents, vous avez besoin du support de telechargement de fichiers.
from fastapi import UploadFile, File
import io
from PIL import Image
@app.post("/api/v1/predict/image", tags=["Predictions"])
async def predict_image(
file: UploadFile = File(..., description="Image file for classification"),
):
if file.content_type not in ["image/jpeg", "image/png"]:
raise HTTPException(
status_code=400,
detail="Only JPEG and PNG images are supported",
)
contents = await file.read()
image = Image.open(io.BytesIO(contents))
# Preprocess and predict (simplified)
result = image_model.predict(image)
return {
"filename": file.filename,
"prediction": result["label"],
"confidence": result["confidence"],
}
Endpoint de prediction par lot
Pour plus d'efficacite, permettez aux clients de soumettre plusieurs entrees dans une seule requete.
from typing import List
class BatchInput(BaseModel):
inputs: List[PredictionInput] = Field(
..., min_length=1, max_length=100,
description="List of prediction inputs (max 100)",
)
class BatchOutput(BaseModel):
predictions: List[PredictionOutput]
total: int
processing_time_ms: float
@app.post("/api/v1/predict/batch", response_model=BatchOutput, tags=["Predictions"])
def predict_batch(batch: BatchInput):
start = time.perf_counter()
results = []
for item in batch.inputs:
features = item.model_dump(exclude={"loan_purpose"})
result = ml_service.predict(features)
results.append(PredictionOutput(
prediction=result["prediction"],
probability=result["probability"],
model_version=result["model_version"],
))
duration = (time.perf_counter() - start) * 1000
return BatchOutput(
predictions=results,
total=len(results),
processing_time_ms=round(duration, 2),
)
Application complete — Tout assembler
from contextlib import asynccontextmanager
from datetime import datetime
from fastapi import FastAPI, HTTPException, Depends, Header
from fastapi.middleware.cors import CORSMiddleware
import time
import joblib
import numpy as np
from pydantic import BaseModel, Field
from typing import Optional
# --- Schemas ---
class PredictionInput(BaseModel):
age: int = Field(..., ge=18, le=120)
income: float = Field(..., gt=0)
credit_score: int = Field(..., ge=300, le=850)
employment_years: float = Field(..., ge=0)
loan_amount: float = Field(..., gt=0)
class PredictionOutput(BaseModel):
prediction: str
probability: float
model_version: str
timestamp: datetime = Field(default_factory=datetime.utcnow)
# --- ML Service ---
class MLService:
def __init__(self):
self.model = None
self.version = "unknown"
def load(self, path: str):
self.model = joblib.load(path)
self.version = "v1.0"
def predict(self, features: dict) -> dict:
arr = np.array([[
features["age"], features["income"],
features["credit_score"], features["employment_years"],
features["loan_amount"],
]])
pred = self.model.predict(arr)[0]
proba = self.model.predict_proba(arr)[0]
return {
"prediction": "approved" if pred == 1 else "denied",
"probability": float(max(proba)),
"model_version": self.version,
}
ml = MLService()
# --- Lifespan ---
@asynccontextmanager
async def lifespan(app: FastAPI):
ml.load("models/model_v1.joblib")
yield
# --- App ---
app = FastAPI(title="Loan Prediction API", version="1.0.0", lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:3000"],
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/health", tags=["System"])
def health():
return {"status": "healthy", "model": ml.version}
@app.post("/api/v1/predict", response_model=PredictionOutput, tags=["Predictions"])
def predict(data: PredictionInput):
try:
result = ml.predict(data.model_dump())
return PredictionOutput(**result)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Lancez-le :
uvicorn app.main:app --reload --port 8000
Testez-le :
curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-d '{"age": 35, "income": 55000, "credit_score": 720, "employment_years": 8, "loan_amount": 25000}'
Cycle de vie d'une requete FastAPI — Resume
Resume
| Sujet | Point cle |
|---|---|
| FastAPI | Framework Python moderne, rapide et type-safe pour les API |
| Pydantic | Validation automatique des entrees/sorties avec messages d'erreur clairs |
| Evenements lifespan | Charger le modele une fois au demarrage, pas par requete |
| Dependances | Logique reutilisable pour l'authentification, l'acces au modele, etc. |
| Middleware | Preoccupations transversales (CORS, chronometrage, journalisation) |
| Sync vs Async | Utiliser def pour l'inference ML CPU-bound |
| Telechargement de fichiers | UploadFile pour les API de prediction image/document |
| Predictions par lot | Traiter plusieurs entrees dans une seule requete |
Reference rapide FastAPI
| Action | Code |
|---|---|
| Creer l'application | app = FastAPI(title="...", version="...") |
| Endpoint GET | @app.get("/path") |
| Endpoint POST | @app.post("/path", response_model=Schema) |
| Lancer le serveur | uvicorn app.main:app --reload |
| Acceder a la doc | http://localhost:8000/docs |
| Valider l'entree | Definir une sous-classe de BaseModel |
| Ajouter un middleware | app.add_middleware(MiddlewareClass, ...) |
| Injection de dependances | @app.post("/", dependencies=[Depends(fn)]) |
| Lever une erreur HTTP | raise HTTPException(status_code=400, detail="...") |