REST API Concepts for AI

Theory 45 min

What is an API?

An API (Application Programming Interface) is a contract that defines how two software components communicate with each other. In the context of AI deployment, an API is the bridge between your trained model and the outside world — applications, users, and other services that want to consume predictions.

The Restaurant Analogy

The most intuitive way to understand an API is to think of a restaurant:

Restaurant	API World
Customer	Client application (web app, mobile app, another service)
Menu	API documentation (available endpoints, expected inputs)
Order	HTTP request with input data (JSON payload)
Waiter	API server (receives requests, routes them, returns responses)
Kitchen	ML model (processes input, generates prediction)
Dish served	HTTP response with prediction results
Receipt	Response status code (200 OK, 400 Bad Request, etc.)

Key Insight

Just like a waiter doesn't need to know how to cook, an API doesn't need to expose the internal workings of your model. The client only needs to know what to send and what to expect back.

REST Architecture

REST (Representational State Transfer) is an architectural style for designing networked applications. A REST API follows a set of constraints that make it scalable, stateless, and easy to understand.

REST Principles

Principle	Description	AI API Example
Stateless	Each request contains all information needed to process it	Every prediction request includes the full input features
Client-Server	Separation between the consumer and the provider	Web app (client) is separate from the model server
Uniform Interface	Standard HTTP methods and URI conventions	`POST /api/v1/predict` for predictions
Resource-Based	Everything is a resource identified by a URI	`/models`, `/predictions`, `/health`
Cacheable	Responses can be cached when appropriate	Cache repeated predictions for identical inputs
Layered System	Client cannot tell if connected directly or via intermediary	Load balancer sits between client and API

REST API Architecture for AI

HTTP Methods

HTTP methods define the action you want to perform on a resource. For AI APIs, some methods are more common than others.

Method	Action	Idempotent	Safe	AI API Usage
GET	Retrieve data	✅ Yes	✅ Yes	Get model info, health check, list available models
POST	Create/Submit data	❌ No	❌ No	Submit features for prediction, upload training data
PUT	Replace entirely	✅ Yes	❌ No	Replace a model version
PATCH	Partial update	❌ No	❌ No	Update model configuration
DELETE	Remove resource	✅ Yes	❌ No	Remove a deployed model

Common AI API Endpoints

GET    /api/v1/health              → Check if the service is running
GET    /api/v1/models              → List available models
GET    /api/v1/models/{id}         → Get details about a specific model
POST   /api/v1/predict             → Submit features, receive prediction
POST   /api/v1/predict/batch       → Submit multiple inputs for batch prediction
GET    /api/v1/predict/{id}        → Retrieve a past prediction result
DELETE /api/v1/models/{id}         → Remove a deployed model

Why POST for predictions?

Even though a prediction doesn't "create" a resource in the traditional sense, we use POST because:

Input features can be complex (nested objects, arrays) — too large for URL parameters
The request has a body (JSON payload)
Predictions may have side effects (logging, billing)

HTTP Status Codes

Status codes tell the client what happened with their request. They are grouped by category.

Status Code Families

Range	Category	Meaning
1xx	Informational	Request received, processing continues
2xx	Success	Request successfully processed
3xx	Redirection	Further action needed
4xx	Client Error	Problem with the request
5xx	Server Error	Problem on the server

Essential Status Codes for AI APIs

Code	Name	When to Use	AI API Example
200	OK	Request succeeded	Prediction returned successfully
201	Created	Resource created	New model uploaded and registered
204	No Content	Success, no body	Model deleted successfully
400	Bad Request	Invalid input format	JSON syntax error in request body
401	Unauthorized	Missing authentication	No API key provided
403	Forbidden	Insufficient permissions	API key lacks prediction access
404	Not Found	Resource doesn't exist	Model ID not found
422	Unprocessable Entity	Validation failed	Feature values out of expected range
429	Too Many Requests	Rate limit exceeded	Client sent too many prediction requests
500	Internal Server Error	Unexpected server failure	Model crashed during inference
503	Service Unavailable	Server not ready	Model still loading at startup

422 vs 400

400 Bad Request: The JSON itself is malformed (syntax error)
422 Unprocessable Entity: The JSON is valid, but the data doesn't pass validation (e.g., negative age, missing required field)

FastAPI uses 422 by default for validation errors from Pydantic models.

JSON Request/Response Format

REST APIs communicate using JSON (JavaScript Object Notation). For AI APIs, you need to design clear input/output schemas.

Prediction Request

{
  "features": {
    "age": 35,
    "income": 55000,
    "credit_score": 720,
    "employment_years": 8,
    "loan_amount": 25000
  },
  "options": {
    "explain": true,
    "threshold": 0.5
  }
}

Prediction Response

{
  "prediction": "approved",
  "probability": 0.87,
  "confidence": "high",
  "model_version": "loan-classifier-v2.1",
  "timestamp": "2026-02-23T14:30:00Z",
  "explanation": {
    "top_features": [
      {"feature": "credit_score", "importance": 0.42},
      {"feature": "income", "importance": 0.31},
      {"feature": "employment_years", "importance": 0.15}
    ]
  }
}

Error Response

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid input features",
    "details": [
      {
        "field": "age",
        "message": "Value must be between 18 and 120",
        "received": -5
      }
    ]
  },
  "timestamp": "2026-02-23T14:31:00Z",
  "request_id": "req_abc123"
}

API Design Best Practices

1. Endpoint Naming Conventions

Convention	Good	Bad
Use nouns, not verbs	`/api/v1/predictions`	`/api/v1/makePrediction`
Use plural nouns	`/api/v1/models`	`/api/v1/model`
Use kebab-case	`/api/v1/model-versions`	`/api/v1/modelVersions`
Version your API	`/api/v1/predict`	`/predict`
Use hierarchy for relations	`/api/v1/models/{id}/predictions`	`/api/v1/model-predictions`

2. Request Validation

Always validate input data before sending it to your model:

from pydantic import BaseModel, Field, validator

class PredictionInput(BaseModel):
    age: int = Field(..., ge=18, le=120, description="Customer age")
    income: float = Field(..., gt=0, description="Annual income in USD")
    credit_score: int = Field(..., ge=300, le=850)

    @validator("income")
    def income_must_be_reasonable(cls, v):
        if v > 10_000_000:
            raise ValueError("Income seems unrealistically high")
        return v

Why validate?

Prevents your model from receiving nonsensical inputs
Returns clear error messages to clients
Avoids silent failures (model returns a prediction for garbage input)
Protects against injection attacks

3. Consistent Response Format

Always return responses in a consistent envelope:

{
    "status": "success",      # or "error"
    "data": { ... },          # response payload
    "meta": {                 # metadata
        "model_version": "v2.1",
        "response_time_ms": 45,
        "request_id": "req_abc123"
    }
}

Authentication and Security

Protecting your AI API is critical — you don't want unauthorized users running predictions (which consume compute resources and may access sensitive models).

API Keys

The simplest authentication method. The client includes a secret key in request headers.

GET /api/v1/models HTTP/1.1
Host: api.example.com
X-API-Key: sk_live_abc123def456

Pros	Cons
Simple to implement	No built-in expiration
Easy for clients to use	Hard to manage permissions per key
Works for server-to-server	Vulnerable if exposed in client-side code

JWT (JSON Web Tokens) — Overview

JWT is a more advanced authentication mechanism where the server issues a signed token that the client includes in subsequent requests.

A JWT token has three parts: Header (algorithm), Payload (claims/permissions), and Signature (verification).

When to use what?

API Keys: Simple internal services, prototyping, server-to-server
JWT: Multi-user applications, fine-grained permissions, token expiration needed
OAuth 2.0: Third-party access, delegated authorization

When a web application at https://myapp.com tries to call your API at https://api.myml.com, the browser blocks it by default. CORS headers tell the browser which origins are allowed.

CORS Configuration Example

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://myapp.com", "http://localhost:3000"],
    allow_credentials=True,
    allow_methods=["GET", "POST"],
    allow_headers=["*"],
)

Security

Never use allow_origins=["*"] in production. This allows any website to call your API, which can lead to abuse and data leaks.

Rate Limiting

Rate limiting controls how many requests a client can make in a given time window. This is essential for AI APIs because each prediction consumes compute resources (CPU/GPU time, memory).

Strategy	Description	Use Case
Fixed Window	X requests per minute/hour	Simple API key quotas
Sliding Window	Smoothed rate over rolling window	Prevents burst abuse
Token Bucket	Allows short bursts up to a limit	APIs with variable traffic
Per-Endpoint	Different limits for different endpoints	`/predict` = 100/min, `/health` = unlimited

Rate Limit Response

When a client exceeds the limit, return a 429 Too Many Requests response with helpful headers:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708700000

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "You have exceeded 100 requests per minute. Please retry after 30 seconds."
  }
}

REST vs GraphQL vs gRPC

When building AI APIs, REST is the most common choice, but it's worth understanding the alternatives.

Feature	REST	GraphQL	gRPC
Protocol	HTTP/1.1 or HTTP/2	HTTP/1.1 or HTTP/2	HTTP/2
Data Format	JSON	JSON	Protocol Buffers (binary)
Schema	OpenAPI (optional)	Required (SDL)	Required (.proto)
Learning Curve	Low	Medium	High
Performance	Good	Good	Excellent
Browser Support	Native	Native	Limited (needs proxy)
Streaming	Limited	Subscriptions	Bidirectional
Use Case	General APIs, web	Flexible queries, mobile	Microservices, low-latency
AI Relevance	Most common for ML APIs	Complex multi-model queries	High-throughput inference

For this course

We focus on REST APIs because they are the most widely used, easiest to test, and best supported by tools like Swagger and Postman. If you need extremely low-latency inference between microservices, consider gRPC as a next step.

The Request/Response Lifecycle

Understanding the full lifecycle of an API request helps you debug issues and optimize performance.

Summary

Concept	Key Takeaway
REST API	Standard way to expose ML models via HTTP
HTTP Methods	POST for predictions, GET for info/health
Status Codes	200 = success, 422 = validation error, 500 = server error
JSON	Universal data format for request/response
Authentication	API keys (simple) or JWT (advanced)
CORS	Required for browser-based clients
Rate Limiting	Protects compute resources from abuse
REST vs alternatives	REST for most AI APIs, gRPC for internal high-throughput

What's Next?

Now that you understand REST API concepts, you'll learn to implement them using two Python frameworks:

FastAPI — Modern, async, auto-documented (next section)
Flask — Lightweight, flexible, widely used

Vocabulary Quick Reference

Term	Definition
Endpoint	A specific URL path that accepts requests (e.g., `/api/v1/predict`)
Payload	The data sent in the body of a request or response
Serialization	Converting data structures to a transferable format (JSON)
Idempotent	Making the same request multiple times produces the same result
Stateless	Server doesn't remember previous requests
Middleware	Code that runs between receiving a request and returning a response
Schema	A formal description of the expected data structure

What is an API?​

The Restaurant Analogy​

REST Architecture​

REST Principles​

REST API Architecture for AI​

HTTP Methods​

Common AI API Endpoints​

HTTP Status Codes​

Status Code Families​

Essential Status Codes for AI APIs​

JSON Request/Response Format​

Prediction Request​

Prediction Response​

Error Response​

API Design Best Practices​

1. Endpoint Naming Conventions​

2. Request Validation​

3. Consistent Response Format​

Authentication and Security​

API Keys​

JWT (JSON Web Tokens) — Overview​

CORS (Cross-Origin Resource Sharing)​

CORS Configuration Example​

Rate Limiting​

Rate Limit Response​

REST vs GraphQL vs gRPC​

The Request/Response Lifecycle​

Summary​

What's Next?​