[FastAPI Series 18] First Steps with Caching and Performance

API response speed shapes user experience and infrastructure cost. A cache stores work you already did so you can reuse it, which lowers database queries and network chatter. In this episode we look at FastAPI-ready caching patterns, query optimization, and uvicorn tuning.

Primer: caching in plain language

When the same question comes in repeatedly, taking the answer from yesterday’s notebook instead of calculating it again is caching.
When that notebook lives inside the server we call it an “in-memory cache”; when multiple servers share it we call it “Redis cache.”
Every cache entry has an expiration (its “sell-by date”). Once it expires you recompute the answer.

Keep that mental model and the code below will click quickly.

Easiest example: serve the same value for 10 seconds

from fastapi import FastAPI
from datetime import datetime, timedelta

app = FastAPI()
cached_value: dict | None = None
expires_at: datetime | None = None

@app.get("/slow-value")
def slow_value():
    global cached_value, expires_at
    if cached_value and expires_at and datetime.utcnow() < expires_at:
        return cached_value
    cached_value = {"value": datetime.utcnow().isoformat()}
    expires_at = datetime.utcnow() + timedelta(seconds=10)
    return cached_value

This small snippet demonstrates that “calculate once, reuse for ten seconds” feeling. The rest of the lesson reshapes it for real FastAPI projects.

Key terms

Cache: A storage layer that keeps precomputed results so you can skip extra DB or external API calls.
lru_cache: A decorator that reuses Python function results in memory—handy for explaining single-process caching.
Redis: An in-memory key–value store that lets multiple servers share the same cache, which is essential when FastAPI runs with several workers.
TTL (Time To Live): How long a cache entry stays valid. Long TTL risks stale data; short TTL weakens the cache hit rate.
uvloop/httptools: Faster event loop and HTTP parser options for uvicorn, covered in the tuning section.

Practice card

Estimated time: 55 minutes (core) / +30 minutes (optional)

Prereqs: Episode 17 settings/Redis URL, basic SQLModel queries

Goal: Add HTTP/Redis caches to feel the cache hit difference, and sample tuning tools if time remains

Core practice: HTTP + Redis cache

The core session sticks to cache headers for immutable data, a Redis cache for repeated lookups, and a small in-memory cache to understand the flow.

HTTP cache headers

Add cache headers to immutable responses so clients or CDNs can reuse them.

from fastapi import Response

@app.get("/public/config")
async def public_config(response: Response):
    response.headers["Cache-Control"] = "public, max-age=300"
    return {"theme": "light"}

max-age=300 invites caches to reuse the response for five minutes.

Server-side (in-memory) cache

Plain dictionaries or lru_cache only work inside a single uvicorn process, but they teach the core pattern well.

from functools import lru_cache

@lru_cache(maxsize=128)
def get_exchange_rate(base: str, target: str) -> Decimal:
    return fetch_from_provider(base, target)

@app.get("/exchange")
def exchange(base: str, target: str):
    rate = get_exchange_rate(base, target)
    return {"rate": rate}

[[lru-cache|lru_cache]] reuses results only inside that one process. Once you scale to multiple workers you need a shared cache such as Redis.

Redis cache pattern


redis_client = redis.from_url(settings.redis_url, encoding="utf-8", decode_responses=True)

async def cache_get(key: str):
    value = await redis_client.get(key)
    if value:
        return json.loads(value)

async def cache_set(key: str, data: dict, ttl: int = 60):
    await redis_client.set(key, json.dumps(data), ex=ttl)

@app.get("/tasks/{task_id}")
async def retrieve_task(task_id: int, session: Session = Depends(get_session)):
    cache_key = f"task:{task_id}"
    cached = await cache_get(cache_key)
    if cached:
        return cached
    task = session.get(Task, task_id)
    if not task:
        raise HTTPException(404, "Task not found")
    data = TaskRead.model_validate(task).model_dump()
    await cache_set(cache_key, data, ttl=120)
    return data

[[ttl|ttl]] is the cache lifetime. Pick a value that protects freshness without flooding the database.

Optional: tuning and benchmarks

Only tackle this section when you have breathing room. The cache work alone delivers noticeable improvements.

Optional: database tuning basics

Avoid N+1 queries with SQLModel/SQLAlchemy loaders such as selectinload or joinedload.
Select only the columns you need to cut network payloads.

statement = select(Task.id, Task.title).where(Task.owner_id == current_user.id)

Optional: uvicorn + uvloop settings

uvloop is a C-based asyncio loop that boosts throughput. Because results differ per environment, measure before adopting it.

pip install uvloop
uvicorn app.main:app --workers 4 --loop uvloop --http httptools

httptools is the same parser used by Node.js and tends to be faster than the default parser.

Optional: monitoring and benchmarking

Tools like wrk, hey, or k6 make it easy to apply load.

hey -n 1000 -c 50 https://api.example.com/tasks

-n controls total requests; -c controls concurrency.
Track average latency, P95/P99, and error rates.

Benchmark numbers vary wildly based on hardware, network, and data size. Use them for before/after comparisons under identical conditions.

Validate the outcome

What matters most is the delta before vs. after enabling cache.

:::terminal{title="Sample before/after", showFinalPrompt="false"}

[
  { "cmd": "hey -n 30 -c 5 http://127.0.0.1:8000/slow-value", "output": "Summary:\n  Total:\t1.84 secs\n  Slowest:\t0.412 secs\n  Fastest:\t0.201 secs\n  Average:\t0.298 secs", "delay": 500 },
  { "cmd": "hey -n 30 -c 5 http://127.0.0.1:8000/slow-value?cached=true", "output": "Summary:\n  Total:\t0.51 secs\n  Slowest:\t0.091 secs\n  Fastest:\t0.010 secs\n  Average:\t0.041 secs", "delay": 500 }
]

:::

Confirm you tested under identical conditions.
Confirm total time and average latency dropped together.
Confirm the performance gain does not introduce unmanageable complexity.

D2: cache layers

A cache “hit” returns data straight from Redis. A “miss” requires a database lookup first. The more traffic you have, the more worthwhile this pattern becomes. Next we will dig into logging, monitoring, and failure handling.

Exercises

Follow along: call /public/config with cache headers and /tasks/{id} wired to Redis, then observe hit/miss behavior.
Extend: from the optional section, try selectinload or the --loop uvloop flag.
Debug: log cache hits/misses and confirm TTL expiry leads to a DB query again.
Definition of done: at least one endpoint is cached and you have recorded one tuning or benchmarking command.

Wrap-up

Knowing when to apply HTTP, in-memory, or Redis caches lets you match the right tool to the job. Pair every experiment with measurements and logs so you can prove the cache strategy works.