[FastAPI Series 18] First Steps with Caching and Performance

한국어 버전

API response speed shapes user experience and infrastructure cost. A cache stores work you already did so you can reuse it, which lowers database queries and network chatter. In this episode we look at FastAPI-ready caching patterns, query optimization, and uvicorn tuning.

Primer: caching in plain language

  • When the same question comes in repeatedly, taking the answer from yesterday’s notebook instead of calculating it again is caching.
  • When that notebook lives inside the server we call it an “in-memory cache”; when multiple servers share it we call it “Redis cache.”
  • Every cache entry has an expiration (its “sell-by date”). Once it expires you recompute the answer.

Keep that mental model and the code below will click quickly.

Easiest example: serve the same value for 10 seconds

from fastapi import FastAPI
from datetime import datetime, timedelta

app = FastAPI()
cached_value: dict | None = None
expires_at: datetime | None = None

@app.get("/slow-value")
def slow_value():
    global cached_value, expires_at
    if cached_value and expires_at and datetime.utcnow() < expires_at:
        return cached_value
    cached_value = {"value": datetime.utcnow().isoformat()}
    expires_at = datetime.utcnow() + timedelta(seconds=10)
    return cached_value

This small snippet demonstrates that “calculate once, reuse for ten seconds” feeling. The rest of the lesson reshapes it for real FastAPI projects.

Key terms

  1. Cache: A storage layer that keeps precomputed results so you can skip extra DB or external API calls.
  2. lru_cache: A decorator that reuses Python function results in memory—handy for explaining single-process caching.
  3. Redis: An in-memory key–value store that lets multiple servers share the same cache, which is essential when FastAPI runs with several workers.
  4. TTL (Time To Live): How long a cache entry stays valid. Long TTL risks stale data; short TTL weakens the cache hit rate.
  5. uvloop/httptools: Faster event loop and HTTP parser options for uvicorn, covered in the tuning section.

Practice card

  • Estimated time: 55 minutes (core) / +30 minutes (optional)
  • Prereqs: Episode 17 settings/Redis URL, basic SQLModel queries
  • Goal: Add HTTP/Redis caches to feel the hit-rate difference, and sample tuning tools if time remains

Core practice: HTTP + Redis cache

The core session sticks to cache headers for immutable data, a Redis cache for repeated lookups, and a small in-memory cache to understand the flow.

HTTP cache headers

Add cache headers to immutable responses so clients or CDNs can reuse them.

from fastapi import Response

@app.get("/public/config")
async def public_config(response: Response):
    response.headers["Cache-Control"] = "public, max-age=300"
    return {"theme": "light"}

max-age=300 invites caches to reuse the response for five minutes.

Server-side (in-memory) cache

Plain dictionaries or lru_cache only work inside a single uvicorn process, but they teach the core pattern well.

from functools import lru_cache

@lru_cache(maxsize=128)
def get_exchange_rate(base: str, target: str) -> Decimal:
    return fetch_from_provider(base, target)

@app.get("/exchange")
def exchange(base: str, target: str):
    rate = get_exchange_rate(base, target)
    return {"rate": rate}

lru_cache reuses results only inside that one process. Once you scale to multiple workers you need a shared cache such as Redis.

Redis cache pattern


redis_client = redis.from_url(settings.redis_url, encoding="utf-8", decode_responses=True)

async def cache_get(key: str):
    value = await redis_client.get(key)
    if value:
        return json.loads(value)

async def cache_set(key: str, data: dict, ttl: int = 60):
    await redis_client.set(key, json.dumps(data), ex=ttl)

@app.get("/tasks/{task_id}")
async def retrieve_task(task_id: int, session: Session = Depends(get_session)):
    cache_key = f"task:{task_id}"
    cached = await cache_get(cache_key)
    if cached:
        return cached
    task = session.get(Task, task_id)
    if not task:
        raise HTTPException(404, "Task not found")
    data = TaskRead.model_validate(task).model_dump()
    await cache_set(cache_key, data, ttl=120)
    return data

ttl is the cache lifetime. Pick a value that protects freshness without flooding the database.

Optional: tuning and benchmarks

Only tackle this section when you have breathing room. The cache work alone delivers noticeable improvements.

Optional: database tuning basics

  • Avoid N+1 queries with SQLModel/SQLAlchemy loaders such as selectinload or joinedload.
  • Select only the columns you need to cut network payloads.
statement = select(Task.id, Task.title).where(Task.owner_id == current_user.id)

Optional: uvicorn + uvloop settings

uvloop is a C-based asyncio loop that boosts throughput. Because results differ per environment, measure before adopting it.

pip install uvloop
uvicorn app.main:app --workers 4 --loop uvloop --http httptools

httptools is the same parser used by Node.js and tends to be faster than the default parser.

Optional: monitoring and benchmarking

Tools like wrk, hey, or k6 make it easy to apply load.

hey -n 1000 -c 50 https://api.example.com/tasks
  • -n controls total requests; -c controls concurrency.
  • Track average latency, P95/P99, and error rates.

Benchmark numbers vary wildly based on hardware, network, and data size. Use them for before/after comparisons under identical conditions.

Validate the outcome

What matters most is the delta before vs. after enabling cache.

:::terminal{title="Sample before/after", showFinalPrompt="false"}

[
  { "cmd": "hey -n 30 -c 5 http://127.0.0.1:8000/slow-value", "output": "Summary:\n  Total:\t1.84 secs\n  Slowest:\t0.412 secs\n  Fastest:\t0.201 secs\n  Average:\t0.298 secs", "delay": 500 },
  { "cmd": "hey -n 30 -c 5 http://127.0.0.1:8000/slow-value?cached=true", "output": "Summary:\n  Total:\t0.51 secs\n  Slowest:\t0.091 secs\n  Fastest:\t0.010 secs\n  Average:\t0.041 secs", "delay": 500 }
]

:::

  • Confirm you tested under identical conditions.
  • Confirm total time and average latency dropped together.
  • Confirm the performance gain does not introduce unmanageable complexity.

D2: cache layers

ClientCDNFastAPIRedis CacheDatabase hitmiss

A cache “hit” returns data straight from Redis. A “miss” requires a database lookup first. The more traffic you have, the more worthwhile this pattern becomes. Next we will dig into logging, monitoring, and failure handling.

Exercises

  • Follow along: call /public/config with cache headers and /tasks/{id} wired to Redis, then observe hit/miss behavior.
  • Extend: from the optional section, try selectinload or the --loop uvloop flag.
  • Debug: log cache hits/misses and confirm TTL expiry leads to a DB query again.
  • Definition of done: at least one endpoint is cached and you have recorded one tuning or benchmarking command.

Wrap-up

Knowing when to apply HTTP, in-memory, or Redis caches lets you match the right tool to the job. Pair every experiment with measurements and logs so you can prove the cache strategy works.

💬 댓글

이 글에 대한 의견을 남겨주세요