API response speed shapes user experience and infrastructure cost. A cache stores work you already did so you can reuse it, which lowers database queries and network chatter. In this episode we look at FastAPI-ready caching patterns, query optimization, and uvicorn tuning.
Primer: caching in plain language
- When the same question comes in repeatedly, taking the answer from yesterday’s notebook instead of calculating it again is caching.
- When that notebook lives inside the server we call it an “in-memory cache”; when multiple servers share it we call it “Redis cache.”
- Every cache entry has an expiration (its “sell-by date”). Once it expires you recompute the answer.
Keep that mental model and the code below will click quickly.
Easiest example: serve the same value for 10 seconds
from fastapi import FastAPI
from datetime import datetime, timedelta
app = FastAPI()
cached_value: dict | None = None
expires_at: datetime | None = None
@app.get("/slow-value")
def slow_value():
global cached_value, expires_at
if cached_value and expires_at and datetime.utcnow() < expires_at:
return cached_value
cached_value = {"value": datetime.utcnow().isoformat()}
expires_at = datetime.utcnow() + timedelta(seconds=10)
return cached_value
This small snippet demonstrates that “calculate once, reuse for ten seconds” feeling. The rest of the lesson reshapes it for real FastAPI projects.
Key terms
- Cache: A storage layer that keeps precomputed results so you can skip extra DB or external API calls.
lru_cache: A decorator that reuses Python function results in memory—handy for explaining single-process caching.- Redis: An in-memory key–value store that lets multiple servers share the same cache, which is essential when FastAPI runs with several workers.
- TTL (Time To Live): How long a cache entry stays valid. Long TTL risks stale data; short TTL weakens the cache hit rate.
uvloop/httptools: Faster event loop and HTTP parser options for uvicorn, covered in the tuning section.
Practice card
- Estimated time: 55 minutes (core) / +30 minutes (optional)
- Prereqs: Episode 17 settings/Redis URL, basic SQLModel queries
- Goal: Add HTTP/Redis caches to feel the hit-rate difference, and sample tuning tools if time remains
Core practice: HTTP + Redis cache
The core session sticks to cache headers for immutable data, a Redis cache for repeated lookups, and a small in-memory cache to understand the flow.
HTTP cache headers
Add cache headers to immutable responses so clients or CDNs can reuse them.
from fastapi import Response
@app.get("/public/config")
async def public_config(response: Response):
response.headers["Cache-Control"] = "public, max-age=300"
return {"theme": "light"}
max-age=300 invites caches to reuse the response for five minutes.
Server-side (in-memory) cache
Plain dictionaries or lru_cache only work inside a single uvicorn process, but they teach the core pattern well.
from functools import lru_cache
@lru_cache(maxsize=128)
def get_exchange_rate(base: str, target: str) -> Decimal:
return fetch_from_provider(base, target)
@app.get("/exchange")
def exchange(base: str, target: str):
rate = get_exchange_rate(base, target)
return {"rate": rate}
lru_cache reuses results only inside that one process. Once you scale to multiple workers you need a shared cache such as Redis.
Redis cache pattern
redis_client = redis.from_url(settings.redis_url, encoding="utf-8", decode_responses=True)
async def cache_get(key: str):
value = await redis_client.get(key)
if value:
return json.loads(value)
async def cache_set(key: str, data: dict, ttl: int = 60):
await redis_client.set(key, json.dumps(data), ex=ttl)
@app.get("/tasks/{task_id}")
async def retrieve_task(task_id: int, session: Session = Depends(get_session)):
cache_key = f"task:{task_id}"
cached = await cache_get(cache_key)
if cached:
return cached
task = session.get(Task, task_id)
if not task:
raise HTTPException(404, "Task not found")
data = TaskRead.model_validate(task).model_dump()
await cache_set(cache_key, data, ttl=120)
return data
ttl is the cache lifetime. Pick a value that protects freshness without flooding the database.
Optional: tuning and benchmarks
Only tackle this section when you have breathing room. The cache work alone delivers noticeable improvements.
Optional: database tuning basics
- Avoid N+1 queries with SQLModel/SQLAlchemy loaders such as
selectinloadorjoinedload. - Select only the columns you need to cut network payloads.
statement = select(Task.id, Task.title).where(Task.owner_id == current_user.id)
Optional: uvicorn + uvloop settings
uvloop is a C-based asyncio loop that boosts throughput. Because results differ per environment, measure before adopting it.
pip install uvloop
uvicorn app.main:app --workers 4 --loop uvloop --http httptools
httptools is the same parser used by Node.js and tends to be faster than the default parser.
Optional: monitoring and benchmarking
Tools like wrk, hey, or k6 make it easy to apply load.
hey -n 1000 -c 50 https://api.example.com/tasks
-ncontrols total requests;-ccontrols concurrency.- Track average latency, P95/P99, and error rates.
Benchmark numbers vary wildly based on hardware, network, and data size. Use them for before/after comparisons under identical conditions.
Validate the outcome
What matters most is the delta before vs. after enabling cache.
:::terminal{title="Sample before/after", showFinalPrompt="false"}
[
{ "cmd": "hey -n 30 -c 5 http://127.0.0.1:8000/slow-value", "output": "Summary:\n Total:\t1.84 secs\n Slowest:\t0.412 secs\n Fastest:\t0.201 secs\n Average:\t0.298 secs", "delay": 500 },
{ "cmd": "hey -n 30 -c 5 http://127.0.0.1:8000/slow-value?cached=true", "output": "Summary:\n Total:\t0.51 secs\n Slowest:\t0.091 secs\n Fastest:\t0.010 secs\n Average:\t0.041 secs", "delay": 500 }
]
:::
- Confirm you tested under identical conditions.
- Confirm total time and average latency dropped together.
- Confirm the performance gain does not introduce unmanageable complexity.
D2: cache layers
A cache “hit” returns data straight from Redis. A “miss” requires a database lookup first. The more traffic you have, the more worthwhile this pattern becomes. Next we will dig into logging, monitoring, and failure handling.
Exercises
- Follow along: call
/public/configwith cache headers and/tasks/{id}wired to Redis, then observe hit/miss behavior. - Extend: from the optional section, try
selectinloador the--loop uvloopflag. - Debug: log cache hits/misses and confirm TTL expiry leads to a DB query again.
- Definition of done: at least one endpoint is cached and you have recorded one tuning or benchmarking command.
Wrap-up
Knowing when to apply HTTP, in-memory, or Redis caches lets you match the right tool to the job. Pair every experiment with measurements and logs so you can prove the cache strategy works.
💬 댓글
이 글에 대한 의견을 남겨주세요