Compute lazily with iterators and generators
When processing large amounts of data, running out of memory is a common problem. You can solve this by generating values one at a time, exactly when you need them. Three terms show up at once, but they all share the same goal: pull items one at a time. We will start from familiar lists and gradually generalize to iterators and generators.
Key terms
- Iterable: anything you can loop over with
for, e.g., lists, dicts, strings - Iterator: an object that implements
__iter__and__next__, returning the next value each call - Generator: a function or expression that uses
yieldto produce values and automatically becomes an iterator - Lazy evaluation: a strategy that defers computation until the moment the value is requested
Core ideas
Study notes
- Time: 60 minutes
- Prereqs: loops/comprehensions, defining functions
- Goal: build a custom iterator class, then re-create it with generator functions and expressions
- An iterable is any object you can traverse.
- An iterator responds to
next()and raisesStopIterationto finish. - Generators use
yieldto create iterators in one step. - Lazy evaluation reduces memory pressure by delaying work.
- Feel free to take only the Core path if you are short on time.
Code walkthrough
Terminology (Core)
- Iterable: works in a
forloop by exposing__iter__or indexed access via__getitem__. - Iterator: exposes both
__iter__and__next__; eachnext()returns the next value. - Generator: a special function or expression with
yield; Python turns it into an iterator automatically.
Build an iterator class (Core)
class Countdown:
def __init__(self, start):
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
value = self.current
self.current -= 1
return value
for number in Countdown(3):
print(number) # 3 2 1
When StopIteration is raised, the loop exits. Custom iterators give you precise control over state, but they require boilerplate.
Generator functions and yield (Core)
Generator functions remove most of the ceremony. Think of yield as "hand over a value and pause."
def countdown(start):
current = start
while current > 0:
yield current
current -= 1
for number in countdown(3):
print(number)
As soon as a function executes yield, it becomes a generator. Execution pauses after each yield and resumes on the next next() call, preserving state automatically.
Generator expressions (Core → Plus)
squares = (n * n for n in range(1, 1_000_001))
first_ten = list(itertools.islice(squares, 10))
Parentheses create a generator expression. It hardly uses memory until you consume it with list, sum, or another consumer.
Iterator toolbox (Optional)
itertools combines iterators like LEGO bricks.
itertools.count(start=0, step=1): infinite increasing sequenceitertools.cycle(iterable): repeat items foreveritertools.chain(a, b, ...): concatenate multiple iterablesitertools.groupby(iterable, key): group sorted data by a key function
Memory and performance strategies (Optional)
- Stream large CSV files line by line with generators instead of loading them at once.
- Wrap slow sources such as network responses in generators to let consumers pace themselves.
- Use
itertools.isliceto grab only the portion you need.
Why it matters
- Iterables can be looped; iterators drive
next(); generators make iterators effortless. - Generator functions remember state while producing one value at a time.
- With generator expressions and
itertools, you build pipelines that avoid wasting memory.
Practice
- Follow along: implement both the
Countdownclass and thecountdowngenerator to compare outputs. - Extend: mimic a large CSV list and process it with a generator expression plus
itertools.islice. - Debug: replace
StopIterationwithreturn, see howNoneleaks out, then fix the exception. - Definition of done: you have a class-based iterator, a
yieldfunction, and a generator expression running in one notebook and can justify when to pick each.
Wrap-up
Next we will manage resources safely with context managers and the with statement.
💬 댓글
이 글에 대한 의견을 남겨주세요