Understanding Python Generators and the Yield Statement
List comprehensions provide a convenient way to create lists in Python, but they come with a significant drawback: the entire list is stored in memory. For large datasets, this can quickly exhaust available resources. Consider a scenario where a list of one million items is created, but only the first few elements are actually needed—the memory allocated for the remaining elements is wasted.
Python addresses this challenge through generators. Unlike lists that store all elements upfront, generators produce values on demand, calculating each subsequent element only when requested. This lazy evaluation approach drastically reduces memory consumption.
Creating Generators with Expressions
The simplest way to construct a generator is by modifying a list comprehension. Replacing the square brackets with parentheses transforms a list into a generator object:
>>> squares_list = [x ** 2 for x in range(10)]
>>> squares_list
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> squares_gen = (x ** 2 for x in range(10))
>>> squares_gen
<generator object <genexpr> at 0x7f8b2c1d5630>
The generator object does not display its contents immediately. To retrieve values, use the built-in next() function:
>>> next(squares_gen)
0
>>> next(squares_gen)
1
>>> next(squares_gen)
4
>>> next(squares_gen)
9
Each invocation of next() computes and returns the next value. When no more elements remain, Python raises a StopIteration exception:
>>> next(squares_gen)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Manually calling next() is cumbersome. A more idiomatic approach uses a for loop, which automatically handles iteration and catches StopIteration:
>>> gen = (x ** 2 for x in range(10))
>>> for num in gen:
... print(num)
0
1
4
9
16
25
36
49
64
81
Generator Functions with Yield
For more complex generation logic that cannot be expressed as a comprehension, Python offers generator functions. These functions use the yield keyword instead of return, pausing execution and saving state between each value.
The Fibonacci sequence illustrates this concept well. Each number equals the sum of the two preceding numbers:
def generate_fibonacci(limit):
current, next_val = 0, 1
count = 0
while count < limit:
yield next_val
current, next_val = next_val, current + next_val
count += 1
The tuple assignment current, next_val = next_val, current + next_val simultaneously updates both variables without requiring a temporary storage variable. When a function contains yield, calling it returns a generator object:
>>> fib_gen = generate_fibonacci(6)
>>> fib_gen
<generator object generate_fibonacci at 0x104feaaa0>
Understanding Yield Execution Flow
The execution model for generator functions differs from regular functions. Standard functions run from start to finish, returning once. Generator functions, however, pause at each yield statement and resume from that exact point when next() is called again.
Consider this demonstration:
def demo_yield():
print('Initializing...')
yield 100
print('Processing second value')
yield 200
print('Finalizing')
yield 300
>>> d = demo_yield()
>>> next(d)
Initializing...
100
>>> next(d)
Processing second value
200
>>> next(d)
Finalizing
300
>>> next(d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
After all yield statements execute, subsequent next() calls trigger StopIteration. For practical purposes, iterate using loops:
>>> for num in generate_fibonacci(6):
... print(num)
1
1
2
3
5
8
Retrieving Return Values
Generator functions can include a return statement with a value. This value is not returned directly to the caller but is attached to the StopIteration exception as its value attribute:
def fibonacci_with_status(limit):
current, next_val = 0, 1
count = 0
while count < limit:
yield next_val
current, next_val = next_val, current + next_val
count += 1
return 'Sequence completed'
>>> gen = fibonacci_with_status(6)
>>> while True:
... try:
... val = next(gen)
... print(f'Value: {val}')
... except StopIteration as e:
... print(f'Final status: {e.value}')
... break
Value: 1
Value: 1
Value: 2
Value: 3
Value: 5
Value: 8
Final status: Sequence completed
This pattern allows generators to signal completion while providing a final result or status message.