Generators & Lazy Evaluation
Generators are a simple yet incredibly powerful way to create iterators. While a normal function runs to completion and returns a value, a generator yields a series of values over time, pausing its execution between each one.
Generators are the key to handling “Big Data” in Python because they allow you to process information without loading it all into memory at once.
1. The yield Keyword
Section titled “1. The yield Keyword”When a function contains the yield keyword, it is no longer a standard function—it’s a Generator Function.
def get_numbers(): print("Starting...") yield 1 print("Resuming...") yield 2
g = get_numbers() # Returns a generator object; code has NOT run yet.
print(next(g)) # Runs until first yield; prints "Starting...", returns 1.print(next(g)) # Resumes after first yield; prints "Resuming...", returns 2.The State Preservation
Section titled “The State Preservation”Unlike a function that loses its local variables when it returns, a generator “remembers” everything: the value of local variables, the instruction pointer, and the state of the stack.
2. Generator Expressions
Section titled “2. Generator Expressions”Just as list comprehensions create lists, Generator Expressions create generators using parentheses ().
# List Comprehension (Immediate, memory-heavy)sq_list = [x**2 for x in range(1000000)]
# Generator Expression (Lazy, memory-efficient)sq_gen = (x**2 for x in range(1000000))3. Advanced: yield from (Delegation)
Section titled “3. Advanced: yield from (Delegation)”Introduced in Python 3.3, yield from allows a generator to delegate part of its operations to another generator or iterable. This is essential for flattening nested data structures.
def sub_task(): yield "Step 2.1" yield "Step 2.2"
def main_task(): yield "Step 1" yield from sub_task() # Pulls all values from sub_task yield "Step 3"
for step in main_task(): print(step)4. Context: Cooperative Multitasking
Section titled “4. Context: Cooperative Multitasking”Because generators can pause and resume, they can be used for Cooperative Multitasking.
Instead of a heavy “Thread” that the operating system forces to stop, a generator can voluntarily “yield” control back to a central scheduler. This is the fundamental concept behind async/await and asynchronous programming in Python.
5. Performance: Memory Comparison
Section titled “5. Performance: Memory Comparison”| Dataset Size | List Memory | Generator Memory |
|---|---|---|
| 1,000 items | ~9 KB | ~100 Bytes |
| 1,000,000 items | ~8 MB | ~100 Bytes |
| 1,000,000,000 items | CRASH (Out of RAM) | ~100 Bytes |
6. Summary Table
Section titled “6. Summary Table”| Feature | Return | Yield |
|---|---|---|
| Execution | Runs to completion. | Pauses and resumes. |
| Values | One single object. | A stream of objects. |
| Local State | Destroyed on exit. | Preserved between yields. |
| Memory | High (for large collections). | Constant (Near zero). |