Profiling & Performance Optimization
“Premature optimization is the root of all evil.” — Donald Knuth.
Before you spend hours trying to make your code “fast,” you must first prove that it is “slow” and identify exactly where the bottleneck is. This is the domain of Profiling.
1. Micro-benchmarking with timeit
Section titled “1. Micro-benchmarking with timeit”If you want to compare two small snippets of code (e.g., is a list comprehension faster than a loop?), use the timeit module. It runs the code thousands of times to give you a statistically significant average.
import timeit
# Compare list comprehension vs. loopsetup = "nums = range(100)"
stmt1 = "[x**2 for x in nums]"stmt2 = """res = []for x in nums: res.append(x**2)"""
print(f"Comp: {timeit.timeit(stmt1, setup, number=100000):.4f}s")print(f"Loop: {timeit.timeit(stmt2, setup, number=100000):.4f}s")2. Macro-profiling with cProfile
Section titled “2. Macro-profiling with cProfile”To find which function in your entire program is taking the most time, use cProfile. This is a “Deterministic Profiler” that records every function call.
Run from the terminal:
python -m cProfile -s cumtime my_script.pyncalls: How many times the function was called.tottime: Total time spent in this function (excluding calls to sub-functions).cumtime: Total time spent in this function and all sub-functions. This is usually the most useful metric.
3. High-Performance Optimization Tips
Section titled “3. High-Performance Optimization Tips”Once you’ve found the bottleneck, use these “Pythonic” techniques to speed it up:
A. Avoid Global Lookups
Section titled “A. Avoid Global Lookups”Looking up a global variable is slower than looking up a local one. If you use a global constant or function inside a tight loop, assign it to a local variable first.
B. Use Built-in Functions
Section titled “B. Use Built-in Functions”Python’s built-ins (like sum(), max(), map()) are written in highly optimized C. They are almost always faster than a manual Python loop.
C. Slots for Memory
Section titled “C. Slots for Memory”If your memory profile shows you have millions of small objects, use __slots__ to reduce memory consumption by 40-50% and speed up attribute access.
D. The Right Data Structure
Section titled “D. The Right Data Structure”Checking if an item exists in a List is $O(n)$. Checking in a Set is $O(1)$. Switching to a set for large lookups can take your code from minutes to milliseconds.
4. Measuring Memory: tracemalloc
Section titled “4. Measuring Memory: tracemalloc”If your program’s RAM usage keeps growing, you may have a memory leak. tracemalloc allows you to take snapshots of memory and compare them.
import tracemalloc
tracemalloc.start()
# ... run your code ...
snapshot = tracemalloc.take_snapshot()top_stats = snapshot.statistics('lineno')
for stat in top_stats[:5]: print(stat)5. Summary Table
Section titled “5. Summary Table”| Tool | Usage | Purpose |
|---|---|---|
timeit | timeit.timeit() | Benchmarking small snippets. |
cProfile | python -m cProfile | Identifying the slowest function in a program. |
tracemalloc | tracemalloc.start() | Finding memory leaks and tracking RAM. |
memory_profiler | @profile decorator | Line-by-line memory usage analysis. |