Multiprocessing & Parallelism
When threading isn’t enough—specifically when you need to perform heavy mathematical calculations or data processing—you need Multiprocessing.
Unlike threading, which runs multiple units inside one process, multiprocessing creates entirely new processes. Each process has its own private memory space and, most importantly, its own Python Interpreter and GIL. This allows Python to utilize every core on your CPU simultaneously.
1. True Parallelism vs. Concurrency
Section titled “1. True Parallelism vs. Concurrency”- Concurrency (Threading): Switching between tasks really fast on one core.
- Parallelism (Multiprocessing): Running tasks at the exact same time on different cores.
2. Basic Multiprocessing
Section titled “2. Basic Multiprocessing”The API for multiprocessing is designed to be almost identical to threading, making it easy to switch between the two.
import multiprocessingimport time
def heavy_math(name): print(f"Process {name} starting...") result = sum(i * i for i in range(10**7)) print(f"Process {name} finished.")
if __name__ == "__main__": # Create processes p1 = multiprocessing.Process(target=heavy_math, args=("A",)) p2 = multiprocessing.Process(target=heavy_math, args=("B",))
p1.start() p2.start()
p1.join() p2.join()3. Communication: Queues & Pipes
Section titled “3. Communication: Queues & Pipes”Because processes don’t share memory, you cannot use global variables to pass data between them. You must serialize the data and send it through a communication channel.
from multiprocessing import Process, Queue
def worker(q): q.put("Data from worker")
if __name__ == "__main__": q = Queue() p = Process(target=worker, args=(q,)) p.start() print(q.get()) # Output: Data from worker p.join()4. Under the Hood: Process Overhead
Section titled “4. Under the Hood: Process Overhead”Multiprocessing is powerful, but it comes with a high “Startup Cost.”
- Memory: Each process loads its own copy of the Python interpreter and all imported libraries (~20MB+ per process).
- Serialization (Pickling): To send data to another process, Python must “Pickle” the object into bytes, send it over, and “Unpickle” it on the other side.
Rule of Thumb: Use Multiprocessing only if the task takes significantly longer than the time it takes to spawn the process (at least 50-100ms).
5. The Pool Executor
Section titled “5. The Pool Executor”For most data science or web scraping tasks, you don’t want to manage individual processes. You want a Pool of workers that you can feed tasks to.
from multiprocessing import Pool
def square(n): return n * n
if __name__ == "__main__": with Pool(processes=4) as pool: results = pool.map(square, [1, 2, 3, 4, 5]) print(results) # [1, 4, 9, 16, 25]6. Summary Table
Section titled “6. Summary Table”| Feature | Threading | Multiprocessing |
|---|---|---|
| Best For | I/O-bound (Web, Disk). | CPU-bound (Math, Video, AI). |
| Memory | Shared (Low overhead). | Isolated (High overhead). |
| Bypass GIL? | No. | Yes. |
| Difficulty | High (Race Conditions). | Medium (Data Serialization). |