Multiprocessing & Parallelism

When threading isn’t enough—specifically when you need to perform heavy mathematical calculations or data processing—you need Multiprocessing.

Unlike threading, which runs multiple units inside one process, multiprocessing creates entirely new processes. Each process has its own private memory space and, most importantly, its own Python Interpreter and GIL. This allows Python to utilize every core on your CPU simultaneously.

1. True Parallelism vs. Concurrency

Concurrency (Threading): Switching between tasks really fast on one core.
Parallelism (Multiprocessing): Running tasks at the exact same time on different cores.

2. Basic Multiprocessing

The API for multiprocessing is designed to be almost identical to threading, making it easy to switch between the two.

import multiprocessing
import time

def heavy_math(name):
    print(f"Process {name} starting...")
    result = sum(i * i for i in range(10**7))
    print(f"Process {name} finished.")

if __name__ == "__main__":
    # Create processes
    p1 = multiprocessing.Process(target=heavy_math, args=("A",))
    p2 = multiprocessing.Process(target=heavy_math, args=("B",))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

3. Communication: Queues & Pipes

Because processes don’t share memory, you cannot use global variables to pass data between them. You must serialize the data and send it through a communication channel.

from multiprocessing import Process, Queue

def worker(q):
    q.put("Data from worker")

if __name__ == "__main__":
    q = Queue()
    p = Process(target=worker, args=(q,))
    p.start()
    print(q.get()) # Output: Data from worker
    p.join()

4. Under the Hood: Process Overhead

Multiprocessing is powerful, but it comes with a high “Startup Cost.”

Memory: Each process loads its own copy of the Python interpreter and all imported libraries (~20MB+ per process).
Serialization (Pickling): To send data to another process, Python must “Pickle” the object into bytes, send it over, and “Unpickle” it on the other side.

Rule of Thumb: Use Multiprocessing only if the task takes significantly longer than the time it takes to spawn the process (at least 50-100ms).

5. The `Pool` Executor

For most data science or web scraping tasks, you don’t want to manage individual processes. You want a Pool of workers that you can feed tasks to.

from multiprocessing import Pool

def square(n):
    return n * n

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        results = pool.map(square, [1, 2, 3, 4, 5])
    print(results) # [1, 4, 9, 16, 25]

6. Summary Table

Feature	Threading	Multiprocessing
Best For	I/O-bound (Web, Disk).	CPU-bound (Math, Video, AI).
Memory	Shared (Low overhead).	Isolated (High overhead).
Bypass GIL?	No.	Yes.
Difficulty	High (Race Conditions).	Medium (Data Serialization).