Skip to content

Multiprocessing & Parallelism

When threading isn’t enough—specifically when you need to perform heavy mathematical calculations or data processing—you need Multiprocessing.

Unlike threading, which runs multiple units inside one process, multiprocessing creates entirely new processes. Each process has its own private memory space and, most importantly, its own Python Interpreter and GIL. This allows Python to utilize every core on your CPU simultaneously.


  • Concurrency (Threading): Switching between tasks really fast on one core.
  • Parallelism (Multiprocessing): Running tasks at the exact same time on different cores.

The API for multiprocessing is designed to be almost identical to threading, making it easy to switch between the two.

parallel_calc.py
import multiprocessing
import time
def heavy_math(name):
print(f"Process {name} starting...")
result = sum(i * i for i in range(10**7))
print(f"Process {name} finished.")
if __name__ == "__main__":
# Create processes
p1 = multiprocessing.Process(target=heavy_math, args=("A",))
p2 = multiprocessing.Process(target=heavy_math, args=("B",))
p1.start()
p2.start()
p1.join()
p2.join()

Because processes don’t share memory, you cannot use global variables to pass data between them. You must serialize the data and send it through a communication channel.

queue_demo.py
from multiprocessing import Process, Queue
def worker(q):
q.put("Data from worker")
if __name__ == "__main__":
q = Queue()
p = Process(target=worker, args=(q,))
p.start()
print(q.get()) # Output: Data from worker
p.join()

Multiprocessing is powerful, but it comes with a high “Startup Cost.”

  1. Memory: Each process loads its own copy of the Python interpreter and all imported libraries (~20MB+ per process).
  2. Serialization (Pickling): To send data to another process, Python must “Pickle” the object into bytes, send it over, and “Unpickle” it on the other side.

Rule of Thumb: Use Multiprocessing only if the task takes significantly longer than the time it takes to spawn the process (at least 50-100ms).


For most data science or web scraping tasks, you don’t want to manage individual processes. You want a Pool of workers that you can feed tasks to.

pool_usage.py
from multiprocessing import Pool
def square(n):
return n * n
if __name__ == "__main__":
with Pool(processes=4) as pool:
results = pool.map(square, [1, 2, 3, 4, 5])
print(results) # [1, 4, 9, 16, 25]

FeatureThreadingMultiprocessing
Best ForI/O-bound (Web, Disk).CPU-bound (Math, Video, AI).
MemoryShared (Low overhead).Isolated (High overhead).
Bypass GIL?No.Yes.
DifficultyHigh (Race Conditions).Medium (Data Serialization).