Q1Because of Python's GIL (Global Interpreter Lock), adding more threads doesn't parallelize which kind of work?
threading and multiprocessing — Threads, Processes, and the GIL
Walk through threads vs. processes, why the GIL blocks CPU-bound parallelism, ThreadPoolExecutor and multiprocessing.Pool, plus subprocess and how to pick — illustrated.
Python's concurrency and parallelism toolkit — threading (threads), multiprocessing (processes), and subprocess (external commands) — laid out side by side. The browser-based Python sandbox can't spin up real threads or real processes, so this article uses diagrams and read-only code blocks to nail down the concepts.
Code here doesn't actually run
The code blocks in this article are examples meant for a real Python environment. Calls like threading.Thread.start() or multiprocessing.Pool.map() won't work in the browser sandbox because the runtime can't create OS threads or OS processes. Instead, there's a concept-check quiz at the end.
Processes vs. threads
A process is an execution unit the OS hands out — it has its own memory space, and processes don't interfere with each other. A thread is a lightweight execution unit inside one process, and it shares memory with sibling threads. That sharing makes passing data fast, but you have to watch out for race conditions (multiple threads writing to the same variable at once and corrupting the result).
The simple mental model: "process = heavy but independent, thread = light but shared".
- An independent execution unit from the OS perspective
- Independent memory · No GIL constraint
- High startup cost; true parallelism works
- The unit that actually runs code inside a process
- Shared memory · Subject to the GIL
- Pays off for I/O-bound work
- Runs by switching inside a single thread
- Even lighter; great for many concurrent I/Os
threading and the GIL — Python's thread constraint
Python (CPython) has a mechanism called the GIL (Global Interpreter Lock) that enforces a constraint: "only one thread can execute Python bytecode at a time". For CPU-heavy work, running it on multiple threads concurrently is effectively no faster than a single thread.
On the flip side, during I/O-bound work (where time is dominated by waiting on external responses — network, files, DB) Python releases the GIL, so threading does speed up I/O-bound work. That said, if the workload is I/O-bound, asyncio usually has less overhead and is easier to write, so prefer asyncio for new code.
# threading: low-level API
import threading
def worker(name):
print(f"{name} started")
# do something
print(f"{name} done")
t1 = threading.Thread(target=worker, args=("A",))
t2 = threading.Thread(target=worker, args=("B",))
t1.start()
t2.start()
t1.join() # wait for completion
t2.join()
# concurrent.futures: high-level API (recommended)
from concurrent.futures import ThreadPoolExecutor
def fetch(url):
# in real code: requests.get(url) or other I/O-bound work
return f"fetched: {url}"
with ThreadPoolExecutor(max_workers=4) as executor:
urls = ["a.com", "b.com", "c.com"]
results = list(executor.map(fetch, urls))
print(results)
Reach for ThreadPoolExecutor in new code
Using threading.Thread directly makes lifecycle management messy. concurrent.futures.ThreadPoolExecutor is safe to manage with with and lets you process whole lists with executor.map(func, iterable) as one API. The thread cap (max_workers) is also handy for not flooding a server with connections.
When threading is a good fit
threading / ThreadPoolExecutor shines when you want to fill in wait time with other tasks:
- Concurrent processing of multiple Web APIs / DB queries / file I/O
- Parallelizing existing synchronous libraries (with no async support)
- Reading output from multiple subprocess-spawned processes concurrently
- Running background work without blocking a GUI's main loop
multiprocessing — true parallelism
multiprocessing is the module for spawning multiple Python processes and running them in parallel. Since processes are free from the GIL constraint, you can run CPU-bound work in true parallel — on a 4-core CPU, image processing or number crunching gets roughly 4x faster.
from multiprocessing import Pool
def heavy(n):
return sum(i * i for i in range(n)) # CPU-bound work
if __name__ == "__main__": # required form for multiprocessing
with Pool(processes=4) as pool:
results = pool.map(heavy, [10**6, 10**6, 10**6, 10**6])
print("sum:", sum(results))
multiprocessing requires `if __name__ == '__main__':`
multiprocessing works by re-running the parent script in each child process, so calling Pool(...).map(...) at the top level triggers infinite recursion and blows up. The Windows / macOS spawn start method is especially strict about this — always put your main code inside an if __name__ == "__main__": block.
subprocess — external commands
subprocess is the module for calling external commands (OS shell commands) from Python — running git status, converting video with ffmpeg, invoking a shell script, and other "start a program that isn't Python" use cases. The name resembles multiprocessing, but it's a completely different tool.
import subprocess
result = subprocess.run(
["git", "status", "--short"],
capture_output=True,
text=True,
check=True, # raises CalledProcessError on failure
)
print(result.stdout)
print("return code:", result.returncode)
Decision flow — which one do you pick?
Picking among asyncio / threading / multiprocessing / subprocess comes down to two axes: "CPU-bound or I/O-bound?" and "Inside Python or an external command?". The flow chart below removes most of the guesswork.
| Workload | Pick | Why |
|---|---|---|
| Hit 100 Web APIs concurrently | asyncio | I/O-bound; lightweight and easy to write |
| Parallelize an existing synchronous HTTP client | threading (ThreadPoolExecutor) | If the library has no async support, threading |
| Parallelize image processing across 4 cores | multiprocessing | CPU-bound work needs processes to dodge the GIL |
Run commands like git or ffmpeg | subprocess | Dedicated to calling programs outside Python |
| Millions of simple math operations | NumPy / Cython | Vectorization beats Python-level parallelism |
Truly CPU-bound is rarer than you'd think
A lot of Python code that looks like it's stuck on computation actually gets 100x faster from vectorization with NumPy / Pandas / Cython. Before chasing 4x with multiprocessing, check what you should do first: NumPy for numerics, Pandas for data, optimized regex for strings.
Knowledge Check
Answer each question one by one.
Q2Which one do you use to call external commands like git status from Python?
Q3Which is best suited for truly parallelizing heavy numerical computation across 4 CPU cores?