threading and multiprocessing — Threads, Processes, and the GIL

Q: Because of Python's GIL (Global Interpreter Lock), adding more threads doesn't parallelize which kind of work?

Computation that fully uses the CPU (CPU-bound)

Q: Which is best suited for truly parallelizing heavy numerical computation across 4 CPU cores?

multiprocessing.Pool(processes=4)

Walk through threads vs. processes, why the GIL blocks CPU-bound parallelism, ThreadPoolExecutor and multiprocessing.Pool, plus subprocess and how to pick — illustrated.

Python's concurrency and parallelism toolkit — threading (threads), multiprocessing (processes), and subprocess (external commands) — laid out side by side. The browser-based Python sandbox can't spin up real threads or real processes, so this article uses diagrams and read-only code blocks to nail down the concepts.

Code here doesn't actually run

The code blocks in this article are examples meant for a real Python environment. Calls like threading.Thread.start() or multiprocessing.Pool.map() won't work in the browser sandbox because the runtime can't create OS threads or OS processes. Instead, there's a concept-check quiz at the end.

Where threading / multiprocessing / subprocess fit

threading for I/O-bound work, multiprocessing for CPU-heavy computation, subprocess for calling commands outside Python. Lined up next to asyncio (covered in the previous article for many concurrent I/Os), the four roles fall into place.

Processes vs. threads

A process is an execution unit the OS hands out — it has its own memory space, and processes don't interfere with each other. A thread is a lightweight execution unit inside one process, and it shares memory with sibling threads. That sharing makes passing data fast, but you have to watch out for race conditions (multiple threads writing to the same variable at once and corrupting the result).

The simple mental model: "process = heavy but independent, thread = light but shared".

Processes, threads, and coroutines nest like this

Process (multiprocessing creates these)

An independent execution unit from the OS perspective
Independent memory · No GIL constraint
High startup cost; true parallelism works

Thread (threading creates these)

The unit that actually runs code inside a process
Shared memory · Subject to the GIL
Pays off for I/O-bound work

Coroutine (asyncio runs these)

Runs by switching inside a single thread
Even lighter; great for many concurrent I/Os

multiprocessing spawns the outermost processes, threading spawns the middle threads, and asyncio runs the innermost coroutines. Three different layers — what you want to parallelize decides which one.

Process vs. thread differences

Process — independent memory, high startup cost in exchange for zero interference. Thread — shared memory, lightweight, but synchronization (Lock and friends) is required. Python's GIL caps thread parallelism on CPU-bound work.

threading and the GIL — Python's thread constraint

Python (CPython) has a mechanism called the GIL (Global Interpreter Lock) that enforces a constraint: "only one thread can execute Python bytecode at a time". For CPU-heavy work, running it on multiple threads concurrently is effectively no faster than a single thread.

GIL — only one thread runs at a time

The GIL is a single lock that gates the right to execute Python bytecode. Threads queue up to acquire it and release it the moment they hit an I/O wait, which lets another thread step in during that window.

On the flip side, during I/O-bound work (where time is dominated by waiting on external responses — network, files, DB) Python releases the GIL, so threading does speed up I/O-bound work. That said, if the workload is I/O-bound, asyncio usually has less overhead and is easier to write, so prefer asyncio for new code.

How the GIL affects threads

CPU-bound work stays serial under the GIL no matter how many threads you spin up. I/O-bound work releases the GIL during the wait, so threading actually helps. For true CPU-bound parallelism, reach for multiprocessing.

# threading: low-level API
import threading

def worker(name):
    print(f"{name} started")
    # do something
    print(f"{name} done")

t1 = threading.Thread(target=worker, args=("A",))
t2 = threading.Thread(target=worker, args=("B",))
t1.start()
t2.start()
t1.join()  # wait for completion
t2.join()

# concurrent.futures: high-level API (recommended)
from concurrent.futures import ThreadPoolExecutor

def fetch(url):
    # in real code: requests.get(url) or other I/O-bound work
    return f"fetched: {url}"

with ThreadPoolExecutor(max_workers=4) as executor:
    urls = ["a.com", "b.com", "c.com"]
    results = list(executor.map(fetch, urls))
    print(results)

Reach for ThreadPoolExecutor in new code

Using threading.Thread directly makes lifecycle management messy. concurrent.futures.ThreadPoolExecutor is safe to manage with with and lets you process whole lists with executor.map(func, iterable) as one API. The thread cap (max_workers) is also handy for not flooding a server with connections.

When threading is a good fit

threading / ThreadPoolExecutor shines when you want to fill in wait time with other tasks:

- Concurrent processing of multiple Web APIs / DB queries / file I/O

- Parallelizing existing synchronous libraries (with no async support)

- Reading output from multiple subprocess-spawned processes concurrently

- Running background work without blocking a GUI's main loop

multiprocessing — true parallelism

multiprocessing is the module for spawning multiple Python processes and running them in parallel. Since processes are free from the GIL constraint, you can run CPU-bound work in true parallel — on a 4-core CPU, image processing or number crunching gets roughly 4x faster.

multiprocessing.Pool — true parallelism on 4 cores

Four Python processes run on four separate CPU cores, so there's no GIL constraint and you get real parallel execution. CPU-bound work speeds up by about 4x.

from multiprocessing import Pool

def heavy(n):
    return sum(i * i for i in range(n))   # CPU-bound work

if __name__ == "__main__":   # required form for multiprocessing
    with Pool(processes=4) as pool:
        results = pool.map(heavy, [10**6, 10**6, 10**6, 10**6])
        print("sum:", sum(results))

multiprocessing requires `if __name__ == '__main__':`

multiprocessing works by re-running the parent script in each child process, so calling Pool(...).map(...) at the top level triggers infinite recursion and blows up. The Windows / macOS spawn start method is especially strict about this — always put your main code inside an if __name__ == "__main__": block.

subprocess — external commands

subprocess is the module for calling external commands (OS shell commands) from Python — running git status, converting video with ffmpeg, invoking a shell script, and other "start a program that isn't Python" use cases. The name resembles multiprocessing, but it's a completely different tool.

subprocess.run — calling an external command from Python

Python asks the OS to run a command → a separate OS process runs the external command → stdout and the return code come back in a CompletedProcess object. Useful for handing off work that Python alone can't do.

import subprocess

result = subprocess.run(
    ["git", "status", "--short"],
    capture_output=True,
    text=True,
    check=True,    # raises CalledProcessError on failure
)
print(result.stdout)
print("return code:", result.returncode)

Decision flow — which one do you pick?

Picking among asyncio / threading / multiprocessing / subprocess comes down to two axes: "CPU-bound or I/O-bound?" and "Inside Python or an external command?". The flow chart below removes most of the guesswork.

Concurrency / parallelism decision flow

External command → subprocess, I/O-bound → asyncio (or threading), CPU-bound → multiprocessing. Three axes for picking the right tool.

Workload	Pick	Why
Hit 100 Web APIs concurrently	asyncio	I/O-bound; lightweight and easy to write
Parallelize an existing synchronous HTTP client	threading (ThreadPoolExecutor)	If the library has no async support, threading
Parallelize image processing across 4 cores	multiprocessing	CPU-bound work needs processes to dodge the GIL
Run commands like `git` or `ffmpeg`	subprocess	Dedicated to calling programs outside Python
Millions of simple math operations	NumPy / Cython	Vectorization beats Python-level parallelism

Truly CPU-bound is rarer than you'd think

A lot of Python code that looks like it's stuck on computation actually gets 100x faster from vectorization with NumPy / Pandas / Cython. Before chasing 4x with multiprocessing, check what you should do first: NumPy for numerics, Pandas for data, optimized regex for strings.

Answer each question one by one.

Q1Because of Python's GIL (Global Interpreter Lock), adding more threads doesn't parallelize which kind of work?

Q2Which one do you use to call external commands like git status from Python?

Q3Which is best suited for truly parallelizing heavy numerical computation across 4 CPU cores?

Back to Python Intermediate