threading and multiprocessing — Threads, Processes, and the GIL

Walk through threads vs. processes, why the GIL blocks CPU-bound parallelism, ThreadPoolExecutor and multiprocessing.Pool, plus subprocess and how to pick — illustrated.

Python's concurrency and parallelism toolkit — threading (threads), multiprocessing (processes), and subprocess (external commands) — laid out side by side. The browser-based Python sandbox can't spin up real threads or real processes, so this article uses diagrams and read-only code blocks to nail down the concepts.

Code here doesn't actually run

The code blocks in this article are examples meant for a real Python environment. Calls like threading.Thread.start() or multiprocessing.Pool.map() won't work in the browser sandbox because the runtime can't create OS threads or OS processes. Instead, there's a concept-check quiz at the end.

Where threading / multiprocessing / subprocess fit
threadingI/O-boundFiles / DB / APIsmultiprocessingCPU-boundImaging / numericssubprocessExternal commandsgit / ffmpeg / shellasyncio (previous)Many concurrent I/Os100 Web API calls
threading for I/O-bound work, multiprocessing for CPU-heavy computation, subprocess for calling commands outside Python. Lined up next to asyncio (covered in the previous article for many concurrent I/Os), the four roles fall into place.

Processes vs. threads

A process is an execution unit the OS hands out — it has its own memory space, and processes don't interfere with each other. A thread is a lightweight execution unit inside one process, and it shares memory with sibling threads. That sharing makes passing data fast, but you have to watch out for race conditions (multiple threads writing to the same variable at once and corrupting the result).

The simple mental model: "process = heavy but independent, thread = light but shared".

Processes, threads, and coroutines nest like this
Process (multiprocessing creates these)
  • An independent execution unit from the OS perspective
  • Independent memory · No GIL constraint
  • High startup cost; true parallelism works
Thread (threading creates these)
  • The unit that actually runs code inside a process
  • Shared memory · Subject to the GIL
  • Pays off for I/O-bound work
Coroutine (asyncio runs these)
  • Runs by switching inside a single thread
  • Even lighter; great for many concurrent I/Os
multiprocessing spawns the outermost processes, threading spawns the middle threads, and asyncio runs the innermost coroutines. Three different layers — what you want to parallelize decides which one.
Process vs. thread differences
Process(multiprocessing)Independent memoryHigh startup costNo interferenceTrue parallelismThread(threading)Shared memoryLightweight startupWatch for racesGIL constraint
Processindependent memory, high startup cost in exchange for zero interference. Threadshared memory, lightweight, but synchronization (Lock and friends) is required. Python's GIL caps thread parallelism on CPU-bound work.

threading and the GIL — Python's thread constraint

Python (CPython) has a mechanism called the GIL (Global Interpreter Lock) that enforces a constraint: "only one thread can execute Python bytecode at a time". For CPU-heavy work, running it on multiple threads concurrently is effectively no faster than a single thread.

GIL — only one thread runs at a time
Thread AComputingGILHeld by AThread BWaitingThread AI/O wait→ releases GILGILAcquired by BThread BStarts computinghandoff
The GIL is a single lock that gates the right to execute Python bytecode. Threads queue up to acquire it and release it the moment they hit an I/O wait, which lets another thread step in during that window.

On the flip side, during I/O-bound work (where time is dominated by waiting on external responses — network, files, DB) Python releases the GIL, so threading does speed up I/O-bound work. That said, if the workload is I/O-bound, asyncio usually has less overhead and is easier to write, so prefer asyncio for new code.

How the GIL affects threads
CPU-bound(computation)threadingQueues on the GIL→ No parallelismmultiprocessing neededI/O-bound(API / DB / files)threadingGIL released on I/O→ Runs concurrently(asyncio is even lighter)
CPU-bound work stays serial under the GIL no matter how many threads you spin up. I/O-bound work releases the GIL during the wait, so threading actually helps. For true CPU-bound parallelism, reach for multiprocessing.
# threading: low-level API
import threading

def worker(name):
    print(f"{name} started")
    # do something
    print(f"{name} done")

t1 = threading.Thread(target=worker, args=("A",))
t2 = threading.Thread(target=worker, args=("B",))
t1.start()
t2.start()
t1.join()  # wait for completion
t2.join()

# concurrent.futures: high-level API (recommended)
from concurrent.futures import ThreadPoolExecutor

def fetch(url):
    # in real code: requests.get(url) or other I/O-bound work
    return f"fetched: {url}"

with ThreadPoolExecutor(max_workers=4) as executor:
    urls = ["a.com", "b.com", "c.com"]
    results = list(executor.map(fetch, urls))
    print(results)

Reach for ThreadPoolExecutor in new code

Using threading.Thread directly makes lifecycle management messy. concurrent.futures.ThreadPoolExecutor is safe to manage with with and lets you process whole lists with executor.map(func, iterable) as one API. The thread cap (max_workers) is also handy for not flooding a server with connections.

When threading is a good fit

threading / ThreadPoolExecutor shines when you want to fill in wait time with other tasks:

- Concurrent processing of multiple Web APIs / DB queries / file I/O

- Parallelizing existing synchronous libraries (with no async support)

- Reading output from multiple subprocess-spawned processes concurrently

- Running background work without blocking a GUI's main loop

multiprocessing — true parallelism

multiprocessing is the module for spawning multiple Python processes and running them in parallel. Since processes are free from the GIL constraint, you can run CPU-bound work in true parallel — on a 4-core CPU, image processing or number crunching gets roughly 4x faster.

multiprocessing.Pool — true parallelism on 4 cores
Input[d1, d2, d3, d4]Process 1Core 1heavy(d1)Process 2Core 2heavy(d2)Process 3Core 3heavy(d3)Process 4Core 4heavy(d4)Result[r1, r2, r3, r4]
Four Python processes run on four separate CPU cores, so there's no GIL constraint and you get real parallel execution. CPU-bound work speeds up by about 4x.
from multiprocessing import Pool

def heavy(n):
    return sum(i * i for i in range(n))   # CPU-bound work

if __name__ == "__main__":   # required form for multiprocessing
    with Pool(processes=4) as pool:
        results = pool.map(heavy, [10**6, 10**6, 10**6, 10**6])
        print("sum:", sum(results))

multiprocessing requires `if __name__ == '__main__':`

multiprocessing works by re-running the parent script in each child process, so calling Pool(...).map(...) at the top level triggers infinite recursion and blows up. The Windows / macOS spawn start method is especially strict about this — always put your main code inside an if __name__ == "__main__": block.

subprocess — external commands

subprocess is the module for calling external commands (OS shell commands) from Python — running git status, converting video with ffmpeg, invoking a shell script, and other "start a program that isn't Python" use cases. The name resembles multiprocessing, but it's a completely different tool.

subprocess.run — calling an external command from Python
Pythonsubprocess.run([...])OSspawns a processExternal commandgit / ffmpeg etc.Returns stdout /returncode to Python
Python asks the OS to run a commanda separate OS process runs the external command → stdout and the return code come back in a CompletedProcess object. Useful for handing off work that Python alone can't do.
import subprocess

result = subprocess.run(
    ["git", "status", "--short"],
    capture_output=True,
    text=True,
    check=True,    # raises CalledProcessError on failure
)
print(result.stdout)
print("return code:", result.returncode)

Decision flow — which one do you pick?

Picking among asyncio / threading / multiprocessing / subprocess comes down to two axes: "CPU-bound or I/O-bound?" and "Inside Python or an external command?". The flow chart below removes most of the guesswork.

Concurrency / parallelism decision flow
Calling anexternal command?I/O-bound?(wait time dominates)CPU-bound?(computation dominates)→ subprocess→ asyncio(or threading)→ multiprocessingYesYesYes
External command → subprocess, I/O-bound → asyncio (or threading), CPU-bound → multiprocessing. Three axes for picking the right tool.
WorkloadPickWhy
Hit 100 Web APIs concurrentlyasyncioI/O-bound; lightweight and easy to write
Parallelize an existing synchronous HTTP clientthreading (ThreadPoolExecutor)If the library has no async support, threading
Parallelize image processing across 4 coresmultiprocessingCPU-bound work needs processes to dodge the GIL
Run commands like git or ffmpegsubprocessDedicated to calling programs outside Python
Millions of simple math operationsNumPy / CythonVectorization beats Python-level parallelism

Truly CPU-bound is rarer than you'd think

A lot of Python code that looks like it's stuck on computation actually gets 100x faster from vectorization with NumPy / Pandas / Cython. Before chasing 4x with multiprocessing, check what you should do first: NumPy for numerics, Pandas for data, optimized regex for strings.

QUIZ

Knowledge Check

Answer each question one by one.

Q1Because of Python's GIL (Global Interpreter Lock), adding more threads doesn't parallelize which kind of work?

Q2Which one do you use to call external commands like git status from Python?

Q3Which is best suited for truly parallelizing heavy numerical computation across 4 CPU cores?