Q1Which is the recommended way to build paths that don't break on either Windows or Linux?
os and pathlib — File Paths and Directory Operations
Learn Python's os.path and pathlib modules from the basics. Build OS-agnostic paths, list and recursively walk directories, and break paths into pieces with Path objects — all hands-on.
Two modules deal with file paths and directories — the older os.path and the newer, more readable pathlib. This article walks through OS-agnostic path construction, directory listing and recursive traversal, and breaking paths into pieces with Path objects, in that order.
os.path — Build OS-Agnostic Paths
Path separators differ by OS. Windows uses \ (backslash), while Linux and macOS use /. Hard-coding `"data/sales/2024.csv"` directly in your code works on Linux and Mac, but Windows can misread the path at runtime.
If you split the parts and pass them to os.path.join("data", "sales", "2024.csv"), Python picks the right separator on the fly based on the OS it's running on.
/ on Linux / Mac and \ on Windows. Don't bake separator characters in by hand — that's the trick to staying portable.| Function | Meaning | Example |
|---|---|---|
| os.path.join(*parts) | Joins paths with the OS separator | join('data', 'sales') → 'data/sales' |
| os.path.exists(p) | Whether the path exists | True / False |
| os.path.isfile(p) | Whether it's a file (not a directory) | True / False |
| os.path.isdir(p) | Whether it's a directory | True / False |
| os.path.basename(p) | The trailing file or folder name | basename('data/x.csv') → 'x.csv' |
| os.path.dirname(p) | The parent path with the tail removed | dirname('data/x.csv') → 'data' |
| os.path.splitext(p) | Splits off the extension | splitext('x.csv') → ('x', '.csv') |
Practice 2 — Split Name and Extension with basename and splitext
Don't cram everything into one line — assign step by step into intermediate variables and split the file name from its extension. basename pulls the trailing file name from a full path, and splitext splits that name into a (name, extension) tuple.
os.listdir and os.walk — Directory Listing and Recursive Traversal
When you want to pull a folder's contents into Python, use os.listdir for just one level and os.walk to recursively descend into subfolders. os.listdir returns a list of names (both files and subfolders) directly inside the folder you specify, while os.walk walks the whole subtree recursively and yields (current path, list of subfolder names, list of file names) tuples one level at a time.
import os
# One level: names directly under 'data'
print(os.listdir("data"))
# → ['sales', 'inventory']
# Recursive: walk everything under 'data'
for dirpath, dirnames, filenames in os.walk("data"):
print(dirpath, filenames)
# → data ['sales', 'inventory'] []
# data/sales [] ['2024_q1.csv', '2024_q2.csv']
# data/inventory [] ['items.json']
glob — Pattern Matching to Collect Files
When you want to grab only the files that match a condition — like only files with the `.csv` extension — the glob module is the shortest path. Write the target as a pattern using wildcards like * (any string) or ** (any depth) and you get back a list of matching paths.
* matches any string within the same level, ** crosses any number of levels (requires recursive=True).import glob
# CSV files directly under data/sales
print(glob.glob("data/sales/*.csv"))
# → ['data/sales/2024_q1.csv', 'data/sales/2024_q2.csv']
# Recursive search under data (** + recursive=True)
print(glob.glob("data/**/*.csv", recursive=True))
# → ['data/sales/2024_q1.csv', 'data/sales/2024_q2.csv']
glob's ** Wildcard Pairs with recursive=True
The double asterisk in glob.glob("data/**/*.csv") is a wildcard that crosses any number of levels. But without recursive=True, it behaves like a regular * and won't find anything in deeper folders. Always pass that argument when you want recursive search.
pathlib.Path — Object-Oriented Path Operations
While os.path was a library that handles paths as strings, since Python 3.4 the recommended approach is pathlib.Path, which treats paths themselves as objects. Build one with Path("data/sales/2024_q1.csv") and you can access each part through attributes like .parent for the parent folder, .name for the trailing piece, .stem for the name without extension, and .suffix for the extension.
os.path.dirname / basename / splitext separately.from pathlib import Path
p = Path("data") / "sales" / "2024_q1.csv" # Join with the / operator
print(p) # data/sales/2024_q1.csv
print(p.parent) # data/sales
print(p.name) # 2024_q1.csv
print(p.stem) # 2024_q1
print(p.suffix) # .csv
print(p.exists()) # True
# Read contents (a wrapper around with open)
print(p.read_text()) # CSV contents
# List subfolders (equivalent to os.walk)
for sub in Path("data").rglob("*.csv"):
print(sub)
`os.path` is string-based, `pathlib.Path` is object-based — they offer the same operations. The table below maps each task between them.
| What you want | os.path style | pathlib style |
|---|---|---|
| Join | os.path.join('data', 'x.csv') | Path('data') / 'x.csv' |
| Parent folder | os.path.dirname(p) | p.parent |
| File name | os.path.basename(p) | p.name |
| Name without extension | Use os.path.splitext(p)[0] | p.stem |
| Extension | os.path.splitext(p)[1] | p.suffix |
| Existence check | os.path.exists(p) | p.exists() |
| Recursive search | glob.glob('**/*.csv', recursive=True) | Path('.').rglob('*.csv') |
| Read | with open(p) as f: f.read() | p.read_text() |
Pick pathlib for New Code
Pathlib is recommended for new code. When older library APIs require string paths (some DB drivers, for example), convert with str(p). os.path isn't going anywhere, so knowing both mappings keeps you comfortable reading legacy code too.
Knowledge Check
Answer each question one by one.
Q2Which of the following is best suited to walking every level of a folder recursively?
Q3Given p = Path("data/sales/2024_q1.csv"), what is the value of p.stem?