Learn by reading through in order

os and pathlib — File Paths and Directory Operations

Learn Python's os.path and pathlib modules from the basics. Build OS-agnostic paths, list and recursively walk directories, and break paths into pieces with Path objects — all hands-on.

Two modules deal with file paths and directories — the older os.path and the newer, more readable pathlib. This article walks through OS-agnostic path construction, directory listing and recursive traversal, and breaking paths into pieces with Path objects, in that order.

os.path — Build OS-Agnostic Paths

Path separators differ by OS. Windows uses \ (backslash), while Linux and macOS use /. Hard-coding `"data/sales/2024.csv"` directly in your code works on Linux and Mac, but Windows can misread the path at runtime.

If you split the parts and pass them to os.path.join("data", "sales", "2024.csv"), Python picks the right separator on the fly based on the OS it's running on.

os.path.join Switches Separators per OS
os.path.join( 'data', 'sales', '2024.csv')Linux / Macdata/sales/2024.csvWindowsdata\sales\2024.csv
The same Python code expands to / on Linux / Mac and \ on Windows. Don't bake separator characters in by hand — that's the trick to staying portable.
FunctionMeaningExample
os.path.join(*parts)Joins paths with the OS separatorjoin('data', 'sales') → 'data/sales'
os.path.exists(p)Whether the path existsTrue / False
os.path.isfile(p)Whether it's a file (not a directory)True / False
os.path.isdir(p)Whether it's a directoryTrue / False
os.path.basename(p)The trailing file or folder namebasename('data/x.csv') → 'x.csv'
os.path.dirname(p)The parent path with the tail removeddirname('data/x.csv') → 'data'
os.path.splitext(p)Splits off the extensionsplitext('x.csv') → ('x', '.csv')

Build the path to a sales CSV file with os.path and read its contents (you can see data/sales/2024_q1.csv and others under the 📂 Files panel on the left).

① Import os and join the three path parts data / sales / 2024_q1.csv into a single string with the OS separator.

② Print the full path as-is.

③ Open the preloaded data/sales/2024_q1.csv with open(path, "r") and print the first three lines.

(If you run it correctly the explanation appears below.)

Python Editor

Run code to see output

Practice 2 — Split Name and Extension with basename and splitext

Don't cram everything into one line — assign step by step into intermediate variables and split the file name from its extension. basename pulls the trailing file name from a full path, and splitext splits that name into a (name, extension) tuple.

Don't write it on one line — go through intermediate variables to extract the file name and extension.

① Import os and set path = "data/sales/2024_q1.csv".

② Pull the trailing file name with os.path.basename(path), store it in filename, and print it.

③ Use os.path.splitext(filename) to get the (name, extension) tuple, store it in parts, and print it.

Unpack that tuple into stem and suffix, and print them as stem suffix separated by a space.

Python Editor

Run code to see output

os.listdir and os.walk — Directory Listing and Recursive Traversal

When you want to pull a folder's contents into Python, use os.listdir for just one level and os.walk to recursively descend into subfolders. os.listdir returns a list of names (both files and subfolders) directly inside the folder you specify, while os.walk walks the whole subtree recursively and yields (current path, list of subfolder names, list of file names) tuples one level at a time.

Difference Between os.listdir and os.walk
os.listdirList of namesin the immediate folderos.walkRecursively traverseevery level
os.listdir gives you only the immediate names; os.walk descends recursively through every level. Pick based on how deep you need to go.
import os

# One level: names directly under 'data'
print(os.listdir("data"))
# → ['sales', 'inventory']

# Recursive: walk everything under 'data'
for dirpath, dirnames, filenames in os.walk("data"):
    print(dirpath, filenames)
# → data ['sales', 'inventory'] []
#    data/sales [] ['2024_q1.csv', '2024_q2.csv']
#    data/inventory [] ['items.json']

Show what's inside the `data/` directory two ways — first the immediate level, then recursively.

① Import os.

② Use os.listdir("data") to list the names directly under `data/` in sorted order (wrap with sorted to keep the order stable).

③ Use os.walk("data") to walk data/ recursively and print one line per level showing (current path, list of file names) (wrap the file name list with sorted too).

Python Editor

Run code to see output

glob — Pattern Matching to Collect Files

When you want to grab only the files that match a condition — like only files with the `.csv` extension — the glob module is the shortest path. Write the target as a pattern using wildcards like * (any string) or ** (any depth) and you get back a list of matching paths.

glob Wildcards
*Any string withinthe same level**Crosses any depth(recursive=True)
* matches any string within the same level, ** crosses any number of levels (requires recursive=True).
import glob

# CSV files directly under data/sales
print(glob.glob("data/sales/*.csv"))
# → ['data/sales/2024_q1.csv', 'data/sales/2024_q2.csv']

# Recursive search under data (** + recursive=True)
print(glob.glob("data/**/*.csv", recursive=True))
# → ['data/sales/2024_q1.csv', 'data/sales/2024_q2.csv']

glob's ** Wildcard Pairs with recursive=True

The double asterisk in glob.glob("data/**/*.csv") is a wildcard that crosses any number of levels. But without recursive=True, it behaves like a regular * and won't find anything in deeper folders. Always pass that argument when you want recursive search.

Pull the CSV files in `data/sales/` in one shot with glob.

① Import glob.

② Use glob.glob("data/sales/*.csv") to grab the CSVs under sales, sort with sorted, and store the result in csv_files.

③ Print each path one at a time.

④ Print the count in the form Found files: ◯.

Python Editor

Run code to see output

pathlib.Path — Object-Oriented Path Operations

While os.path was a library that handles paths as strings, since Python 3.4 the recommended approach is pathlib.Path, which treats paths themselves as objects. Build one with Path("data/sales/2024_q1.csv") and you can access each part through attributes like .parent for the parent folder, .name for the trailing piece, .stem for the name without extension, and .suffix for the extension.

Path Object Attributes
Path('data/sales/2024_q1.csv').parentPath('data/sales').name'2024_q1.csv'.stem'2024_q1'.suffix'.csv'
From a single Path you can pull each part out with .parent / .name / .stem / .suffix. Easier to read than calling os.path.dirname / basename / splitext separately.
from pathlib import Path

p = Path("data") / "sales" / "2024_q1.csv"   # Join with the / operator
print(p)               # data/sales/2024_q1.csv
print(p.parent)        # data/sales
print(p.name)          # 2024_q1.csv
print(p.stem)          # 2024_q1
print(p.suffix)        # .csv
print(p.exists())      # True

# Read contents (a wrapper around with open)
print(p.read_text())   # CSV contents

# List subfolders (equivalent to os.walk)
for sub in Path("data").rglob("*.csv"):
    print(sub)

`os.path` is string-based, `pathlib.Path` is object-based — they offer the same operations. The table below maps each task between them.

What you wantos.path stylepathlib style
Joinos.path.join('data', 'x.csv')Path('data') / 'x.csv'
Parent folderos.path.dirname(p)p.parent
File nameos.path.basename(p)p.name
Name without extensionUse os.path.splitext(p)[0]p.stem
Extensionos.path.splitext(p)[1]p.suffix
Existence checkos.path.exists(p)p.exists()
Recursive searchglob.glob('**/*.csv', recursive=True)Path('.').rglob('*.csv')
Readwith open(p) as f: f.read()p.read_text()

Pick pathlib for New Code

Pathlib is recommended for new code. When older library APIs require string paths (some DB drivers, for example), convert with str(p). os.path isn't going anywhere, so knowing both mappings keeps you comfortable reading legacy code too.

Use pathlib.Path to handle a sales CSV. Instead of concatenating strings directly, get the same result through the Path object's operator and attributes.

① Import the Path class from pathlib.

② Build a single Path object from the three parts data / sales / 2024_q1.csv using Path's join operator (the os.path.join replacement).

③ Pull the name without extension and the extension from the Path via attribute access, and print them separated by a space.

④ Use a Path method to read the file's contents as a single string and print it (the way that doesn't require with open(...)).

Python Editor

Run code to see output
QUIZ

Knowledge Check

Answer each question one by one.

Q1Which is the recommended way to build paths that don't break on either Windows or Linux?

Q2Which of the following is best suited to walking every level of a folder recursively?

Q3Given p = Path("data/sales/2024_q1.csv"), what is the value of p.stem?