Q1What's the danger of calling pickle.loads on untrusted pickle data?
pickle and base64 — Object Serialization and Binary-to-Text Conversion
Use pickle.dumps / loads to serialize dicts and custom classes to bytes for full restoration, and base64.b64encode / b64decode to put binary on text rails.
We'll cover two modules used when moving data through files or across the network. pickle saves Python objects (dicts, class instances, etc.) in a form that can be fully restored later, while base64 converts binary data like images or audio into a string of "alphanumerics plus two symbols" so you can drop it straight into email, JSON, or URLs. Both are standard library modules for data conversion at internal boundaries or external transports.
Sorting out what each module is for
The two modules naturally split by input/output combination. Use the diagram below to grab the big picture, then we'll dig into typical uses and APIs in each section.
pickle — Convert Python Objects Straight to Bytes
pickle is a standard library module that converts Python objects into a form that can be fully restored later. The result is a bytes value (a contiguous sequence of integers from 0 to 255 — a type for handling "sequences of numbers" rather than "sequences of characters" like strings). Bytes are the basic format for situations where machines, not humans, will read the data — saving to files, sending over the network, handing data to another process.
Unlike json, pickle's strength is being able to save not just dicts / lists / strings / numbers but also custom class instances and function objects, which makes it useful for saving ML models or passing objects between Python processes.
Never loads pickle data from someone else
pickle.loads can execute arbitrary Python code, so passing in bytes from an untrusted source becomes an arbitrary-code-execution vulnerability. Limit yourself to "I made it, I read it back" scenarios and never use it on data received over the network or from someone else. For external communication, use json instead.
| Function | Role | Notes |
|---|---|---|
| pickle.dumps(obj) | Python object → bytes | returns bytes (not str) |
| pickle.loads(b) | bytes → Python object | trusted sources only |
| pickle.dump(obj, file) | write to a file | open in binary mode open(..., "wb") |
| pickle.load(file) | read from a file | open in binary mode open(..., "rb") |
base64 — Move Binary Data Through Text
base64 is a module that converts arbitrary binary data into a string of "alphanumerics plus two symbols". Since 3 bytes become 4 characters, the size grows about 1.33×, but it's used widely in situations where you need to carry binary data through text-only channels, such as email attachments, embedding images in JSON or URLs, or storing bytes in SQL.
The two basic APIs:
| Function | Input | Returns |
|---|---|---|
| base64.b64encode(bytes) | bytes (the source data) | Base64-encoded bytes |
| base64.b64decode(Base64) | bytes / str (Base64 form) | the original bytes |
= padding any shortfall.import base64
# bytes → Base64
binary = b"Python is fun!"
encoded = base64.b64encode(binary)
print(encoded) # b'UHl0aG9uIGlzIGZ1biE='
print(encoded.decode()) # UHl0aG9uIGlzIGZ1biE= (converted to str)
# Base64 → bytes
decoded = base64.b64decode(encoded)
print(decoded) # b'Python is fun!'
print(decoded == binary) # True
For URLs, use urlsafe_b64
Standard b64encode includes + and /, so embedding it directly in URLs or JSON keys requires escaping. For URLs and JSON, choose base64.urlsafe_b64encode instead — it swaps + → - and / → _ to give you a URL-safe variant. Use base64.urlsafe_b64decode to restore.
Knowledge Check
Answer each question one by one.
Q2What's the type of pickle.dumps(obj)'s return value?
Q3Which function is best when you want to put a base64 string in a URL query parameter?