File I/O & Buffering
Interacting with the file system is a core task for almost any application. Whether you are logging errors, reading configuration, or processing massive datasets, you must understand how Python manages file streams.
In Python, we use the built-in open() function to create a “file object” that acts as a bridge between your code and the physical disk.
1. Opening a File: The Mode System
Section titled “1. Opening a File: The Mode System”file_obj = open("filename", mode)
| Mode | Name | Behavior |
|---|---|---|
r | Read | Default. Error if file doesn’t exist. |
w | Write | Creates new file or erases existing content. |
a | Append | Adds to the end of the file. |
x | Exclusive | Creates new file; errors if it already exists. |
b | Binary | For non-text files (images, PDFs). |
t | Text | Default. For text files (automatic encoding). |
2. Reading Strategies
Section titled “2. Reading Strategies”Depending on the size of your file, you should choose different reading methods.
Method A: .read() (The Whole File)
Section titled “Method A: .read() (The Whole File)”Loads everything into memory as one giant string.
with open("data.txt", "r") as f: content = f.read() # Don't do this for 10GB files!Method B: .readline() (Line by Line)
Section titled “Method B: .readline() (Line by Line)”Reads only the next line.
with open("data.txt", "r") as f: line1 = f.readline() line2 = f.readline()Method C: The Iterator (The Professional Way)
Section titled “Method C: The Iterator (The Professional Way)”You can loop directly over the file object. This is Memory Efficient because Python only loads one line into memory at a time.
with open("huge_data.txt", "r") as f: for line in f: process(line.strip())3. Writing & Appending
Section titled “3. Writing & Appending”When writing, Python handles the buffer automatically.
with open("output.log", "a") as f: f.write("New event logged\n")4. Binary Mode (rb and wb)
Section titled “4. Binary Mode (rb and wb)”When working with images, audio, or binary data, you must use the b flag. In this mode, Python returns Bytes objects instead of strings.
with open("input.png", "rb") as source: data = source.read()
with open("copy.png", "wb") as dest: dest.write(data)5. Under the Hood: Buffered I/O
Section titled “5. Under the Hood: Buffered I/O”Disk operations are incredibly slow compared to CPU operations. To speed things up, Python uses a Buffer (a small slice of RAM).
- When you call
.write(), Python doesn’t go to the disk immediately. It puts the data in the buffer. - When the buffer is full (usually 4KB or 8KB), it “flushes” the whole batch to the disk at once.
- The
withstatement ensures that even if your program crashes, Python will attempt to flush the buffer before closing the file.
6. Summary Table: Reading Methods
Section titled “6. Summary Table: Reading Methods”| Method | Returns | Best Use Case |
|---|---|---|
.read(n) | String | Small files or specific byte counts. |
.readline() | String | Parsing specific headers or fixed-length lines. |
.readlines() | List | Small files where you need random access to lines. |
for line in f | String | Large files (memory efficient). |