Reference Counting
In languages like C or C++, developers must manually allocate and deallocate memory. This is error-prone and leads to memory leaks or crashes. Python handles this automatically through a system called Reference Counting, supplemented by a cyclic garbage collector.
At its simplest, Python keeps a running tally of how many “labels” (variables) are attached to every object in memory. When no labels remain, the object is immediately destroyed.
1. What is a Reference?
Section titled “1. What is a Reference?”Recall from our chapter on Variables that a variable is not a box, but a label pointing to an object. Each of these “points” is a reference.
The Mechanics
Section titled “The Mechanics”Every object in Python (defined by the PyObject C struct) has a hidden field called ob_refcnt. This integer tracks the number of references currently pointing to that object.
2. Increasing and Decreasing Counts
Section titled “2. Increasing and Decreasing Counts”Python manages the reference count automatically as your code executes.
How the Count Increases
Section titled “How the Count Increases”- Assignment:
a = [1, 2, 3](Count becomes 1). - Shared Reference:
b = a(Count becomes 2). - Argument Passing: Passing
ainto a function (Count increases inside the function’s scope). - Container Storage: Adding
ato a list or dictionary.
How the Count Decreases
Section titled “How the Count Decreases”- Scope Exit: A local variable goes out of scope when a function returns.
- Reassignment:
a = 42(The nameais moved to a new object; the old object’s count drops). - Manual Deletion: Using the
delstatement (e.g.,del a). - Container Destruction: If a list containing an object is deleted, the count of that object drops.
3. Visualizing Reference Counts
Section titled “3. Visualizing Reference Counts”You can inspect the current reference count of an object using the sys module.
import sys
# 1. Create a list objectx = [1, 2, 3]print(f"Initial: {sys.getrefcount(x)}")
# 2. Add a referencey = xprint(f"Shared: {sys.getrefcount(x)}")
# 3. Add to a containerdata = [x]print(f"In List: {sys.getrefcount(x)}")Output:
Initial: 2Shared: 3In List: 44. The Critical Flaw: Reference Cycles
Section titled “4. The Critical Flaw: Reference Cycles”Reference counting is elegant and fast, but it has one fatal weakness: Circular References.
Imagine two objects that point to each other, but no variable in your program points to either of them.
class Node: def __init__(self): self.partner = None
a = Node()b = Node()
# Create the cyclea.partner = bb.partner = a
# Delete our variablesdel adel bThe Problem
Section titled “The Problem”Even after del a and del b, the objects still exist in memory!
Object Ahas a reference count of 1 (held byObject B).Object Bhas a reference count of 1 (held byObject A).
Since the counts are not zero, reference counting alone will never delete them. This is why Python includes a secondary system: the Generational Garbage Collector (GC), which specifically hunts for these isolated “islands” of objects.
5. Under the Hood: Py_INCREF and Py_DECREF
Section titled “5. Under the Hood: Py_INCREF and Py_DECREF”If you look at the C source code for the Python interpreter, you will see two macros used everywhere: Py_INCREF(op) and Py_DECREF(op).
Py_INCREFsimply increments theob_refcnt.Py_DECREFdecrements the count and immediately checks if it has reached zero. If it has, it calls the object’s “destructor” (tp_dealloc) to free the memory.
6. Context: Performance and Determinism
Section titled “6. Context: Performance and Determinism”Why does Python use reference counting instead of just a standard garbage collector like Java?
- Immediacy (Determinism): Objects are destroyed the instant they are no longer needed. This makes memory usage more predictable.
- Efficiency: For the vast majority of objects, no complex “stop-the-world” scanning is required.
- Simplicity: It is a very straightforward way to manage memory in a language where everything is an object.