Understanding Shallow and Deep Copying in Python
When working with mutable objects in Python—such as lists, dictionaries, or NumPy arrays—it's essential to distinguish between shallow and deep copying to prevent unintended side effects.
Consider the following assignment:
self.sd_mx = self.adj_mx.copy()
This uses the .copy() method to create a new object rather than assigning by reference.
Reference Assignment vs. Copying
-
Reference assignment:
self.sd_mx = self.adj_mxHere, both variables point to the same underlying object. Any in-place modification to
self.sd_mxwill also alterself.adj_mx, because they share memory. -
Shallow copy:
self.sd_mx = self.adj_mx.copy()This creates a new container object, but its contents still reference the same elements as the original. For flat structures containing immutable types (e.g., integers in a NumPy array), this effectively isolates modifications.
Practical Example with NumPy
import numpy as np
class MatrixHandler:
def __init__(self, base_matrix):
self.original = base_matrix
self.working_copy = self.original.copy()
def alter_working(self):
self.working_copy[0, 0] = -1
# Initialize with a 2x2 array
base = np.array([[5, 6], [7, 8]])
handler = MatrixHandler(base)
handler.alter_working()
print("Original matrix:")
print(handler.original)
print("Working copy:")
print(handler.working_copy)
Output:
Original matrix:
[[5 6]
[7 8]]
Working copy:
[[-1 6]
[ 7 8]]
The change to working_copy does not affect original, confirming that .copy() produced an independent array.
For nested structures (e.g., a list of lists), a shallow copy may stil share inner objects, necessitating a deep copy via copy.deepcopy(). However, for NumPy arrays—which are homogeneous and flat in memory—a shallow copy suffices to achieve full data independence.