Understanding Object References, Shallow Copies, and Deep Copies in Python
Core Concepts
In Python, understanding how data is duplicated requires a clear distinction between the container and the elements it holds. There are three primary ways to "copy" data:
- Object Assignment: This creates a new reference to the same memory address. It is essentially an alias.
- Shallow Copy: Utilizing the
copy.copy()function, this creates a new top-level object but populates it with references to the existing objects found in the original. - Deep Copy: Utilizing
copy.deepcopy(), this recursively creates new objects for every element and sub-element, resulting in a fully independent duplicate.
Fundamental Terminology
- Variable: A name or label used to access an object.
- Object: A block of memory allocated to store data (e.g., integers, strings, or lists).
- Reference: The pointer that links a variable name to a specific memory address.
Shallow Copy Mechanism
A shallow copy constructs a new collection object and then populates it with references to the child objects found in the original. If the elements inside the collection are mutable, changes to those elements will be reflected across both the original and the copy.
import copy
# Original nested list
primary_data = [[10, 20], [30, 40]]
# Creating a shallow copy
shallow_replica = copy.copy(primary_data)
# Modifying a mutable child element within the original list
primary_data[0][0] = 999
print(f"Original: {primary_data}, ID: {id(primary_data)}")
print(f"Shallow Copy: {shallow_replica}, ID: {id(shallow_replica)}")
print(f"Child element ID match: {id(primary_data[0]) == id(shallow_replica[0])}")
Analysis:
- The memory adress of
primary_datadiffers fromshallow_replicabecause a new list container was created. - However, because the child elements are nested lists (mutable), modifying
primary_data[0]affectsshallow_replica[0]as they both point to the same internal memory address.
Deep Copy Mechanism
A deep copy is a recursive process. It first creates a new collection object and then recursively inserts copies of the objects found in the original. This ensures that the new object is entirely independent of the original.
import copy
# Original nested structure
base_config = {"values": [1, 2, 3], "active": True}
# Creating a deep copy
independent_copy = copy.deepcopy(base_config)
# Modifying the original nested list
base_config["values"][0] = 500
print(f"Base Config: {base_config}")
print(f"Independent Copy: {independent_copy}")
print(f"Nested list ID match: {id(base_config['values']) == id(independent_copy['values'])}")
Analysis:
copy.deepcopy()creates a new memory address for the dictionary and a new memory address for the list inside the dictionary.- Changes to the nested list in
base_configdo not impactindependent_copybecuase they are physically separate objects in memory.
Summary of Differences
| Feature | Assignment | Shallow Copy | Deep Copy |
|---|---|---|---|
| Memory Address | Same as original | Different from original | Different from original |
| Nested Mutable Objects | Shared | Shared | Duplicated (Independent) |
| Impact of Change | Affects both | Afffects both (for nested data) | Completely isolated |
Technical Implications
- Immutable Objects: For immutable types like integers, strings, or tuples, there is no functional difference between a shallow copy and a deep copy because the data cannot be modified in place.
- Performance: Deep copies are computationally more expensive and consume more memory than shallow copies because they involve recursive duplication of the entire object tree.
- Isolation: Deep copies are essential when you need to transform data for an experiment or temporary state without corrupting the source data structure.