Getting Started with Python NumPy Arrays
Purpose and Performance Advantages
NumPy (Numerical Python) is an open-source library designed for manipulating arrays, alongside handling linear algebra, Fourier transforms, and matrices. While Python natively supports lists, lists become inefficient for large-scale numerical data processing. NumPy addresses this by offering performance up to 50 times faster than standard Python lists.
This speed boost comes from storing data in contiguous memory blocks, allowing rapid access and manipulation. Additionally, NumPy is optimized for modern multi-core and multi-threaded processors. Beyond scientific computing, it serves as a versatile container for general multidimensional data and supports custom data types for database integration.
Setting Up the Environment
To begin, ensure Python and pip are installed on your system. Install NumPy using pip:
pip install numpyOn certain Linux distributions like Ubuntu 24.04, pip might face permission issues. In such cases, use the system package manager:
sudo apt-get install python3-numpy -yFor Fedora-based systems, the pip method functions without issues.
Creating Basic Arrays
Import the library and assign it the standard alias np to make it available in your script:
import numpy as npConstruct a one-dimensional array using the array function:
data_points = np.array([10, 20, 30, 40, 50, 60, 70, 80])
print(data_points)Executing this script will output:
[10 20 30 40 50 60 70 80]Reshaping and Copying Arrays
When duplicating arrays, direct assignment (new_array = old_array) only creates a reference. Modifying the original array will alter the referenced copy as well. To create an independent duplicate, use np.copy().
The np.copy() function takes the target array as its primary argument, with optional order (controlling memory layout) and subok (dictating subclass preservation) parameters.
First, generate a structured array using arange and reshape. The elements generated must match the shape dimensions; for instance, a 3x2 matrix requires exactly 6 elements:
import numpy as np
source_matrix = np.arange(start=5, stop=11).reshape(3, 2)
reference_copy = source_matrix
deep_copy = np.copy(source_matrix)
print("Original Matrix:")
print(source_matrix)
# Change a value in the original
source_matrix[1, 0] = 999
print("\nAfter modifying source_matrix[1, 0]:")
print("Source Matrix:")
print(source_matrix)
print("Reference Copy (affected):")
print(reference_copy)
print("Deep Copy (unaffected):")
print(deep_copy)Running the above code demonstrates that reference_copy reflects the change made to source_matrix because they point to the same memory location, while deep_copy retains its original state.