Essential NumPy Operations for Data Analysis
NumPy, short for Numerical Python, is a foundational library for scientific computing in Python. Many data analysis packages, including pandas, are built on top of NumPy.
At its core, NumPy uses the ndarray (N-dimensional array) data structure. While similar to Python lists, ndarrays are more efficient due to their C-based implementation and require uniform data types for optimal performance in large-scale mathematical operations.
Installation and Import
Install NumPy using pip:
pip install numpy
Import the library with the conventional alias:
import numpy as np
Creating Arrays
Manual Array Creation
Use np.array() to create arrays from sequences:
# One-dimensional array
vector = np.array([4, 5, 6])
# Two-dimensional array
matrix = np.array([[1, 2, 3], [7, 8, 9]])
Specialized Array Creation Methods
Zero Arrays:
# 1D zero array
zero_vec = np.zeros(5)
print(zero_vec) # Output: [0. 0. 0. 0. 0.]
# 2D zero array
zero_mat = np.zeros((2, 3))
print(zero_mat)
# Output:
# [[0. 0. 0.]
# [0. 0. 0.]]
One Arrays:
# 1D array of ones
ones_vec = np.ones(4)
print(ones_vec) # Output: [1. 1. 1. 1.]
# 2D array of ones
ones_mat = np.ones((3, 2))
print(ones_mat)
# Output:
# [[1. 1.]
# [1. 1.]
# [1. 1.]]
Empty Arrays:
# Uninitialized 1D array
empty_arr = np.empty(3)
# Uninitialized 2D array
empty_mat = np.empty((2, 2))
Range Arrays:
# Array from 0 to 9
seq_arr = np.arange(10)
print(seq_arr) # Output: [0 1 2 3 4 5 6 7 8 9]
# Array from 5 to 15 with step 3
step_arr = np.arange(5, 16, 3)
print(step_arr) # Output: [ 5 8 11 14]
Random Arrays:
# 2x3 array with random values
rand_mat = np.random.randn(2, 3)
print(rand_mat)
# Output example:
# [[-0.234 1.456 -0.789]
# [ 0.123 -0.456 0.890]]
Accessing Array Elements
Numeric Indexing
Access elements using zero-based indexing:
arr = np.array([10, 20, 30, 40])
print(arr[2]) # Output: 30
mat = np.array([[1, 2, 3], [4, 5, 6]])
print(mat[1][2]) # Output: 6
Slicing
Extract subarrays with slice notation:
arr = np.array([0, 1, 2, 3, 4, 5])
print(arr[2:5]) # Output: [2 3 4]
mat = np.array([[10, 20, 30], [40, 50, 60]])
print(mat[0:2, 1:3])
# Output:
# [[20 30]
# [50 60]]
Boolean Indexing
Filter arrays using boolean conditions:
values = np.array([5, 15, 25, 35])
mask = np.array([True, False, True, False])
print(values[mask]) # Output: [ 5 25]
Vectoriaztion and Broadcasting
Vectorization allows element-wise operations without explicit loops:
arr = np.array([[1, 2], [3, 4]])
print(arr + 10)
# Output:
# [[11 12]
# [13 14]]
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(arr1 * arr2) # Output: [ 4 10 18]
Broadcasting automatically aligns arrays of different shapes:
vec = np.array([1, 2, 3])
mat = np.array([[10, 20, 30], [40, 50, 60]])
print(vec + mat)
# Output:
# [[11 22 33]
# [41 52 63]]
Common Array Methods and Properties
Array Attributes
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.ndim) # Dimensions: 2
print(arr.shape) # Shape: (2, 3)
print(arr.size) # Total elements: 6
print(arr.dtype) # Data type: int32
Statistical Operations
arr = np.array([10, 20, 30, 40])
print(arr.max()) # Maximum: 40
print(arr.min()) # Minimum: 10
print(arr.mean()) # Mean: 25.0
print(arr.sum()) # Sum: 100
Sorting
NumPy provides in-place and non-destructive sorting options:
arr = np.array([30, 10, 40, 20])
# NumPy sort (returns new array)
sorted_np = np.sort(arr)
print(sorted_np) # Output: [10 20 30 40]
# Python built-in sort
sorted_py = sorted(arr, reverse=True)
print(sorted_py) # Output: [40, 30, 20, 10]
Conditional Filtering
Combine boolean indexing with comparison operators:
arr = np.array([5, 15, 25, 35])
# Single condition
print(arr[arr > 20]) # Output: [25 35]
# Multiple conditions
print(arr[(arr > 10) & (arr < 30)]) # Output: [15 25]
Transposition
Swap array axes using the .T attribute:
mat = np.array([[1, 2, 3], [4, 5, 6]])
print(mat.T)
# Output:
# [[1 4]
# [2 5]
# [3 6]]