Techniques for Iterating Over Pandas DataFrames
Here are several common methods for iterating over Pandas DataFrames:
- Iterator Methods (
items,iterrows,itertuples): Iterate through all elements row-by-row or column-by-column. Best suited for element-level operations. - Simple
columnsandindexTraversal: Iterate over each row or column. Best suited for aggregate operations like summation or discretization. forLoop withzip: Efficient method best suited for iterating over a specific, small subset of rows or columns.
Iterator Methods: items, iterrows, itertuples
items(): Iterate over columns as (column_name, Series) pairs
Functon Signature: DataFrame.iteritems()
Returns: An iterator that yields tuples of (column_label, column_content_series).
import pandas as pd
# Example: Employee salary data
salary_info = {'Department': ['HR', 'IT', 'Finance'], 'Headcount': [5, 10, 8]}
employee_df = pd.DataFrame(salary_info)
for col_label, col_data in employee_df.items():
print(f"Label: {col_label}")
print(f"Data:\n{col_data}\n")
Output:
iterrows(): Iterate over rows as (row_index, Series) pairs
Function Signature: DataFrame.iterrows()
Returns: An iterator yielding (row_index, row_data_series).
import pandas as pd
# Example: Temperature readings
readings = {'City': ['New York', 'Los Angeles'], 'Temp_C': [15, 25]}
weather_df = pd.DataFrame(readings)
for idx, row_series in weather_df.iterrows():
# Logic to process each row
print(f"Current Index: {idx}")
print(f"Row Data:\n{row_series}\n")
Output:
itertuples(): Iterate over rows as named tuples
Function Signature: DataFrame.itertuples(index=True, name='Pandas')
index: IfTrue, the index is included as the first element of the tuple.name: The class name for the named tuple; ifNone, a standard tuple is returned.
Returns: An iterator yielding named tuples for each row.
import pandas as pd
# Example: Product prices
product_data = {'Item': ['Apple', 'Banana'], 'Price': [0.5, 0.3]}
price_df = pd.DataFrame(product_data)
### 1. Default behavior
print("--- Default ---")
for record in price_df.itertuples():
print(record)
''' Output
Pandas(Index=0, Item='Apple', Price=0.5)
Pandas(Index=1, Item='Banana', Price=0.3)
'''
### 2. Exclude index from the tuple
print("\n--- Without Index ---")
for record in price_df.itertuples(index=False):
print(record)
''' Output
Pandas(Item='Apple', Price=0.5)
Pandas(Item='Banana', Price=0.3)
'''
### 3. Return as standard tuple
print("\n--- Standard Tuple ---")
for record in price_df.itertuples(name=None):
print(record)
''' Output
(0, 'Apple', 0.5)
(1, 'Banana', 0.3)
'''
Columns and Index Attributes
columns: Accessing DataFrame colum labels
Returns: <class 'pandas.core.indexes.base.Index'>. This provides access to column names and data types, but cannot be used to direct manipulate data.
To retrieve the labels as a NumPy array, use df.columns.values. To get a standard Python list, use .tolist().
import pandas as pd
# Example: Student grades
grades = {'Math': [90, 85], 'Science': [88, 92], 'History': [78, 80]}
grades_df = pd.DataFrame(grades)
print('Column object: ', grades_df.columns)
'''
Output:
Column object: Index(['Math', 'Science', 'History'], dtype='object')
'''
print('Column array: ', grades_df.columns.values) # Array type
'''
Output:
Column array: ['Math' 'Science' 'History']
'''
print('Column list: ', grades_df.columns.tolist()) # Same as list(grades_df.columns)
print('Column list (alt): ', list(grades_df))
print('Column list (alt 2): ', list(grades_df.columns))
'''
All three above output:
Column list: ['Math' 'Science' 'History']
'''
index: DataFrame index labels
Similar to columns, this retrieves the row index labels.
When performing operations on every row or column (e.g., summing), these attributes are useful. However, often the df.apply(function, axis=0) method is preferred (axis defaults to 0 for column-wise operations, 1 for row-wise).
import pandas as pd
# Example: Inventory count
inventory = {'Laptops': [10, 5], 'Mice': [50, 20], 'Keyboards': [30, 15]}
stock_df = pd.DataFrame(inventory)
# Iterate over column names
print("Column Names:")
for col in stock_df.columns:
print(col)
''' Output
Column Names:
Laptops
Mice
Keyboards
'''
# Iterate over row indices
print("\nRow Indices:")
for row_idx in stock_df.index:
print(row_idx)
''' Output
Row Indices:
0
1
'''
for Loop with zip: Iterating specific columns or rows
This method is simple and offers high execution efficiency. It is best suited when you need to iterate over a specific, small subset of columns or rows. It can become cumbersome and error-prone if used for many columns or rows.
import pandas as pd
# Example: Coordinates
coordinates = {'x': [1, 2], 'y': [4, 5], 'z': [7, 8]}
coord_df = pd.DataFrame(coordinates)
# Iterating over specific columns
for val_x, val_y in zip(coord_df['x'], coord_df['y']):
print(f'x: {val_x}, y: {val_y}')
''' Output
x: 1, y: 4
x: 2, y: 5
'''
# Iterating over specific rows
# For label-based indexing, use df.loc['RowName'] instead
for row_data_0, row_data_1 in zip(coord_df.iloc[0], coord_df.iloc[1]):
print(f'Row 0 value: {row_data_0}, Row 1 value: {row_data_1}')
''' Output
Row 0 value: 1, Row 1 value: 2
Row 0 value: 4, Row 1 value: 5
Row 0 value: 7, Row 1 value: 8
'''