Home > Tech > Content

Mastering Pandas DataFrame Operations

Tech May 17 16

Iterating Over DataFrames

Processing data row by row or column by column is a common task, though vectorization is preferred for performance. When iteration is necessary, Pandas offers several methods.

Row Iteration

The iterrows() method yields each row index as a key and the row data as a Series. This allows access to values via column labels.

for identifier, row_data in dataset.iterrows():
    print(row_data['id'], row_data['value'])

For significantly better performance, itertuples() converts rows into named tuples. This approach is faster and attribute access is cleaner.

for row in dataset.itertuples(index=True, name='Row'):
    print(getattr(row, 'id'), getattr(row, 'value'))

Column Iteration

To iterate through columns, use items() (or iteritems() in older versions). This returns the column label and the content as a Series.

for label, content in dataset.items():
    print(f"Column: {label}")
    print(content.values)

Data Filtering Techniques

Subset Selection with `filter()`

The filter() method is versatile for selecting specific rows or columns based on names, patterns, or regular expressions.

items: Specific list of labels to keep.
like: Keeps labels containing a specific substring.
regex: Keeps labels matching a regular expression pattern.
axis: The target axis (0 for 'index', 1 for 'columns').

import pandas as pd
import numpy as np

df = pd.DataFrame(
    np.array([[10, 20, 30], [40, 50, 60]]),
    index=['row_x', 'row_y'],
    columns=['col_a', 'col_b', 'col_c']
)

# Select specific columns
selected = df.filter(items=['col_a', 'col_c'])

# Select columns via regex (e.g., ending with 'c')
regex_cols = df.filter(regex='c$', axis=1)

# Select rows containing 'y'
filtered_rows = df.filter(like='y', axis=0)

Boolean Indexing

Direct filtering can be achieved by passing a boolean series derived from conditional logic.

# Filter rows where column value is greater than 25
filtered_df = original_df[original_df['quantity'] > 25]

Essential DataFrame Attributes

Understanding the fundamental attributes of a DataFrame is crucial for debugging and data inspection.

Attribute	Description
T	Transposes rows and columns.
axes	Returns a list of row and column axis labels.
dtypes	Returns the data type of each column.
empty	Returns `True` if the DataFrame is entirely empty.
ndim	The number of axes (dimensionality).
shape	A tuple representing the dimensionality (rows, columns).
size	Total number of elements in the DataFrame.
values	Returns the DataFrame data as a NumPy array.
head(n)	Returns the first `n` rows.
tail(n)	Returns the last `n` rows.
shift(periods)	Shifts the index by the desired number of periods.

Converting DataFrames to Dictionaries

The to_dict() function transforms DataFrame data into various Python dictionary formats. This is useful for serialization or API responses.

The structure of the output is controlled by the orient parameter.

'dict' (default): {column_label: {index_label: value}}
'list': {column_label: [values]}
'series': {column_label: Series(values)}
'split': {'index': [indices], 'columns': [cols], 'data': [values]}
'records': [{column_label: value}, ...]
'index': {index_label: {column_label: value}}

Mapping External Values

A common requirement is enriching a DataFrame by mapping values in a column to values from an external dictionary. The map() function is ideal for this operation.

import pandas as pd

status_lookup = {'open': 1, 'closed': 0, 'pending': 2}

df = pd.DataFrame({'order_status': ['open', 'pending', 'closed', 'open'], 'amount': [100, 200, 150, 50]})

# Create a new column by mapping status strings to integers
df['status_code'] = df['order_status'].map(status_lookup)

Tags: pandas Python

Back to List

Prev: Essential Nginx Hardening and Performance Tuning Techniques

Next: Implementing SMS One-Time Password Verification in Java with Redis Integration

Fading Coder

Mastering Pandas DataFrame Operations

Iterating Over DataFrames

Row Iteration

Column Iteration

Data Filtering Techniques

Subset Selection with `filter()`

Boolean Indexing

Essential DataFrame Attributes

Converting DataFrames to Dictionaries

Mapping External Values

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Mastering Pandas DataFrame Operations

Iterating Over DataFrames

Row Iteration

Column Iteration

Data Filtering Techniques

Subset Selection with filter()

Boolean Indexing

Essential DataFrame Attributes

Converting DataFrames to Dictionaries

Mapping External Values

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Subset Selection with `filter()`

Leave a Comment