Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Mastering Pandas DataFrame Operations

Tech May 17 3

Iterating Over DataFrames

Processing data row by row or column by column is a common task, though vectorization is preferred for performance. When iteration is necessary, Pandas offers several methods.

Row Iteration

The iterrows() method yields each row index as a key and the row data as a Series. This allows access to values via column labels.

for identifier, row_data in dataset.iterrows():
    print(row_data['id'], row_data['value'])

For significantly better performance, itertuples() converts rows into named tuples. This approach is faster and attribute access is cleaner.

for row in dataset.itertuples(index=True, name='Row'):
    print(getattr(row, 'id'), getattr(row, 'value'))

Column Iteration

To iterate through columns, use items() (or iteritems() in older versions). This returns the column label and the content as a Series.

for label, content in dataset.items():
    print(f"Column: {label}")
    print(content.values)

Data Filtering Techniques

Subset Selection with filter()

The filter() method is versatile for selecting specific rows or columns based on names, patterns, or regular expressions.

  • items: Specific list of labels to keep.
  • like: Keeps labels containing a specific substring.
  • regex: Keeps labels matching a regular expression pattern.
  • axis: The target axis (0 for 'index', 1 for 'columns').
import pandas as pd
import numpy as np

df = pd.DataFrame(
    np.array([[10, 20, 30], [40, 50, 60]]),
    index=['row_x', 'row_y'],
    columns=['col_a', 'col_b', 'col_c']
)

# Select specific columns
selected = df.filter(items=['col_a', 'col_c'])

# Select columns via regex (e.g., ending with 'c')
regex_cols = df.filter(regex='c$', axis=1)

# Select rows containing 'y'
filtered_rows = df.filter(like='y', axis=0)

Boolean Indexing

Direct filtering can be achieved by passing a boolean series derived from conditional logic.

# Filter rows where column value is greater than 25
filtered_df = original_df[original_df['quantity'] > 25]

Essential DataFrame Attributes

Understanding the fundamental attributes of a DataFrame is crucial for debugging and data inspection.

Attribute Description
T Transposes rows and columns.
axes Returns a list of row and column axis labels.
dtypes Returns the data type of each column.
empty Returns True if the DataFrame is entirely empty.
ndim The number of axes (dimensionality).
shape A tuple representing the dimensionality (rows, columns).
size Total number of elements in the DataFrame.
values Returns the DataFrame data as a NumPy array.
head(n) Returns the first n rows.
tail(n) Returns the last n rows.
shift(periods) Shifts the index by the desired number of periods.

Converting DataFrames to Dictionaries

The to_dict() function transforms DataFrame data into various Python dictionary formats. This is useful for serialization or API responses.

The structure of the output is controlled by the orient parameter.

  • 'dict' (default): {column_label: {index_label: value}}
  • 'list': {column_label: [values]}
  • 'series': {column_label: Series(values)}
  • 'split': {'index': [indices], 'columns': [cols], 'data': [values]}
  • 'records': [{column_label: value}, ...]
  • 'index': {index_label: {column_label: value}}

Mapping External Values

A common requirement is enriching a DataFrame by mapping values in a column to values from an external dictionary. The map() function is ideal for this operation.

import pandas as pd

status_lookup = {'open': 1, 'closed': 0, 'pending': 2}

df = pd.DataFrame({'order_status': ['open', 'pending', 'closed', 'open'], 'amount': [100, 200, 150, 50]})

# Create a new column by mapping status strings to integers
df['status_code'] = df['order_status'].map(status_lookup)
Tags: pandasPython

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.