Home > Tech > Content

Conditional Column Creation with Pandas case_when()

Tech May 17 17

The case_when() method in Pandas provides a SQL-like approach to creating new columns based on conditional logic. This method evaluates multiple conditions sequential and assigns corresponding values, offering a cleaner alternative to nested if-else statements when transforming data.

Method Overview

The case_when() function is available on Pandas Series objects and processes conditions in order. When a condition evaluates to True, the associated value is assigned. This behavior mirrors SQL's CASE WHEN construct and significantly improves code readability when handling multiple conditional branches.

Syntax and Parameters

Series.case_when(conditions, values, default=None)

Parameters:

conditions: A list of boolean Series or arrays defining evaluasion criteria
values: A list of values to assign when corresponding conditions are met
default: Fallback value for rows where no condition matches

The first matching condition determines the output value. If no conditions match and no default is specified, the result contains NaN values.

Practical Examples

Example 1: Categorizing Numeric Ranges

Consider a dataset containing employee performance scores that need conversion into categorical grades:

import pandas as pd

# Employee performance data
performance_data = {
    'employee_id': [1001, 1002, 1003, 1004, 1005],
    'performance_score': [92, 78, 88, 55, 83]
}

df = pd.DataFrame(performance_data)

# Define evaluation criteria
criteria = [
    df['performance_score'] >= 85,
    df['performance_score'] >= 70,
    df['performance_score'] >= 60
]

# Define corresponding grade labels
grade_labels = ['Exceptional', 'Satisfactory', 'Needs Improvement']

# Apply conditional logic
df['performance_grade'] = df['performance_score'].case_when(criteria, grade_labels, default='At Risk')

print(df)

Example 2: Handling Missing Values

Real-world datasets frequently contain null values requiring special treatment:

import pandas as pd
import numpy as np

# Product inventory data with missing quantities
inventory_data = {
    'product_code': ['A001', 'A002', 'A003', 'A004', 'A005'],
    'stock_quantity': [150, np.nan, 85, 200, np.nan]
}

df = pd.DataFrame(inventory_data)

# Define stock level conditions
stock_conditions = [
    df['stock_quantity'].notna() & (df['stock_quantity'] > 150),
    df['stock_quantity'].notna() & (df['stock_quantity'] >= 100),
    df['stock_quantity'].notna()
]

stock_levels = ['High', 'Normal', 'Low']

df['stock_status'] = df['stock_quantity'].case_when(stock_conditions, stock_levels, default='Unknown')

print(df)

Example 3: Combining Multiple Columns

When conditions span multiple columns, case_when() can be integrated with apply():

import pandas as pd

# Customer analysis dataset
customer_data = {
    'customer_id': [100, 101, 102, 103, 104],
    'annual_spending': [5000, 2500, 8000, 1200, 4500],
    'purchase_frequency': [12, 4, 15, 2, 8]
}

df = pd.DataFrame(customer_data)

# Define multi-column evaluation criteria
def classify_customer(row):
    conditions = [
        (row['annual_spending'] >= 5000) & (row['purchase_frequency'] >= 10),
        (row['annual_spending'] >= 3000) & (row['purchase_frequency'] >= 5),
        (row['annual_spending'] >= 1000)
    ]
    values = ['VIP', 'Regular', 'Occasional']
    return row['annual_spending'].case_when(conditions, values, default='New')

df['customer_segment'] = df.apply(classify_customer, axis=1)

print(df)

Example 4: Complex Business Rules

For intricate transformations involving multiple business rules:

import pandas as pd

# Sales representative performance data
sales_data = {
    'rep_name': ['Johnson', 'Williams', 'Brown', 'Jones', 'Garcia'],
    'quarterly_sales': [125000, 85000, 150000, 45000, 95000],
    'client_retention_rate': [0.92, 0.78, 0.88, 0.55, 0.83]
}

df = pd.DataFrame(sales_data)

# Define complex evaluation criteria
sales_conditions = [
    (df['quarterly_sales'] >= 100000) & (df['client_retention_rate'] >= 0.85),
    (df['quarterly_sales'] >= 75000) & (df['client_retention_rate'] >= 0.70),
    (df['quarterly_sales'] >= 50000)
]

performance_tiers = ['Top Performer', 'Solid Contributor', 'Developing']

df['performance_category'] = df['quarterly_sales'].case_when(sales_conditions, performance_tiers, default='Needs Support')

print(df)

Example 5: Aggregating Multiple Numeric Columns

Combining values from different columns into a single derived metric:

import pandas as pd

# Exam results dataset
exam_data = {
    'student_id': [2001, 2002, 2003, 2004, 2005],
    'midterm_score': [85, 70, 95, 60, 75],
    'final_score': [90, 80, 85, 70, 90]
}

df = pd.DataFrame(exam_data)

# Calculate weighted total
df['weighted_total'] = df.apply(lambda row: row['midterm_score'] * 0.4 + row['final_score'] * 0.6, axis=1)

print(df)

Example 6: Data Validation and Flagging

Using conditional logic to validate data quality and flag enomalies:

import pandas as pd
import numpy as np

# Transaction log with potential anomalies
transaction_data = {
    'transaction_id': ['T001', 'T002', 'T003', 'T004', 'T005'],
    'transaction_amount': [2500, np.nan, 150, 99999, 800]
}

df = pd.DataFrame(transaction_data)

# Define validation rules
validation_conditions = [
    df['transaction_amount'].notna() & (df['transaction_amount'] > 50000),
    df['transaction_amount'].notna() & (df['transaction_amount'] > 0),
    df['transaction_amount'].isna()
]

status_labels = ['Review Required', 'Valid', 'Missing Data']

df['data_quality_flag'] = df['transaction_amount'].case_when(validation_conditions, status_labels, default='Invalid')

print(df)

Important Notes

The case_when() method processes conditions strictly in order. The first condition evaluating to True determines the output value, making condition ordering critical. All conditions must return boolean values matching the Series length. This method integrates well with Pandas' method chaining patterns, enabling concise and maintainable data transformation pipelines.

Tags: pandas Python data-transformation

Back to List

Prev: Engineering a Laboratory Equipment Monitoring System with Spring Boot and Vue.js

Next: Building Template-Driven Hash Containers in C++: From Core Implementation to Standard-Like Wrappers

Fading Coder

Conditional Column Creation with Pandas case_when()

Method Overview

Syntax and Parameters

Practical Examples

Example 1: Categorizing Numeric Ranges

Example 2: Handling Missing Values

Example 3: Combining Multiple Columns

Example 4: Complex Business Rules

Example 5: Aggregating Multiple Numeric Columns

Example 6: Data Validation and Flagging

Important Notes

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Conditional Column Creation with Pandas case_when()

Method Overview

Syntax and Parameters

Practical Examples

Example 1: Categorizing Numeric Ranges

Example 2: Handling Missing Values

Example 3: Combining Multiple Columns

Example 4: Complex Business Rules

Example 5: Aggregating Multiple Numeric Columns

Example 6: Data Validation and Flagging

Important Notes

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment