Home > Tech > Content

Time Series Forecasting for Electricity Demand Using Pandas and LightGBM

Tech Apr 20 17

Problem Overview

This challenge focuses on forecasting electricity consumption for multiple households using historical time-series data. Given sequences of past power usage labeled by household ID and day index (dt), the objective is to predict future target values — representing actual electricity demand.

The task falls under univariate time-series regression, where temporal dependencies, seasonality, and cross-household heterogeneity must be accounted for in modeling.

Baseline Solution: Static Historical Averaging

A minimal yet interpretable baseline computes the average target value per househlod over a recent window (days 11–20) and applies it uniformly across all test entries for that household:

import pandas as pd
import numpy as np

# Load datasets
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

# Compute mean target per 'id' over dt ∈ [11, 20]
recent_window = train.query('11 <= dt <= 20').groupby('id')['target'].mean().reset_index(name='pred_mean')

# Attach predictions to test set via left join
submission = test.merge(recent_window, on='id', how='left')

# Export final submission
submission[['id', 'dt', 'pred_mean']].rename(columns={'pred_mean': 'target'}).to_csv('submit.csv', index=False)

Key operations:

.query() filters rows more readably than boolean indexing.
.groupby(...).mean().reset_index() aggregates and restores flat structure.
.merge(..., how='left') preserves all test rows, filling missing matches with NaN.

This approach assumes stationarity within the short horizon and serves as a performence floor.

Advanced Modeling with LightGBM

To capture non-linear patterns and interactions, we adopt LightGBM — a gradient-boosted decision tree framework optimized for speed and memory efficeincy.

Data Preparation

We first unify and sort data chronologically per household, then engineer lagged and rolling features:

import pandas as pd
import numpy as np

# Concatenate and sort descending by time (newest first)
data = pd.concat([train, test], ignore_index=True)
data = data.sort_values(['id', 'dt'], ascending=[True, False]).reset_index(drop=True)

# Generate lag features: target values from 10–29 days prior
for lag in range(10, 30):
    data[f't_{lag}'] = data.groupby('id')['target'].shift(lag)

# Compute 3-day rolling mean of lags t_10, t_11, t_12
data['t_3day_avg'] = data[['t_10', 't_11', 't_12']].mean(axis=1)

# Split back into train/test using presence of 'target'
train_df = data[data['target'].notna()].reset_index(drop=True)
test_df = data[data['target'].isna()].reset_index(drop=True)

# Define feature set (exclude identifiers and target)
feature_cols = [col for col in data.columns if col not in {'id', 'dt', 'type', 'target'}]

Note: Sorting by dt in descending order ensures shift() retrieves earlier timestamps correctly when grouped by id.

Train-Validation Split Strategy

Instead of random or temporal splits, we use date-based separation:

Training: samples with dt >= 31
Validation: samples with dt <= 30

This mimics real-world deployment where models trained on newer observations predict older unseen ones — preserving temporal integrity.

Model Configuration and Training

import lightgbm as lgb
from sklearn.metrics import mean_squared_error

# Define feature and label subsets
X_trn = train_df.query('dt >= 31')[feature_cols]
y_trn = train_df.query('dt >= 31')['target']
X_val = train_df.query('dt <= 30')[feature_cols]
y_val = train_df.query('dt <= 30')['target']

# Construct LightGBM datasets
dtrain = lgb.Dataset(X_trn, label=y_trn)
dvalid = lgb.Dataset(X_val, label=y_val, reference=dtrain)

# Hyperparameters tuned for stability and generalization
params = {
    'objective': 'regression',
    'metric': 'rmse',
    'num_leaves': 32,
    'learning_rate': 0.045,
    'feature_fraction': 0.75,
    'bagging_fraction': 0.85,
    'bagging_freq': 5,
    'lambda_l2': 8.0,
    'min_data_in_leaf': 20,
    'seed': 42,
    'verbose': -1
}

# Train with early stopping
model = lgb.train(
    params,
    dtrain,
    valid_sets=[dtrain, dvalid],
    num_boost_round=10000,
    callbacks=[lgb.early_stopping(stopping_rounds=500, verbose=True)]
)

# Predict and evaluate
y_val_pred = model.predict(X_val)
val_rmse = np.sqrt(mean_squared_error(y_val, y_val_pred))
print(f'Validation RMSE: {val_rmse:.4f}')

# Apply to test set
test_predictions = model.predict(test_df[feature_cols])
test_df['target'] = test_predictions

test_df[['id', 'dt', 'target']].to_csv('submit.csv', index=False)

Critical considerations:

rmse is used as the primary metric during training and evaluation.
early_stopping prevents overfitting by halting optimization if validation loss stagnates for 500 rounds.
Categorical variables like type are omitted here but could be encoded and added to feature_cols for richer modeling.

Feature importance can be inspected via model.feature_importance() to guide iterative engineering.

Back to List

Prev: Understanding Side Effects in Jetpack Compose

Next: Strategies for Massive Data Processing and Bit Manipulation Algorithms

Fading Coder

Time Series Forecasting for Electricity Demand Using Pandas and LightGBM

Problem Overview

Baseline Solution: Static Historical Averaging

Advanced Modeling with LightGBM

Data Preparation

Train-Validation Split Strategy

Model Configuration and Training

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Time Series Forecasting for Electricity Demand Using Pandas and LightGBM

Problem Overview

Baseline Solution: Static Historical Averaging

Advanced Modeling with LightGBM

Data Preparation

Train-Validation Split Strategy

Model Configuration and Training

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment