A Practical Guide to Deepfake Detection in Audio-Visual Media
The task of this competition is to determine whether a facial image within a video is a Deepfake and output a probability score. Participants must develop and optimize detection models to handle diverse Deepfake generation techniques and complex scenarios, thereby improving the accuracy and robustness of detection.
Dataset Structure
Training and validation sets have been released. The label file train_labels.csv is used for model training, while validation_labels.csv is for model tuning. Each row in these files contains two comma-separated parts: the video filename (with .mp4 extension) and the ground truth label. A target value of 1 indicates a Deepfake video, and 0 indicates a genuine video.
Example of train_labels.csv:
video_name,target
96b04c80704f02cb426076b3f624b69e.mp4,0
16fe4cf5ae8b3928c968a5d11e870360.mp4,1
Example of validation_labels.csv:
video_name,target
f859cb3510c69513d5c57c6934bc9968.mp4,0
50ae26b3f3ea85babb2f9dde840830e2.mp4,1
For phase submissions, a prediction file submission.csv must be generated. Each row should contain the video filename and the model's predicted probability (score) that the video is a Deepfake.
Example of submission.csv:
video_name,score
658042526e6d0c199adc7bfeb1f7c888.mp4,0.123456
a20cf2d7dea580d0affc4d85c9932479.mp4,0.123456
Evaluation Metric
Primary performance is measured using the Area Under the ROC Curve (AUC). If rankings are tied, the True Positive Rate at a False Positive Rate of 1E-3 (TPR@FPR=1E-3) serves as a secondary metric.
Key Formulas:
- True Positive Rate (TPR): TPR = TP / (TP + FN)
- False Positive Rate (FPR): FPR = FP / (FP + TN) Where:
- TP: Attack samples correctly identified as attacks.
- TN: Genuine samples correctly identified as genuine.
- FP: Genuine samples incorrectly identified as attacks (False Alarms).
- FN: Attack samples incorrectly identified as genuine (Misses).
Competition Rules & Requirements
- Submission Limits: Validation submissions are limited to 5 per day. Test set submissions are limited to 2 per day.
- Final Ranking: Based on a weighted combination of the public test set score (20%), the score on a hidden test set after code reproduction (60%), and the technical report (20%).
- Model Constraints: Only a single model is allowed, with effective parameters not exceeding 200M (measured using the
thoplibrary). - Data Usage: No external datasets are permitted. Only the provided competition data and ImageNet-1K pre-trained models are allowed for training.
- Code Submission: Finalists must submit thier training and inference code in a Docker container, along with a detailed technical report.
Technical Report Evaluation Criteria
Reports are assessed by domain experts on:
- Innovation: Novelty in technology and application, creative solutions.
- Generality: Defense capability against unknown atacks, cross-dataset adaptability, robustness to interference, adversarial attack/defense performance.
- Practicality: Scalability, inference speed, iteration cost.
- Interpretability: Ability to capture, analyze, and provide feedback on attack clues.
Implementation Baseline: Key Steps
- Data Preparation: Use Pandas to load training and validation labels, merging file paths with their corresponding labels.
- Feature Extraction Function: Define a function
extract_audio_featuresto load a video, extract its audio track, and convert it into a Mel-spectrogram image representation. - Model Functions: Implement
model_train,model_evaluate, andmodel_predictfunctions for training, validation, and inference phases. - Model Setup: Initialize a CNN model (e.g., ResNet-18), define an optimizer (Adam), a loss function (Cross-Entropy), and a learning rate scheduler.
- Training Loop: Iterate through epochs, performing training and validation. Save the model weights when validation performance improves.
- Generate Predictions: Use the trained model to predict scores on the validation set and save them to a CSV file.
- Format Submission: Merge predictions with the required submission template to create the final
submission.csvfile.
Example code structure for the training loop:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
# Model, Loss, Optimizer initialization
net = models.resnet18(pretrained=True)
net.fc = nn.Linear(net.fc.in_features, 2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
scheduler = StepLR(optimizer, step_size=5, gamma=0.1)
best_score = 0.0
for epoch in range(num_epochs):
net.train()
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
scheduler.step()
# Validation phase
val_score = model_evaluate(net, val_loader)
if val_score > best_score:
best_score = val_score
torch.save(net.state_dict(), 'best_model.pth')