Home > Tech > Content

A Practical Guide to Deepfake Detection in Audio-Visual Media

Tech Apr 14 24

The task of this competition is to determine whether a facial image within a video is a Deepfake and output a probability score. Participants must develop and optimize detection models to handle diverse Deepfake generation techniques and complex scenarios, thereby improving the accuracy and robustness of detection.

Dataset Structure

Training and validation sets have been released. The label file train_labels.csv is used for model training, while validation_labels.csv is for model tuning. Each row in these files contains two comma-separated parts: the video filename (with .mp4 extension) and the ground truth label. A target value of 1 indicates a Deepfake video, and 0 indicates a genuine video.

Example of train_labels.csv:

video_name,target
96b04c80704f02cb426076b3f624b69e.mp4,0
16fe4cf5ae8b3928c968a5d11e870360.mp4,1

Example of validation_labels.csv:

video_name,target
f859cb3510c69513d5c57c6934bc9968.mp4,0
50ae26b3f3ea85babb2f9dde840830e2.mp4,1

For phase submissions, a prediction file submission.csv must be generated. Each row should contain the video filename and the model's predicted probability (score) that the video is a Deepfake.

Example of submission.csv:

video_name,score
658042526e6d0c199adc7bfeb1f7c888.mp4,0.123456
a20cf2d7dea580d0affc4d85c9932479.mp4,0.123456

Evaluation Metric

Primary performance is measured using the Area Under the ROC Curve (AUC). If rankings are tied, the True Positive Rate at a False Positive Rate of 1E-3 (TPR@FPR=1E-3) serves as a secondary metric.

Key Formulas:

True Positive Rate (TPR): TPR = TP / (TP + FN)
False Positive Rate (FPR): FPR = FP / (FP + TN) Where:
TP: Attack samples correctly identified as attacks.
TN: Genuine samples correctly identified as genuine.
FP: Genuine samples incorrectly identified as attacks (False Alarms).
FN: Attack samples incorrectly identified as genuine (Misses).

Competition Rules & Requirements

Submission Limits: Validation submissions are limited to 5 per day. Test set submissions are limited to 2 per day.
Final Ranking: Based on a weighted combination of the public test set score (20%), the score on a hidden test set after code reproduction (60%), and the technical report (20%).
Model Constraints: Only a single model is allowed, with effective parameters not exceeding 200M (measured using the thop library).
Data Usage: No external datasets are permitted. Only the provided competition data and ImageNet-1K pre-trained models are allowed for training.
Code Submission: Finalists must submit thier training and inference code in a Docker container, along with a detailed technical report.

Technical Report Evaluation Criteria

Reports are assessed by domain experts on:

Innovation: Novelty in technology and application, creative solutions.
Generality: Defense capability against unknown atacks, cross-dataset adaptability, robustness to interference, adversarial attack/defense performance.
Practicality: Scalability, inference speed, iteration cost.
Interpretability: Ability to capture, analyze, and provide feedback on attack clues.

Implementation Baseline: Key Steps

Data Preparation: Use Pandas to load training and validation labels, merging file paths with their corresponding labels.
Feature Extraction Function: Define a function extract_audio_features to load a video, extract its audio track, and convert it into a Mel-spectrogram image representation.
Model Functions: Implement model_train, model_evaluate, and model_predict functions for training, validation, and inference phases.
Model Setup: Initialize a CNN model (e.g., ResNet-18), define an optimizer (Adam), a loss function (Cross-Entropy), and a learning rate scheduler.
Training Loop: Iterate through epochs, performing training and validation. Save the model weights when validation performance improves.
Generate Predictions: Use the trained model to predict scores on the validation set and save them to a CSV file.
Format Submission: Merge predictions with the required submission template to create the final submission.csv file.

Example code structure for the training loop:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR

# Model, Loss, Optimizer initialization
net = models.resnet18(pretrained=True)
net.fc = nn.Linear(net.fc.in_features, 2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
scheduler = StepLR(optimizer, step_size=5, gamma=0.1)

best_score = 0.0
for epoch in range(num_epochs):
    net.train()
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    scheduler.step()
    
    # Validation phase
    val_score = model_evaluate(net, val_loader)
    if val_score > best_score:
        best_score = val_score
        torch.save(net.state_dict(), 'best_model.pth')

Tags: Deepfake Detection Computer Vision

Back to List

Prev: Core Principles of Database Systems: A Technical Overview

Next: Managing Serial Ports on Ubuntu: Information, Tools, and Permissions

Fading Coder

A Practical Guide to Deepfake Detection in Audio-Visual Media

Dataset Structure

Evaluation Metric

Competition Rules & Requirements

Technical Report Evaluation Criteria

Implementation Baseline: Key Steps

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

A Practical Guide to Deepfake Detection in Audio-Visual Media

Dataset Structure

Evaluation Metric

Competition Rules & Requirements

Technical Report Evaluation Criteria

Implementation Baseline: Key Steps

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment