Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Extracting H.264 Video Color Formats from SPS using FFmpeg Source Code

Tech 2

This article details how FFmpeg's source code determines the color format of H.264 encoded video streams by analyzing the chroma_format_idc attribute within the Sequence Parameter Set (SPS).

Understanding chroma_format_idc in H.264

The H.264 standard defines chroma_format_idc in its official documentation. This attribute dictates the chroma subsampling scheme:

  • chroma_format_idc = 0: Monochrome
  • chroma_format_idc = 1: YUV 4:2:0
  • chroma_format_idc = 2: YUV 4:2:2
  • chroma_format_idc = 3: YUV 4:4:4

The value of chroma_format_idc is specified to be between 0 and 3, inclusive. If this attribute is absent in the SPS, it defaults to 1, indicating a YUV 4:2:0 color format. However, the presence of chroma_format_idc is conditional on specific profile_idc values. If the profile_idc does not fall within a predefined set of profiles, chroma_format_idc is implicitly assumed to be 1 (YUV 4:2:0).

Example of Determining Color Format

Consider a video file analyzed with the Elecard Stream Analyzer. If its profile_idc is 77, which is not among the profiles that explicitly define chroma_format_idc, then chroma_format_idc defaults to 1. This implies the video's color format is YUV 4:2:0. Tools like Elecard StreamEye can confirm this, displaying the color format as YUV 4:2:0.

FFmpeg Source Code Implementation

FFmpeg's ff_h264_decode_seq_parameter_set function is responsible for parsing the SPS and extracting its attributes. The code snippet below illustrates how chroma_format_idc is obtained:

int ff_h264_decode_seq_parameter_set(GetBitContext *gb, AVCodecContext *avctx,
                                     H264ParamSets *ps, int ignore_truncation)
{
    // ...
    if (sps->profile_idc == 100 ||  // High profile
        sps->profile_idc == 110 ||  // High10 profile
        sps->profile_idc == 122 ||  // High422 profile
        sps->profile_idc == 244 ||  // High444 Predictive profile
        sps->profile_idc ==  44 ||  // Cavlc444 profile
        sps->profile_idc ==  83 ||  // Scalable Constrained High profile (SVC)
        sps->profile_idc ==  86 ||  // Scalable High Intra profile (SVC)
        sps->profile_idc == 118 ||  // Stereo High profile (MVC)
        sps->profile_idc == 128 ||  // Multiview High profile (MVC)
        sps->profile_idc == 138 ||  // Multiview Depth High profile (MVCD)
        sps->profile_idc == 144) {  // old High444 profile
        sps->chroma_format_idc = get_ue_golomb_31(gb);
    // ...
    } else {
        sps->chroma_format_idc = 1;
    // ...
    }
    // ...
}

Subsequently, within the parse_nal_units function, this chroma_format_idc value is used to determine the pixel format, which is then stored in s->format (an AVCodecParserContext member):

static inline int parse_nal_units(AVCodecParserContext *s,
                                  AVCodecContext *avctx,
                                  const uint8_t * const buf, int buf_size)
{
    // ...
    for (;;) {
        switch (nal.type) {
        case H264_NAL_SPS:
            ff_h264_decode_seq_parameter_set(&nal.gb, avctx, &p->ps, 0);
            break;
         
        // ...
 
        case H264_NAL_IDR_SLICE:
        // ...
 
            switch (sps->bit_depth_luma) {
            case 9:
                if (sps->chroma_format_idc == 3)      s->format = AV_PIX_FMT_YUV444P9;
                else if (sps->chroma_format_idc == 2) s->format = AV_PIX_FMT_YUV422P9;
                else                                  s->format = AV_PIX_FMT_YUV420P9;
                break;
            case 10:
                if (sps->chroma_format_idc == 3)      s->format = AV_PIX_FMT_YUV444P10;
                else if (sps->chroma_format_idc == 2) s->format = AV_PIX_FMT_YUV422P10;
                else                                  s->format = AV_PIX_FMT_YUV420P10;
                break;
            case 8:
                if (sps->chroma_format_idc == 3)      s->format = AV_PIX_FMT_YUV444P;
                else if (sps->chroma_format_idc == 2) s->format = AV_PIX_FMT_YUV422P;
                else                                  s->format = AV_PIX_FMT_YUV420P;
                break;
            default:
                s->format = AV_PIX_FMT_NONE;
            }
        // ... 
        }
        // ...
    }
}

Interestingly, the pixel format stored in s->format is not directly used when FFmpeg prints the color format via command-line tools. Instead, the h264_init_ps function re-acquires the color format by calling get_pixel_format. This function, located in libavcodec/h264_slice.c, determines the AVCodecContext.pix_fmt:

static int h264_init_ps(H264Context *h, const H264SliceContext *sl, int first_slice)
{
//...
    if (!h->context_initialized || must_reinit || needs_reinit) {
    //...
        if ((ret = get_pixel_format(h, 1)) < 0)
            return ret;
        h->avctx->pix_fmt = ret;
    //...
    }
//...
}

The get_pixel_format function relies on macros like CHROMA(h), CHROMA422(h), and CHROMA444(h) to interpret the chroma_format_idc and map it to appropriate AVPixelFormat values, considering the bit depth:

static enum AVPixelFormat get_pixel_format(H264Context *h, int force_callback)
{
#define HWACCEL_MAX (CONFIG_H264_DXVA2_HWACCEL + \
                     (CONFIG_H264_D3D11VA_HWACCEL * 2) + \
                     CONFIG_H264_D3D12VA_HWACCEL + \
                     CONFIG_H264_NVDEC_HWACCEL + \
                     CONFIG_H264_VAAPI_HWACCEL + \
                     CONFIG_H264_VIDEOTOOLBOX_HWACCEL + \
                     CONFIG_H264_VDPAU_HWACCEL + \
                     CONFIG_H264_VULKAN_HWACCEL)
    enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmt = pix_fmts;

    switch (h->ps.sps->bit_depth_luma) {
    case 9:
        if (CHROMA444(h)) {
            if (h->avctx->colorspace == AVCOL_SPC_RGB) {
                *fmt++ = AV_PIX_FMT_GBRP9;
            } else
                *fmt++ = AV_PIX_FMT_YUV444P9;
        } else if (CHROMA422(h))
            *fmt++ = AV_PIX_FMT_YUV422P9;
        else
            *fmt++ = AV_PIX_FMT_YUV420P9;
        break;
    case 10:
#if CONFIG_H264_VIDEOTOOLBOX_HWACCEL
        if (h->avctx->colorspace != AVCOL_SPC_RGB)
            *fmt++ = AV_PIX_FMT_VIDEOTOOLBOX;
#endif
#if CONFIG_H264_VULKAN_HWACCEL
        *fmt++ = AV_PIX_FMT_VULKAN;
#endif
        if (CHROMA444(h)) {
            if (h->avctx->colorspace == AVCOL_SPC_RGB) {
                *fmt++ = AV_PIX_FMT_GBRP10;
            } else
                *fmt++ = AV_PIX_FMT_YUV444P10;
        } else if (CHROMA422(h))
            *fmt++ = AV_PIX_FMT_YUV422P10;
        else {
#if CONFIG_H264_VAAPI_HWACCEL
            *fmt++ = AV_PIX_FMT_VAAPI;
#endif
            *fmt++ = AV_PIX_FMT_YUV420P10;
        }
        break;
    case 12:
#if CONFIG_H264_VULKAN_HWACCEL
        *fmt++ = AV_PIX_FMT_VULKAN;
#endif
        if (CHROMA444(h)) {
            if (h->avctx->colorspace == AVCOL_SPC_RGB) {
                *fmt++ = AV_PIX_FMT_GBRP12;
            } else
                *fmt++ = AV_PIX_FMT_YUV444P12;
        } else if (CHROMA422(h))
            *fmt++ = AV_PIX_FMT_YUV422P12;
        else
            *fmt++ = AV_PIX_FMT_YUV420P12;
        break;
    case 14:
        if (CHROMA444(h)) {
            if (h->avctx->colorspace == AVCOL_SPC_RGB) {
                *fmt++ = AV_PIX_FMT_GBRP14;
            } else
                *fmt++ = AV_PIX_FMT_YUV444P14;
        } else if (CHROMA422(h))
            *fmt++ = AV_PIX_FMT_YUV422P14;
        else
            *fmt++ = AV_PIX_FMT_YUV420P14;
        break;
    case 8:
#if CONFIG_H264_VDPAU_HWACCEL
        *fmt++ = AV_PIX_FMT_VDPAU;
#endif
#if CONFIG_H264_VULKAN_HWACCEL
        *fmt++ = AV_PIX_FMT_VULKAN;
#endif
#if CONFIG_H264_NVDEC_HWACCEL
        *fmt++ = AV_PIX_FMT_CUDA;
#endif
#if CONFIG_H264_VIDEOTOOLBOX_HWACCEL
        if (h->avctx->colorspace != AVCOL_SPC_RGB)
            *fmt++ = AV_PIX_FMT_VIDEOTOOLBOX;
#endif
        if (CHROMA444(h)) {
            if (h->avctx->colorspace == AVCOL_SPC_RGB)
                *fmt++ = AV_PIX_FMT_GBRP;
            else if (h->avctx->color_range == AVCOL_RANGE_JPEG)
                *fmt++ = AV_PIX_FMT_YUVJ444P;
            else
                *fmt++ = AV_PIX_FMT_YUV444P;
        } else if (CHROMA422(h)) {
            if (h->avctx->color_range == AVCOL_RANGE_JPEG)
                *fmt++ = AV_PIX_FMT_YUVJ422P;
            else
                *fmt++ = AV_PIX_FMT_YUV422P;
        } else {
#if CONFIG_H264_DXVA2_HWACCEL
            *fmt++ = AV_PIX_FMT_DXVA2_VLD;
#endif
#if CONFIG_H264_D3D11VA_HWACCEL
            *fmt++ = AV_PIX_FMT_D3D11VA_VLD;
            *fmt++ = AV_PIX_FMT_D3D11;
#endif
#if CONFIG_H264_D3D12VA_HWACCEL
            *fmt++ = AV_PIX_FMT_D3D12;
#endif
#if CONFIG_H264_VAAPI_HWACCEL
            *fmt++ = AV_PIX_FMT_VAAPI;
#endif
            if (h->avctx->color_range == AVCOL_RANGE_JPEG)
                *fmt++ = AV_PIX_FMT_YUVJ420P;
            else
                *fmt++ = AV_PIX_FMT_YUV420P;
        }
        break;
    default:
        av_log(h->avctx, AV_LOG_ERROR,
               "Unsupported bit depth %d\n", h->ps.sps->bit_depth_luma);
        return AVERROR_INVALIDDATA;
    }

    *fmt = AV_PIX_FMT_NONE;

    for (int i = 0; pix_fmts[i] != AV_PIX_FMT_NONE; i++)
        if (pix_fmts[i] == h->avctx->pix_fmt && !force_callback)
            return pix_fmts[i];
    return ff_get_format(h->avctx, pix_fmts);
}

#define CHROMA(h)    ((h)->ps.sps->chroma_format_idc)
#define CHROMA422(h) ((h)->ps.sps->chroma_format_idc == 2)
#define CHROMA444(h) ((h)->ps.sps->chroma_format_idc == 3)

Finally, the dump_stream_format function uses avcodec_string to display the determined color format. The avcodec_string function retrieves the AVCodecContext.pix_fmt and converts it to its string representation:

void avcodec_string(char *buf, int buf_size, AVCodecContext *enc, int encode)
{
//...
    switch (enc->codec_type) {
    case AVMEDIA_TYPE_VIDEO:
    {
            av_bprintf(&bprint, "%s%s", separator,
                       enc->pix_fmt == AV_PIX_FMT_NONE ? "none" :
                       unknown_if_null(av_get_pix_fmt_name(enc->pix_fmt)));
    }
//...
}

Conclusion

FFmpeg determines the color format of H.264 video by inspecting the chroma_format_idc attribute in the SPS. The source code reveals that the color format might be determined multiple times during the demultiplexing process for H.264 media. For performance-critical applications requiring faster demultiplexing, developers might consider optimizing FFmpeg's source code to perform this color format determination only once.

Tags: H.264ffmpeg

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.