当前位置：网站首页>【webrtc】Add x264 encoder for CEF/Chromium

【webrtc】Add x264 encoder for CEF/Chromium

2022-04-23 19:26:00 【等風來不如迎風去】

Add x264 encoder for CEF/Chromium

Add x264 encoder for CEF/Chromium
This article explains the main ideas and precautions for modifying the H264Encoder module of WebRTC and adding the x264 encoder.
大神的文章：為 CEF/Chromium 添加 x264 編碼器
, 大神文章還有英文版缺失了的圖片。

Article Directory

在這裏插入圖片描述

Preface
Class transformation
Possible problems
Simulcast
The meaning of two QP thresholds
Dynamic bit rate/frame rate
It is very important to feedback the QP of each frame
What is RTPFragmentationHeader
H264 encoded data arrangement format requirements
Streams with pred_weight_table unsupported

Preface

As we all know, all projects based on the “standard” browser architecture developed by the Chromium kernel, such as CEF, Electron and even Google Chrome, provide the default H.264 soft encoder by Cisco’s OpenH264. So, what if we want to use other H.264 encoders,
such as x264, or even hard-code in combination with a special chip? There are many related articles on the Internet about the difference between OpenH264 and x264, so I won’t go into details here.
The specific implementation of OpenH264 is located in the WebRTC project. This article is based on the source code of the branch of CEF 76.3809.132 .
Because the code of Chromium and WebRTC changes frequently, the code may not correspond to it in the near future, so this article will only talk about general ideas suitable for future versions as much as possible.
However, from the git log, the implementation of H264 encoder changes infrequently. The most frequent one is VideoSendStream of WebRTC.
There are a lot of codes for controlling the behavior of the encoder, and this part is very interesting. I have time to write an article. To put it briefly… I’m off topic, get back to business.

The H264 soft encoder code is located at:

third_party\webrtc\modules\video_coding\codecs\h264\h264_encoder_impl.h/.cc

Class transformation

The following is the inheritance relationship of the H264EncoderImpl class: lazy, so the function parameters and some member variables in the above figure are not written.
The class relationship is still very clear and relatively simple. H264EncoderImpl implements 5 pure virtual functions (InitEncode, Encode, RegisterEncodeCompleteCallback, Release, SetRates) of VideoEncoder, and 1 virtual function (GetEncoderInfo). And added a number of private methods and member variables.
The virtual functions implemented by H264EncoderImpl are the core ones. We add the x264 encoder to expand around these virtual functions. As for the other parts, you can see that they are basically the implementation of OpenH264, not universal. So, it can be said that H264EncoderImpl should be called OpenH264EncoderImpl more appropriately.
- 大神的圖片：為 CEF/Chromium 添加 x264 編碼器
  ：
OK, let’s modify H264EncoderImpl. The modified class structure is as follows:
I created a new base class H264BaseEncoder , which is used to store and implement some general methods, and the real difference, continue to retain the form of virtual functions, and derive 2 subclasses: OpenH264EncoderImpl And X264EncoderImpl to implement them.
After the transformation, all the codes in the original H264EncoderImpl that only apply to OpenH264 were transferred to OpenH264EncoderImpl, and the common parts (methods and variables) were kept in H264BaseEncoder. Finally, X264EncoderImpl is our important class.

X264EncoderImpl class

OK, let’s talk about the problems that may be encountered in implementing the X264EncoderImpl class.

Possible problems

Simulcast
WebRTC’s OpenH264 supports Simulcast, which can be seen by reading the source code:

std::vector<ISVCEncoder*> encoders_;

OpenH264 has a vector with a maximum number of 4 to store the encoder corresponding to each layer. Whether the x264 implementation is supported depends on your actual use. You can implement only one encoder. Correspondingly, do not set simulcast when the upper layer initializes PeerConnection.

The meaning of two QP thresholds

In the source code of OpenH264, two static constants are defined: kLowH264QpThreshold and kHighH264QpThreshold , the values are 24 and 37 respectively.
It should be noted here that these two values are not the QP range set in the OpenH264 encoding parameters for encoding, but are used to determine whether to dynamically increase and decrease the transmission resolution or frame rate threshold. WebRTC will compare these two values based on the average QP of the encoding over a period of time. Smaller is equal to kLowH264QpThreshold, and it is considered that the resolution (or frame rate) needs to be increased. If it is greater than kHighH264QpThreshold, the resolution (or frame rate) will be lowered. The implementation of this part of the average QP calculation is also very interesting. Its source code is located in**/webrtc\modules\video_coding\utility\quality_scaler.cc.**

As for whether 24 and 37 are applicable to x264, this also needs to be determined by combining the x264 encoding parameters you use and the actual application. I made a slight adjustment (26, 40).

Dynamic bit rate/frame rate

As you have seen above, among the several important pure virtual functions, there is a SetRates (this function was formerly called SetRateAllocation ). The call stack of this function is roughly like this:

→ VideoEncoder::SetRates()
↑ VideoStreamEncoder::SetEncoderRates()
↑ VideoStreamEncoder::OnBitrateUpdated()
↑ VideoSendStreamImpl::OnBitrateUpdated()
↑ BitrateAllocator::OnNetworkChanged()

Among them, VideoEncoder::SetRates() will come to SetRates of the specific encoder. Its parameters mainly include the target bit rate and frame rate to be adjusted. SetRates will be called very frequently, so here, we have to constantly reset the encoder parameters to adapt to WebRTC requests.

x264 does not seem to support dynamic frame rates

One point to be specifically stated here is that x264 does not seem to support dynamic frame rates . In other words, if we set a frame rate of 60fps in InitEncode, and try to change it to 30fps using x264_encoder_reconfig() in SetRates , it seems to be invalid.
But dynamically adjusting the bit rate does not have this problem.
I’m not sure if it is caused by the ABR encoding method, or other reasons. Anyone who knows can tell me.
Therefore, in response to this problem, my approach is: if the current encoding frame rate differs from the target frame rate by a certain threshold (such as 5) or more, then close the current x264 encoder and create it again (the frame rate adopts the current target frame Rate, other encoding parameters used before closing) .

It is very important to feedback the QP of each frame

It is very important to feedback the QP of each frame
I have to say a few more words about this one.

In fact, I quickly finished writing the x264 encoder, and used several common resolution/frame rate parameters, and tested it with a Logitech camera, and it worked quite well. But when I changed the video source from the camera to a 1080P mp4 file, a very strange problem occurred and I stepped on a pit:
At first the sending resolution was 1920x1080, but as seen through webrtc-internals, the sending resolution was quickly reduced to 1440x810, then dropped to 960x540, and after a few seconds, it quickly dropped to 720x405, and then guess what, x264 went wrong . what is the reason? An odd resolution was entered.
So what causes the resolution to drop rapidly in a stepwise manner in a short time? After reading the source code, I discovered the mystery: WebRTC will decide whether to increase or decrease the sending resolution based on the average QP of the encoding within a certain period of time! In other words, the fundamental reason for the continuous reduction of the transmission resolution is that the current average QP has been consistently high (exceeding the upper limit of the QP threshold we mentioned above), so we have been constantly requesting resolution reduction.
If it is not x264 because of an error in the input of an odd resolution, the next possible arrival is 480x270. (Note, the lower resolution is calculated alternately according to the input resolution 3/4 and 2/3)
OK, the reason is found, and then the problem is solved. The reason is that last question because I did not encoded by webrtc::H264BitstreamParser of ParseBitstream and GetLastSliceQp calculated coded frames of these two methods to send out the correct QP. (The call of these two methods can be found in the code after OpenH264 encoding)

What is RTPFragmentationHeader

The function of RTPFragmentationHeader is actually to identify the starting position and length of each NALU of the encoded H264 data after removing the start code. For example: For example, the encoded piece of H264 data is in the following format:

00 00 00 01 67 aa aa aa aa aa aa 00 00 00 01 68 bb bb bb 00 00 00 01 65 cc cc cc

Then, RTPFragmentationHeader will store 3 elements:

The offset pointer of element 1 points to 67,
**- and the length is the number of bytes between 67
and the next start code.**
The offset pointer of element 2 points to 68, and the length is the number of bytes between 68 and the next start code.
The offset of element 3 The pointer points to 65, the length is the number of bytes between 65 and the next start code

元素1的offset指針指向67，長度是67開始到下一個開始碼之間的字節數
元素2的offset指針指向68，長度是68開始到下一個開始碼之間的字節數
元素3的offset指針指向65，長度是65開始到下一個開始碼之間的字節數

Note that there are fragmentationPlType and fragmentationTimeDiff fields in the old version of the WebRTC source code, and the new version has been deleted.
OK, finally, the RTPFragmentationHeader and the encoded data will be sent outside through EncodedImageCallback->OnEncodedImage() for subsequent processing.

H264 encoded data arrangement format requirements

The data encoded by OpenH264 is generally composed of relatively regular “4-byte start code + NALU type + encoded data”, as follows:
Some of the data encoded by the x264 encoder are useless and need to be skipped.
However, the following is the beginning part of the x264 encoding:
Here also note that the NALU start code sent by the x264 encoder has 4 lengths, and there are also 3 lengths.
You need to pay attention when filling the RTPFragmentationHeader. wrong. **
In addition, some NALU types (such as SEI, AUD) need to be skipped.**
Here is an insert. At first, when I filled the RTPFragmentationHeader, I removed the NALU start code.
I thought, RTPFragmentationHeader gets data according to offset and length anyway, it doesn’t matter whether the start code is or not. But then I found out that I was wrong.
This start code is not used outside, but is webrtc::H264BitstreamParser when analyzing the stream and QP value.
- Without the start code, the correct QP value cannot be read, so it must not be removed. In this part, you can read the source code of the following H264BitstreamParser.

Streams with pred_weight_table unsupported

After replacing x264, I found a lot of this error in the debug level log. It comes from

third_party\webrtc\common_video\h264\h264_bitstream_parser.cc

My solution is to pass the following two lines of code:

[x264_param_t].analyse.i_weighted_pred = X264_WEIGHTP_NONE;
[x264_param_t].analyse.b_weighted_bipred = X264_WEIGHTP_NONE;

To be honest, I don’t know what the impact of closing it is. If you know, please let me know.
OK, that’s about it. This article explains the main ideas and precautions for modifying the H264Encoder module of WebRTC and adding the x264 encoder.
There are actually many small knowledge points in it, and you can write an article specifically. E.g:

WebRTC’s video transmission strategy includes frame rate, resolution, and balance modes.

WebRTC’s video transmission strategy includes frame rate, resolution, and balance modes.
What is the difference?
Under a certain sending strategy, what are the factors that affect the increase and decrease (resolution/frame rate)?
Where does the real-time bit rate come from?
The calculation method of the average QP, and the limit of upward adjustment and downward adjustment
How does WebRTC detect CPU overload (different detection strategies under soft coding and hard coding)
I strongly recommend that you have time to take a look at the implementation code of WebRTC, there are a lot of things worth learning.