webRTC解决方案实现了P2P的音视频通信,其中有关timing的几个问题值得归纳总结。开始本文之前建议先行阅读https://xie.infoq.cn/article/738b8293dce86f7c8748e2629 了解视频传输的关键路径。webRTC是一个异步系统,通信的双方无需做时间同步。本文主要探讨webRTC是怎样解决下面两个跟时间有关的问题:1. 音视频同步 2. 基于延时的带宽评估。
2.NTP(Network Time Protocol)时间:全局时间信息,从1/1/1900-00:00h计时到当前的时间差值
这三种时间坐标都是对时间的度量,只是描述时间的方式不同。比如当前绝对时间2020-08-05T06:08:52+00:00, 它们是这样表达的。
一帧视频画面在caputer线程就记录下了,这一帧对应的三个时间信息,尤其重要的是RTP时间。这个rtp_timestamp在Packet pacer模块会加一个提前设定的偏移量,作为最终的rtp时间发出去。这个偏移量加在了整个rtp时间坐标系内,所有的对外的RTP时间都加了。
RTCP SR(sender report)的作用之一就是做时间对齐的,将该流中的RTP时间于NTP时间对齐。所有的流都对齐发送端的NTP时间,这样接收端就有了统一时间基准。
RTCP SR format 如下:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
header |V=2|P| RC | PT=SR=200 | length |
| SSRC of sender |
sender | NTP timestamp, most significant word |
info +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NTP timestamp, least significant word |
| RTP timestamp |
| sender's packet count |
| sender's octet count |
report | SSRC_1 (SSRC of first source) |
block +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1 | fraction lost | cumulative number of packets lost |
| extended highest sequence number received |
| interarrival jitter |
| last SR (LSR) |
| delay since last SR (DLSR) |
report | SSRC_2 (SSRC of second source) |
block +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2 : ... :
| profile-specific extensions |
The sender report packet consists of three sections, possibly
followed by a fourth profile-specific extension section if defined.
The first section, the header, is 8 octets long. The fields have the
following meaning:
version (V): 2 bits
Identifies the version of RTP, which is the same in RTCP packets
as in RTP data packets. The version defined by this specification
is two (2).
上图中可以看到经过网络传输后,到达接收端的帧数据可能经过了jitter(抖动),乱序,比如stream1 的帧2/3/4。接收端通过RTCP SR和buffer的设计,采用pull的模式,以渲染作为终点倒推从frame queue中取帧的延迟。从单条流处理过程中可以看到该延迟包含渲染+解码+抖动延迟,而多流之间的同步还需要考虑流之间的相对传输延迟(参考RtpStreamsSynchronizer),最终得到每条流的取帧延迟。
RTP format
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|V=2|P|X| CC |M| PT | sequence number |
| timestamp |
| synchronization source (SSRC) identifier |
| contributing source (CSRC) identifiers(if mixed) |
| .... |
| header extension (optional) |
| payload header (format depended) |
| payload data |
在header extension域组织如下类型的扩展内容
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| 0xBE | 0xDE | length=1 |
| ID | L=1 |transport-wide sequence number | zero padding |
Tips: 可以注意到RTP包里有两种sequence number。
发送端在发送的时候对每一个RTP packet都打上transport-wide sequence number的序号(PacketRouter::SendPacket),比如发送seq=53,54,55。
接收端RemoteEstimatorProxy模块负责传输层统计的反馈,周期性的把包接收的时间信息回馈到发送端。transport feedback的格式有详细的规则,定义如下 https://tools.ietf.org/id/draft-dt-rmcat-feedback-message-04.html#rfc.section.3.1
这里有一篇写的不错的注解可以参考 https://blog.jianchihu.net/webrtc-research-transport-cc-rtp-rtcp.html
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|V=2|P| FMT=CCFB | PT = 205 | length |
| SSRC of packet sender |
| SSRC of 1st media source |
| begin_seq | end_seq |
|L|ECN| Arrival time offset | ... .
. .
. .
. .
| SSRC of nth media source |
| begin_seq | end_seq |
|L|ECN| Arrival time offset | ... |
. .
. .
| Report Timestamp (32bits) |
RTCP transport feedback一般是RTCP通道上最频繁的传递内容,webRTC对其传输也有特别的设计。关注以下几个参数
max_intervel = 250ms //feedback 最大周期
min_intervel =50ms //feedback 最小周期
rtcp_ratio = 5% //feedback占用带宽比例
Avg_feedback_size = 68bytes //平均一个feedback包的大小
发送RTCP transport feedback的时间周期控制在[50ms,250ms]内,在这个范围内根据当前带宽动态调整,尽量把RTCP transport feedback的传输占用带宽比例控制在5%。可以计算得到边界,单单传输feedback占用的带宽范围[2176bps,10880bps],也是一笔不小的开销了。
class webrtc::VideoFrame{
uint16_t id_; //picture id
uint32_t timestamp_rtp_; //rtp timestamp, (u32)ntp_time_ms_ *90
int64_t ntp_time_ms_; //ntp timestamp, capture time since 1/1/1900-00:00h
int64_t timestamp_us_; //internal timestamp, capture time since system started, round at 49.71days
VideoStreamEncoder::OnFrame // caluclate capture timing
VideoStreamEncoder::OnEncodedImage // fill capture timing
RtpVideoSender::OnEncodedImage // timestamp_rtp_+random value
class webrtc::EncodedImage{
//RTP Video Timing extension
struct Timing {
uint8_t flags = VideoSendTiming::kInvalid;
int64_t encode_start_ms = 0; //frame encoding start time, base on ntp_time_ms_
int64_t encode_finish_ms = 0; //frame encoding end time, base on ntp_time_ms_
int64_t packetization_finish_ms = 0; //encoded frame packetization time, base on ntp_time_ms_
int64_t pacer_exit_ms = 0; //packet sent time when leaving pacer, base on ntp_time_ms_
int64_t network_timestamp_ms = 0; //reseved for network node
int64_t network2_timestamp_ms = 0; //reseved for network node
int64_t receive_start_ms = 0;
int64_t receive_finish_ms = 0;
} timing_;
uint32_t timestamp_rtp_; //same as caputrer.timestamp_rtp_
int64_t ntp_time_ms_; //same as caputrer.ntp_time_ms_
int64_t capture_time_ms_; //same as caputrer.capture_time_ms_
class webrtc::RtpPacketToSend{
// RTP Header.
bool marker_; //frame end marker
uint16_t sequence_number_; //RTP sequence number, start at random(1,32767)
uint32_t timestamp_; //capturer timestamp_rtp_ + u32.random()
uint32_t ssrc_; //Synchronization Source, specify media source
int64_t capture_time_ms_; //same as capturer.capture_time_ms_
receiver side
class webrtc::RtpPacketReceived{
NtpTime capture_time_;
int64_t arrival_time_ms_; //RTP packet arrival time, local internal timestamp
// RTP Header.
bool marker_; //frame end marker
uint16_t sequence_number_; //RTP sequence number, start at random(1,32767)
uint32_t timestamp_; //sender's rtp timestamp maintained by RTPSenderVideo
uint32_t ssrc_; //Synchronization Source, specify media source
RtpVideoStreamReceiver::ReceivePacket /OnReceivedPayloadData
struct webrtc::RTPHeader{
bool markerBit;
uint16_t sequenceNumber; //RTP sequence, set by sender per RTP packet
uint32_t timestamp; //sender's RTP timestamp
uint32_t ssrc;
RTPHeaderExtension extension; //contains PlayoutDelay&VideoSendTiming if has
class webrtc::RtpDepacketizer::ParsedPayload{
RTPVideoHeader video;
const uint8_t* payload;
size_t payload_length;
class webrtc::RTPVideoHeader{
bool is_first_packet_in_frame;
bool is_last_packet_in_frame;
PlayoutDelay playout_delay; //playout delay extension
VideoSendTiming video_timing; //Video Timing extension, align with sender's webrtc::EncodedImage::timing
class webrtc::VCMPacket{
uint32_t timestamp; //sender's RTP timestamp
int64_t ntp_time_ms_;
uint16_t seqNum;
RTPVideoHeader video_header;
RtpPacketInfo packet_info;
class webrtc::RtpPacketInfo{
uint32_t ssrc_;
uint32_t rtp_timestamp_; //sender's rtp timestamp
absl::optional<AbsoluteCaptureTime> absolute_capture_time_; //
int64_t receive_time_ms_; //packet receive time, local internal timestamp
class webrtc::video_coding::RtpFrameObject: public EncodedImage{
RTPVideoHeader rtp_video_header_;
uint16_t first_seq_num_;
uint16_t last_seq_num_;
int64_t last_packet_received_time_;
int64_t _renderTimeMs;
//inherit from webrtc::EncodedImage
uint32_t timestamp_rtp_;
int64_t ntp_time_ms_;
int64_t capture_time_ms_;
