| draft-ietf-quic-recovery-06.txt | draft-ietf-quic-recovery-07.txt | |||
|---|---|---|---|---|
| QUIC J. Iyengar, Ed. | QUIC J. Iyengar, Ed. | |||
| Internet-Draft I. Swett, Ed. | Internet-Draft I. Swett, Ed. | |||
| Intended status: Standards Track Google | Intended status: Standards Track Google | |||
| Expires: March 26, 2018 September 22, 2017 | Expires: May 18, 2018 November 14, 2017 | |||
| QUIC Loss Detection and Congestion Control | QUIC Loss Detection and Congestion Control | |||
| draft-ietf-quic-recovery-06 | draft-ietf-quic-recovery-07 | |||
| Abstract | Abstract | |||
| This document describes loss detection and congestion control | This document describes loss detection and congestion control | |||
| mechanisms for QUIC. | mechanisms for QUIC. | |||
| Note to Readers | Note to Readers | |||
| Discussion of this draft takes place on the QUIC working group | Discussion of this draft takes place on the QUIC working group | |||
| mailing list (quic@ietf.org), which is archived at | mailing list (quic@ietf.org), which is archived at | |||
| https://mailarchive.ietf.org/arch/search/?email_list=quic . | https://mailarchive.ietf.org/arch/search/?email_list=quic [1]. | |||
| Working Group information can be found at https://github.com/quicwg ; | Working Group information can be found at https://github.com/quicwg | |||
| source code and issues list for this draft can be found at | [2]; source code and issues list for this draft can be found at | |||
| https://github.com/quicwg/base-drafts/labels/recovery . | https://github.com/quicwg/base-drafts/labels/recovery [3]. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on March 26, 2018. | This Internet-Draft will expire on May 18, 2018. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2017 IETF Trust and the persons identified as the | Copyright (c) 2017 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 | 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 | |||
| 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 3 | 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 3 | |||
| 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 | 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 | |||
| 2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 4 | 2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 4 | |||
| 2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 5 | 2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 | 2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 | |||
| 2.1.4. Explicit Correction For Delayed Acks . . . . . . . . 5 | 2.1.4. Explicit Correction For Delayed Acks . . . . . . . . 5 | |||
| 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 5 | 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 5 | 3.1. Computing the RTT estimate . . . . . . . . . . . . . . . 5 | |||
| 3.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 6 | 3.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 5 | |||
| 3.2.1. Constants of interest . . . . . . . . . . . . . . . . 6 | 3.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.2.2. Variables of interest . . . . . . . . . . . . . . . . 7 | 3.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 6 | |||
| 3.2.3. Initialization . . . . . . . . . . . . . . . . . . . 8 | 3.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 7 | |||
| 3.2.4. On Sending a Packet . . . . . . . . . . . . . . . . . 8 | 3.3.1. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 7 | |||
| 3.2.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 9 | 3.3.2. Retransmission Timeout . . . . . . . . . . . . . . . 9 | |||
| 3.2.6. On Packet Acknowledgment . . . . . . . . . . . . . . 9 | 3.3.3. Handshake Timeout . . . . . . . . . . . . . . . . . . 10 | |||
| 3.2.7. Setting the Loss Detection Alarm . . . . . . . . . . 10 | 3.4. Algorithm Details . . . . . . . . . . . . . . . . . . . . 10 | |||
| 3.2.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 12 | 3.4.1. Constants of interest . . . . . . . . . . . . . . . . 10 | |||
| 3.2.9. Detecting Lost Packets . . . . . . . . . . . . . . . 13 | 3.4.2. Variables of interest . . . . . . . . . . . . . . . . 11 | |||
| 3.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . 14 | 3.4.3. Initialization . . . . . . . . . . . . . . . . . . . 12 | |||
| 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 14 | 3.4.4. On Sending a Packet . . . . . . . . . . . . . . . . . 13 | |||
| 4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 15 | 3.4.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 13 | |||
| 4.2. Congestion Avoidance . . . . . . . . . . . . . . . . . . 15 | 3.4.6. On Packet Acknowledgment . . . . . . . . . . . . . . 14 | |||
| 4.3. Recovery Period . . . . . . . . . . . . . . . . . . . . . 15 | 3.4.7. Setting the Loss Detection Alarm . . . . . . . . . . 15 | |||
| 4.4. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 15 | 3.4.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 17 | |||
| 4.5. Retransmission Timeout . . . . . . . . . . . . . . . . . 15 | 3.4.9. Detecting Lost Packets . . . . . . . . . . . . . . . 17 | |||
| 4.6. Pacing Rate . . . . . . . . . . . . . . . . . . . . . . . 16 | 3.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 4.7. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 16 | 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 4.7.1. Constants of interest . . . . . . . . . . . . . . . . 16 | 4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 4.7.2. Variables of interest . . . . . . . . . . . . . . . . 16 | 4.2. Congestion Avoidance . . . . . . . . . . . . . . . . . . 19 | |||
| 4.7.3. Initialization . . . . . . . . . . . . . . . . . . . 17 | 4.3. Recovery Period . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 4.7.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 17 | 4.4. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 4.7.5. On Packet Acknowledgement . . . . . . . . . . . . . . 17 | 4.5. Retransmission Timeout . . . . . . . . . . . . . . . . . 20 | |||
| 4.7.6. On Packets Lost . . . . . . . . . . . . . . . . . . . 17 | 4.6. Pacing Rate . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 4.7.7. On Retransmission Timeout Verified . . . . . . . . . 18 | 4.7. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 | 4.7.1. Constants of interest . . . . . . . . . . . . . . . . 20 | |||
| 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 | 4.7.2. Variables of interest . . . . . . . . . . . . . . . . 20 | |||
| 6.1. Normative References . . . . . . . . . . . . . . . . . . 18 | 4.7.3. Initialization . . . . . . . . . . . . . . . . . . . 21 | |||
| 6.2. Informative References . . . . . . . . . . . . . . . . . 18 | 4.7.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 21 | |||
| Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 19 | 4.7.5. On Packet Acknowledgement . . . . . . . . . . . . . . 21 | |||
| Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 19 | 4.7.6. On Packets Lost . . . . . . . . . . . . . . . . . . . 22 | |||
| B.1. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 19 | 4.7.7. On Retransmission Timeout Verified . . . . . . . . . 22 | |||
| B.2. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 19 | 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 | |||
| B.3. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 19 | 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| B.4. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 20 | 6.1. Normative References . . . . . . . . . . . . . . . . . . 23 | |||
| B.5. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 20 | 6.2. Informative References . . . . . . . . . . . . . . . . . 24 | |||
| B.6. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 20 | 6.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| B.7. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 20 | Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 24 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 | Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 24 | |||
| B.1. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 24 | ||||
| B.2. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 24 | ||||
| B.3. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 25 | ||||
| B.4. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 25 | ||||
| B.5. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 25 | ||||
| B.6. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 25 | ||||
| B.7. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 25 | ||||
| B.8. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 25 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 | ||||
| 1. Introduction | 1. Introduction | |||
| QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | |||
| on decades of transport and security experience, and implements | on decades of transport and security experience, and implements | |||
| mechanisms that make it attractive as a modern general-purpose | mechanisms that make it attractive as a modern general-purpose | |||
| transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | |||
| QUIC implements the spirit of known TCP loss recovery mechanisms, | QUIC implements the spirit of known TCP loss recovery mechanisms, | |||
| described in RFCs, various Internet-drafts, and also those prevalent | described in RFCs, various Internet-drafts, and also those prevalent | |||
| skipping to change at page 4, line 13 ¶ | skipping to change at page 4, line 21 ¶ | |||
| below. | below. | |||
| o Retransmittable frames are frames requiring reliable delivery. | o Retransmittable frames are frames requiring reliable delivery. | |||
| The most common are STREAM frames, which typically contain | The most common are STREAM frames, which typically contain | |||
| application data. | application data. | |||
| o Crypto handshake data is sent on stream 0, and uses the | o Crypto handshake data is sent on stream 0, and uses the | |||
| reliability machinery of QUIC underneath. | reliability machinery of QUIC underneath. | |||
| o ACK frames contain acknowledgment information. QUIC uses a SACK- | o ACK frames contain acknowledgment information. QUIC uses a SACK- | |||
| based scheme, where acks express up to 256 ranges. The ACK frame | based scheme, where acks express up to 256 ranges. | |||
| also includes a receive timestamp for each packet newly acked. | ||||
| 2.1. Relevant Differences Between QUIC and TCP | 2.1. Relevant Differences Between QUIC and TCP | |||
| Readers familiar with TCP's loss detection and congestion control | Readers familiar with TCP's loss detection and congestion control | |||
| will find algorithms here that parallel well-known TCP ones. | will find algorithms here that parallel well-known TCP ones. | |||
| Protocol differences between QUIC and TCP however contribute to | Protocol differences between QUIC and TCP however contribute to | |||
| algorithmic differences. We briefly describe these protocol | algorithmic differences. We briefly describe these protocol | |||
| differences below. | differences below. | |||
| 2.1.1. Monotonically Increasing Packet Numbers | 2.1.1. Monotonically Increasing Packet Numbers | |||
| skipping to change at page 5, line 30 ¶ | skipping to change at page 5, line 37 ¶ | |||
| between when a packet is received and when the corresponding ACK is | between when a packet is received and when the corresponding ACK is | |||
| sent. This allows the receiver of the ACK to adjust for receiver | sent. This allows the receiver of the ACK to adjust for receiver | |||
| delays, specifically the delayed ack timer, when estimating the path | delays, specifically the delayed ack timer, when estimating the path | |||
| RTT. This mechanism also allows a receiver to measure and report the | RTT. This mechanism also allows a receiver to measure and report the | |||
| delay from when a packet was received by the OS kernel, which is | delay from when a packet was received by the OS kernel, which is | |||
| useful in receivers which may incur delays such as context-switch | useful in receivers which may incur delays such as context-switch | |||
| latency before a userspace QUIC receiver processes a received packet. | latency before a userspace QUIC receiver processes a received packet. | |||
| 3. Loss Detection | 3. Loss Detection | |||
| 3.1. Overview | QUIC senders use both ack information and timeouts to detect lost | |||
| packets, and this section provides a description of these algorithms. | ||||
| Estimating the network round-trip time (RTT) is critical to these | ||||
| algorithms and is described first. | ||||
| QUIC uses a combination of ack information and alarms to detect lost | 3.1. Computing the RTT estimate | |||
| packets. An unacknowledged QUIC packet is marked as lost in one of | ||||
| the following ways: | ||||
| o A packet is marked as lost if at least one packet that was sent a | (To be filled) | |||
| threshold number of packets (kReorderingThreshold) after it has | ||||
| been acknowledged. This indicates that the unacknowledged packet | ||||
| is either lost or reordered beyond the specified threshold. This | ||||
| mechanism combines both TCP's FastRetransmit and FACK mechanisms. | ||||
| o If a packet is near the tail, where fewer than | 3.2. Ack-based Detection | |||
| kReorderingThreshold packets are sent after it, the sender cannot | ||||
| expect to detect loss based on the previous mechanism. In this | ||||
| case, a sender uses both ack information and an alarm to detect | ||||
| loss. Specifically, when the last sent packet is acknowledged, | ||||
| the sender waits a short period of time to allow for reordering | ||||
| and then marks any unacknowledged packets as lost. This mechanism | ||||
| is based on the Linux implementation of TCP Early Retransmit. | ||||
| o If a packet is sent at the tail, there are no packets sent after | Ack-based loss detection implements the spirit of TCP's Fast | |||
| it, and the sender cannot use ack information to detect its loss. | Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss | |||
| recovery [RFC6675]. This section provides an overview of how these | ||||
| algorithms are implemented in QUIC. | ||||
| The sender therefore relies on an alarm to detect such tail | (TODO: Define unacknowledged packet, ackable packet, outstanding | |||
| losses. This mechanism is based on TCP's Tail Loss Probe. | bytes.) | |||
| o If all else fails, a Retransmission Timeout (RTO) alarm is always | 3.2.1. Fast Retransmit | |||
| set when any retransmittable packet is outstanding. When this | ||||
| alarm fires, all unacknowledged packets are marked as lost. | ||||
| o Instead of a packet threshold to tolerate reordering, a QUIC | An unacknowledged packet is marked as lost when an acknowledgment is | |||
| sender may use a time threshold. This allows for senders to be | received for a packet that was sent a threshold number of packets | |||
| tolerant of short periods of significant reordering. In this | (kReorderingThreshold) after the unacknowledged packet. Receipt of | |||
| mechanism, a QUIC sender marks a packet as lost when a larger | the ack indicates that a later packet was received, while | |||
| packet number is acknowledged and a threshold amount of time has | kReorderingThreshold provides some tolerance for reordering of | |||
| passed since the packet was sent. | packets in the network. | |||
| o Handshake packets, which contain STREAM frames for stream 0, are | The RECOMMENDED initial value for kReorderingThreshold is 3. | |||
| critical to QUIC transport and crypto negotiation, so a separate | ||||
| alarm period is used for them. | ||||
| 3.2. Algorithm Details | We derive this default from recommendations for TCP loss recovery | |||
| [RFC5681] [RFC6675]. It is possible for networks to exhibit higher | ||||
| degrees of reordering, causing a sender to detect spurious losses. | ||||
| Detecting spurious losses leads to unnecessary retransmissions and | ||||
| may result in degraded performance due to the actions of the | ||||
| congestion controller upon detecting loss. Implementers MAY use | ||||
| algorithms developed for TCP, such as TCP-NCR [RFC4653], to improve | ||||
| QUIC's reordering resilience, though care should be taken to map TCP | ||||
| specifics to QUIC correctly. Similarly, using time-based loss | ||||
| detection to deal with reordering, such as in PR-TCP, should be more | ||||
| readily usable in QUIC. Making QUIC deal with such networks is | ||||
| important open research, and implementers are encouraged to explore | ||||
| this space. | ||||
| 3.2.1. Constants of interest | 3.2.2. Early Retransmit | |||
| Unacknowledged packets close to the tail may have fewer than | ||||
| kReorderingThreshold number of ackable packets sent after them. Loss | ||||
| of such packets cannot be detected via Fast Retransmit. To enable | ||||
| ack-based loss detection of such packets, receipt of an | ||||
| acknowledgment for the last outstanding ackable packet triggers the | ||||
| Early Retransmit process, as follows. | ||||
| If there are unacknowledged ackable packets still pending, they ought | ||||
| to be marked as lost. To compensate for the reduced reordering | ||||
| resilience, the sender SHOULD set an alarm for a small period of | ||||
| time. If the unacknowledged ackable packets are not acknowledged | ||||
| during this time, then these packets MUST be marked as lost. | ||||
| An endpoint SHOULD set the alarm such that a packet is marked as lost | ||||
| no earlier than 1.25 * max(SRTT, latest_RTT) since when it was sent. | ||||
| Using max(SRTT, latest_RTT) protects from the two following cases: | ||||
| o the latest RTT sample is lower than the SRTT, perhaps due to | ||||
| reordering where packet whose ack triggered the Early Retransit | ||||
| process encountered a shorter path; | ||||
| o the latest RTT sample is higher than the SRTT, perhaps due to a | ||||
| sustained increase in the actual RTT, but the smoothed SRTT has | ||||
| not yet caught up. | ||||
| The 1.25 multiplier increases reordering resilience. Implementers | ||||
| MAY experiment with using other multipliers, bearing in mind that a | ||||
| lower multiplier reduces reordering resilience and increases spurious | ||||
| retransmissions, and a higher multipler increases loss recovery | ||||
| delay. | ||||
| This mechanism is based on Early Retransmit for TCP [RFC5827]. | ||||
| However, [RFC5827] does not include the alarm described above. Early | ||||
| Retransmit is prone to spurious retransmissions due to its reduced | ||||
| reordering resilence without the alarm. This observation led Linux | ||||
| TCP implementers to implement an alarm for TCP as well, and this | ||||
| document incorporates this advancement. | ||||
| 3.3. Timer-based Detection | ||||
| Timer-based loss detection implements the spirit of TCP's Tail Loss | ||||
| Probe and Retransmission Timeout mechanisms. | ||||
| 3.3.1. Tail Loss Probe | ||||
| The algorithm described in this section is an adaptation of the Tail | ||||
| Loss Probe algorithm proposed for TCP [TLP]. | ||||
| A packet sent at the tail is particularly vulnerable to slow loss | ||||
| detection, since acks of subsequent packets are needed to trigger | ||||
| ack-based detection. To ameliorate this weakness of tail packets, | ||||
| the sender schedules an alarm when the last ackable packet before | ||||
| quiescence is transmitted. When this alarm fires, a Tail Loss Probe | ||||
| (TLP) packet is sent to evoke an acknowledgement from the receiver. | ||||
| The alarm duration, or Probe Timeout (PTO), is set based on the | ||||
| following conditions: | ||||
| o If there is exactly one unacknowledged packet, PTO SHOULD be | ||||
| scheduled for max(2_SRTT, 1.5_SRTT+kDelayedAckTimeout) | ||||
| o If there are more than one unacknowledged packets, PTO SHOULD be | ||||
| scheduled for max(2*SRTT, 10ms). | ||||
| o If RTO is earlier, schedule a TLP alarm in its place. That is, | ||||
| PTO SHOULD be scheduled for min(RTO, PTO). | ||||
| kDelayedAckTimeout is the expected delayed ACK timer. When there is | ||||
| exactly one unacknowledged packet, the alarm duration includes time | ||||
| for an acknowledgment to be received, and additionally, a | ||||
| kDelayedAckTimeout period to compensate for the delayed | ||||
| acknowledgment timer at the receiver. | ||||
| The RECOMMENDED value for kDelayedAckTimeout is 25ms. | ||||
| (TODO: Add negotiability of delayed ack timeout.) | ||||
| A PTO value of at least 2_SRTT ensures that the ACK is overdue. | ||||
| Using a PTO of exactly 1_SRTT may generate spurious probes, and | ||||
| 2*SRTT is simply the next integral value of RTT. | ||||
| (TODO: These values of 2 and 1.5 are a bit arbitrary. Reconsider | ||||
| these.) | ||||
| If the Retransmission Timeout (RTO, Section 3.3.2) period is smaller | ||||
| than the computed PTO, then a PTO is scheduled for the smaller RTO | ||||
| period. | ||||
| To reduce latency, it is RECOMMENDED that the sender set and allow | ||||
| the TLP alarm to fire twice before setting an RTO alarm. In other | ||||
| words, when the TLP alarm fires the first time, a TLP packet is sent, | ||||
| and it is RECOMMENDED that the TLP alarm be scheduled for a second | ||||
| time. When the TLP alarm fires the second time, a second TLP packet | ||||
| is sent, and an RTO alarm SHOULD be scheduled Section 3.3.2. | ||||
| A TLP packet SHOULD carry new data when possible. If new data is | ||||
| unavailable or new data cannot be sent due to flow control, a TLP | ||||
| packet MAY retransmit unacknowledged data to potentially reduce | ||||
| recovery time. Since a TLP alarm is used to send a probe into the | ||||
| network prior to establishing any packet loss, prior unacknowledged | ||||
| packets SHOULD NOT be marked as lost when a TLP alarm fires. | ||||
| A TLP packet MUST NOT be blocked by the sender's congestion | ||||
| controller. The sender MUST however count these bytes as additional | ||||
| bytes in flight, since a TLP adds network load without establishing | ||||
| packet loss. | ||||
| A sender will commonly not know that a packet being sent is a tail | ||||
| packet. Consequently, a sender may have to arm or adjust the TLP | ||||
| alarm on every sent ackable packet. | ||||
| 3.3.2. Retransmission Timeout | ||||
| A Retransmission Timeout (RTO) alarm is the final backstop for loss | ||||
| detection. The algorithm used in QUIC is based on the RTO algorithm | ||||
| for TCP [RFC5681] and is additionally resilient to spurious RTO | ||||
| events [RFC5682]. | ||||
| When the last TLP packet is sent, an alarm is scheduled for the RTO | ||||
| period. When this alarm fires, the sender sends two packets, to | ||||
| evoke acknowledgements from the receiver, and restarts the RTO alarm. | ||||
| Similar to TCP [RFC6298], the RTO period is set based on the | ||||
| following conditions: | ||||
| o When the final TLP packet is sent, the RTO period is set to | ||||
| max(SRTT + 4*RTTVAR, minRTO) | ||||
| o When an RTO alarm fires, the RTO period is doubled. | ||||
| The sender typically has incurred a high latency penalty by the time | ||||
| an RTO alarm fires, and this penalty increases exponentially in | ||||
| subsequent consecutive RTO events. Sending a single packet on an RTO | ||||
| event therefore makes the connection very sensitive to single packet | ||||
| loss. Sending two packets instead of one significantly increases | ||||
| resilience to packet drop in both directions, thus reducing the | ||||
| probability of consecutive RTO events. | ||||
| QUIC's RTO algorithm differs from TCP in that the firing of an RTO | ||||
| alarm is not considered a strong enough signal of packet loss. An | ||||
| RTO alarm fires only when there's a prolonged period of network | ||||
| silence, which could be caused by a change in the underlying network | ||||
| RTT. | ||||
| When an acknowledgment is received for a packet sent on an RTO event, | ||||
| any unacknowledged packets with lower packet numbers than those | ||||
| acknowledged MUST be marked as lost. | ||||
| A packet sent when an RTO alarm fires MAY carry new data if available | ||||
| or unacknowledged data to potentially reduce recovery time. Since | ||||
| this packet is sent as a probe into the network prior to establishing | ||||
| any packet loss, prior unacknowledged packets SHOULD NOT be marked as | ||||
| lost. | ||||
| A packet sent on an RTO alarm MUST NOT be blocked by the sender's | ||||
| congestion controller. A sender MUST however count these bytes as | ||||
| additional bytes in flight, since this packet adds network load | ||||
| without establishing packet loss. | ||||
| 3.3.3. Handshake Timeout | ||||
| Handshake packets, which contain STREAM frames for stream 0, are | ||||
| critical to QUIC transport and crypto negotiation, so a separate | ||||
| alarm is used for them. | ||||
| The handshake timeout SHOULD be set to twice the initial RTT. | ||||
| There are no prior RTT samples within this connection. However, this | ||||
| may be a resumed connection over the same network, in which case, a | ||||
| client SHOULD use the previous connection's final smoothed RTT value | ||||
| as the resumed connection's initial RTT. | ||||
| If no previous RTT is available, or if the network changes, the | ||||
| initial RTT SHOULD be set to 100ms. | ||||
| When the first handshake packet is sent, the sender SHOULD set an | ||||
| alarm for the handshake timeout period. | ||||
| When the alarm fires, the sender MUST retransmit all unacknowledged | ||||
| handshake frames. The sender SHOULD double the handshake timeout and | ||||
| set an alarm for this period. | ||||
| On each consecutive firing of the handshake alarm, the sender SHOULD | ||||
| double the handshake timeout period. | ||||
| When an acknowledgement is received for a handshake packet, the new | ||||
| RTT is computed and the alarm SHOULD be set for twice the newly | ||||
| computed smoothed RTT. | ||||
| Handshake frames may be cancelled by handshake state transitions. In | ||||
| particular, all non-protected frames SHOULD no longer be transmitted | ||||
| once packet protection is available. | ||||
| (TODO: Work this section some more. Add text on client vs. server, | ||||
| and on stateless retry.) | ||||
| 3.4. Algorithm Details | ||||
| 3.4.1. Constants of interest | ||||
| Constants used in loss recovery are based on a combination of RFCs, | Constants used in loss recovery are based on a combination of RFCs, | |||
| papers, and common practice. Some may need to be changed or | papers, and common practice. Some may need to be changed or | |||
| negotiated in order to better suit a variety of environments. | negotiated in order to better suit a variety of environments. | |||
| kMaxTLPs (default 2): Maximum number of tail loss probes before an | kMaxTLPs (default 2): Maximum number of tail loss probes before an | |||
| RTO fires. | RTO fires. | |||
| kReorderingThreshold (default 3): Maximum reordering in packet | kReorderingThreshold (default 3): Maximum reordering in packet | |||
| number space before FACK style loss detection considers a packet | number space before FACK style loss detection considers a packet | |||
| skipping to change at page 7, line 5 ¶ | skipping to change at page 11, line 25 ¶ | |||
| kMinRTOTimeout (default 200ms): Minimum time in the future an RTO | kMinRTOTimeout (default 200ms): Minimum time in the future an RTO | |||
| alarm may be set for. | alarm may be set for. | |||
| kDelayedAckTimeout (default 25ms): The length of the peer's delayed | kDelayedAckTimeout (default 25ms): The length of the peer's delayed | |||
| ack timer. | ack timer. | |||
| kDefaultInitialRtt (default 100ms): The default RTT used before an | kDefaultInitialRtt (default 100ms): The default RTT used before an | |||
| RTT sample is taken. | RTT sample is taken. | |||
| 3.2.2. Variables of interest | 3.4.2. Variables of interest | |||
| Variables required to implement the congestion control mechanisms are | Variables required to implement the congestion control mechanisms are | |||
| described in this section. | described in this section. | |||
| loss_detection_alarm: Multi-modal alarm used for loss detection. | loss_detection_alarm: Multi-modal alarm used for loss detection. | |||
| handshake_count: The number of times the handshake packets have been | handshake_count: The number of times the handshake packets have been | |||
| retransmitted without receiving an ack. | retransmitted without receiving an ack. | |||
| tlp_count: The number of times a tail loss probe has been sent | tlp_count: The number of times a tail loss probe has been sent | |||
| skipping to change at page 8, line 9 ¶ | skipping to change at page 12, line 31 ¶ | |||
| based on early transmit or exceeding the reordering window in | based on early transmit or exceeding the reordering window in | |||
| time. | time. | |||
| sent_packets: An association of packet numbers to information about | sent_packets: An association of packet numbers to information about | |||
| them, including a number field indicating the packet number, a | them, including a number field indicating the packet number, a | |||
| time field indicating the time a packet was sent, and a bytes | time field indicating the time a packet was sent, and a bytes | |||
| field indicating the packet's size. sent_packets is ordered by | field indicating the packet's size. sent_packets is ordered by | |||
| packet number, and packets remain in sent_packets until | packet number, and packets remain in sent_packets until | |||
| acknowledged or lost. | acknowledged or lost. | |||
| 3.2.3. Initialization | 3.4.3. Initialization | |||
| At the beginning of the connection, initialize the loss detection | At the beginning of the connection, initialize the loss detection | |||
| variables as follows: | variables as follows: | |||
| loss_detection_alarm.reset() | loss_detection_alarm.reset() | |||
| handshake_count = 0 | handshake_count = 0 | |||
| tlp_count = 0 | tlp_count = 0 | |||
| rto_count = 0 | rto_count = 0 | |||
| if (UsingTimeLossDetection()) | if (UsingTimeLossDetection()) | |||
| reordering_threshold = infinite | reordering_threshold = infinite | |||
| skipping to change at page 8, line 31 ¶ | skipping to change at page 13, line 5 ¶ | |||
| else: | else: | |||
| reordering_threshold = kReorderingThreshold | reordering_threshold = kReorderingThreshold | |||
| time_reordering_fraction = infinite | time_reordering_fraction = infinite | |||
| loss_time = 0 | loss_time = 0 | |||
| smoothed_rtt = 0 | smoothed_rtt = 0 | |||
| rttvar = 0 | rttvar = 0 | |||
| largest_sent_before_rto = 0 | largest_sent_before_rto = 0 | |||
| time_of_last_sent_packet = 0 | time_of_last_sent_packet = 0 | |||
| largest_sent_packet = 0 | largest_sent_packet = 0 | |||
| 3.2.4. On Sending a Packet | 3.4.4. On Sending a Packet | |||
| After any packet is sent, be it a new transmission or a rebundled | After any packet is sent, be it a new transmission or a rebundled | |||
| transmission, the following OnPacketSent function is called. The | transmission, the following OnPacketSent function is called. The | |||
| parameters to OnPacketSent are as follows: | parameters to OnPacketSent are as follows: | |||
| o packet_number: The packet number of the sent packet. | o packet_number: The packet number of the sent packet. | |||
| o is_ack_only: A boolean that indicates whether a packet only | o is_ack_only: A boolean that indicates whether a packet only | |||
| contains an ACK frame. If true, it is still expected an ack will | contains an ACK frame. If true, it is still expected an ack will | |||
| be received for this packet, but it is not congestion controlled. | be received for this packet, but it is not congestion controlled. | |||
| skipping to change at page 9, line 15 ¶ | skipping to change at page 13, line 32 ¶ | |||
| OnPacketSent(packet_number, is_ack_only, sent_bytes): | OnPacketSent(packet_number, is_ack_only, sent_bytes): | |||
| time_of_last_sent_packet = now | time_of_last_sent_packet = now | |||
| largest_sent_packet = packet_number | largest_sent_packet = packet_number | |||
| sent_packets[packet_number].packet_number = packet_number | sent_packets[packet_number].packet_number = packet_number | |||
| sent_packets[packet_number].time = now | sent_packets[packet_number].time = now | |||
| if !is_ack_only: | if !is_ack_only: | |||
| OnPacketSentCC(sent_bytes) | OnPacketSentCC(sent_bytes) | |||
| sent_packets[packet_number].bytes = sent_bytes | sent_packets[packet_number].bytes = sent_bytes | |||
| SetLossDetectionAlarm() | SetLossDetectionAlarm() | |||
| 3.2.5. On Ack Receipt | 3.4.5. On Ack Receipt | |||
| When an ack is received, it may acknowledge 0 or more packets. | When an ack is received, it may acknowledge 0 or more packets. | |||
| Pseudocode for OnAckReceived and UpdateRtt follow: | Pseudocode for OnAckReceived and UpdateRtt follow: | |||
| OnAckReceived(ack): | OnAckReceived(ack): | |||
| largest_acked_packet = ack.largest_acked | largest_acked_packet = ack.largest_acked | |||
| // If the largest acked is newly acked, update the RTT. | // If the largest acked is newly acked, update the RTT. | |||
| if (sent_packets[ack.largest_acked]): | if (sent_packets[ack.largest_acked]): | |||
| latest_rtt = now - sent_packets[ack.largest_acked].time | latest_rtt = now - sent_packets[ack.largest_acked].time | |||
| skipping to change at page 9, line 45 ¶ | skipping to change at page 14, line 29 ¶ | |||
| UpdateRtt(latest_rtt): | UpdateRtt(latest_rtt): | |||
| // Based on {{RFC6298}}. | // Based on {{RFC6298}}. | |||
| if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
| smoothed_rtt = latest_rtt | smoothed_rtt = latest_rtt | |||
| rttvar = latest_rtt / 2 | rttvar = latest_rtt / 2 | |||
| else: | else: | |||
| rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - latest_rtt) | rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - latest_rtt) | |||
| smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | |||
| 3.2.6. On Packet Acknowledgment | 3.4.6. On Packet Acknowledgment | |||
| When a packet is acked for the first time, the following | When a packet is acked for the first time, the following | |||
| OnPacketAcked function is called. Note that a single ACK frame may | OnPacketAcked function is called. Note that a single ACK frame may | |||
| newly acknowledge several packets. OnPacketAcked must be called once | newly acknowledge several packets. OnPacketAcked must be called once | |||
| for each of these newly acked packets. | for each of these newly acked packets. | |||
| OnPacketAcked takes one parameter, acked_packet, which is the packet | OnPacketAcked takes one parameter, acked_packet, which is the packet | |||
| number of the newly acked packet, and returns a list of packet | number of the newly acked packet, and returns a list of packet | |||
| numbers that are detected as lost. | numbers that are detected as lost. | |||
| skipping to change at page 10, line 28 ¶ | skipping to change at page 15, line 17 ¶ | |||
| // If a packet sent prior to RTO was acked, then the RTO | // If a packet sent prior to RTO was acked, then the RTO | |||
| // was spurious. Otherwise, inform congestion control. | // was spurious. Otherwise, inform congestion control. | |||
| if (rto_count > 0 && | if (rto_count > 0 && | |||
| acked_packet_number > largest_sent_before_rto) | acked_packet_number > largest_sent_before_rto) | |||
| OnRetransmissionTimeoutVerified() | OnRetransmissionTimeoutVerified() | |||
| handshake_count = 0 | handshake_count = 0 | |||
| tlp_count = 0 | tlp_count = 0 | |||
| rto_count = 0 | rto_count = 0 | |||
| sent_packets.remove(acked_packet_number) | sent_packets.remove(acked_packet_number) | |||
| 3.2.7. Setting the Loss Detection Alarm | 3.4.7. Setting the Loss Detection Alarm | |||
| QUIC loss detection uses a single alarm for all timer-based loss | QUIC loss detection uses a single alarm for all timer-based loss | |||
| detection. The duration of the alarm is based on the alarm's mode, | detection. The duration of the alarm is based on the alarm's mode, | |||
| which is set in the packet and timer events further below. The | which is set in the packet and timer events further below. The | |||
| function SetLossDetectionAlarm defined below shows how the single | function SetLossDetectionAlarm defined below shows how the single | |||
| timer is set based on the alarm mode. | timer is set based on the alarm mode. | |||
| 3.2.7.1. Handshake Packets | 3.4.7.1. Handshake Packets | |||
| The initial flight has no prior RTT sample. A client SHOULD remember | The initial flight has no prior RTT sample. A client SHOULD remember | |||
| the previous RTT it observed when resumption is attempted and use | the previous RTT it observed when resumption is attempted and use | |||
| that for an initial RTT value. If no previous RTT is available, the | that for an initial RTT value. If no previous RTT is available, the | |||
| initial RTT defaults to 100ms. | initial RTT defaults to 100ms. | |||
| Endpoints MUST retransmit handshake frames if not acknowledged within | Endpoints MUST retransmit handshake frames if not acknowledged within | |||
| a time limit. This time limit will start as the largest of twice the | a time limit. This time limit will start as the largest of twice the | |||
| RTT value and MinTLPTimeout. Each consecutive handshake | RTT value and MinTLPTimeout. Each consecutive handshake | |||
| retransmission doubles the time limit, until an acknowledgement is | retransmission doubles the time limit, until an acknowledgement is | |||
| skipping to change at page 11, line 13 ¶ | skipping to change at page 16, line 5 ¶ | |||
| transmitted once packet protection is available. | transmitted once packet protection is available. | |||
| When stateless rejects are in use, the connection is considered | When stateless rejects are in use, the connection is considered | |||
| immediately closed once a reject is sent, so no timer is set to | immediately closed once a reject is sent, so no timer is set to | |||
| retransmit the reject. | retransmit the reject. | |||
| Version negotiation packets are always stateless, and MUST be sent | Version negotiation packets are always stateless, and MUST be sent | |||
| once per handshake packet that uses an unsupported QUIC version, and | once per handshake packet that uses an unsupported QUIC version, and | |||
| MAY be sent in response to 0RTT packets. | MAY be sent in response to 0RTT packets. | |||
| 3.2.7.2. Tail Loss Probe and Retransmission Timeout | 3.4.7.2. Tail Loss Probe and Retransmission Timeout | |||
| Tail loss probes [LOSS-PROBE] and retransmission timeouts [RFC6298] | Tail loss probes [LOSS-PROBE] and retransmission timeouts [RFC6298] | |||
| are an alarm based mechanism to recover from cases when there are | are an alarm based mechanism to recover from cases when there are | |||
| outstanding retransmittable packets, but an acknowledgement has not | outstanding retransmittable packets, but an acknowledgement has not | |||
| been received in a timely manner. | been received in a timely manner. | |||
| 3.2.7.3. Early Retransmit | 3.4.7.3. Early Retransmit | |||
| Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It | Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It | |||
| is part of QUIC's time based loss detection, but is always enabled, | is part of QUIC's time based loss detection, but is always enabled, | |||
| even when only packet reordering loss detection is enabled. | even when only packet reordering loss detection is enabled. | |||
| 3.2.7.4. Pseudocode | 3.4.7.4. Pseudocode | |||
| Pseudocode for SetLossDetectionAlarm follows: | Pseudocode for SetLossDetectionAlarm follows: | |||
| SetLossDetectionAlarm(): | SetLossDetectionAlarm(): | |||
| if (retransmittable packets are not outstanding): | if (retransmittable packets are not outstanding): | |||
| loss_detection_alarm.cancel() | loss_detection_alarm.cancel() | |||
| return | return | |||
| if (handshake packets are outstanding): | if (handshake packets are outstanding): | |||
| // Handshake retransmission alarm. | // Handshake retransmission alarm. | |||
| skipping to change at page 12, line 36 ¶ | skipping to change at page 17, line 5 ¶ | |||
| alarm_duration = kMinTLPTimeout | alarm_duration = kMinTLPTimeout | |||
| alarm_duration = max(alarm_duration, 2 * smoothed_rtt) | alarm_duration = max(alarm_duration, 2 * smoothed_rtt) | |||
| else: | else: | |||
| // RTO alarm | // RTO alarm | |||
| alarm_duration = smoothed_rtt + 4 * rttvar | alarm_duration = smoothed_rtt + 4 * rttvar | |||
| alarm_duration = max(alarm_duration, kMinRTOTimeout) | alarm_duration = max(alarm_duration, kMinRTOTimeout) | |||
| alarm_duration = alarm_duration * (2 ^ rto_count) | alarm_duration = alarm_duration * (2 ^ rto_count) | |||
| loss_detection_alarm.set(now + alarm_duration) | loss_detection_alarm.set(now + alarm_duration) | |||
| 3.2.8. On Alarm Firing | 3.4.8. On Alarm Firing | |||
| QUIC uses one loss recovery alarm, which when set, can be in one of | QUIC uses one loss recovery alarm, which when set, can be in one of | |||
| several modes. When the alarm fires, the mode determines the action | several modes. When the alarm fires, the mode determines the action | |||
| to be performed. | to be performed. | |||
| Pseudocode for OnLossDetectionAlarm follows: | Pseudocode for OnLossDetectionAlarm follows: | |||
| OnLossDetectionAlarm(): | OnLossDetectionAlarm(): | |||
| if (handshake packets are outstanding): | if (handshake packets are outstanding): | |||
| // Handshake retransmission alarm. | // Handshake retransmission alarm. | |||
| skipping to change at page 13, line 26 ¶ | skipping to change at page 17, line 34 ¶ | |||
| tlp_count++ | tlp_count++ | |||
| else: | else: | |||
| // RTO. | // RTO. | |||
| if (rto_count == 0) | if (rto_count == 0) | |||
| largest_sent_before_rto = largest_sent_packet | largest_sent_before_rto = largest_sent_packet | |||
| SendTwoPackets() | SendTwoPackets() | |||
| rto_count++ | rto_count++ | |||
| SetLossDetectionAlarm() | SetLossDetectionAlarm() | |||
| 3.2.9. Detecting Lost Packets | 3.4.9. Detecting Lost Packets | |||
| Packets in QUIC are only considered lost once a larger packet number | Packets in QUIC are only considered lost once a larger packet number | |||
| is acknowledged. DetectLostPackets is called every time an ack is | is acknowledged. DetectLostPackets is called every time an ack is | |||
| received. If the loss detection alarm fires and the loss_time is | received. If the loss detection alarm fires and the loss_time is | |||
| set, the previous largest acked packet is supplied. | set, the previous largest acked packet is supplied. | |||
| 3.2.9.1. Handshake Packets | 3.4.9.1. Handshake Packets | |||
| The receiver MUST ignore unprotected packets that ack protected | The receiver MUST close the connection with an error of type | |||
| packets. The receiver MUST trust protected acks for unprotected | OPTIMISTIC_ACK when receiving an unprotected packet that acks | |||
| packets, however. Aside from this, loss detection for handshake | protected packets. The receiver MUST trust protected acks for | |||
| packets when an ack is processed is identical to other packets. | unprotected packets, however. Aside from this, loss detection for | |||
| handshake packets when an ack is processed is identical to other | ||||
| packets. | ||||
| 3.2.9.2. Pseudocode | 3.4.9.2. Pseudocode | |||
| DetectLostPackets takes one parameter, acked, which is the largest | DetectLostPackets takes one parameter, acked, which is the largest | |||
| acked packet. | acked packet. | |||
| Pseudocode for DetectLostPackets follows: | Pseudocode for DetectLostPackets follows: | |||
| DetectLostPackets(largest_acked): | DetectLostPackets(largest_acked): | |||
| loss_time = 0 | loss_time = 0 | |||
| lost_packets = {} | lost_packets = {} | |||
| delay_until_lost = infinite | delay_until_lost = infinite | |||
| if (time_reordering_fraction != infinite): | if (time_reordering_fraction != infinite): | |||
| delay_until_lost = | delay_until_lost = | |||
| (1 + time_reordering_fraction) * max(latest_rtt, smoothed_rtt) | (1 + time_reordering_fraction) * max(latest_rtt, smoothed_rtt) | |||
| else if (largest_acked.packet_number == largest_sent_packet): | else if (largest_acked.packet_number == largest_sent_packet): | |||
| // Early retransmit alarm. | // Early retransmit alarm. | |||
| delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) | delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) | |||
| foreach (unacked < largest_acked.packet_number): | foreach (unacked < largest_acked.packet_number): | |||
| time_since_sent = now() - unacked.time_sent | time_since_sent = now() - unacked.time_sent | |||
| packet_delta = largest_acked.packet_number - unacked.packet_number | delta = largest_acked.packet_number - unacked.packet_number | |||
| if (time_since_sent > delay_until_lost): | if (time_since_sent > delay_until_lost): | |||
| lost_packets.insert(unacked) | lost_packets.insert(unacked) | |||
| else if (packet_delta > reordering_threshold) | else if (delta > reordering_threshold) | |||
| lost_packets.insert(unacked) | lost_packets.insert(unacked) | |||
| else if (loss_time == 0 && delay_until_lost != infinite): | else if (loss_time == 0 && delay_until_lost != infinite): | |||
| loss_time = now() + delay_until_lost - time_since_sent | loss_time = now() + delay_until_lost - time_since_sent | |||
| // Inform the congestion controller of lost packets and | // Inform the congestion controller of lost packets and | |||
| // lets it decide whether to retransmit immediately. | // lets it decide whether to retransmit immediately. | |||
| if (!lost_packets.empty()) | if (!lost_packets.empty()) | |||
| OnPacketsLost(lost_packets) | OnPacketsLost(lost_packets) | |||
| foreach (packet in lost_packets) | foreach (packet in lost_packets) | |||
| sent_packets.remove(packet.packet_number) | sent_packets.remove(packet.packet_number) | |||
| 3.3. Discussion | 3.5. Discussion | |||
| The majority of constants were derived from best common practices | The majority of constants were derived from best common practices | |||
| among widely deployed TCP implementations on the internet. | among widely deployed TCP implementations on the internet. | |||
| Exceptions follow. | Exceptions follow. | |||
| A shorter delayed ack time of 25ms was chosen because longer delayed | A shorter delayed ack time of 25ms was chosen because longer delayed | |||
| acks can delay loss recovery and for the small number of connections | acks can delay loss recovery and for the small number of connections | |||
| where less than packet per 25ms is delivered, acking every packet is | where less than packet per 25ms is delivered, acking every packet is | |||
| beneficial to congestion control and loss recovery. | beneficial to congestion control and loss recovery. | |||
| skipping to change at page 15, line 24 ¶ | skipping to change at page 19, line 31 ¶ | |||
| Slow start exits to congestion avoidance. Congestion avoidance in | Slow start exits to congestion avoidance. Congestion avoidance in | |||
| NewReno uses an additive increase multiplicative decrease (AIMD) | NewReno uses an additive increase multiplicative decrease (AIMD) | |||
| approach that increases the congestion window by one MSS of bytes per | approach that increases the congestion window by one MSS of bytes per | |||
| congestion window acknowledged. When a loss is detected, NewReno | congestion window acknowledged. When a loss is detected, NewReno | |||
| halves the congestion window and sets the slow start threshold to the | halves the congestion window and sets the slow start threshold to the | |||
| new congestion window. | new congestion window. | |||
| 4.3. Recovery Period | 4.3. Recovery Period | |||
| Recovery is a period of time beginning with detection of a lost | Recovery is a period of time beginning with detection of a lost | |||
| packet. Because QUIC retransmits frames, not packets, it defines the | packet. Because QUIC retransmits stream data and control frames, not | |||
| end of recovery as all packets outstanding at the start of recovery | packets, it defines the end of recovery as a packet sent after the | |||
| being acknowledged or lost. This is slightly different from TCP's | start of recovery being acknowledged. This is slightly different | |||
| definition of recovery ending when the lost packet that started | from TCP's definition of recovery ending when the lost packet that | |||
| recovery is acknowledged. During recovery, the congestion window is | started recovery is acknowledged. | |||
| not increased or decreased. As such, multiple lost packets only | ||||
| decrease the congestion window once as long as they're lost before | During recovery, the congestion window is not increased or decreased. | |||
| exiting recovery. This causes QUIC to decrease the congestion window | As such, multiple lost packets only decrease the congestion window | |||
| multiple times if retransmisions are lost, but limits the reduction | once as long as they're lost before exiting recovery. This causes | |||
| to once per round trip. | QUIC to decrease the congestion window multiple times if | |||
| retransmisions are lost, but limits the reduction to once per round | ||||
| trip. | ||||
| 4.4. Tail Loss Probe | 4.4. Tail Loss Probe | |||
| If recovery sends a tail loss probe, no change is made to the | If recovery sends a tail loss probe, no change is made to the | |||
| congestion window or pacing rate. Acknowledgement or loss of tail | congestion window or pacing rate. Acknowledgement or loss of tail | |||
| loss probes are treated like any other packet. | loss probes are treated like any other packet. | |||
| 4.5. Retransmission Timeout | 4.5. Retransmission Timeout | |||
| When retransmissions are sent due to a retransmission timeout alarm, | When retransmissions are sent due to a retransmission timeout alarm, | |||
| skipping to change at page 16, line 11 ¶ | skipping to change at page 20, line 22 ¶ | |||
| retransmission timeout are acknowledged, the retransmission timeout | retransmission timeout are acknowledged, the retransmission timeout | |||
| has been validated and the congestion window must be reduced to the | has been validated and the congestion window must be reduced to the | |||
| minimum congestion window and slow start is begun. | minimum congestion window and slow start is begun. | |||
| 4.6. Pacing Rate | 4.6. Pacing Rate | |||
| The pacing rate is a function of the mode, the congestion window, and | The pacing rate is a function of the mode, the congestion window, and | |||
| the smoothed rtt. Specifically, the pacing rate is 2 times the | the smoothed rtt. Specifically, the pacing rate is 2 times the | |||
| congestion window divided by the smoothed RTT during slow start and | congestion window divided by the smoothed RTT during slow start and | |||
| 1.25 times the congestion window divided by the smoothed RTT during | 1.25 times the congestion window divided by the smoothed RTT during | |||
| slow start. In order to fairly compete with flows that are not | congestion avoidance. In order to fairly compete with flows that are | |||
| pacing, it is recommended to not pace the first 10 sent packets when | not pacing, it is recommended to not pace the first 10 sent packets | |||
| exiting quiescence. | when exiting quiescence. | |||
| 4.7. Pseudocode | 4.7. Pseudocode | |||
| 4.7.1. Constants of interest | 4.7.1. Constants of interest | |||
| Constants used in congestion control are based on a combination of | Constants used in congestion control are based on a combination of | |||
| RFCs, papers, and common practice. Some may need to be changed or | RFCs, papers, and common practice. Some may need to be changed or | |||
| negotiated in order to better suit a variety of environments. | negotiated in order to better suit a variety of environments. | |||
| kDefaultMss (default 1460 bytes): The default max packet size used | kDefaultMss (default 1460 bytes): The default max packet size used | |||
| skipping to change at page 18, line 37 ¶ | skipping to change at page 23, line 12 ¶ | |||
| This document has no IANA actions. Yet. | This document has no IANA actions. Yet. | |||
| 6. References | 6. References | |||
| 6.1. Normative References | 6.1. Normative References | |||
| [QUIC-TRANSPORT] | [QUIC-TRANSPORT] | |||
| Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | |||
| Multiplexed and Secure Transport", draft-ietf-quic- | Multiplexed and Secure Transport", draft-ietf-quic- | |||
| transport (work in progress), September 2017. | transport-07 (work in progress), November 2017. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, <https://www.rfc- | DOI 10.17487/RFC2119, March 1997, | |||
| editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| 6.2. Informative References | ||||
| [LOSS-PROBE] | [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | |||
| Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | "Improving the Robustness of TCP to Non-Congestion | |||
| "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | |||
| Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | <https://www.rfc-editor.org/info/rfc4653>. | |||
| in progress), February 2013. | ||||
| [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
| Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | |||
| 2003, <https://www.rfc-editor.org/info/rfc3465>. | <https://www.rfc-editor.org/info/rfc5681>. | |||
| [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, | [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, | |||
| "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting | "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting | |||
| Spurious Retransmission Timeouts with TCP", RFC 5682, | Spurious Retransmission Timeouts with TCP", RFC 5682, | |||
| DOI 10.17487/RFC5682, September 2009, <https://www.rfc- | DOI 10.17487/RFC5682, September 2009, | |||
| editor.org/info/rfc5682>. | <https://www.rfc-editor.org/info/rfc5682>. | |||
| [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and | [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and | |||
| P. Hurtig, "Early Retransmit for TCP and Stream Control | P. Hurtig, "Early Retransmit for TCP and Stream Control | |||
| Transmission Protocol (SCTP)", RFC 5827, | Transmission Protocol (SCTP)", RFC 5827, | |||
| DOI 10.17487/RFC5827, May 2010, <https://www.rfc- | DOI 10.17487/RFC5827, May 2010, | |||
| editor.org/info/rfc5827>. | <https://www.rfc-editor.org/info/rfc5827>. | |||
| [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, | [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, | |||
| "Computing TCP's Retransmission Timer", RFC 6298, | "Computing TCP's Retransmission Timer", RFC 6298, | |||
| DOI 10.17487/RFC6298, June 2011, <https://www.rfc- | DOI 10.17487/RFC6298, June 2011, | |||
| editor.org/info/rfc6298>. | <https://www.rfc-editor.org/info/rfc6298>. | |||
| [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., | ||||
| and Y. Nishida, "A Conservative Loss Recovery Algorithm | ||||
| Based on Selective Acknowledgment (SACK) for TCP", | ||||
| RFC 6675, DOI 10.17487/RFC6675, August 2012, | ||||
| <https://www.rfc-editor.org/info/rfc6675>. | ||||
| 6.2. Informative References | ||||
| [LOSS-PROBE] | ||||
| Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | ||||
| "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | ||||
| Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | ||||
| in progress), February 2013. | ||||
| [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | ||||
| Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | ||||
| 2003, <https://www.rfc-editor.org/info/rfc3465>. | ||||
| [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The | [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The | |||
| NewReno Modification to TCP's Fast Recovery Algorithm", | NewReno Modification to TCP's Fast Recovery Algorithm", | |||
| RFC 6582, DOI 10.17487/RFC6582, April 2012, | RFC 6582, DOI 10.17487/RFC6582, April 2012, | |||
| <https://www.rfc-editor.org/info/rfc6582>. | <https://www.rfc-editor.org/info/rfc6582>. | |||
| [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | ||||
| "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | ||||
| Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | ||||
| in progress), February 2013. | ||||
| 6.3. URIs | ||||
| [1] https://mailarchive.ietf.org/arch/search/?email_list=quic | ||||
| [2] https://github.com/quicwg | ||||
| [3] https://github.com/quicwg/base-drafts/labels/recovery | ||||
| Appendix A. Acknowledgments | Appendix A. Acknowledgments | |||
| Appendix B. Change Log | Appendix B. Change Log | |||
| *RFC Editor's Note:* Please remove this section prior to | *RFC Editor's Note:* Please remove this section prior to | |||
| publication of a final version of this document. | publication of a final version of this document. | |||
| B.1. Since draft-ietf-quic-recovery-05 | B.1. Since draft-ietf-quic-recovery-06 | |||
| Nothing yet. | ||||
| B.2. Since draft-ietf-quic-recovery-05 | ||||
| o Add more congestion control text (#776) | o Add more congestion control text (#776) | |||
| B.2. Since draft-ietf-quic-recovery-04 | B.3. Since draft-ietf-quic-recovery-04 | |||
| No significant changes. | No significant changes. | |||
| B.3. Since draft-ietf-quic-recovery-03 | B.4. Since draft-ietf-quic-recovery-03 | |||
| No significant changes. | No significant changes. | |||
| B.4. Since draft-ietf-quic-recovery-02 | B.5. Since draft-ietf-quic-recovery-02 | |||
| o Integrate F-RTO (#544, #409) | o Integrate F-RTO (#544, #409) | |||
| o Add congestion control (#545, #395) | o Add congestion control (#545, #395) | |||
| o Require connection abort if a skipped packet was acknowledged | o Require connection abort if a skipped packet was acknowledged | |||
| (#415) | (#415) | |||
| o Simplify RTO calculations (#142, #417) | o Simplify RTO calculations (#142, #417) | |||
| B.5. Since draft-ietf-quic-recovery-01 | B.6. Since draft-ietf-quic-recovery-01 | |||
| o Overview added to loss detection | o Overview added to loss detection | |||
| o Changes initial default RTT to 100ms | o Changes initial default RTT to 100ms | |||
| o Added time-based loss detection and fixes early retransmit | o Added time-based loss detection and fixes early retransmit | |||
| o Clarified loss recovery for handshake packets | o Clarified loss recovery for handshake packets | |||
| o Fixed references and made TCP references informative | o Fixed references and made TCP references informative | |||
| B.6. Since draft-ietf-quic-recovery-00 | B.7. Since draft-ietf-quic-recovery-00 | |||
| o Improved description of constants and ACK behavior | o Improved description of constants and ACK behavior | |||
| B.7. Since draft-iyengar-quic-loss-recovery-01 | B.8. Since draft-iyengar-quic-loss-recovery-01 | |||
| o Adopted as base for draft-ietf-quic-recovery | o Adopted as base for draft-ietf-quic-recovery | |||
| o Updated authors/editors list | o Updated authors/editors list | |||
| o Added table of contents | o Added table of contents | |||
| Authors' Addresses | Authors' Addresses | |||
| Jana Iyengar (editor) | Jana Iyengar (editor) | |||
| Email: jri@google.com | Email: jri@google.com | |||
| Ian Swett (editor) | Ian Swett (editor) | |||
| Email: ianswett@google.com | Email: ianswett@google.com | |||
| End of changes. 56 change blocks. | ||||
| 171 lines changed or deleted | 415 lines changed or added | |||
This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||