| draft-ietf-quic-recovery-07.txt | draft-ietf-quic-recovery-08.txt | |||
|---|---|---|---|---|
| QUIC J. Iyengar, Ed. | QUIC J. Iyengar, Ed. | |||
| Internet-Draft I. Swett, Ed. | Internet-Draft I. Swett, Ed. | |||
| Intended status: Standards Track Google | Intended status: Standards Track Google | |||
| Expires: May 18, 2018 November 14, 2017 | Expires: June 8, 2018 December 5, 2017 | |||
| QUIC Loss Detection and Congestion Control | QUIC Loss Detection and Congestion Control | |||
| draft-ietf-quic-recovery-07 | draft-ietf-quic-recovery-08 | |||
| Abstract | Abstract | |||
| This document describes loss detection and congestion control | This document describes loss detection and congestion control | |||
| mechanisms for QUIC. | mechanisms for QUIC. | |||
| Note to Readers | Note to Readers | |||
| Discussion of this draft takes place on the QUIC working group | Discussion of this draft takes place on the QUIC working group | |||
| mailing list (quic@ietf.org), which is archived at | mailing list (quic@ietf.org), which is archived at | |||
| https://mailarchive.ietf.org/arch/search/?email_list=quic [1]. | https://mailarchive.ietf.org/arch/search/?email_list=quic [1]. | |||
| Working Group information can be found at https://github.com/quicwg | Working Group information can be found at https://github.com/quicwg | |||
| [2]; source code and issues list for this draft can be found at | [2]; source code and issues list for this draft can be found at | |||
| https://github.com/quicwg/base-drafts/labels/recovery [3]. | https://github.com/quicwg/base-drafts/labels/-recovery [3]. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on May 18, 2018. | This Internet-Draft will expire on June 8, 2018. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2017 IETF Trust and the persons identified as the | Copyright (c) 2017 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 | 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 | |||
| 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 3 | 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 | |||
| 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 | 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 | |||
| 2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 4 | 2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 4 | |||
| 2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 5 | 2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 | 2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 | |||
| 2.1.4. Explicit Correction For Delayed Acks . . . . . . . . 5 | 2.1.4. Explicit Correction For Delayed Acks . . . . . . . . 5 | |||
| 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 5 | 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 3.1. Computing the RTT estimate . . . . . . . . . . . . . . . 5 | 3.1. Computing the RTT estimate . . . . . . . . . . . . . . . 6 | |||
| 3.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 5 | 3.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 6 | 3.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 6 | 3.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 7 | |||
| 3.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 7 | 3.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 8 | |||
| 3.3.1. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 7 | 3.3.1. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 8 | |||
| 3.3.2. Retransmission Timeout . . . . . . . . . . . . . . . 9 | 3.3.2. Retransmission Timeout . . . . . . . . . . . . . . . 9 | |||
| 3.3.3. Handshake Timeout . . . . . . . . . . . . . . . . . . 10 | 3.3.3. Handshake Timeout . . . . . . . . . . . . . . . . . . 10 | |||
| 3.4. Algorithm Details . . . . . . . . . . . . . . . . . . . . 10 | 3.4. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 3.4.1. Constants of interest . . . . . . . . . . . . . . . . 10 | 3.4.1. Constants of interest . . . . . . . . . . . . . . . . 11 | |||
| 3.4.2. Variables of interest . . . . . . . . . . . . . . . . 11 | 3.4.2. Variables of interest . . . . . . . . . . . . . . . . 12 | |||
| 3.4.3. Initialization . . . . . . . . . . . . . . . . . . . 12 | 3.4.3. Initialization . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.4.4. On Sending a Packet . . . . . . . . . . . . . . . . . 13 | 3.4.4. On Sending a Packet . . . . . . . . . . . . . . . . . 13 | |||
| 3.4.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 13 | 3.4.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 14 | |||
| 3.4.6. On Packet Acknowledgment . . . . . . . . . . . . . . 14 | 3.4.6. On Packet Acknowledgment . . . . . . . . . . . . . . 15 | |||
| 3.4.7. Setting the Loss Detection Alarm . . . . . . . . . . 15 | 3.4.7. Setting the Loss Detection Alarm . . . . . . . . . . 16 | |||
| 3.4.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 17 | 3.4.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 17 | |||
| 3.4.9. Detecting Lost Packets . . . . . . . . . . . . . . . 17 | 3.4.9. Detecting Lost Packets . . . . . . . . . . . . . . . 18 | |||
| 3.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 18 | 3.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 19 | 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 19 | 4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 4.2. Congestion Avoidance . . . . . . . . . . . . . . . . . . 19 | 4.2. Congestion Avoidance . . . . . . . . . . . . . . . . . . 20 | |||
| 4.3. Recovery Period . . . . . . . . . . . . . . . . . . . . . 19 | 4.3. Recovery Period . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 4.4. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 19 | 4.4. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 4.5. Retransmission Timeout . . . . . . . . . . . . . . . . . 20 | 4.5. Retransmission Timeout . . . . . . . . . . . . . . . . . 20 | |||
| 4.6. Pacing Rate . . . . . . . . . . . . . . . . . . . . . . . 20 | 4.6. Pacing Rate . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 4.7. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 20 | 4.7. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 4.7.1. Constants of interest . . . . . . . . . . . . . . . . 20 | 4.7.1. Constants of interest . . . . . . . . . . . . . . . . 21 | |||
| 4.7.2. Variables of interest . . . . . . . . . . . . . . . . 20 | 4.7.2. Variables of interest . . . . . . . . . . . . . . . . 21 | |||
| 4.7.3. Initialization . . . . . . . . . . . . . . . . . . . 21 | 4.7.3. Initialization . . . . . . . . . . . . . . . . . . . 22 | |||
| 4.7.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 21 | 4.7.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 22 | |||
| 4.7.5. On Packet Acknowledgement . . . . . . . . . . . . . . 21 | 4.7.5. On Packet Acknowledgement . . . . . . . . . . . . . . 22 | |||
| 4.7.6. On Packets Lost . . . . . . . . . . . . . . . . . . . 22 | 4.7.6. On Packets Lost . . . . . . . . . . . . . . . . . . . 23 | |||
| 4.7.7. On Retransmission Timeout Verified . . . . . . . . . 22 | 4.7.7. On Retransmission Timeout Verified . . . . . . . . . 23 | |||
| 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 | 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 | 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 6.1. Normative References . . . . . . . . . . . . . . . . . . 23 | 6.1. Normative References . . . . . . . . . . . . . . . . . . 23 | |||
| 6.2. Informative References . . . . . . . . . . . . . . . . . 24 | 6.2. Informative References . . . . . . . . . . . . . . . . . 24 | |||
| 6.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 24 | 6.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 24 | Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 25 | |||
| Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 24 | Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 25 | |||
| B.1. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 24 | B.1. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 25 | |||
| B.2. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 24 | B.2. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 25 | |||
| B.3. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 25 | B.3. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 25 | |||
| B.4. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 25 | B.4. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 25 | |||
| B.5. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 25 | B.5. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 25 | |||
| B.6. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 25 | B.6. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 26 | |||
| B.7. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 25 | B.7. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 26 | |||
| B.8. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 25 | B.8. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 26 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 1. Introduction | 1. Introduction | |||
| QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | |||
| on decades of transport and security experience, and implements | on decades of transport and security experience, and implements | |||
| mechanisms that make it attractive as a modern general-purpose | mechanisms that make it attractive as a modern general-purpose | |||
| transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | |||
| QUIC implements the spirit of known TCP loss recovery mechanisms, | QUIC implements the spirit of known TCP loss recovery mechanisms, | |||
| described in RFCs, various Internet-drafts, and also those prevalent | described in RFCs, various Internet-drafts, and also those prevalent | |||
| in the Linux TCP implementation. This document describes QUIC | in the Linux TCP implementation. This document describes QUIC | |||
| congestion control and loss recovery, and where applicable, | congestion control and loss recovery, and where applicable, | |||
| attributes the TCP equivalent in RFCs, Internet-drafts, academic | attributes the TCP equivalent in RFCs, Internet-drafts, academic | |||
| papers, and/or TCP implementations. | papers, and/or TCP implementations. | |||
| 1.1. Notational Conventions | 1.1. Notational Conventions | |||
| The words "MUST", "MUST NOT", "SHOULD", and "MAY" are used in this | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| document. It's not shouting; when they are capitalized, they have | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| the special meaning defined in [RFC2119]. | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | ||||
| capitals, as shown here. | ||||
| 2. Design of the QUIC Transmission Machinery | 2. Design of the QUIC Transmission Machinery | |||
| All transmissions in QUIC are sent with a packet-level header, which | All transmissions in QUIC are sent with a packet-level header, which | |||
| includes a packet sequence number (referred to below as a packet | includes a packet sequence number (referred to below as a packet | |||
| number). These packet numbers never repeat in the lifetime of a | number). These packet numbers never repeat in the lifetime of a | |||
| connection, and are monotonically increasing, which makes duplicate | connection, and are monotonically increasing, which prevents | |||
| detection trivial. This fundamental design decision obviates the | ambiguity. This fundamental design decision obviates the need for | |||
| need for disambiguating between transmissions and retransmissions and | disambiguating between transmissions and retransmissions and | |||
| eliminates significant complexity from QUIC's interpretation of TCP | eliminates significant complexity from QUIC's interpretation of TCP | |||
| loss detection mechanisms. | loss detection mechanisms. | |||
| Every packet may contain several frames. We outline the frames that | Every packet may contain several frames. We outline the frames that | |||
| are important to the loss detection and congestion control machinery | are important to the loss detection and congestion control machinery | |||
| below. | below. | |||
| o Retransmittable frames are frames requiring reliable delivery. | o Retransmittable frames are frames requiring reliable delivery. | |||
| The most common are STREAM frames, which typically contain | The most common are STREAM frames, which typically contain | |||
| application data. | application data. | |||
| o Crypto handshake data is sent on stream 0, and uses the | o Crypto handshake data is sent on stream 0, and uses the | |||
| reliability machinery of QUIC underneath. | reliability machinery of QUIC underneath. | |||
| o ACK frames contain acknowledgment information. QUIC uses a SACK- | o ACK frames contain acknowledgment information. ACK frames contain | |||
| based scheme, where acks express up to 256 ranges. | one or more ranges of acknowledged packets. | |||
| 2.1. Relevant Differences Between QUIC and TCP | 2.1. Relevant Differences Between QUIC and TCP | |||
| Readers familiar with TCP's loss detection and congestion control | Readers familiar with TCP's loss detection and congestion control | |||
| will find algorithms here that parallel well-known TCP ones. | will find algorithms here that parallel well-known TCP ones. | |||
| Protocol differences between QUIC and TCP however contribute to | Protocol differences between QUIC and TCP however contribute to | |||
| algorithmic differences. We briefly describe these protocol | algorithmic differences. We briefly describe these protocol | |||
| differences below. | differences below. | |||
| 2.1.1. Monotonically Increasing Packet Numbers | 2.1.1. Monotonically Increasing Packet Numbers | |||
| TCP conflates transmission sequence number at the sender with | TCP conflates transmission sequence number at the sender with | |||
| delivery sequence number at the receiver, which results in | delivery sequence number at the receiver, which results in | |||
| retransmissions of the same data carrying the same sequence number, | retransmissions of the same data carrying the same sequence number, | |||
| and consequently to problems caused by "retransmission ambiguity". | and consequently to problems caused by "retransmission ambiguity". | |||
| QUIC separates the two: QUIC uses a packet sequence number (referred | QUIC separates the two: QUIC uses a packet number for transmissions, | |||
| to as the "packet number") for transmissions, and any data that is to | and any data that is to be delivered to the receiving application(s) | |||
| be delivered to the receiving application(s) is sent in one or more | is sent in one or more streams, with delivery order determined by | |||
| streams, with stream offsets encoded within STREAM frames inside of | stream offsets encoded within STREAM frames. | |||
| packets that determine delivery order. | ||||
| QUIC's packet number is strictly increasing, and directly encodes | QUIC's packet number is strictly increasing, and directly encodes | |||
| transmission order. A higher QUIC packet number signifies that the | transmission order. A higher QUIC packet number signifies that the | |||
| packet was sent later, and a lower QUIC packet number signifies that | packet was sent later, and a lower QUIC packet number signifies that | |||
| the packet was sent earlier. When a packet containing frames is | the packet was sent earlier. When a packet containing frames is | |||
| deemed lost, QUIC rebundles necessary frames in a new packet with a | deemed lost, QUIC rebundles necessary frames in a new packet with a | |||
| new packet number, removing ambiguity about which packet is | new packet number, removing ambiguity about which packet is | |||
| acknowledged when an ACK is received. Consequently, more accurate | acknowledged when an ACK is received. Consequently, more accurate | |||
| RTT measurements can be made, spurious retransmissions are trivially | RTT measurements can be made, spurious retransmissions are trivially | |||
| detected, and mechanisms such as Fast Retransmit can be applied | detected, and mechanisms such as Fast Retransmit can be applied | |||
| universally, based only on packet number. | universally, based only on packet number. | |||
| This design point significantly simplifies loss detection mechanisms | This design point significantly simplifies loss detection mechanisms | |||
| for QUIC. Most TCP mechanisms implicitly attempt to infer | for QUIC. Most TCP mechanisms implicitly attempt to infer | |||
| transmission ordering based on TCP sequence numbers - a non-trivial | transmission ordering based on TCP sequence numbers - a non-trivial | |||
| task, especially when TCP timestamps are not available. | task, especially when TCP timestamps are not available. | |||
| 2.1.2. No Reneging | 2.1.2. No Reneging | |||
| QUIC ACKs contain information that is equivalent to TCP SACK, but | QUIC ACKs contain information that is similar to TCP SACK, but QUIC | |||
| QUIC does not allow any acked packet to be reneged, greatly | does not allow any acked packet to be reneged, greatly simplifying | |||
| simplifying implementations on both sides and reducing memory | implementations on both sides and reducing memory pressure on the | |||
| pressure on the sender. | sender. | |||
| 2.1.3. More ACK Ranges | 2.1.3. More ACK Ranges | |||
| QUIC supports up to 256 ACK ranges, opposed to TCP's 3 SACK ranges. | QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In | |||
| In high loss environments, this speeds recovery. | high loss environments, this speeds recovery, reduces spurious | |||
| retransmits, and ensures forward progress without relying on | ||||
| timeouts. | ||||
| 2.1.4. Explicit Correction For Delayed Acks | 2.1.4. Explicit Correction For Delayed Acks | |||
| QUIC ACKs explicitly encode the delay incurred at the receiver | QUIC ACKs explicitly encode the delay incurred at the receiver | |||
| between when a packet is received and when the corresponding ACK is | between when a packet is received and when the corresponding ACK is | |||
| sent. This allows the receiver of the ACK to adjust for receiver | sent. This allows the receiver of the ACK to adjust for receiver | |||
| delays, specifically the delayed ack timer, when estimating the path | delays, specifically the delayed ack timer, when estimating the path | |||
| RTT. This mechanism also allows a receiver to measure and report the | RTT. This mechanism also allows a receiver to measure and report the | |||
| delay from when a packet was received by the OS kernel, which is | delay from when a packet was received by the OS kernel, which is | |||
| useful in receivers which may incur delays such as context-switch | useful in receivers which may incur delays such as context-switch | |||
| skipping to change at page 5, line 44 ¶ | skipping to change at page 6, line 7 ¶ | |||
| 3. Loss Detection | 3. Loss Detection | |||
| QUIC senders use both ack information and timeouts to detect lost | QUIC senders use both ack information and timeouts to detect lost | |||
| packets, and this section provides a description of these algorithms. | packets, and this section provides a description of these algorithms. | |||
| Estimating the network round-trip time (RTT) is critical to these | Estimating the network round-trip time (RTT) is critical to these | |||
| algorithms and is described first. | algorithms and is described first. | |||
| 3.1. Computing the RTT estimate | 3.1. Computing the RTT estimate | |||
| (To be filled) | RTT is calculated when an ACK frame arrives by computing the | |||
| difference between the current time and the time the largest newly | ||||
| acked packet was sent. If no packets are newly acknowledged, RTT | ||||
| cannot be calculated. When RTT is calculated, the ack delay field | ||||
| from the ACK frame SHOULD be subtracted from the RTT as long as the | ||||
| result is larger than the Min RTT. If the result is smaller than the | ||||
| min_rtt, the RTT should be used, but the ack delay field should be | ||||
| ignored. | ||||
| Like TCP, QUIC calculates both smoothed RTT and RTT variance as | ||||
| specified in [RFC6298]. | ||||
| Min RTT is the minimum RTT measured over the connection, prior to | ||||
| adjusting by ack delay. Ignoring ack delay for min RTT prevents | ||||
| intentional or unintentional underestimation of min RTT, which in | ||||
| turn prevents underestimating smoothed RTT. | ||||
| 3.2. Ack-based Detection | 3.2. Ack-based Detection | |||
| Ack-based loss detection implements the spirit of TCP's Fast | Ack-based loss detection implements the spirit of TCP's Fast | |||
| Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss | Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss | |||
| recovery [RFC6675]. This section provides an overview of how these | recovery [RFC6675]. This section provides an overview of how these | |||
| algorithms are implemented in QUIC. | algorithms are implemented in QUIC. | |||
| (TODO: Define unacknowledged packet, ackable packet, outstanding | (TODO: Define unacknowledged packet, ackable packet, outstanding | |||
| bytes.) | bytes.) | |||
| skipping to change at page 7, line 46 ¶ | skipping to change at page 8, line 25 ¶ | |||
| A packet sent at the tail is particularly vulnerable to slow loss | A packet sent at the tail is particularly vulnerable to slow loss | |||
| detection, since acks of subsequent packets are needed to trigger | detection, since acks of subsequent packets are needed to trigger | |||
| ack-based detection. To ameliorate this weakness of tail packets, | ack-based detection. To ameliorate this weakness of tail packets, | |||
| the sender schedules an alarm when the last ackable packet before | the sender schedules an alarm when the last ackable packet before | |||
| quiescence is transmitted. When this alarm fires, a Tail Loss Probe | quiescence is transmitted. When this alarm fires, a Tail Loss Probe | |||
| (TLP) packet is sent to evoke an acknowledgement from the receiver. | (TLP) packet is sent to evoke an acknowledgement from the receiver. | |||
| The alarm duration, or Probe Timeout (PTO), is set based on the | The alarm duration, or Probe Timeout (PTO), is set based on the | |||
| following conditions: | following conditions: | |||
| o If there is exactly one unacknowledged packet, PTO SHOULD be | o PTO SHOULD be scheduled for max(1.5*SRTT+MaxAckDelay, 10ms) | |||
| scheduled for max(2_SRTT, 1.5_SRTT+kDelayedAckTimeout) | ||||
| o If there are more than one unacknowledged packets, PTO SHOULD be | ||||
| scheduled for max(2*SRTT, 10ms). | ||||
| o If RTO is earlier, schedule a TLP alarm in its place. That is, | ||||
| PTO SHOULD be scheduled for min(RTO, PTO). | ||||
| kDelayedAckTimeout is the expected delayed ACK timer. When there is | ||||
| exactly one unacknowledged packet, the alarm duration includes time | ||||
| for an acknowledgment to be received, and additionally, a | ||||
| kDelayedAckTimeout period to compensate for the delayed | ||||
| acknowledgment timer at the receiver. | ||||
| The RECOMMENDED value for kDelayedAckTimeout is 25ms. | ||||
| (TODO: Add negotiability of delayed ack timeout.) | o If RTO (Section 3.3.2) is earlier, schedule a TLP alarm in its | |||
| place. That is, PTO SHOULD be scheduled for min(RTO, PTO). | ||||
| A PTO value of at least 2_SRTT ensures that the ACK is overdue. | MaxAckDelay is the maximum ack delay supplied in an incoming ACK | |||
| Using a PTO of exactly 1_SRTT may generate spurious probes, and | frame. MaxAckDelay excludes ack delays that aren't included in an | |||
| 2*SRTT is simply the next integral value of RTT. | RTT sample because they're too large and excludes those which | |||
| reference an ack-only packet. | ||||
| (TODO: These values of 2 and 1.5 are a bit arbitrary. Reconsider | QUIC diverges from TCP by calculating MaxAckDelay dynamically, | |||
| these.) | instead of assuming a constant delayed ack timeout for all | |||
| connections. QUIC includes this in all probe timeouts, because it | ||||
| assume the ack delay may come into play, regardless of the number of | ||||
| packets outstanding. TCP's TLP assumes if at least 2 packets are | ||||
| outstanding, acks will not be delayed. | ||||
| If the Retransmission Timeout (RTO, Section 3.3.2) period is smaller | A PTO value of at least 1.5*SRTT ensures that the ACK is overdue. | |||
| than the computed PTO, then a PTO is scheduled for the smaller RTO | The 1.5 is based on [LOSS-PROBE], but implementations MAY experiment | |||
| period. | with other constants. | |||
| To reduce latency, it is RECOMMENDED that the sender set and allow | To reduce latency, it is RECOMMENDED that the sender set and allow | |||
| the TLP alarm to fire twice before setting an RTO alarm. In other | the TLP alarm to fire twice before setting an RTO alarm. In other | |||
| words, when the TLP alarm fires the first time, a TLP packet is sent, | words, when the TLP alarm fires the first time, a TLP packet is sent, | |||
| and it is RECOMMENDED that the TLP alarm be scheduled for a second | and it is RECOMMENDED that the TLP alarm be scheduled for a second | |||
| time. When the TLP alarm fires the second time, a second TLP packet | time. When the TLP alarm fires the second time, a second TLP packet | |||
| is sent, and an RTO alarm SHOULD be scheduled Section 3.3.2. | is sent, and an RTO alarm SHOULD be scheduled Section 3.3.2. | |||
| A TLP packet SHOULD carry new data when possible. If new data is | A TLP packet SHOULD carry new data when possible. If new data is | |||
| unavailable or new data cannot be sent due to flow control, a TLP | unavailable or new data cannot be sent due to flow control, a TLP | |||
| packet MAY retransmit unacknowledged data to potentially reduce | packet MAY retransmit unacknowledged data to potentially reduce | |||
| recovery time. Since a TLP alarm is used to send a probe into the | recovery time. Since a TLP alarm is used to send a probe into the | |||
| network prior to establishing any packet loss, prior unacknowledged | network prior to establishing any packet loss, prior unacknowledged | |||
| packets SHOULD NOT be marked as lost when a TLP alarm fires. | packets SHOULD NOT be marked as lost when a TLP alarm fires. | |||
| A TLP packet MUST NOT be blocked by the sender's congestion | A TLP packet MUST NOT be blocked by the sender's congestion | |||
| controller. The sender MUST however count these bytes as additional | controller. The sender MUST however count these bytes as additional | |||
| bytes in flight, since a TLP adds network load without establishing | bytes in flight, since a TLP adds network load without establishing | |||
| packet loss. | packet loss. | |||
| A sender will commonly not know that a packet being sent is a tail | A sender may not know that a packet being sent is a tail packet. | |||
| packet. Consequently, a sender may have to arm or adjust the TLP | Consequently, a sender may have to arm or adjust the TLP alarm on | |||
| alarm on every sent ackable packet. | every sent ackable packet. | |||
| 3.3.2. Retransmission Timeout | 3.3.2. Retransmission Timeout | |||
| A Retransmission Timeout (RTO) alarm is the final backstop for loss | A Retransmission Timeout (RTO) alarm is the final backstop for loss | |||
| detection. The algorithm used in QUIC is based on the RTO algorithm | detection. The algorithm used in QUIC is based on the RTO algorithm | |||
| for TCP [RFC5681] and is additionally resilient to spurious RTO | for TCP [RFC5681] and is additionally resilient to spurious RTO | |||
| events [RFC5682]. | events [RFC5682]. | |||
| When the last TLP packet is sent, an alarm is scheduled for the RTO | When the last TLP packet is sent, an alarm is scheduled for the RTO | |||
| period. When this alarm fires, the sender sends two packets, to | period. When this alarm fires, the sender sends two packets, to | |||
| evoke acknowledgements from the receiver, and restarts the RTO alarm. | evoke acknowledgements from the receiver, and restarts the RTO alarm. | |||
| Similar to TCP [RFC6298], the RTO period is set based on the | Similar to TCP [RFC6298], the RTO period is set based on the | |||
| following conditions: | following conditions: | |||
| o When the final TLP packet is sent, the RTO period is set to | o When the final TLP packet is sent, the RTO period is set to | |||
| max(SRTT + 4*RTTVAR, minRTO) | max(SRTT + 4*RTTVAR + MaxAckDelay, minRTO) | |||
| o When an RTO alarm fires, the RTO period is doubled. | o When an RTO alarm fires, the RTO period is doubled. | |||
| The sender typically has incurred a high latency penalty by the time | The sender typically has incurred a high latency penalty by the time | |||
| an RTO alarm fires, and this penalty increases exponentially in | an RTO alarm fires, and this penalty increases exponentially in | |||
| subsequent consecutive RTO events. Sending a single packet on an RTO | subsequent consecutive RTO events. Sending a single packet on an RTO | |||
| event therefore makes the connection very sensitive to single packet | event therefore makes the connection very sensitive to single packet | |||
| loss. Sending two packets instead of one significantly increases | loss. Sending two packets instead of one significantly increases | |||
| resilience to packet drop in both directions, thus reducing the | resilience to packet drop in both directions, thus reducing the | |||
| probability of consecutive RTO events. | probability of consecutive RTO events. | |||
| QUIC's RTO algorithm differs from TCP in that the firing of an RTO | QUIC's RTO algorithm differs from TCP in that the firing of an RTO | |||
| alarm is not considered a strong enough signal of packet loss. An | alarm is not considered a strong enough signal of packet loss, so | |||
| RTO alarm fires only when there's a prolonged period of network | does not result in an immediate change to congestion window or | |||
| silence, which could be caused by a change in the underlying network | recovery state. An RTO alarm fires only when there's a prolonged | |||
| RTT. | period of network silence, which could be caused by a change in the | |||
| underlying network RTT. | ||||
| QUIC also diverges from TCP by including MaxAckDelay in the RTO | ||||
| period. QUIC is able to explicitly model delay at the receiver via | ||||
| the ack delay field in the ACK frame. Since QUIC corrects for this | ||||
| delay in its SRTT and RTTVAR computations, it is necessary to add | ||||
| this delay explicitly in the TLP and RTO computation. | ||||
| When an acknowledgment is received for a packet sent on an RTO event, | When an acknowledgment is received for a packet sent on an RTO event, | |||
| any unacknowledged packets with lower packet numbers than those | any unacknowledged packets with lower packet numbers than those | |||
| acknowledged MUST be marked as lost. | acknowledged MUST be marked as lost. | |||
| A packet sent when an RTO alarm fires MAY carry new data if available | A packet sent when an RTO alarm fires MAY carry new data if available | |||
| or unacknowledged data to potentially reduce recovery time. Since | or unacknowledged data to potentially reduce recovery time. Since | |||
| this packet is sent as a probe into the network prior to establishing | this packet is sent as a probe into the network prior to establishing | |||
| any packet loss, prior unacknowledged packets SHOULD NOT be marked as | any packet loss, prior unacknowledged packets SHOULD NOT be marked as | |||
| lost. | lost. | |||
| skipping to change at page 10, line 11 ¶ | skipping to change at page 10, line 34 ¶ | |||
| congestion controller. A sender MUST however count these bytes as | congestion controller. A sender MUST however count these bytes as | |||
| additional bytes in flight, since this packet adds network load | additional bytes in flight, since this packet adds network load | |||
| without establishing packet loss. | without establishing packet loss. | |||
| 3.3.3. Handshake Timeout | 3.3.3. Handshake Timeout | |||
| Handshake packets, which contain STREAM frames for stream 0, are | Handshake packets, which contain STREAM frames for stream 0, are | |||
| critical to QUIC transport and crypto negotiation, so a separate | critical to QUIC transport and crypto negotiation, so a separate | |||
| alarm is used for them. | alarm is used for them. | |||
| The handshake timeout SHOULD be set to twice the initial RTT. | The initial handshake timeout SHOULD be set to twice the initial RTT. | |||
| There are no prior RTT samples within this connection. However, this | At the beginning, there are no prior RTT samples within a connection. | |||
| may be a resumed connection over the same network, in which case, a | Resumed connections over the same network SHOULD use the previous | |||
| client SHOULD use the previous connection's final smoothed RTT value | connection's final smoothed RTT value as the resumed connection's | |||
| as the resumed connection's initial RTT. | initial RTT. | |||
| If no previous RTT is available, or if the network changes, the | If no previous RTT is available, or if the network changes, the | |||
| initial RTT SHOULD be set to 100ms. | initial RTT SHOULD be set to 100ms. | |||
| When the first handshake packet is sent, the sender SHOULD set an | When the first handshake packet is sent, the sender SHOULD set an | |||
| alarm for the handshake timeout period. | alarm for the handshake timeout period. | |||
| When the alarm fires, the sender MUST retransmit all unacknowledged | When the alarm fires, the sender MUST retransmit all unacknowledged | |||
| handshake frames. The sender SHOULD double the handshake timeout and | handshake data. On each consecutive firing of the handshake alarm, | |||
| set an alarm for this period. | the sender SHOULD double the handshake timeout and set an alarm for | |||
| this period. | ||||
| On each consecutive firing of the handshake alarm, the sender SHOULD | ||||
| double the handshake timeout period. | ||||
| When an acknowledgement is received for a handshake packet, the new | When an acknowledgement is received for a handshake packet, the new | |||
| RTT is computed and the alarm SHOULD be set for twice the newly | RTT is computed and the alarm SHOULD be set for twice the newly | |||
| computed smoothed RTT. | computed smoothed RTT. | |||
| Handshake frames may be cancelled by handshake state transitions. In | Handshake data may be cancelled by handshake state transitions. In | |||
| particular, all non-protected frames SHOULD no longer be transmitted | particular, all non-protected data SHOULD no longer be transmitted | |||
| once packet protection is available. | once packet protection is available. | |||
| (TODO: Work this section some more. Add text on client vs. server, | (TODO: Work this section some more. Add text on client vs. server, | |||
| and on stateless retry.) | and on stateless retry.) | |||
| 3.4. Algorithm Details | 3.4. Pseudocode | |||
| 3.4.1. Constants of interest | 3.4.1. Constants of interest | |||
| Constants used in loss recovery are based on a combination of RFCs, | Constants used in loss recovery are based on a combination of RFCs, | |||
| papers, and common practice. Some may need to be changed or | papers, and common practice. Some may need to be changed or | |||
| negotiated in order to better suit a variety of environments. | negotiated in order to better suit a variety of environments. | |||
| kMaxTLPs (default 2): Maximum number of tail loss probes before an | kMaxTLPs (default 2): Maximum number of tail loss probes before an | |||
| RTO fires. | RTO fires. | |||
| kReorderingThreshold (default 3): Maximum reordering in packet | kReorderingThreshold (default 3): Maximum reordering in packet | |||
| number space before FACK style loss detection considers a packet | number space before FACK style loss detection considers a packet | |||
| lost. | lost. | |||
| kTimeReorderingFraction (default 1/8): Maximum reordering in time | kTimeReorderingFraction (default 1/8): Maximum reordering in time | |||
| space before time based loss detection considers a packet lost. | space before time based loss detection considers a packet lost. | |||
| In fraction of an RTT. | In fraction of an RTT. | |||
| kUsingTimeLossDetection (default false): Whether time based loss | ||||
| detection is in use. If false, uses FACK style loss detection. | ||||
| kMinTLPTimeout (default 10ms): Minimum time in the future a tail | kMinTLPTimeout (default 10ms): Minimum time in the future a tail | |||
| loss probe alarm may be set for. | loss probe alarm may be set for. | |||
| kMinRTOTimeout (default 200ms): Minimum time in the future an RTO | kMinRTOTimeout (default 200ms): Minimum time in the future an RTO | |||
| alarm may be set for. | alarm may be set for. | |||
| kDelayedAckTimeout (default 25ms): The length of the peer's delayed | kDelayedAckTimeout (default 25ms): The length of the peer's delayed | |||
| ack timer. | ack timer. | |||
| kDefaultInitialRtt (default 100ms): The default RTT used before an | kDefaultInitialRtt (default 100ms): The default RTT used before an | |||
| skipping to change at page 11, line 50 ¶ | skipping to change at page 12, line 30 ¶ | |||
| largest_sent_before_rto: The last packet number sent prior to the | largest_sent_before_rto: The last packet number sent prior to the | |||
| first retransmission timeout. | first retransmission timeout. | |||
| time_of_last_sent_packet: The time the most recent packet was sent. | time_of_last_sent_packet: The time the most recent packet was sent. | |||
| largest_sent_packet: The packet number of the most recently sent | largest_sent_packet: The packet number of the most recently sent | |||
| packet. | packet. | |||
| largest_acked_packet: The largest packet number acknowledged in an | largest_acked_packet: The largest packet number acknowledged in an | |||
| ack frame. | ACK frame. | |||
| latest_rtt: The most recent RTT measurement made when receiving an | latest_rtt: The most recent RTT measurement made when receiving an | |||
| ack for a previously unacked packet. | ack for a previously unacked packet. | |||
| smoothed_rtt: The smoothed RTT of the connection, computed as | smoothed_rtt: The smoothed RTT of the connection, computed as | |||
| described in [RFC6298] | described in [RFC6298] | |||
| rttvar: The RTT variance, computed as described in [RFC6298] | rttvar: The RTT variance, computed as described in [RFC6298] | |||
| min_rtt: The minimum RTT seen in the connection, ignoring ack delay. | ||||
| max_ack_delay: The maximum ack delay in an incoming ACK frame for | ||||
| this connection. Excludes ack delays for ack only packets and | ||||
| those that create an RTT sample less than min_rtt. | ||||
| reordering_threshold: The largest delta between the largest acked | reordering_threshold: The largest delta between the largest acked | |||
| retransmittable packet and a packet containing retransmittable | retransmittable packet and a packet containing retransmittable | |||
| frames before it's declared lost. | frames before it's declared lost. | |||
| time_reordering_fraction: The reordering window as a fraction of | time_reordering_fraction: The reordering window as a fraction of | |||
| max(smoothed_rtt, latest_rtt). | max(smoothed_rtt, latest_rtt). | |||
| loss_time: The time at which the next packet will be considered lost | loss_time: The time at which the next packet will be considered lost | |||
| based on early transmit or exceeding the reordering window in | based on early transmit or exceeding the reordering window in | |||
| time. | time. | |||
| sent_packets: An association of packet numbers to information about | sent_packets: An association of packet numbers to information about | |||
| them, including a number field indicating the packet number, a | them, including a number field indicating the packet number, a | |||
| time field indicating the time a packet was sent, and a bytes | time field indicating the time a packet was sent, a boolean | |||
| field indicating the packet's size. sent_packets is ordered by | indicating whether the packet is ack only, and a bytes field | |||
| packet number, and packets remain in sent_packets until | indicating the packet's size. sent_packets is ordered by packet | |||
| acknowledged or lost. | number, and packets remain in sent_packets until acknowledged or | |||
| lost. | ||||
| 3.4.3. Initialization | 3.4.3. Initialization | |||
| At the beginning of the connection, initialize the loss detection | At the beginning of the connection, initialize the loss detection | |||
| variables as follows: | variables as follows: | |||
| loss_detection_alarm.reset() | loss_detection_alarm.reset() | |||
| handshake_count = 0 | handshake_count = 0 | |||
| tlp_count = 0 | tlp_count = 0 | |||
| rto_count = 0 | rto_count = 0 | |||
| if (UsingTimeLossDetection()) | if (kUsingTimeLossDetection) | |||
| reordering_threshold = infinite | reordering_threshold = infinite | |||
| time_reordering_fraction = kTimeReorderingFraction | time_reordering_fraction = kTimeReorderingFraction | |||
| else: | else: | |||
| reordering_threshold = kReorderingThreshold | reordering_threshold = kReorderingThreshold | |||
| time_reordering_fraction = infinite | time_reordering_fraction = infinite | |||
| loss_time = 0 | loss_time = 0 | |||
| smoothed_rtt = 0 | smoothed_rtt = 0 | |||
| rttvar = 0 | rttvar = 0 | |||
| min_rtt = 0 | ||||
| max_ack_delay = 0 | ||||
| largest_sent_before_rto = 0 | largest_sent_before_rto = 0 | |||
| time_of_last_sent_packet = 0 | time_of_last_sent_packet = 0 | |||
| largest_sent_packet = 0 | largest_sent_packet = 0 | |||
| 3.4.4. On Sending a Packet | 3.4.4. On Sending a Packet | |||
| After any packet is sent, be it a new transmission or a rebundled | After any packet is sent, be it a new transmission or a rebundled | |||
| transmission, the following OnPacketSent function is called. The | transmission, the following OnPacketSent function is called. The | |||
| parameters to OnPacketSent are as follows: | parameters to OnPacketSent are as follows: | |||
| skipping to change at page 13, line 27 ¶ | skipping to change at page 14, line 15 ¶ | |||
| o sent_bytes: The number of bytes sent in the packet, not including | o sent_bytes: The number of bytes sent in the packet, not including | |||
| UDP or IP overhead, but including QUIC framing overhead. | UDP or IP overhead, but including QUIC framing overhead. | |||
| Pseudocode for OnPacketSent follows: | Pseudocode for OnPacketSent follows: | |||
| OnPacketSent(packet_number, is_ack_only, sent_bytes): | OnPacketSent(packet_number, is_ack_only, sent_bytes): | |||
| time_of_last_sent_packet = now | time_of_last_sent_packet = now | |||
| largest_sent_packet = packet_number | largest_sent_packet = packet_number | |||
| sent_packets[packet_number].packet_number = packet_number | sent_packets[packet_number].packet_number = packet_number | |||
| sent_packets[packet_number].time = now | sent_packets[packet_number].time = now | |||
| sent_packets[packet_number].ack_only = is_ack_only | ||||
| if !is_ack_only: | if !is_ack_only: | |||
| OnPacketSentCC(sent_bytes) | OnPacketSentCC(sent_bytes) | |||
| sent_packets[packet_number].bytes = sent_bytes | sent_packets[packet_number].bytes = sent_bytes | |||
| SetLossDetectionAlarm() | SetLossDetectionAlarm() | |||
| 3.4.5. On Ack Receipt | 3.4.5. On Ack Receipt | |||
| When an ack is received, it may acknowledge 0 or more packets. | When an ack is received, it may acknowledge 0 or more packets. | |||
| Pseudocode for OnAckReceived and UpdateRtt follow: | Pseudocode for OnAckReceived and UpdateRtt follow: | |||
| OnAckReceived(ack): | OnAckReceived(ack): | |||
| largest_acked_packet = ack.largest_acked | largest_acked_packet = ack.largest_acked | |||
| // If the largest acked is newly acked, update the RTT. | // If the largest acked is newly acked, update the RTT. | |||
| if (sent_packets[ack.largest_acked]): | if (sent_packets[ack.largest_acked]): | |||
| latest_rtt = now - sent_packets[ack.largest_acked].time | latest_rtt = now - sent_packets[ack.largest_acked].time | |||
| if (latest_rtt > ack.ack_delay): | UpdateRtt(latest_rtt, ack.ack_delay) | |||
| latest_rtt -= ack.delay | ||||
| UpdateRtt(latest_rtt) | ||||
| // Find all newly acked packets. | // Find all newly acked packets. | |||
| for acked_packet in DetermineNewlyAckedPackets(): | for acked_packet in DetermineNewlyAckedPackets(): | |||
| OnPacketAcked(acked_packet.packet_number) | OnPacketAcked(acked_packet.packet_number) | |||
| DetectLostPackets(ack.largest_acked_packet) | DetectLostPackets(ack.largest_acked_packet) | |||
| SetLossDetectionAlarm() | SetLossDetectionAlarm() | |||
| UpdateRtt(latest_rtt): | UpdateRtt(latest_rtt, ack_delay): | |||
| // min_rtt ignores ack delay. | ||||
| min_rtt = min(min_rtt, latest_rtt) | ||||
| // Adjust for ack delay if it's plausible. | ||||
| if (latest_rtt - min_rtt > ack_delay): | ||||
| latest_rtt -= ack_delay | ||||
| // Only save into max ack delay if it's used | ||||
| // for rtt calculation and is not ack only. | ||||
| if (!sent_packets[ack.largest_acked].ack_only) | ||||
| max_ack_delay = max(max_ack_delay, ack_delay) | ||||
| // Based on {{RFC6298}}. | // Based on {{RFC6298}}. | |||
| if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
| smoothed_rtt = latest_rtt | smoothed_rtt = latest_rtt | |||
| rttvar = latest_rtt / 2 | rttvar = latest_rtt / 2 | |||
| else: | else: | |||
| rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - latest_rtt) | rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - latest_rtt) | |||
| smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | |||
| 3.4.6. On Packet Acknowledgment | 3.4.6. On Packet Acknowledgment | |||
| skipping to change at page 15, line 25 ¶ | skipping to change at page 16, line 26 ¶ | |||
| sent_packets.remove(acked_packet_number) | sent_packets.remove(acked_packet_number) | |||
| 3.4.7. Setting the Loss Detection Alarm | 3.4.7. Setting the Loss Detection Alarm | |||
| QUIC loss detection uses a single alarm for all timer-based loss | QUIC loss detection uses a single alarm for all timer-based loss | |||
| detection. The duration of the alarm is based on the alarm's mode, | detection. The duration of the alarm is based on the alarm's mode, | |||
| which is set in the packet and timer events further below. The | which is set in the packet and timer events further below. The | |||
| function SetLossDetectionAlarm defined below shows how the single | function SetLossDetectionAlarm defined below shows how the single | |||
| timer is set based on the alarm mode. | timer is set based on the alarm mode. | |||
| 3.4.7.1. Handshake Packets | 3.4.7.1. Handshake Alarm | |||
| The initial flight has no prior RTT sample. A client SHOULD remember | ||||
| the previous RTT it observed when resumption is attempted and use | ||||
| that for an initial RTT value. If no previous RTT is available, the | ||||
| initial RTT defaults to 100ms. | ||||
| Endpoints MUST retransmit handshake frames if not acknowledged within | ||||
| a time limit. This time limit will start as the largest of twice the | ||||
| RTT value and MinTLPTimeout. Each consecutive handshake | ||||
| retransmission doubles the time limit, until an acknowledgement is | ||||
| received. | ||||
| Handshake frames may be cancelled by handshake state transitions. In | When a connection has unacknowledged handshake data, the handshake | |||
| particular, all non-protected frames SHOULD be no longer be | alarm is set and when it expires, all unacknowledgedd handshake data | |||
| transmitted once packet protection is available. | is retransmitted. | |||
| When stateless rejects are in use, the connection is considered | When stateless rejects are in use, the connection is considered | |||
| immediately closed once a reject is sent, so no timer is set to | immediately closed once a reject is sent, so no timer is set to | |||
| retransmit the reject. | retransmit the reject. | |||
| Version negotiation packets are always stateless, and MUST be sent | Version negotiation packets are always stateless, and MUST be sent | |||
| once per handshake packet that uses an unsupported QUIC version, and | once per handshake packet that uses an unsupported QUIC version, and | |||
| MAY be sent in response to 0RTT packets. | MAY be sent in response to 0RTT packets. | |||
| 3.4.7.2. Tail Loss Probe and Retransmission Timeout | 3.4.7.2. Tail Loss Probe and Retransmission Alarm | |||
| Tail loss probes [LOSS-PROBE] and retransmission timeouts [RFC6298] | Tail loss probes [LOSS-PROBE] and retransmission timeouts [RFC6298] | |||
| are an alarm based mechanism to recover from cases when there are | are an alarm based mechanism to recover from cases when there are | |||
| outstanding retransmittable packets, but an acknowledgement has not | outstanding retransmittable packets, but an acknowledgement has not | |||
| been received in a timely manner. | been received in a timely manner. | |||
| 3.4.7.3. Early Retransmit | The TLP and RTO timers are armed when there is not unacknowledged | |||
| handshake data. The TLP alarm is set until the max number of TLP | ||||
| packets have been sent, and then the RTO tiemr is set. | ||||
| 3.4.7.3. Early Retransmit Alarm | ||||
| Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It | Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It | |||
| is part of QUIC's time based loss detection, but is always enabled, | is part of QUIC's time based loss detection, but is always enabled, | |||
| even when only packet reordering loss detection is enabled. | even when only packet reordering loss detection is enabled. | |||
| 3.4.7.4. Pseudocode | 3.4.7.4. Pseudocode | |||
| Pseudocode for SetLossDetectionAlarm follows: | Pseudocode for SetLossDetectionAlarm follows: | |||
| SetLossDetectionAlarm(): | SetLossDetectionAlarm(): | |||
| if (retransmittable packets are not outstanding): | // Don't arm the alarm if there are no packets with | |||
| // retransmittable data in flight. | ||||
| if (num_retransmittable_packets_outstanding == 0): | ||||
| loss_detection_alarm.cancel() | loss_detection_alarm.cancel() | |||
| return | return | |||
| if (handshake packets are outstanding): | if (handshake packets are outstanding): | |||
| // Handshake retransmission alarm. | // Handshake retransmission alarm. | |||
| if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
| alarm_duration = 2 * kDefaultInitialRtt | alarm_duration = 2 * kDefaultInitialRtt | |||
| else: | else: | |||
| alarm_duration = 2 * smoothed_rtt | alarm_duration = 2 * smoothed_rtt | |||
| alarm_duration = max(alarm_duration, kMinTLPTimeout) | alarm_duration = max(alarm_duration, kMinTLPTimeout) | |||
| alarm_duration = alarm_duration * (2 ^ handshake_count) | alarm_duration = alarm_duration * (2 ^ handshake_count) | |||
| else if (loss_time != 0): | else if (loss_time != 0): | |||
| // Early retransmit timer or time loss detection. | // Early retransmit timer or time loss detection. | |||
| alarm_duration = loss_time - now | alarm_duration = loss_time - time_of_last_sent_packet | |||
| else if (tlp_count < kMaxTLPs): | else if (tlp_count < kMaxTLPs): | |||
| // Tail Loss Probe | // Tail Loss Probe | |||
| if (retransmittable_packets_outstanding == 1): | alarm_duration = max(1.5 * smoothed_rtt + max_ack_delay, | |||
| alarm_duration = 1.5 * smoothed_rtt + kDelayedAckTimeout | kMinTLPTimeout) | |||
| else: | ||||
| alarm_duration = kMinTLPTimeout | ||||
| alarm_duration = max(alarm_duration, 2 * smoothed_rtt) | ||||
| else: | else: | |||
| // RTO alarm | // RTO alarm | |||
| alarm_duration = smoothed_rtt + 4 * rttvar | alarm_duration = smoothed_rtt + 4 * rttvar | |||
| alarm_duration = max(alarm_duration, kMinRTOTimeout) | alarm_duration = max(alarm_duration, kMinRTOTimeout) | |||
| alarm_duration = alarm_duration * (2 ^ rto_count) | alarm_duration = alarm_duration * (2 ^ rto_count) | |||
| loss_detection_alarm.set(now + alarm_duration) | loss_detection_alarm.set(time_of_last_sent_packet | |||
| + alarm_duration) | ||||
| 3.4.8. On Alarm Firing | 3.4.8. On Alarm Firing | |||
| QUIC uses one loss recovery alarm, which when set, can be in one of | QUIC uses one loss recovery alarm, which when set, can be in one of | |||
| several modes. When the alarm fires, the mode determines the action | several modes. When the alarm fires, the mode determines the action | |||
| to be performed. | to be performed. | |||
| Pseudocode for OnLossDetectionAlarm follows: | Pseudocode for OnLossDetectionAlarm follows: | |||
| OnLossDetectionAlarm(): | OnLossDetectionAlarm(): | |||
| skipping to change at page 18, line 16 ¶ | skipping to change at page 19, line 9 ¶ | |||
| DetectLostPackets takes one parameter, acked, which is the largest | DetectLostPackets takes one parameter, acked, which is the largest | |||
| acked packet. | acked packet. | |||
| Pseudocode for DetectLostPackets follows: | Pseudocode for DetectLostPackets follows: | |||
| DetectLostPackets(largest_acked): | DetectLostPackets(largest_acked): | |||
| loss_time = 0 | loss_time = 0 | |||
| lost_packets = {} | lost_packets = {} | |||
| delay_until_lost = infinite | delay_until_lost = infinite | |||
| if (time_reordering_fraction != infinite): | if (kUsingTimeLossDetection): | |||
| delay_until_lost = | delay_until_lost = | |||
| (1 + time_reordering_fraction) * max(latest_rtt, smoothed_rtt) | (1 + time_reordering_fraction) * max(latest_rtt, smoothed_rtt) | |||
| else if (largest_acked.packet_number == largest_sent_packet): | else if (largest_acked.packet_number == largest_sent_packet): | |||
| // Early retransmit alarm. | // Early retransmit alarm. | |||
| delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) | delay_until_lost = 5/4 * max(latest_rtt, smoothed_rtt) | |||
| foreach (unacked < largest_acked.packet_number): | foreach (unacked < largest_acked.packet_number): | |||
| time_since_sent = now() - unacked.time_sent | time_since_sent = now() - unacked.time_sent | |||
| delta = largest_acked.packet_number - unacked.packet_number | delta = largest_acked.packet_number - unacked.packet_number | |||
| if (time_since_sent > delay_until_lost): | if (time_since_sent > delay_until_lost): | |||
| lost_packets.insert(unacked) | lost_packets.insert(unacked) | |||
| else if (delta > reordering_threshold) | else if (delta > reordering_threshold) | |||
| lost_packets.insert(unacked) | lost_packets.insert(unacked) | |||
| else if (loss_time == 0 && delay_until_lost != infinite): | else if (loss_time == 0 && delay_until_lost != infinite): | |||
| loss_time = now() + delay_until_lost - time_since_sent | loss_time = now() + delay_until_lost - time_since_sent | |||
| skipping to change at page 19, line 15 ¶ | skipping to change at page 20, line 8 ¶ | |||
| 4. Congestion Control | 4. Congestion Control | |||
| QUIC's congestion control is based on TCP NewReno[RFC6582] congestion | QUIC's congestion control is based on TCP NewReno[RFC6582] congestion | |||
| control to determine the congestion window and pacing rate. QUIC | control to determine the congestion window and pacing rate. QUIC | |||
| congestion control is specified in bytes due to finer control and the | congestion control is specified in bytes due to finer control and the | |||
| ease of appropriate byte counting[RFC3465]. | ease of appropriate byte counting[RFC3465]. | |||
| 4.1. Slow Start | 4.1. Slow Start | |||
| QUIC begins every connection in slow start and exits slow start upon | QUIC begins every connection in slow start and exits slow start upon | |||
| loss. QUIC re-enters slow start after a retransmission timeout. | loss. QUIC re-enters slow start anytime the congestion window is | |||
| While in slow start, QUIC increases the congestion window by the | less than sshthresh, which typically only occurs after an RTO. While | |||
| number of acknowledged bytes when each ack is processed. | in slow start, QUIC increases the congestion window by the number of | |||
| acknowledged bytes when each ack is processed. | ||||
| 4.2. Congestion Avoidance | 4.2. Congestion Avoidance | |||
| Slow start exits to congestion avoidance. Congestion avoidance in | Slow start exits to congestion avoidance. Congestion avoidance in | |||
| NewReno uses an additive increase multiplicative decrease (AIMD) | NewReno uses an additive increase multiplicative decrease (AIMD) | |||
| approach that increases the congestion window by one MSS of bytes per | approach that increases the congestion window by one MSS of bytes per | |||
| congestion window acknowledged. When a loss is detected, NewReno | congestion window acknowledged. When a loss is detected, NewReno | |||
| halves the congestion window and sets the slow start threshold to the | halves the congestion window and sets the slow start threshold to the | |||
| new congestion window. | new congestion window. | |||
| skipping to change at page 21, line 5 ¶ | skipping to change at page 21, line 45 ¶ | |||
| when a new loss event is detected. | when a new loss event is detected. | |||
| 4.7.2. Variables of interest | 4.7.2. Variables of interest | |||
| Variables required to implement the congestion control mechanisms are | Variables required to implement the congestion control mechanisms are | |||
| described in this section. | described in this section. | |||
| bytes_in_flight: The sum of the size in bytes of all sent packets | bytes_in_flight: The sum of the size in bytes of all sent packets | |||
| that contain at least one retransmittable or PADDING frame, and | that contain at least one retransmittable or PADDING frame, and | |||
| have not been acked or declared lost. The size does not include | have not been acked or declared lost. The size does not include | |||
| IP or UDP overhead. Packets only containing ack frames do not | IP or UDP overhead. Packets only containing ACK frames do not | |||
| count towards byte_in_flight to ensure congestion control does not | count towards byte_in_flight to ensure congestion control does not | |||
| impede congestion feedback. | impede congestion feedback. | |||
| congestion_window: Maximum number of bytes in flight that may be | congestion_window: Maximum number of bytes in flight that may be | |||
| sent. | sent. | |||
| end_of_recovery: The largest packet number sent when QUIC detects a | end_of_recovery: The largest packet number sent when QUIC detects a | |||
| loss. When a larger packet is acknowledged, QUIC exits recovery. | loss. When a larger packet is acknowledged, QUIC exits recovery. | |||
| ssthresh Slow start threshold in bytes. When the congestion window | ssthresh: Slow start threshold in bytes. When the congestion window | |||
| is below ssthresh, the mode is slow start and the window grows by | is below ssthresh, the mode is slow start and the window grows by | |||
| the number of bytes acknowledged. | the number of bytes acknowledged. | |||
| 4.7.3. Initialization | 4.7.3. Initialization | |||
| At the beginning of the connection, initialize the congestion control | At the beginning of the connection, initialize the congestion control | |||
| variables as follows: | variables as follows: | |||
| congestion_window = kInitialWindow | congestion_window = kInitialWindow | |||
| bytes_in_flight = 0 | bytes_in_flight = 0 | |||
| skipping to change at page 23, line 12 ¶ | skipping to change at page 23, line 42 ¶ | |||
| This document has no IANA actions. Yet. | This document has no IANA actions. Yet. | |||
| 6. References | 6. References | |||
| 6.1. Normative References | 6.1. Normative References | |||
| [QUIC-TRANSPORT] | [QUIC-TRANSPORT] | |||
| Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | |||
| Multiplexed and Secure Transport", draft-ietf-quic- | Multiplexed and Secure Transport", draft-ietf-quic- | |||
| transport-07 (work in progress), November 2017. | transport-00 (work in progress), December 2017. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | |||
| "Improving the Robustness of TCP to Non-Congestion | "Improving the Robustness of TCP to Non-Congestion | |||
| Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | |||
| <https://www.rfc-editor.org/info/rfc4653>. | <https://www.rfc-editor.org/info/rfc4653>. | |||
| skipping to change at page 24, line 5 ¶ | skipping to change at page 24, line 32 ¶ | |||
| "Computing TCP's Retransmission Timer", RFC 6298, | "Computing TCP's Retransmission Timer", RFC 6298, | |||
| DOI 10.17487/RFC6298, June 2011, | DOI 10.17487/RFC6298, June 2011, | |||
| <https://www.rfc-editor.org/info/rfc6298>. | <https://www.rfc-editor.org/info/rfc6298>. | |||
| [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., | [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., | |||
| and Y. Nishida, "A Conservative Loss Recovery Algorithm | and Y. Nishida, "A Conservative Loss Recovery Algorithm | |||
| Based on Selective Acknowledgment (SACK) for TCP", | Based on Selective Acknowledgment (SACK) for TCP", | |||
| RFC 6675, DOI 10.17487/RFC6675, August 2012, | RFC 6675, DOI 10.17487/RFC6675, August 2012, | |||
| <https://www.rfc-editor.org/info/rfc6675>. | <https://www.rfc-editor.org/info/rfc6675>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | ||||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | ||||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | ||||
| 6.2. Informative References | 6.2. Informative References | |||
| [LOSS-PROBE] | [LOSS-PROBE] | |||
| Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | |||
| "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | |||
| Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | |||
| in progress), February 2013. | in progress), February 2013. | |||
| [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | |||
| Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | |||
| skipping to change at page 24, line 33 ¶ | skipping to change at page 25, line 16 ¶ | |||
| "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | |||
| Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | |||
| in progress), February 2013. | in progress), February 2013. | |||
| 6.3. URIs | 6.3. URIs | |||
| [1] https://mailarchive.ietf.org/arch/search/?email_list=quic | [1] https://mailarchive.ietf.org/arch/search/?email_list=quic | |||
| [2] https://github.com/quicwg | [2] https://github.com/quicwg | |||
| [3] https://github.com/quicwg/base-drafts/labels/recovery | [3] https://github.com/quicwg/base-drafts/labels/-recovery | |||
| Appendix A. Acknowledgments | Appendix A. Acknowledgments | |||
| Appendix B. Change Log | Appendix B. Change Log | |||
| *RFC Editor's Note:* Please remove this section prior to | *RFC Editor's Note:* Please remove this section prior to | |||
| publication of a final version of this document. | publication of a final version of this document. | |||
| B.1. Since draft-ietf-quic-recovery-06 | B.1. Since draft-ietf-quic-recovery-06 | |||
| End of changes. 59 change blocks. | ||||
| 152 lines changed or deleted | 184 lines changed or added | |||
This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||