| draft-ietf-quic-recovery-13.txt | draft-ietf-quic-recovery-14.txt | |||
|---|---|---|---|---|
| QUIC J. Iyengar, Ed. | QUIC J. Iyengar, Ed. | |||
| Internet-Draft Fastly | Internet-Draft Fastly | |||
| Intended status: Standards Track I. Swett, Ed. | Intended status: Standards Track I. Swett, Ed. | |||
| Expires: December 30, 2018 Google | Expires: February 16, 2019 Google | |||
| June 28, 2018 | August 15, 2018 | |||
| QUIC Loss Detection and Congestion Control | QUIC Loss Detection and Congestion Control | |||
| draft-ietf-quic-recovery-13 | draft-ietf-quic-recovery-14 | |||
| Abstract | Abstract | |||
| This document describes loss detection and congestion control | This document describes loss detection and congestion control | |||
| mechanisms for QUIC. | mechanisms for QUIC. | |||
| Note to Readers | Note to Readers | |||
| Discussion of this draft takes place on the QUIC working group | Discussion of this draft takes place on the QUIC working group | |||
| mailing list (quic@ietf.org), which is archived at | mailing list (quic@ietf.org), which is archived at | |||
| skipping to change at page 1, line 42 ¶ | skipping to change at page 1, line 42 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on December 30, 2018. | This Internet-Draft will expire on February 16, 2019. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 4 | 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 | |||
| 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 | 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 | |||
| 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 | 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 | |||
| 2.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 5 | 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 5 | |||
| 2.1.2. Monotonically Increasing Packet Numbers . . . . . . . 5 | 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 | |||
| 2.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 | 3.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 2.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 6 | 3.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 6 | |||
| 2.1.5. Explicit Correction For Delayed ACKs . . . . . . . . 6 | 3.1.5. Explicit Correction For Delayed ACKs . . . . . . . . 6 | |||
| 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 6 | 4. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 3.1. Computing the RTT estimate . . . . . . . . . . . . . . . 6 | 4.1. Computing the RTT estimate . . . . . . . . . . . . . . . 7 | |||
| 3.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 7 | 4.1.1. Maximum Ack Delay . . . . . . . . . . . . . . . . . . 7 | |||
| 3.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 7 | 4.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 7 | |||
| 3.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 7 | 4.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 8 | |||
| 3.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 8 | 4.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 8 | |||
| 3.3.1. Crypto Handshake Timeout . . . . . . . . . . . . . . 8 | 4.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 9 | |||
| 3.3.2. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 9 | 4.3.1. Crypto Handshake Timeout . . . . . . . . . . . . . . 9 | |||
| 3.3.3. Retransmission Timeout . . . . . . . . . . . . . . . 10 | 4.3.2. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 10 | |||
| 3.4. Generating Acknowledgements . . . . . . . . . . . . . . . 12 | 4.3.3. Retransmission Timeout . . . . . . . . . . . . . . . 11 | |||
| 3.4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . 12 | 4.4. Generating Acknowledgements . . . . . . . . . . . . . . . 12 | |||
| 3.4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . 12 | 4.4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . 13 | |||
| 3.4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . 13 | 4.4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 13 | 4.4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . 13 | |||
| 3.5.1. Constants of interest . . . . . . . . . . . . . . . . 13 | 4.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 3.5.2. Variables of interest . . . . . . . . . . . . . . . . 14 | 4.5.1. Constants of interest . . . . . . . . . . . . . . . . 14 | |||
| 3.5.3. Initialization . . . . . . . . . . . . . . . . . . . 15 | 4.5.2. Variables of interest . . . . . . . . . . . . . . . . 14 | |||
| 3.5.4. On Sending a Packet . . . . . . . . . . . . . . . . . 16 | 4.5.3. Initialization . . . . . . . . . . . . . . . . . . . 16 | |||
| 3.5.5. On Receiving an Acknowledgment . . . . . . . . . . . 17 | 4.5.4. On Sending a Packet . . . . . . . . . . . . . . . . . 16 | |||
| 3.5.6. On Packet Acknowledgment . . . . . . . . . . . . . . 18 | 4.5.5. On Receiving an Acknowledgment . . . . . . . . . . . 17 | |||
| 3.5.7. Setting the Loss Detection Alarm . . . . . . . . . . 19 | 4.5.6. On Packet Acknowledgment . . . . . . . . . . . . . . 18 | |||
| 3.5.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 21 | 4.5.7. Setting the Loss Detection Timer . . . . . . . . . . 19 | |||
| 3.5.9. Detecting Lost Packets . . . . . . . . . . . . . . . 22 | 4.5.8. On Timeout . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 3.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 23 | 4.5.9. Detecting Lost Packets . . . . . . . . . . . . . . . 22 | |||
| 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 23 | 4.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 4.1. Explicit Congestion Notification . . . . . . . . . . . . 24 | 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 4.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 24 | 5.1. Explicit Congestion Notification . . . . . . . . . . . . 24 | |||
| 4.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 24 | 5.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 4.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 24 | 5.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 24 | |||
| 4.5. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 25 | 5.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 4.6. Retransmission Timeout . . . . . . . . . . . . . . . . . 25 | 5.5. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 4.7. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 5.6. Retransmission Timeout . . . . . . . . . . . . . . . . . 25 | |||
| 4.8. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 26 | 5.7. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 4.8.1. Constants of interest . . . . . . . . . . . . . . . . 26 | 5.8. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 4.8.2. Variables of interest . . . . . . . . . . . . . . . . 26 | 5.8.1. Constants of interest . . . . . . . . . . . . . . . . 26 | |||
| 4.8.3. Initialization . . . . . . . . . . . . . . . . . . . 27 | 5.8.2. Variables of interest . . . . . . . . . . . . . . . . 26 | |||
| 4.8.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 27 | 5.8.3. Initialization . . . . . . . . . . . . . . . . . . . 27 | |||
| 4.8.5. On Packet Acknowledgement . . . . . . . . . . . . . . 27 | 5.8.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 27 | |||
| 4.8.6. On New Congestion Event . . . . . . . . . . . . . . . 27 | 5.8.5. On Packet Acknowledgement . . . . . . . . . . . . . . 27 | |||
| 4.8.7. Process ECN Information . . . . . . . . . . . . . . . 28 | 5.8.6. On New Congestion Event . . . . . . . . . . . . . . . 28 | |||
| 4.8.8. On Packets Lost . . . . . . . . . . . . . . . . . . . 28 | 5.8.7. Process ECN Information . . . . . . . . . . . . . . . 28 | |||
| 4.8.9. On Retransmission Timeout Verified . . . . . . . . . 28 | 5.8.8. On Packets Lost . . . . . . . . . . . . . . . . . . . 28 | |||
| 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 | 5.8.9. On Retransmission Timeout Verified . . . . . . . . . 28 | |||
| 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 29 | |||
| 6.1. Normative References . . . . . . . . . . . . . . . . . . 29 | 6.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 29 | |||
| 6.2. Informative References . . . . . . . . . . . . . . . . . 29 | 6.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 29 | |||
| 6.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | 6.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 29 | |||
| Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 30 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 | |||
| A.1. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 30 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 | |||
| A.2. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 31 | 8.1. Normative References . . . . . . . . . . . . . . . . . . 30 | |||
| A.3. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 31 | 8.2. Informative References . . . . . . . . . . . . . . . . . 30 | |||
| A.4. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 31 | 8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 31 | |||
| A.5. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 31 | Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 31 | |||
| A.6. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 31 | A.1. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 32 | |||
| A.7. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 31 | A.2. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 32 | |||
| A.8. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 31 | A.3. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 32 | |||
| A.9. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 32 | A.4. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 32 | |||
| A.10. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 32 | A.5. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 32 | |||
| A.11. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 32 | A.6. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 33 | |||
| A.12. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 32 | A.7. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 33 | |||
| A.13. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 32 | A.8. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 33 | |||
| A.14. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 32 | A.9. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 33 | |||
| Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 32 | A.10. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 33 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 | A.11. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 33 | |||
| A.12. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 33 | ||||
| A.13. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 33 | ||||
| A.14. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 34 | ||||
| A.15. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 34 | ||||
| Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 34 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 | ||||
| 1. Introduction | 1. Introduction | |||
| QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | |||
| on decades of transport and security experience, and implements | on decades of transport and security experience, and implements | |||
| mechanisms that make it attractive as a modern general-purpose | mechanisms that make it attractive as a modern general-purpose | |||
| transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | |||
| QUIC implements the spirit of known TCP loss recovery mechanisms, | QUIC implements the spirit of known TCP loss recovery mechanisms, | |||
| described in RFCs, various Internet-drafts, and also those prevalent | described in RFCs, various Internet-drafts, and also those prevalent | |||
| in the Linux TCP implementation. This document describes QUIC | in the Linux TCP implementation. This document describes QUIC | |||
| congestion control and loss recovery, and where applicable, | congestion control and loss recovery, and where applicable, | |||
| attributes the TCP equivalent in RFCs, Internet-drafts, academic | attributes the TCP equivalent in RFCs, Internet-drafts, academic | |||
| papers, and/or TCP implementations. | papers, and/or TCP implementations. | |||
| 1.1. Notational Conventions | 2. Conventions and Definitions | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 2. Design of the QUIC Transmission Machinery | Definitions of terms that are used in this document: | |||
| ACK frames: ACK frames refer to both ACK and ACK_ECN frames in this | ||||
| document. | ||||
| ACK-only: Any packet containing only an ACK or ACK_ECN frame. | ||||
| In-flight: Packets are considered in-flight when they have been sent | ||||
| and neither acknowledged nor declared lost, and they are not ACK- | ||||
| only. | ||||
| Retransmittable Frames: All frames besides ACK, ACK_ECN, or PADDING | ||||
| are considered retransmittable. | ||||
| Retransmittable Packets: Packets that contain retransmittable frames | ||||
| elicit an ACK from the receiver and are called retransmittable | ||||
| packets. | ||||
| 3. Design of the QUIC Transmission Machinery | ||||
| All transmissions in QUIC are sent with a packet-level header, which | All transmissions in QUIC are sent with a packet-level header, which | |||
| indicates the encryption level and includes a packet sequence number | indicates the encryption level and includes a packet sequence number | |||
| (referred to below as a packet number). The encryption level | (referred to below as a packet number). The encryption level | |||
| indicates the packet number space, as described in [QUIC-TRANSPORT]. | indicates the packet number space, as described in [QUIC-TRANSPORT]. | |||
| Packet numbers never repeat within a packet number space for the | Packet numbers never repeat within a packet number space for the | |||
| lifetime of a connection. Packet numbers monotonically increase | lifetime of a connection. Packet numbers monotonically increase | |||
| within a space, preventing ambiguity. | within a space, preventing ambiguity. | |||
| This design obviates the need for disambiguating between | This design obviates the need for disambiguating between | |||
| transmissions and retransmissions and eliminates significant | transmissions and retransmissions and eliminates significant | |||
| complexity from QUIC's interpretation of TCP loss detection | complexity from QUIC's interpretation of TCP loss detection | |||
| mechanisms. | mechanisms. | |||
| Every packet may contain several frames. We outline the frames that | QUIC packets can contain multiple frames of different types. The | |||
| are important to the loss detection and congestion control machinery | recovery mechanisms ensure that data and frames that need reliable | |||
| below. | delivery are acknowledged or declared lost and sent in new packets as | |||
| necessary. The types of frames contained in a packet affect recovery | ||||
| o Retransmittable frames are those that count towards bytes in | and congestion control logic: | |||
| flight and need acknowledgement. The most common are STREAM | ||||
| frames, which typically contain application data. | ||||
| o Retransmittable packets are those that contain at least one | o All packets are acknowledged, though packets that contain only | |||
| retransmittable frame. | ACK, ACK_ECN, and PADDING frames are not acknowledged immediately. | |||
| o Cryptographic handshake data is sent in CRYPTO frames, and uses | o Long header packets that contain CRYPTO frames are critical to the | |||
| the reliability machinery of QUIC underneath. | performance of the QUIC handshake and use shorter timers for | |||
| acknowledgement and retransmission. | ||||
| o ACK and ACK_ECN frames contain acknowledgment information. | o Packets that contain only ACK and ACK_ECN frames do not count | |||
| ACK_ECN frames additionally contain information about ECN | toward congestion control limits and are not considered in-flight. | |||
| codepoints seen by the peer. (The rest of this document uses ACK | Note that this means PADDING frames cause packets to contribute | |||
| frames to refer to both ACK and ACK_ECN frames.) | toward bytes in flight without directly causing an acknowledgment | |||
| to be sent. | ||||
| 2.1. Relevant Differences Between QUIC and TCP | 3.1. Relevant Differences Between QUIC and TCP | |||
| Readers familiar with TCP's loss detection and congestion control | Readers familiar with TCP's loss detection and congestion control | |||
| will find algorithms here that parallel well-known TCP ones. | will find algorithms here that parallel well-known TCP ones. | |||
| Protocol differences between QUIC and TCP however contribute to | Protocol differences between QUIC and TCP however contribute to | |||
| algorithmic differences. We briefly describe these protocol | algorithmic differences. We briefly describe these protocol | |||
| differences below. | differences below. | |||
| 2.1.1. Separate Packet Number Spaces | 3.1.1. Separate Packet Number Spaces | |||
| QUIC uses separate packet number spaces for each encryption level, | QUIC uses separate packet number spaces for each encryption level, | |||
| except 0-RTT and all generations of 1-RTT keys use the same packet | except 0-RTT and all generations of 1-RTT keys use the same packet | |||
| number space. Separate packet number spaces ensures acknowledgement | number space. Separate packet number spaces ensures acknowledgement | |||
| of packets sent with one level of encryption will not cause spurious | of packets sent with one level of encryption will not cause spurious | |||
| retransmission of packets sent with a different encryption level. | retransmission of packets sent with a different encryption level. | |||
| Congestion control and RTT measurement are unified across packet | Congestion control and RTT measurement are unified across packet | |||
| number spaces. | number spaces. | |||
| 2.1.2. Monotonically Increasing Packet Numbers | 3.1.2. Monotonically Increasing Packet Numbers | |||
| TCP conflates transmission sequence number at the sender with | TCP conflates transmission sequence number at the sender with | |||
| delivery sequence number at the receiver, which results in | delivery sequence number at the receiver, which results in | |||
| retransmissions of the same data carrying the same sequence number, | retransmissions of the same data carrying the same sequence number, | |||
| and consequently to problems caused by "retransmission ambiguity". | and consequently to problems caused by "retransmission ambiguity". | |||
| QUIC separates the two: QUIC uses a packet number for transmissions, | QUIC separates the two: QUIC uses a packet number for transmissions, | |||
| and any data that is to be delivered to the receiving application(s) | and any application data is sent in one or more streams, with | |||
| is sent in one or more streams, with delivery order determined by | delivery order determined by stream offsets encoded within STREAM | |||
| stream offsets encoded within STREAM frames. | frames. | |||
| QUIC's packet number is strictly increasing, and directly encodes | QUIC's packet number is strictly increasing, and directly encodes | |||
| transmission order. A higher QUIC packet number signifies that the | transmission order. A higher QUIC packet number signifies that the | |||
| packet was sent later, and a lower QUIC packet number signifies that | packet was sent later, and a lower QUIC packet number signifies that | |||
| the packet was sent earlier. When a packet containing frames is | the packet was sent earlier. When a packet containing frames is | |||
| deemed lost, QUIC rebundles necessary frames in a new packet with a | deemed lost, QUIC rebundles necessary frames in a new packet with a | |||
| new packet number, removing ambiguity about which packet is | new packet number, removing ambiguity about which packet is | |||
| acknowledged when an ACK is received. Consequently, more accurate | acknowledged when an ACK is received. Consequently, more accurate | |||
| RTT measurements can be made, spurious retransmissions are trivially | RTT measurements can be made, spurious retransmissions are trivially | |||
| detected, and mechanisms such as Fast Retransmit can be applied | detected, and mechanisms such as Fast Retransmit can be applied | |||
| universally, based only on packet number. | universally, based only on packet number. | |||
| This design point significantly simplifies loss detection mechanisms | This design point significantly simplifies loss detection mechanisms | |||
| for QUIC. Most TCP mechanisms implicitly attempt to infer | for QUIC. Most TCP mechanisms implicitly attempt to infer | |||
| transmission ordering based on TCP sequence numbers - a non-trivial | transmission ordering based on TCP sequence numbers - a non-trivial | |||
| task, especially when TCP timestamps are not available. | task, especially when TCP timestamps are not available. | |||
| 2.1.3. No Reneging | 3.1.3. No Reneging | |||
| QUIC ACKs contain information that is similar to TCP SACK, but QUIC | QUIC ACKs contain information that is similar to TCP SACK, but QUIC | |||
| does not allow any acked packet to be reneged, greatly simplifying | does not allow any acked packet to be reneged, greatly simplifying | |||
| implementations on both sides and reducing memory pressure on the | implementations on both sides and reducing memory pressure on the | |||
| sender. | sender. | |||
| 2.1.4. More ACK Ranges | 3.1.4. More ACK Ranges | |||
| QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In | QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In | |||
| high loss environments, this speeds recovery, reduces spurious | high loss environments, this speeds recovery, reduces spurious | |||
| retransmits, and ensures forward progress without relying on | retransmits, and ensures forward progress without relying on | |||
| timeouts. | timeouts. | |||
| 2.1.5. Explicit Correction For Delayed ACKs | 3.1.5. Explicit Correction For Delayed ACKs | |||
| QUIC ACKs explicitly encode the delay incurred at the receiver | QUIC ACKs explicitly encode the delay incurred at the receiver | |||
| between when a packet is received and when the corresponding ACK is | between when a packet is received and when the corresponding ACK is | |||
| sent. This allows the receiver of the ACK to adjust for receiver | sent. This allows the receiver of the ACK to adjust for receiver | |||
| delays, specifically the delayed ack timer, when estimating the path | delays, specifically the delayed ack timer, when estimating the path | |||
| RTT. This mechanism also allows a receiver to measure and report the | RTT. This mechanism also allows a receiver to measure and report the | |||
| delay from when a packet was received by the OS kernel, which is | delay from when a packet was received by the OS kernel, which is | |||
| useful in receivers which may incur delays such as context-switch | useful in receivers which may incur delays such as context-switch | |||
| latency before a userspace QUIC receiver processes a received packet. | latency before a userspace QUIC receiver processes a received packet. | |||
| 3. Loss Detection | 4. Loss Detection | |||
| QUIC senders use both ack information and timeouts to detect lost | QUIC senders use both ack information and timeouts to detect lost | |||
| packets, and this section provides a description of these algorithms. | packets, and this section provides a description of these algorithms. | |||
| Estimating the network round-trip time (RTT) is critical to these | Estimating the network round-trip time (RTT) is critical to these | |||
| algorithms and is described first. | algorithms and is described first. | |||
| 3.1. Computing the RTT estimate | 4.1. Computing the RTT estimate | |||
| RTT is calculated when an ACK frame arrives by computing the | RTT is calculated when an ACK frame arrives by computing the | |||
| difference between the current time and the time the largest newly | difference between the current time and the time the largest newly | |||
| acked packet was sent. If no packets are newly acknowledged, RTT | acked packet was sent. If no packets are newly acknowledged, RTT | |||
| cannot be calculated. When RTT is calculated, the ack delay field | cannot be calculated. When RTT is calculated, the ack delay field | |||
| from the ACK frame SHOULD be subtracted from the RTT as long as the | from the ACK frame SHOULD be subtracted from the RTT as long as the | |||
| result is larger than the Min RTT. If the result is smaller than the | result is larger than the Min RTT. If the result is smaller than the | |||
| min_rtt, the RTT should be used, but the ack delay field should be | min_rtt, the RTT should be used, but the ack delay field should be | |||
| ignored. | ignored. | |||
| Like TCP, QUIC calculates both smoothed RTT and RTT variance similar | Like TCP, QUIC calculates both smoothed RTT and RTT variance similar | |||
| to those specified in [RFC6298]. | to those specified in [RFC6298]. | |||
| Min RTT is the minimum RTT measured over the connection, prior to | Min RTT is the minimum RTT measured over the connection, prior to | |||
| adjusting by ack delay. Ignoring ack delay for min RTT prevents | adjusting by ack delay. Ignoring ack delay for min RTT prevents | |||
| intentional or unintentional underestimation of min RTT, which in | intentional or unintentional underestimation of min RTT, which in | |||
| turn prevents underestimating smoothed RTT. | turn prevents underestimating smoothed RTT. | |||
| 3.2. Ack-based Detection | 4.1.1. Maximum Ack Delay | |||
| QUIC is able to explicitly model delay at the receiver via the ack | ||||
| delay field in the ACK frame. Therefore, QUIC diverges from TCP by | ||||
| calculating a MaxAckDelay dynamically, instead of assuming a constant | ||||
| delayed ack timeout for all connections. | ||||
| MaxAckDelay is the maximum ack delay supplied in an all incoming ACK | ||||
| frames. MaxAckDelay excludes ack delays that aren't included in an | ||||
| RTT sample because they're too large or the largest acknowledged has | ||||
| already been acknowledged. MaxAckDelay also excludes ack delays | ||||
| where the largest acknowledged references an ACK-only packet. | ||||
| 4.2. Ack-based Detection | ||||
| Ack-based loss detection implements the spirit of TCP's Fast | Ack-based loss detection implements the spirit of TCP's Fast | |||
| Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss | Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss | |||
| recovery [RFC6675]. This section provides an overview of how these | recovery [RFC6675]. This section provides an overview of how these | |||
| algorithms are implemented in QUIC. | algorithms are implemented in QUIC. | |||
| 3.2.1. Fast Retransmit | 4.2.1. Fast Retransmit | |||
| An unacknowledged packet is marked as lost when an acknowledgment is | An unacknowledged packet is marked as lost when an acknowledgment is | |||
| received for a packet that was sent a threshold number of packets | received for a packet that was sent a threshold number of packets | |||
| (kReorderingThreshold) after the unacknowledged packet. Receipt of | (kReorderingThreshold) and/or a threshold amount of time after the | |||
| the ack indicates that a later packet was received, while | unacknowledged packet. Receipt of the acknowledgement indicates that | |||
| kReorderingThreshold provides some tolerance for reordering of | a later packet was received, while the reordering threshold provides | |||
| packets in the network. | some tolerance for reordering of packets in the network. | |||
| The RECOMMENDED initial value for kReorderingThreshold is 3. | The RECOMMENDED initial value for kReorderingThreshold is 3, based on | |||
| TCP loss recovery [RFC5681] [RFC6675]. Some networks may exhibit | ||||
| higher degrees of reordering, causing a sender to detect spurious | ||||
| losses. Spuriously declaring packets lost leads to unnecessary | ||||
| retransmissions and may result in degraded performance due to the | ||||
| actions of the congestion controller upon detecting loss. | ||||
| Implementers MAY use algorithms developed for TCP, such as TCP-NCR | ||||
| [RFC4653], to improve QUIC's reordering resilience. | ||||
| We derive this recommendation from TCP loss recovery [RFC5681] | QUIC implementations can use time-based loss detection to handle | |||
| [RFC6675]. It is possible for networks to exhibit higher degrees of | reordering based on time elapsed since the packet was sent. This may | |||
| reordering, causing a sender to detect spurious losses. Detecting | be used either as a replacement for a packet reordering threshold or | |||
| spurious losses leads to unnecessary retransmissions and may result | in addition to it. The RECOMMENDED time threshold, expressed as a | |||
| in degraded performance due to the actions of the congestion | fraction of the round-trip time (kTimeReorderingFraction), is 1/8. | |||
| controller upon detecting loss. Implementers MAY use algorithms | ||||
| developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's | ||||
| reordering resilience, though care should be taken to map TCP | ||||
| specifics to QUIC correctly. Similarly, using time-based loss | ||||
| detection to deal with reordering, such as in PR-TCP, should be more | ||||
| readily usable in QUIC. Making QUIC deal with such networks is | ||||
| important open research, and implementers are encouraged to explore | ||||
| this space. | ||||
| 3.2.2. Early Retransmit | 4.2.2. Early Retransmit | |||
| Unacknowledged packets close to the tail may have fewer than | Unacknowledged packets close to the tail may have fewer than | |||
| kReorderingThreshold retransmittable packets sent after them. Loss | kReorderingThreshold retransmittable packets sent after them. Loss | |||
| of such packets cannot be detected via Fast Retransmit. To enable | of such packets cannot be detected via Fast Retransmit. To enable | |||
| ack-based loss detection of such packets, receipt of an | ack-based loss detection of such packets, receipt of an | |||
| acknowledgment for the last outstanding retransmittable packet | acknowledgment for the last outstanding retransmittable packet | |||
| triggers the Early Retransmit process, as follows. | triggers the Early Retransmit process, as follows. | |||
| If there are unacknowledged retransmittable packets still pending, | If there are unacknowledged in-flight packets still pending, they | |||
| they should be marked as lost. To compensate for the reduced | should be marked as lost. To compensate for the reduced reordering | |||
| reordering resilience, the sender SHOULD set an alarm for a small | resilience, the sender SHOULD set a timer for a small period of time. | |||
| period of time. If the unacknowledged retransmittable packets are | If the unacknowledged in-flight packets are not acknowledged during | |||
| not acknowledged during this time, then these packets MUST be marked | this time, then these packets MUST be marked as lost. | |||
| as lost. | ||||
| An endpoint SHOULD set the alarm such that a packet is marked as lost | An endpoint SHOULD set the timer such that a packet is marked as lost | |||
| no earlier than 1.25 * max(SRTT, latest_RTT) since when it was sent. | no earlier than 1.125 * max(SRTT, latest_RTT) since when it was sent. | |||
| Using max(SRTT, latest_RTT) protects from the two following cases: | Using max(SRTT, latest_RTT) protects from the two following cases: | |||
| o the latest RTT sample is lower than the SRTT, perhaps due to | o the latest RTT sample is lower than the SRTT, perhaps due to | |||
| reordering where packet whose ack triggered the Early Retransit | reordering where packet whose ack triggered the Early Retransit | |||
| process encountered a shorter path; | process encountered a shorter path; | |||
| o the latest RTT sample is higher than the SRTT, perhaps due to a | o the latest RTT sample is higher than the SRTT, perhaps due to a | |||
| sustained increase in the actual RTT, but the smoothed SRTT has | sustained increase in the actual RTT, but the smoothed SRTT has | |||
| not yet caught up. | not yet caught up. | |||
| The 1.25 multiplier increases reordering resilience. Implementers | The 1.125 multiplier increases reordering resilience. Implementers | |||
| MAY experiment with using other multipliers, bearing in mind that a | MAY experiment with using other multipliers, bearing in mind that a | |||
| lower multiplier reduces reordering resilience and increases spurious | lower multiplier reduces reordering resilience and increases spurious | |||
| retransmissions, and a higher multipler increases loss recovery | retransmissions, and a higher multipler increases loss recovery | |||
| delay. | delay. | |||
| This mechanism is based on Early Retransmit for TCP [RFC5827]. | This mechanism is based on Early Retransmit for TCP [RFC5827]. | |||
| However, [RFC5827] does not include the alarm described above. Early | However, [RFC5827] does not include the timer described above. Early | |||
| Retransmit is prone to spurious retransmissions due to its reduced | Retransmit is prone to spurious retransmissions due to its reduced | |||
| reordering resilence without the alarm. This observation led Linux | reordering resilence without the timer. This observation led Linux | |||
| TCP implementers to implement an alarm for TCP as well, and this | TCP implementers to implement a timer for TCP as well, and this | |||
| document incorporates this advancement. | document incorporates this advancement. | |||
| 3.3. Timer-based Detection | 4.3. Timer-based Detection | |||
| Timer-based loss detection implements a handshake retransmission | Timer-based loss detection recovers from losses that cannot be | |||
| timer that is optimized for QUIC as well as the spirit of TCP's Tail | handled by ack-based loss detection. It uses a single timer which | |||
| Loss Probe and Retransmission Timeout mechanisms. | switches between a handshake retransmission timer, a Tail Loss Probe | |||
| timer and Retransmission Timeout mechanisms. | ||||
| 3.3.1. Crypto Handshake Timeout | 4.3.1. Crypto Handshake Timeout | |||
| Data in CRYPTO frames is critical to QUIC transport and crypto | Data in CRYPTO frames is critical to QUIC transport and crypto | |||
| negotiation, so a more aggressive timeout is used to retransmit it. | negotiation, so a more aggressive timeout is used to retransmit it. | |||
| Below, the term "handshake packet" is used to refer to packets | Below, the term "handshake packet" is used to refer to packets | |||
| containing CRYPTO frames, not packets with the specific long header | containing CRYPTO frames, not packets with the specific long header | |||
| packet type Handshake. | packet type Handshake. | |||
| The initial handshake timeout SHOULD be set to twice the initial RTT. | The initial handshake timeout SHOULD be set to twice the initial RTT. | |||
| At the beginning, there are no prior RTT samples within a connection. | At the beginning, there are no prior RTT samples within a connection. | |||
| Resumed connections over the same network SHOULD use the previous | Resumed connections over the same network SHOULD use the previous | |||
| connection's final smoothed RTT value as the resumed connection's | connection's final smoothed RTT value as the resumed connection's | |||
| initial RTT. | initial RTT. | |||
| If no previous RTT is available, or if the network changes, the | If no previous RTT is available, or if the network changes, the | |||
| initial RTT SHOULD be set to 100ms. | initial RTT SHOULD be set to 100ms. | |||
| When CRYPTO frames are sent, the sender SHOULD set an alarm for the | When CRYPTO frames are sent, the sender SHOULD set a timer for the | |||
| handshake timeout period. When the alarm fires, the sender MUST | handshake timeout period. Upon timeout, the sender MUST retransmit | |||
| retransmit all unacknowledged CRYPTO data by calling | all unacknowledged CRYPTO data by calling | |||
| RetransmitAllUnackedHandshakeData(). On each consecutive firing of | RetransmitAllUnackedHandshakeData(). On each consecutive expiration | |||
| the handshake alarm without receiving an acknowledgement for a new | of the handshake timer without receiving an acknowledgement for a new | |||
| packet, the sender SHOULD double the handshake timeout and set an | packet, the sender SHOULD double the handshake timeout and set a | |||
| alarm for this period. | timer for this period. | |||
| When CRYPTO frames are outstanding, the TLP and RTO timers are not | When CRYPTO frames are outstanding, the TLP and RTO timers are not | |||
| active unless the CRYPTO frames were sent at 1RTT encryption. | active unless the CRYPTO frames were sent at 1RTT encryption. | |||
| When an acknowledgement is received for a handshake packet, the new | When an acknowledgement is received for a handshake packet, the new | |||
| RTT is computed and the alarm SHOULD be set for twice the newly | RTT is computed and the timer SHOULD be set for twice the newly | |||
| computed smoothed RTT. | computed smoothed RTT. | |||
| 3.3.1.1. Retry | 4.3.1.1. Retry and Version Negotiation | |||
| A Retry packet causes the content of the client's Initial packet to | A Retry or Version Negotiation packet causes a client to send another | |||
| be immediately retransmitted along with the token present in the | Initial packet, effectively restarting the connection process. | |||
| Retry. | ||||
| The Retry indicates that the Initial was received but not processed. | Either packet indicates that the Initial was received but not | |||
| It MUST NOT be treated as an acknowledgment for the Initial, but it | processed. Neither packet can be treated as an acknowledgment for | |||
| MAY be used for an RTT measurement. | the Initial, but they MAY be used to improve the RTT estimate. | |||
| 3.3.2. Tail Loss Probe | 4.3.2. Tail Loss Probe | |||
| The algorithm described in this section is an adaptation of the Tail | The algorithm described in this section is an adaptation of the Tail | |||
| Loss Probe algorithm proposed for TCP [TLP]. | Loss Probe algorithm proposed for TCP [TLP]. | |||
| A packet sent at the tail is particularly vulnerable to slow loss | A packet sent at the tail is particularly vulnerable to slow loss | |||
| detection, since acks of subsequent packets are needed to trigger | detection, since acks of subsequent packets are needed to trigger | |||
| ack-based detection. To ameliorate this weakness of tail packets, | ack-based detection. To ameliorate this weakness of tail packets, | |||
| the sender schedules an alarm when the last retransmittable packet | the sender schedules a timer when the last retransmittable packet | |||
| before quiescence is transmitted. When this alarm fires, a Tail Loss | before quiescence is transmitted. Upon timeout, a Tail Loss Probe | |||
| Probe (TLP) packet is sent to evoke an acknowledgement from the | (TLP) packet is sent to evoke an acknowledgement from the receiver. | |||
| receiver. | ||||
| The alarm duration, or Probe Timeout (PTO), is set based on the | The timer duration, or Probe Timeout (PTO), is set based on the | |||
| following conditions: | following conditions: | |||
| o PTO SHOULD be scheduled for max(1.5*SRTT+MaxAckDelay, | o PTO SHOULD be scheduled for max(1.5*SRTT+MaxAckDelay, | |||
| kMinTLPTimeout) | kMinTLPTimeout) | |||
| o If RTO (Section 3.3.3) is earlier, schedule a TLP alarm in its | o If RTO (Section 4.3.3) is earlier, schedule a TLP in its place. | |||
| place. That is, PTO SHOULD be scheduled for min(RTO, PTO). | That is, PTO SHOULD be scheduled for min(RTO, PTO). | |||
| MaxAckDelay is the maximum ack delay supplied in an incoming ACK | ||||
| frame. MaxAckDelay excludes ack delays that aren't included in an | ||||
| RTT sample because they're too large and excludes those which | ||||
| reference an ack-only packet. | ||||
| QUIC diverges from TCP by calculating MaxAckDelay dynamically, | QUIC includes MaxAckDelay in all probe timeouts, because it assumes | |||
| instead of assuming a constant delayed ack timeout for all | the ack delay may come into play, regardless of the number of packets | |||
| connections. QUIC includes this in all probe timeouts, because it | outstanding. TCP's TLP assumes if at least 2 packets are | |||
| assume the ack delay may come into play, regardless of the number of | ||||
| packets outstanding. TCP's TLP assumes if at least 2 packets are | ||||
| outstanding, acks will not be delayed. | outstanding, acks will not be delayed. | |||
| A PTO value of at least 1.5*SRTT ensures that the ACK is overdue. | A PTO value of at least 1.5*SRTT ensures that the ACK is overdue. | |||
| The 1.5 is based on [TLP], but implementations MAY experiment with | The 1.5 is based on [TLP], but implementations MAY experiment with | |||
| other constants. | other constants. | |||
| To reduce latency, it is RECOMMENDED that the sender set and allow | To reduce latency, it is RECOMMENDED that the sender set and allow | |||
| the TLP alarm to fire twice before setting an RTO alarm. In other | the TLP timer to fire twice before setting an RTO timer. In other | |||
| words, when the TLP alarm fires the first time, a TLP packet is sent, | words, when the TLP timer expires the first time, a TLP packet is | |||
| and it is RECOMMENDED that the TLP alarm be scheduled for a second | sent, and it is RECOMMENDED that the TLP timer be scheduled for a | |||
| time. When the TLP alarm fires the second time, a second TLP packet | second time. When the TLP timer expires the second time, a second | |||
| is sent, and an RTO alarm SHOULD be scheduled Section 3.3.3. | TLP packet is sent, and an RTO timer SHOULD be scheduled | |||
| Section 4.3.3. | ||||
| A TLP packet SHOULD carry new data when possible. If new data is | A TLP packet SHOULD carry new data when possible. If new data is | |||
| unavailable or new data cannot be sent due to flow control, a TLP | unavailable or new data cannot be sent due to flow control, a TLP | |||
| packet MAY retransmit unacknowledged data to potentially reduce | packet MAY retransmit unacknowledged data to potentially reduce | |||
| recovery time. Since a TLP alarm is used to send a probe into the | recovery time. Since a TLP timer is used to send a probe into the | |||
| network prior to establishing any packet loss, prior unacknowledged | network prior to establishing any packet loss, prior unacknowledged | |||
| packets SHOULD NOT be marked as lost when a TLP alarm fires. | packets SHOULD NOT be marked as lost when a TLP timer expires. | |||
| A sender may not know that a packet being sent is a tail packet. | A sender may not know that a packet being sent is a tail packet. | |||
| Consequently, a sender may have to arm or adjust the TLP alarm on | Consequently, a sender may have to arm or adjust the TLP timer on | |||
| every sent retransmittable packet. | every sent retransmittable packet. | |||
| 3.3.3. Retransmission Timeout | 4.3.3. Retransmission Timeout | |||
| A Retransmission Timeout (RTO) alarm is the final backstop for loss | A Retransmission Timeout (RTO) timer is the final backstop for loss | |||
| detection. The algorithm used in QUIC is based on the RTO algorithm | detection. The algorithm used in QUIC is based on the RTO algorithm | |||
| for TCP [RFC5681] and is additionally resilient to spurious RTO | for TCP [RFC5681] and is additionally resilient to spurious RTO | |||
| events [RFC5682]. | events [RFC5682]. | |||
| When the last TLP packet is sent, an alarm is scheduled for the RTO | When the last TLP packet is sent, a timer is set for the RTO period. | |||
| period. When this alarm fires, the sender sends two packets, to | When this timer expires, the sender sends two packets, to evoke | |||
| evoke acknowledgements from the receiver, and restarts the RTO alarm. | acknowledgements from the receiver, and restarts the RTO timer. | |||
| Similar to TCP [RFC6298], the RTO period is set based on the | Similar to TCP [RFC6298], the RTO period is set based on the | |||
| following conditions: | following conditions: | |||
| o When the final TLP packet is sent, the RTO period is set to | o When the final TLP packet is sent, the RTO period is set to | |||
| max(SRTT + 4*RTTVAR + MaxAckDelay, kMinRTOTimeout) | max(SRTT + 4*RTTVAR + MaxAckDelay, kMinRTOTimeout) | |||
| o When an RTO alarm fires, the RTO period is doubled. | o When an RTO timer expires, the RTO period is doubled. | |||
| The sender typically has incurred a high latency penalty by the time | The sender typically has incurred a high latency penalty by the time | |||
| an RTO alarm fires, and this penalty increases exponentially in | an RTO timer expires, and this penalty increases exponentially in | |||
| subsequent consecutive RTO events. Sending a single packet on an RTO | subsequent consecutive RTO events. Sending a single packet on an RTO | |||
| event therefore makes the connection very sensitive to single packet | event therefore makes the connection very sensitive to single packet | |||
| loss. Sending two packets instead of one significantly increases | loss. Sending two packets instead of one significantly increases | |||
| resilience to packet drop in both directions, thus reducing the | resilience to packet drop in both directions, thus reducing the | |||
| probability of consecutive RTO events. | probability of consecutive RTO events. | |||
| QUIC's RTO algorithm differs from TCP in that the firing of an RTO | QUIC's RTO algorithm differs from TCP in that the firing of an RTO | |||
| alarm is not considered a strong enough signal of packet loss, so | timer is not considered a strong enough signal of packet loss, so | |||
| does not result in an immediate change to congestion window or | does not result in an immediate change to congestion window or | |||
| recovery state. An RTO alarm fires only when there's a prolonged | recovery state. An RTO timer expires only when there's a prolonged | |||
| period of network silence, which could be caused by a change in the | period of network silence, which could be caused by a change in the | |||
| underlying network RTT. | underlying network RTT. | |||
| QUIC also diverges from TCP by including MaxAckDelay in the RTO | QUIC also diverges from TCP by including MaxAckDelay in the RTO | |||
| period. QUIC is able to explicitly model delay at the receiver via | period. Since QUIC corrects for this delay in its SRTT and RTTVAR | |||
| the ack delay field in the ACK frame. Since QUIC corrects for this | computations, it is necessary to add this delay explicitly in the TLP | |||
| delay in its SRTT and RTTVAR computations, it is necessary to add | and RTO computation. | |||
| this delay explicitly in the TLP and RTO computation. | ||||
| When an acknowledgment is received for a packet sent on an RTO event, | When an acknowledgment is received for a packet sent on an RTO event, | |||
| any unacknowledged packets with lower packet numbers than those | any unacknowledged packets with lower packet numbers than those | |||
| acknowledged MUST be marked as lost. | acknowledged MUST be marked as lost. | |||
| A packet sent when an RTO alarm fires MAY carry new data if available | A packet sent when an RTO timer expires MAY carry new data if | |||
| or unacknowledged data to potentially reduce recovery time. Since | available or unacknowledged data to potentially reduce recovery time. | |||
| this packet is sent as a probe into the network prior to establishing | Since this packet is sent as a probe into the network prior to | |||
| any packet loss, prior unacknowledged packets SHOULD NOT be marked as | establishing any packet loss, prior unacknowledged packets SHOULD NOT | |||
| lost. | be marked as lost. | |||
| A packet sent on an RTO alarm MUST NOT be blocked by the sender's | A packet sent on an RTO timer MUST NOT be blocked by the sender's | |||
| congestion controller. A sender MUST however count these bytes as | congestion controller. A sender MUST however count these bytes as | |||
| additional bytes in flight, since this packet adds network load | additional bytes in flight, since this packet adds network load | |||
| without establishing packet loss. | without establishing packet loss. | |||
| 3.4. Generating Acknowledgements | 4.4. Generating Acknowledgements | |||
| QUIC SHOULD delay sending acknowledgements in response to packets, | QUIC SHOULD delay sending acknowledgements in response to packets, | |||
| but MUST NOT excessively delay acknowledgements of packets containing | but MUST NOT excessively delay acknowledgements of packets containing | |||
| frames other than ACK or ACN_ECN. Specifically, implementaions MUST | frames other than ACK or ACN_ECN. Specifically, implementaions MUST | |||
| attempt to enforce a maximum ack delay to avoid causing the peer | attempt to enforce a maximum ack delay to avoid causing the peer | |||
| spurious timeouts. The RECOMMENDED maximum ack delay in QUIC is | spurious timeouts. The RECOMMENDED maximum ack delay in QUIC is | |||
| 25ms. | 25ms. | |||
| An acknowledgement MAY be sent for every second full-sized packet, as | An acknowledgement MAY be sent for every second full-sized packet, as | |||
| TCP does [RFC5681], or may be sent less frequently, as long as the | TCP does [RFC5681], or may be sent less frequently, as long as the | |||
| skipping to change at page 12, line 34 ¶ | skipping to change at page 13, line 16 ¶ | |||
| Similarly, packets marked with the ECN Congestion Experienced (CE) | Similarly, packets marked with the ECN Congestion Experienced (CE) | |||
| codepoint in the IP header SHOULD be acknowledged immediately, to | codepoint in the IP header SHOULD be acknowledged immediately, to | |||
| reduce the peer's response time to congestion events. | reduce the peer's response time to congestion events. | |||
| As an optimization, a receiver MAY process multiple packets before | As an optimization, a receiver MAY process multiple packets before | |||
| sending any ACK frames in response. In this case they can determine | sending any ACK frames in response. In this case they can determine | |||
| whether an immediate or delayed acknowledgement should be generated | whether an immediate or delayed acknowledgement should be generated | |||
| after processing incoming packets. | after processing incoming packets. | |||
| 3.4.1. Crypto Handshake Data | 4.4.1. Crypto Handshake Data | |||
| In order to quickly complete the handshake and avoid spurious | In order to quickly complete the handshake and avoid spurious | |||
| retransmissions due to handshake alarm timeouts, handshake packets | retransmissions due to handshake timeouts, handshake packets SHOULD | |||
| SHOULD use a very short ack delay, such as 1ms. ACK frames MAY be | use a very short ack delay, such as 1ms. ACK frames MAY be sent | |||
| sent immediately when the crypto stack indicates all data for that | immediately when the crypto stack indicates all data for that | |||
| encryption level has been received. | encryption level has been received. | |||
| 3.4.2. ACK Ranges | 4.4.2. ACK Ranges | |||
| When an ACK frame is sent, one or more ranges of acknowledged packets | When an ACK frame is sent, one or more ranges of acknowledged packets | |||
| are included. Including older packets reduces the chance of spurious | are included. Including older packets reduces the chance of spurious | |||
| retransmits caused by losing previously sent ACK frames, at the cost | retransmits caused by losing previously sent ACK frames, at the cost | |||
| of larger ACK frames. | of larger ACK frames. | |||
| ACK frames SHOULD always acknowledge the most recently received | ACK frames SHOULD always acknowledge the most recently received | |||
| packets, and the more out-of-order the packets are, the more | packets, and the more out-of-order the packets are, the more | |||
| important it is to send an updated ACK frame quickly, to prevent the | important it is to send an updated ACK frame quickly, to prevent the | |||
| peer from declaring a packet as lost and spuriusly retransmitting the | peer from declaring a packet as lost and spuriusly retransmitting the | |||
| frames it contains. | frames it contains. | |||
| Below is one recommended approach for determining what packets to | Below is one recommended approach for determining what packets to | |||
| include in an ACK frame. | include in an ACK frame. | |||
| 3.4.3. Receiver Tracking of ACK Frames | 4.4.3. Receiver Tracking of ACK Frames | |||
| When a packet containing an ACK frame is sent, the largest | When a packet containing an ACK frame is sent, the largest | |||
| acknowledged in that frame may be saved. When a packet containing an | acknowledged in that frame may be saved. When a packet containing an | |||
| ACK frame is acknowledged, the receiver can stop acknowledging | ACK frame is acknowledged, the receiver can stop acknowledging | |||
| packets less than or equal to the largest acknowledged in the sent | packets less than or equal to the largest acknowledged in the sent | |||
| ACK frame. | ACK frame. | |||
| In cases without ACK frame loss, this algorithm allows for a minimum | In cases without ACK frame loss, this algorithm allows for a minimum | |||
| of 1 RTT of reordering. In cases with ACK frame loss, this approach | of 1 RTT of reordering. In cases with ACK frame loss, this approach | |||
| does not guarantee that every acknowledgement is seen by the sender | does not guarantee that every acknowledgement is seen by the sender | |||
| before it is no longer included in the ACK frame. Packets could be | before it is no longer included in the ACK frame. Packets could be | |||
| received out of order and all subsequent ACK frames containing them | received out of order and all subsequent ACK frames containing them | |||
| could be lost. In this case, the loss recovery algorithm may cause | could be lost. In this case, the loss recovery algorithm may cause | |||
| spurious retransmits, but the sender will continue making forward | spurious retransmits, but the sender will continue making forward | |||
| progress. | progress. | |||
| 3.5. Pseudocode | 4.5. Pseudocode | |||
| 3.5.1. Constants of interest | 4.5.1. Constants of interest | |||
| Constants used in loss recovery are based on a combination of RFCs, | Constants used in loss recovery are based on a combination of RFCs, | |||
| papers, and common practice. Some may need to be changed or | papers, and common practice. Some may need to be changed or | |||
| negotiated in order to better suit a variety of environments. | negotiated in order to better suit a variety of environments. | |||
| kMaxTLPs (RECOMMENDED 2): Maximum number of tail loss probes before | kMaxTLPs (RECOMMENDED 2): Maximum number of tail loss probes before | |||
| an RTO fires. | an RTO expires. | |||
| kReorderingThreshold (RECOMMENDED 3): Maximum reordering in packet | kReorderingThreshold (RECOMMENDED 3): Maximum reordering in packet | |||
| number space before FACK style loss detection considers a packet | number space before FACK style loss detection considers a packet | |||
| lost. | lost. | |||
| kTimeReorderingFraction (RECOMMENDED 1/8): Maximum reordering in | kTimeReorderingFraction (RECOMMENDED 1/8): Maximum reordering in | |||
| time space before time based loss detection considers a packet | time space before time based loss detection considers a packet | |||
| lost. In fraction of an RTT. | lost. In fraction of an RTT. | |||
| kUsingTimeLossDetection (RECOMMENDED false): Whether time based loss | kUsingTimeLossDetection (RECOMMENDED false): Whether time based loss | |||
| detection is in use. If false, uses FACK style loss detection. | detection is in use. If false, uses FACK style loss detection. | |||
| kMinTLPTimeout (RECOMMENDED 10ms): Minimum time in the future a tail | kMinTLPTimeout (RECOMMENDED 10ms): Minimum time in the future a tail | |||
| loss probe alarm may be set for. | loss probe timer may be set for. | |||
| kMinRTOTimeout (RECOMMENDED 200ms): Minimum time in the future an | kMinRTOTimeout (RECOMMENDED 200ms): Minimum time in the future an | |||
| RTO alarm may be set for. | RTO timer may be set for. | |||
| kDelayedAckTimeout (RECOMMENDED 25ms): The length of the peer's | kDelayedAckTimeout (RECOMMENDED 25ms): The length of the peer's | |||
| delayed ack timer. | delayed ack timer. | |||
| kInitialRtt (RECOMMENDED 100ms): The RTT used before an RTT sample | kInitialRtt (RECOMMENDED 100ms): The RTT used before an RTT sample | |||
| is taken. | is taken. | |||
| 3.5.2. Variables of interest | 4.5.2. Variables of interest | |||
| Variables required to implement the congestion control mechanisms are | Variables required to implement the congestion control mechanisms are | |||
| described in this section. | described in this section. | |||
| loss_detection_alarm: Multi-modal alarm used for loss detection. | loss_detection_timer: Multi-modal timer used for loss detection. | |||
| handshake_count: The number of times all unacknowledged handshake | handshake_count: The number of times all unacknowledged handshake | |||
| data has been retransmitted without receiving an ack. | data has been retransmitted without receiving an ack. | |||
| tlp_count: The number of times a tail loss probe has been sent | tlp_count: The number of times a tail loss probe has been sent | |||
| without receiving an ack. | without receiving an ack. | |||
| rto_count: The number of times an rto has been sent without | rto_count: The number of times an rto has been sent without | |||
| receiving an ack. | receiving an ack. | |||
| skipping to change at page 15, line 4 ¶ | skipping to change at page 15, line 33 ¶ | |||
| largest_acked_packet: The largest packet number acknowledged in an | largest_acked_packet: The largest packet number acknowledged in an | |||
| ACK frame. | ACK frame. | |||
| latest_rtt: The most recent RTT measurement made when receiving an | latest_rtt: The most recent RTT measurement made when receiving an | |||
| ack for a previously unacked packet. | ack for a previously unacked packet. | |||
| smoothed_rtt: The smoothed RTT of the connection, computed as | smoothed_rtt: The smoothed RTT of the connection, computed as | |||
| described in [RFC6298] | described in [RFC6298] | |||
| rttvar: The RTT variance, computed as described in [RFC6298] | rttvar: The RTT variance, computed as described in [RFC6298] | |||
| min_rtt: The minimum RTT seen in the connection, ignoring ack delay. | min_rtt: The minimum RTT seen in the connection, ignoring ack delay. | |||
| max_ack_delay: The maximum ack delay in an incoming ACK frame for | max_ack_delay: The maximum ack delay in an incoming ACK frame for | |||
| this connection. Excludes ack delays for ack only packets and | this connection. Excludes ack delays for non-retransmittable | |||
| those that create an RTT sample less than min_rtt. | packets and those that create an RTT sample less than min_rtt. | |||
| reordering_threshold: The largest packet number gap between the | reordering_threshold: The largest packet number gap between the | |||
| largest acked retransmittable packet and an unacknowledged | largest acknowledged retransmittable packet and an unacknowledged | |||
| retransmittable packet before it is declared lost. | retransmittable packet before it is declared lost. | |||
| time_reordering_fraction: The reordering window as a fraction of | time_reordering_fraction: The reordering window as a fraction of | |||
| max(smoothed_rtt, latest_rtt). | max(smoothed_rtt, latest_rtt). | |||
| loss_time: The time at which the next packet will be considered lost | loss_time: The time at which the next packet will be considered lost | |||
| based on early transmit or exceeding the reordering window in | based on early transmit or exceeding the reordering window in | |||
| time. | time. | |||
| sent_packets: An association of packet numbers to information about | sent_packets: An association of packet numbers to information about | |||
| them, including a number field indicating the packet number, a | them, including a number field indicating the packet number, a | |||
| time field indicating the time a packet was sent, a boolean | time field indicating the time a packet was sent, a boolean | |||
| indicating whether the packet is ack only, and a bytes field | indicating whether the packet is ack-only, a boolean indicating | |||
| whether it counts towards bytes in flight, and a bytes field | ||||
| indicating the packet's size. sent_packets is ordered by packet | indicating the packet's size. sent_packets is ordered by packet | |||
| number, and packets remain in sent_packets until acknowledged or | number, and packets remain in sent_packets until acknowledged or | |||
| lost. A sent_packets data structure is maintained per packet | lost. A sent_packets data structure is maintained per packet | |||
| number space, and ACK processing only applies to a single space. | number space, and ACK processing only applies to a single space. | |||
| 3.5.3. Initialization | 4.5.3. Initialization | |||
| At the beginning of the connection, initialize the loss detection | At the beginning of the connection, initialize the loss detection | |||
| variables as follows: | variables as follows: | |||
| loss_detection_alarm.reset() | loss_detection_timer.reset() | |||
| handshake_count = 0 | handshake_count = 0 | |||
| tlp_count = 0 | tlp_count = 0 | |||
| rto_count = 0 | rto_count = 0 | |||
| if (kUsingTimeLossDetection) | if (kUsingTimeLossDetection) | |||
| reordering_threshold = infinite | reordering_threshold = infinite | |||
| time_reordering_fraction = kTimeReorderingFraction | time_reordering_fraction = kTimeReorderingFraction | |||
| else: | else: | |||
| reordering_threshold = kReorderingThreshold | reordering_threshold = kReorderingThreshold | |||
| time_reordering_fraction = infinite | time_reordering_fraction = infinite | |||
| loss_time = 0 | loss_time = 0 | |||
| smoothed_rtt = 0 | smoothed_rtt = 0 | |||
| rttvar = 0 | rttvar = 0 | |||
| min_rtt = infinite | min_rtt = infinite | |||
| max_ack_delay = 0 | max_ack_delay = 0 | |||
| largest_sent_before_rto = 0 | largest_sent_before_rto = 0 | |||
| time_of_last_sent_retransmittable_packet = 0 | time_of_last_sent_retransmittable_packet = 0 | |||
| time_of_last_sent_handshake_packet = 0 | time_of_last_sent_handshake_packet = 0 | |||
| largest_sent_packet = 0 | largest_sent_packet = 0 | |||
| 3.5.4. On Sending a Packet | 4.5.4. On Sending a Packet | |||
| After any packet is sent, be it a new transmission or a rebundled | After any packet is sent, be it a new transmission or a rebundled | |||
| transmission, the following OnPacketSent function is called. The | transmission, the following OnPacketSent function is called. The | |||
| parameters to OnPacketSent are as follows: | parameters to OnPacketSent are as follows: | |||
| o packet_number: The packet number of the sent packet. | o packet_number: The packet number of the sent packet. | |||
| o is_ack_only: A boolean that indicates whether a packet only | o ack_only: A boolean that indicates whether a packet contains only | |||
| contains an ACK frame. If true, it is still expected an ack will | ACK or PADDING frame(s). If true, it is still expected an ack | |||
| be received for this packet, but it is not retransmittable. | will be received for this packet, but it is not retransmittable. | |||
| o is_handshake_packet: A boolean that indicates whether a packet | o in_flight: A boolean that indicates whether the packet counts | |||
| contains handshake data. | towards bytes in flight. | |||
| o is_handshake_packet: A boolean that indicates whether the packet | ||||
| contains cryptographic handshake messages critical to the | ||||
| completion of the QUIC handshake. In this version of QUIC, this | ||||
| includes any packet with the long header that includes a CRYPTO | ||||
| frame. | ||||
| o sent_bytes: The number of bytes sent in the packet, not including | o sent_bytes: The number of bytes sent in the packet, not including | |||
| UDP or IP overhead, but including QUIC framing overhead. | UDP or IP overhead, but including QUIC framing overhead. | |||
| Pseudocode for OnPacketSent follows: | Pseudocode for OnPacketSent follows: | |||
| OnPacketSent(packet_number, is_ack_only, is_handshake_packet, | OnPacketSent(packet_number, ack_only, in_flight, | |||
| sent_bytes): | is_handshake_packet, sent_bytes): | |||
| largest_sent_packet = packet_number | largest_sent_packet = packet_number | |||
| sent_packets[packet_number].packet_number = packet_number | sent_packets[packet_number].packet_number = packet_number | |||
| sent_packets[packet_number].time = now | sent_packets[packet_number].time = now | |||
| sent_packets[packet_number].ack_only = is_ack_only | sent_packets[packet_number].ack_only = ack_only | |||
| if !is_ack_only: | sent_packets[packet_number].in_flight = in_flight | |||
| if !ack_only: | ||||
| if is_handshake_packet: | if is_handshake_packet: | |||
| time_of_last_sent_handshake_packet = now | time_of_last_sent_handshake_packet = now | |||
| time_of_last_sent_retransmittable_packet = now | time_of_last_sent_retransmittable_packet = now | |||
| OnPacketSentCC(sent_bytes) | OnPacketSentCC(sent_bytes) | |||
| sent_packets[packet_number].bytes = sent_bytes | sent_packets[packet_number].bytes = sent_bytes | |||
| SetLossDetectionAlarm() | SetLossDetectionTimer() | |||
| 3.5.5. On Receiving an Acknowledgment | 4.5.5. On Receiving an Acknowledgment | |||
| When an ACK frame is received, it may acknowledge 0 or more packets. | When an ACK frame is received, it may newly acknowledge any number of | |||
| packets. | ||||
| Pseudocode for OnAckReceived and UpdateRtt follow: | Pseudocode for OnAckReceived and UpdateRtt follow: | |||
| OnAckReceived(ack): | OnAckReceived(ack): | |||
| largest_acked_packet = ack.largest_acked | largest_acked_packet = ack.largest_acked | |||
| // If the largest acked is newly acked, update the RTT. | // If the largest acknowledged is newly acked, | |||
| // update the RTT. | ||||
| if (sent_packets[ack.largest_acked]): | if (sent_packets[ack.largest_acked]): | |||
| latest_rtt = now - sent_packets[ack.largest_acked].time | latest_rtt = now - sent_packets[ack.largest_acked].time | |||
| UpdateRtt(latest_rtt, ack.ack_delay) | UpdateRtt(latest_rtt, ack.ack_delay) | |||
| // Find all newly acked packets. | // Find all newly acked packets. | |||
| for acked_packet in DetermineNewlyAckedPackets(): | for acked_packet in DetermineNewlyAckedPackets(): | |||
| OnPacketAcked(acked_packet.packet_number) | OnPacketAcked(acked_packet.packet_number) | |||
| DetectLostPackets(ack.largest_acked_packet) | DetectLostPackets(ack.largest_acked_packet) | |||
| SetLossDetectionAlarm() | SetLossDetectionTimer() | |||
| // Process ECN information if present. | // Process ECN information if present. | |||
| if (ACK frame contains ECN information): | if (ACK frame contains ECN information): | |||
| ProcessECN(ack) | ProcessECN(ack) | |||
| UpdateRtt(latest_rtt, ack_delay): | UpdateRtt(latest_rtt, ack_delay): | |||
| // min_rtt ignores ack delay. | // min_rtt ignores ack delay. | |||
| min_rtt = min(min_rtt, latest_rtt) | min_rtt = min(min_rtt, latest_rtt) | |||
| // Adjust for ack delay if it's plausible. | // Adjust for ack delay if it's plausible. | |||
| if (latest_rtt - min_rtt > ack_delay): | if (latest_rtt - min_rtt > ack_delay): | |||
| latest_rtt -= ack_delay | latest_rtt -= ack_delay | |||
| // Only save into max ack delay if it's used | // Only save into max ack delay if it's used | |||
| // for rtt calculation and is not ack only. | // for rtt calculation and is not ack-only. | |||
| if (!sent_packets[ack.largest_acked].ack_only) | if (!sent_packets[ack.largest_acked].ack_only) | |||
| max_ack_delay = max(max_ack_delay, ack_delay) | max_ack_delay = max(max_ack_delay, ack_delay) | |||
| // Based on {{RFC6298}}. | // Based on {{RFC6298}}. | |||
| if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
| smoothed_rtt = latest_rtt | smoothed_rtt = latest_rtt | |||
| rttvar = latest_rtt / 2 | rttvar = latest_rtt / 2 | |||
| else: | else: | |||
| rttvar_sample = abs(smoothed_rtt - latest_rtt) | rttvar_sample = abs(smoothed_rtt - latest_rtt) | |||
| rttvar = 3/4 * rttvar + 1/4 * rttvar_sample | rttvar = 3/4 * rttvar + 1/4 * rttvar_sample | |||
| smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | |||
| 3.5.6. On Packet Acknowledgment | 4.5.6. On Packet Acknowledgment | |||
| When a packet is acked for the first time, the following | When a packet is acked for the first time, the following | |||
| OnPacketAcked function is called. Note that a single ACK frame may | OnPacketAcked function is called. Note that a single ACK frame may | |||
| newly acknowledge several packets. OnPacketAcked must be called once | newly acknowledge several packets. OnPacketAcked must be called once | |||
| for each of these newly acked packets. | for each of these newly acked packets. | |||
| OnPacketAcked takes one parameter, acked_packet, which is the struct | OnPacketAcked takes one parameter, acked_packet, which is the struct | |||
| of the newly acked packet. | of the newly acked packet. | |||
| If this is the first acknowledgement following RTO, check if the | If this is the first acknowledgement following RTO, check if the | |||
| skipping to change at page 19, line 18 ¶ | skipping to change at page 19, line 18 ¶ | |||
| [RFC5682]. | [RFC5682]. | |||
| Pseudocode for OnPacketAcked follows: | Pseudocode for OnPacketAcked follows: | |||
| OnPacketAcked(acked_packet): | OnPacketAcked(acked_packet): | |||
| if (!acked_packet.is_ack_only): | if (!acked_packet.is_ack_only): | |||
| OnPacketAckedCC(acked_packet) | OnPacketAckedCC(acked_packet) | |||
| // If a packet sent prior to RTO was acked, then the RTO | // If a packet sent prior to RTO was acked, then the RTO | |||
| // was spurious. Otherwise, inform congestion control. | // was spurious. Otherwise, inform congestion control. | |||
| if (rto_count > 0 && | if (rto_count > 0 && | |||
| acked_packet.packet_number > largest_sent_before_rto) | acked_packet.packet_number > largest_sent_before_rto): | |||
| OnRetransmissionTimeoutVerified() | OnRetransmissionTimeoutVerified( | |||
| acket_packet.packet_number) | ||||
| handshake_count = 0 | handshake_count = 0 | |||
| tlp_count = 0 | tlp_count = 0 | |||
| rto_count = 0 | rto_count = 0 | |||
| sent_packets.remove(acked_packet.packet_number) | sent_packets.remove(acked_packet.packet_number) | |||
| 3.5.7. Setting the Loss Detection Alarm | 4.5.7. Setting the Loss Detection Timer | |||
| QUIC loss detection uses a single alarm for all timer-based loss | QUIC loss detection uses a single timer for all timer-based loss | |||
| detection. The duration of the alarm is based on the alarm's mode, | detection. The duration of the timer is based on the timer's mode, | |||
| which is set in the packet and timer events further below. The | which is set in the packet and timer events further below. The | |||
| function SetLossDetectionAlarm defined below shows how the single | function SetLossDetectionTimer defined below shows how the single | |||
| timer is set based on the alarm mode. | timer is set. | |||
| 3.5.7.1. Handshake Alarm | 4.5.7.1. Handshake Timer | |||
| When a connection has unacknowledged handshake data, the handshake | When a connection has unacknowledged handshake data, the handshake | |||
| alarm is set and when it expires, all unacknowledgedd handshake data | timer is set and when it expires, all unacknowledgedd handshake data | |||
| is retransmitted. | is retransmitted. | |||
| When stateless rejects are in use, the connection is considered | When stateless rejects are in use, the connection is considered | |||
| immediately closed once a reject is sent, so no timer is set to | immediately closed once a reject is sent, so no timer is set to | |||
| retransmit the reject. | retransmit the reject. | |||
| Version negotiation packets are always stateless, and MUST be sent | Version negotiation packets are always stateless, and MUST be sent | |||
| once per handshake packet that uses an unsupported QUIC version, and | once per handshake packet that uses an unsupported QUIC version, and | |||
| MAY be sent in response to 0-RTT packets. | MAY be sent in response to 0-RTT packets. | |||
| 3.5.7.2. Tail Loss Probe and Retransmission Alarm | 4.5.7.2. Tail Loss Probe and Retransmission Timer | |||
| Tail loss probes [TLP] and retransmission timeouts [RFC6298] are an | Tail loss probes [TLP] and retransmission timeouts [RFC6298] are a | |||
| alarm based mechanism to recover from cases when there are | timer based mechanism to recover from cases when there are | |||
| outstanding retransmittable packets, but an acknowledgement has not | outstanding retransmittable packets, but an acknowledgement has not | |||
| been received in a timely manner. | been received in a timely manner. | |||
| The TLP and RTO timers are armed when there is not unacknowledged | The TLP and RTO timers are armed when there is not unacknowledged | |||
| handshake data. The TLP alarm is set until the max number of TLP | handshake data. The TLP timer is set until the max number of TLP | |||
| packets have been sent, and then the RTO timer is set. | packets have been sent, and then the RTO timer is set. | |||
| 3.5.7.3. Early Retransmit Alarm | 4.5.7.3. Early Retransmit Timer | |||
| Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It | Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It | |||
| is part of QUIC's time based loss detection, but is always enabled, | is part of QUIC's time based loss detection, but is always enabled, | |||
| even when only packet reordering loss detection is enabled. | even when only packet reordering loss detection is enabled. | |||
| 3.5.7.4. Pseudocode | 4.5.7.4. Pseudocode | |||
| Pseudocode for SetLossDetectionAlarm follows: | Pseudocode for SetLossDetectionTimer follows: | |||
| SetLossDetectionAlarm(): | SetLossDetectionTimer(): | |||
| // Don't arm the alarm if there are no packets with | // Don't arm timer if there are no retransmittable packets | |||
| // retransmittable data in flight. | // in flight. | |||
| if (bytes_in_flight == 0): | if (bytes_in_flight == 0): | |||
| loss_detection_alarm.cancel() | loss_detection_timer.cancel() | |||
| return | return | |||
| if (handshake packets are outstanding): | if (handshake packets are outstanding): | |||
| // Handshake retransmission alarm. | // Handshake retransmission timer. | |||
| if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
| alarm_duration = 2 * kInitialRtt | timeout = 2 * kInitialRtt | |||
| else: | else: | |||
| alarm_duration = 2 * smoothed_rtt | timeout = 2 * smoothed_rtt | |||
| alarm_duration = max(alarm_duration + max_ack_delay, | timeout = max(timeout + max_ack_delay, kMinTLPTimeout) | |||
| kMinTLPTimeout) | timeout = timeout * (2 ^ handshake_count) | |||
| alarm_duration = alarm_duration * (2 ^ handshake_count) | loss_detection_timer.set( | |||
| loss_detection_alarm.set( | time_of_last_sent_handshake_packet + timeout) | |||
| time_of_last_sent_handshake_packet + alarm_duration) | ||||
| return; | return; | |||
| else if (loss_time != 0): | else if (loss_time != 0): | |||
| // Early retransmit timer or time loss detection. | // Early retransmit timer or time loss detection. | |||
| alarm_duration = loss_time - | timeout = loss_time - | |||
| time_of_last_sent_retransmittable_packet | time_of_last_sent_retransmittable_packet | |||
| else: | else: | |||
| // RTO or TLP alarm | // RTO or TLP timer | |||
| // Calculate RTO duration | // Calculate RTO duration | |||
| alarm_duration = | timeout = | |||
| smoothed_rtt + 4 * rttvar + max_ack_delay | smoothed_rtt + 4 * rttvar + max_ack_delay | |||
| alarm_duration = max(alarm_duration, kMinRTOTimeout) | timeout = max(timeout, kMinRTOTimeout) | |||
| alarm_duration = alarm_duration * (2 ^ rto_count) | timeout = timeout * (2 ^ rto_count) | |||
| if (tlp_count < kMaxTLPs): | if (tlp_count < kMaxTLPs): | |||
| // Tail Loss Probe | // Tail Loss Probe | |||
| tlp_alarm_duration = max(1.5 * smoothed_rtt | tlp_timeout = max(1.5 * smoothed_rtt | |||
| + max_ack_delay, kMinTLPTimeout) | + max_ack_delay, kMinTLPTimeout) | |||
| alarm_duration = min(tlp_alarm_duration, alarm_duration) | timeout = min(tlp_timeout, timeout) | |||
| loss_detection_alarm.set( | loss_detection_timer.set( | |||
| time_of_last_sent_retransmittable_packet + alarm_duration) | time_of_last_sent_retransmittable_packet + timeout) | |||
| 3.5.8. On Alarm Firing | 4.5.8. On Timeout | |||
| QUIC uses one loss recovery alarm, which when set, can be in one of | QUIC uses one loss recovery timer, which when set, can be in one of | |||
| several modes. When the alarm fires, the mode determines the action | several modes. When the timer expires, the mode determines the | |||
| to be performed. | action to be performed. | |||
| Pseudocode for OnLossDetectionAlarm follows: | Pseudocode for OnLossDetectionTimeout follows: | |||
| OnLossDetectionAlarm(): | OnLossDetectionTimeout(): | |||
| if (handshake packets are outstanding): | if (handshake packets are outstanding): | |||
| // Handshake retransmission alarm. | // Handshake timeout. | |||
| RetransmitAllUnackedHandshakeData() | RetransmitAllUnackedHandshakeData() | |||
| handshake_count++ | handshake_count++ | |||
| else if (loss_time != 0): | else if (loss_time != 0): | |||
| // Early retransmit or Time Loss Detection | // Early retransmit or Time Loss Detection | |||
| DetectLostPackets(largest_acked_packet) | DetectLostPackets(largest_acked_packet) | |||
| else if (tlp_count < kMaxTLPs): | else if (tlp_count < kMaxTLPs): | |||
| // Tail Loss Probe. | // Tail Loss Probe. | |||
| SendOnePacket() | SendOnePacket() | |||
| tlp_count++ | tlp_count++ | |||
| else: | else: | |||
| // RTO. | // RTO. | |||
| if (rto_count == 0) | if (rto_count == 0) | |||
| largest_sent_before_rto = largest_sent_packet | largest_sent_before_rto = largest_sent_packet | |||
| SendTwoPackets() | SendTwoPackets() | |||
| rto_count++ | rto_count++ | |||
| SetLossDetectionAlarm() | SetLossDetectionTimer() | |||
| 3.5.9. Detecting Lost Packets | 4.5.9. Detecting Lost Packets | |||
| Packets in QUIC are only considered lost once a larger packet number | Packets in QUIC are only considered lost once a larger packet number | |||
| in the same packet number space is acknowledged. DetectLostPackets | in the same packet number space is acknowledged. DetectLostPackets | |||
| is called every time an ack is received and operates on the | is called every time an ack is received and operates on the | |||
| sent_packets for that packet number space. If the loss detection | sent_packets for that packet number space. If the loss detection | |||
| alarm fires and the loss_time is set, the previous largest acked | timer expires and the loss_time is set, the previous largest acked | |||
| packet is supplied. | packet is supplied. | |||
| 3.5.9.1. Pseudocode | 4.5.9.1. Pseudocode | |||
| DetectLostPackets takes one parameter, acked, which is the largest | DetectLostPackets takes one parameter, acked, which is the largest | |||
| acked packet. | acked packet. | |||
| Pseudocode for DetectLostPackets follows: | Pseudocode for DetectLostPackets follows: | |||
| DetectLostPackets(largest_acked): | DetectLostPackets(largest_acked): | |||
| loss_time = 0 | loss_time = 0 | |||
| lost_packets = {} | lost_packets = {} | |||
| delay_until_lost = infinite | delay_until_lost = infinite | |||
| if (kUsingTimeLossDetection): | if (kUsingTimeLossDetection): | |||
| delay_until_lost = | delay_until_lost = | |||
| (1 + time_reordering_fraction) * | (1 + time_reordering_fraction) * | |||
| max(latest_rtt, smoothed_rtt) | max(latest_rtt, smoothed_rtt) | |||
| else if (largest_acked.packet_number == largest_sent_packet): | else if (largest_acked.packet_number == largest_sent_packet): | |||
| // Early retransmit alarm. | // Early retransmit timer. | |||
| delay_until_lost = 5/4 * max(latest_rtt, smoothed_rtt) | delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) | |||
| foreach (unacked < largest_acked.packet_number): | foreach (unacked < largest_acked.packet_number): | |||
| time_since_sent = now() - unacked.time_sent | time_since_sent = now() - unacked.time_sent | |||
| delta = largest_acked.packet_number - unacked.packet_number | delta = largest_acked.packet_number - unacked.packet_number | |||
| if (time_since_sent > delay_until_lost || | if (time_since_sent > delay_until_lost || | |||
| delta > reordering_threshold): | delta > reordering_threshold): | |||
| sent_packets.remove(unacked.packet_number) | sent_packets.remove(unacked.packet_number) | |||
| if (!unacked.is_ack_only): | if (!unacked.is_ack_only): | |||
| lost_packets.insert(unacked) | lost_packets.insert(unacked) | |||
| else if (loss_time == 0 && delay_until_lost != infinite): | else if (loss_time == 0 && delay_until_lost != infinite): | |||
| loss_time = now() + delay_until_lost - time_since_sent | loss_time = now() + delay_until_lost - time_since_sent | |||
| // Inform the congestion controller of lost packets and | // Inform the congestion controller of lost packets and | |||
| // lets it decide whether to retransmit immediately. | // lets it decide whether to retransmit immediately. | |||
| if (!lost_packets.empty()): | if (!lost_packets.empty()): | |||
| OnPacketsLost(lost_packets) | OnPacketsLost(lost_packets) | |||
| 3.6. Discussion | 4.6. Discussion | |||
| The majority of constants were derived from best common practices | The majority of constants were derived from best common practices | |||
| among widely deployed TCP implementations on the internet. | among widely deployed TCP implementations on the internet. | |||
| Exceptions follow. | Exceptions follow. | |||
| A shorter delayed ack time of 25ms was chosen because longer delayed | A shorter delayed ack time of 25ms was chosen because longer delayed | |||
| acks can delay loss recovery and for the small number of connections | acks can delay loss recovery and for the small number of connections | |||
| where less than packet per 25ms is delivered, acking every packet is | where less than packet per 25ms is delivered, acking every packet is | |||
| beneficial to congestion control and loss recovery. | beneficial to congestion control and loss recovery. | |||
| The default initial RTT of 100ms was chosen because it is slightly | The default initial RTT of 100ms was chosen because it is slightly | |||
| higher than both the median and mean min_rtt typically observed on | higher than both the median and mean min_rtt typically observed on | |||
| the public internet. | the public internet. | |||
| 4. Congestion Control | 5. Congestion Control | |||
| QUIC's congestion control is based on TCP NewReno [RFC6582] | QUIC's congestion control is based on TCP NewReno [RFC6582]. NewReno | |||
| congestion control to determine the congestion window. QUIC | is a congestion window based congestion control. QUIC specifies the | |||
| congestion control is specified in bytes due to finer control and the | congestion window in bytes rather than packets due to finer control | |||
| ease of appropriate byte counting [RFC3465]. | and the ease of appropriate byte counting [RFC3465]. | |||
| QUIC hosts MUST NOT send packets if they would increase | QUIC hosts MUST NOT send packets if they would increase | |||
| bytes_in_flight (defined in Section 4.8.2) beyond the available | bytes_in_flight (defined in Section 5.8.2) beyond the available | |||
| congestion window, unless the packet is a probe packet sent after the | congestion window, unless the packet is a probe packet sent after the | |||
| TLP or RTO alarm fires, as described in Section 3.3.2 and | TLP or RTO timer expires, as described in Section 4.3.2 and | |||
| Section 3.3.3. | Section 4.3.3. | |||
| 4.1. Explicit Congestion Notification | Implementations MAY use other congestion control algorithms, and | |||
| endpoints MAY use different algorithms from one another. The signals | ||||
| QUIC provides for congestion control are generic and are designed to | ||||
| support different algorithms. | ||||
| 5.1. Explicit Congestion Notification | ||||
| If a path has been verified to support ECN, QUIC treats a Congestion | If a path has been verified to support ECN, QUIC treats a Congestion | |||
| Experienced codepoint in the IP header as a signal of congestion. | Experienced codepoint in the IP header as a signal of congestion. | |||
| This document specifies an endpoint's response when its peer receives | This document specifies an endpoint's response when its peer receives | |||
| packets with the Congestion Experienced codepoint. As discussed in | packets with the Congestion Experienced codepoint. As discussed in | |||
| [RFC8311], endpoints are permitted to experiment with other response | [RFC8311], endpoints are permitted to experiment with other response | |||
| functions. | functions. | |||
| 4.2. Slow Start | 5.2. Slow Start | |||
| QUIC begins every connection in slow start and exits slow start upon | QUIC begins every connection in slow start and exits slow start upon | |||
| loss or upon increase in the ECN-CE counter. QUIC re-enters slow | loss or upon increase in the ECN-CE counter. QUIC re-enters slow | |||
| start anytime the congestion window is less than sshthresh, which | start anytime the congestion window is less than ssthresh, which | |||
| typically only occurs after an RTO. While in slow start, QUIC | typically only occurs after an RTO. While in slow start, QUIC | |||
| increases the congestion window by the number of bytes acknowledged | increases the congestion window by the number of bytes acknowledged | |||
| when each ack is processed. | when each ack is processed. | |||
| 4.3. Congestion Avoidance | 5.3. Congestion Avoidance | |||
| Slow start exits to congestion avoidance. Congestion avoidance in | Slow start exits to congestion avoidance. Congestion avoidance in | |||
| NewReno uses an additive increase multiplicative decrease (AIMD) | NewReno uses an additive increase multiplicative decrease (AIMD) | |||
| approach that increases the congestion window by one MSS of bytes per | approach that increases the congestion window by one maximum packet | |||
| congestion window acknowledged. When a loss is detected, NewReno | size per congestion window acknowledged. When a loss is detected, | |||
| halves the congestion window and sets the slow start threshold to the | NewReno halves the congestion window and sets the slow start | |||
| new congestion window. | threshold to the new congestion window. | |||
| 4.4. Recovery Period | 5.4. Recovery Period | |||
| Recovery is a period of time beginning with detection of a lost | Recovery is a period of time beginning with detection of a lost | |||
| packet or an increase in the ECN-CE counter. Because QUIC | packet or an increase in the ECN-CE counter. Because QUIC | |||
| retransmits stream data and control frames, not packets, it defines | retransmits stream data and control frames, not packets, it defines | |||
| the end of recovery as a packet sent after the start of recovery | the end of recovery as a packet sent after the start of recovery | |||
| being acknowledged. This is slightly different from TCP's definition | being acknowledged. This is slightly different from TCP's definition | |||
| of recovery, which ends when the lost packet that started recovery is | of recovery, which ends when the lost packet that started recovery is | |||
| acknowledged. | acknowledged. | |||
| The recovery period limits congestion window reduction to once per | The recovery period limits congestion window reduction to once per | |||
| round trip. During recovery, the congestion window remains unchanged | round trip. During recovery, the congestion window remains unchanged | |||
| irrespective of new losses or increases in the ECN-CE counter. | irrespective of new losses or increases in the ECN-CE counter. | |||
| 4.5. Tail Loss Probe | 5.5. Tail Loss Probe | |||
| A TLP packet MUST NOT be blocked by the sender's congestion | A TLP packet MUST NOT be blocked by the sender's congestion | |||
| controller. The sender MUST however count these bytes as additional | controller. The sender MUST however count these bytes as additional | |||
| bytes-in-flight, since a TLP adds network load without establishing | bytes-in-flight, since a TLP adds network load without establishing | |||
| packet loss. | packet loss. | |||
| Acknowledgement or loss of tail loss probes are treated like any | Acknowledgement or loss of tail loss probes are treated like any | |||
| other packet. | other packet. | |||
| 4.6. Retransmission Timeout | 5.6. Retransmission Timeout | |||
| When retransmissions are sent due to a retransmission timeout alarm, | When retransmissions are sent due to a retransmission timeout timer, | |||
| no change is made to the congestion window until the next | no change is made to the congestion window until the next | |||
| acknowledgement arrives. The retransmission timeout is considered | acknowledgement arrives. The retransmission timeout is considered | |||
| spurious when this acknowledgement acknowledges packets sent prior to | spurious when this acknowledgement acknowledges packets sent prior to | |||
| the first retransmission timeout. The retransmission timeout is | the first retransmission timeout. The retransmission timeout is | |||
| considered valid when this acknowledgement acknowledges no packets | considered valid when this acknowledgement acknowledges no packets | |||
| sent prior to the first retransmission timeout. In this case, the | sent prior to the first retransmission timeout. In this case, the | |||
| congestion window MUST be reduced to the minimum congestion window | congestion window MUST be reduced to the minimum congestion window | |||
| and slow start is re-entered. | and slow start is re-entered. | |||
| 4.7. Pacing | 5.7. Pacing | |||
| This document does not specify a pacer, but it is RECOMMENDED that a | This document does not specify a pacer, but it is RECOMMENDED that a | |||
| sender pace sending of all retransmittable packets based on input | sender pace sending of all in-flight packets based on input from the | |||
| from the congestion controller. For example, a pacer might | congestion controller. For example, a pacer might distribute the | |||
| distribute the congestion window over the SRTT when used with a | congestion window over the SRTT when used with a window-based | |||
| window-based controller, and a pacer might use the rate estimate of a | controller, and a pacer might use the rate estimate of a rate-based | |||
| rate-based controller. | controller. | |||
| An implementation should take care to architect its congestion | An implementation should take care to architect its congestion | |||
| controller to work well with a pacer. For instance, a pacer might | controller to work well with a pacer. For instance, a pacer might | |||
| wrap the congestion controller and control the availability of the | wrap the congestion controller and control the availability of the | |||
| congestion window, or a pacer might pace out packets handed to it by | congestion window, or a pacer might pace out packets handed to it by | |||
| the congestion controller. Timely delivery of ACK frames is | the congestion controller. Timely delivery of ACK frames is | |||
| important for efficient loss recovery. Packets containing only ACK | important for efficient loss recovery. Packets containing only ACK | |||
| frames should therefore not be paced, to avoid delaying their | frames should therefore not be paced, to avoid delaying their | |||
| delivery to the peer. | delivery to the peer. | |||
| As an example of a well-known and publicly available implementation | As an example of a well-known and publicly available implementation | |||
| of a flow pacer, implementers are referred to the Fair Queue packet | of a flow pacer, implementers are referred to the Fair Queue packet | |||
| scheduler (fq qdisc) in Linux (3.11 onwards). | scheduler (fq qdisc) in Linux (3.11 onwards). | |||
| 4.8. Pseudocode | 5.8. Pseudocode | |||
| 4.8.1. Constants of interest | 5.8.1. Constants of interest | |||
| Constants used in congestion control are based on a combination of | Constants used in congestion control are based on a combination of | |||
| RFCs, papers, and common practice. Some may need to be changed or | RFCs, papers, and common practice. Some may need to be changed or | |||
| negotiated in order to better suit a variety of environments. | negotiated in order to better suit a variety of environments. | |||
| kInitialMss (RECOMMENDED 1460 bytes): The max packet size is used | kMaxDatagramSize (RECOMMENDED 1200 bytes): The sender's maximum | |||
| for calculating initial and minimum congestion windows. | payload size. Does not include UDP or IP overhead. The max | |||
| packet size is used for calculating initial and minimum congestion | ||||
| windows. | ||||
| kInitialWindow (RECOMMENDED 10 * kInitialMss): Limit on the initial | kInitialWindow (RECOMMENDED min(10 * kMaxDatagramSize, | |||
| amount of outstanding data in bytes. | ||||
| kMinimumWindow (RECOMMENDED 2 * kInitialMss): Minimum congestion | max(2* kMaxDatagramSize, 14600))): | |||
| window in bytes. | Default limit on the initial amount of outstanding data in bytes. | |||
| Taken from [RFC6928]. | ||||
| kMinimumWindow (RECOMMENDED 2 * kMaxDatagramSize): Minimum | ||||
| congestion window in bytes. | ||||
| kLossReductionFactor (RECOMMENDED 0.5): Reduction in congestion | kLossReductionFactor (RECOMMENDED 0.5): Reduction in congestion | |||
| window when a new loss event is detected. | window when a new loss event is detected. | |||
| 4.8.2. Variables of interest | 5.8.2. Variables of interest | |||
| Variables required to implement the congestion control mechanisms are | Variables required to implement the congestion control mechanisms are | |||
| described in this section. | described in this section. | |||
| ecn_ce_counter: The highest value reported for the ECN-CE counter by | ecn_ce_counter: The highest value reported for the ECN-CE counter by | |||
| the peer in an ACK_ECN frame. This variable is used to detect | the peer in an ACK_ECN frame. This variable is used to detect | |||
| increases in the reported ECN-CE counter. | increases in the reported ECN-CE counter. | |||
| bytes_in_flight: The sum of the size in bytes of all sent packets | bytes_in_flight: The sum of the size in bytes of all sent packets | |||
| that contain at least one retransmittable frame, and have not been | that contain at least one retransmittable or PADDING frame, and | |||
| acked or declared lost. The size does not include IP or UDP | have not been acked or declared lost. The size does not include | |||
| IP or UDP overhead, but does include the QUIC header and AEAD | ||||
| overhead. Packets only containing ACK frames do not count towards | overhead. Packets only containing ACK frames do not count towards | |||
| bytes_in_flight to ensure congestion control does not impede | bytes_in_flight to ensure congestion control does not impede | |||
| congestion feedback. | congestion feedback. | |||
| congestion_window: Maximum number of bytes-in-flight that may be | congestion_window: Maximum number of bytes-in-flight that may be | |||
| sent. | sent. | |||
| end_of_recovery: The largest packet number sent when QUIC detects a | end_of_recovery: The largest packet number sent when QUIC detects a | |||
| loss. When a larger packet is acknowledged, QUIC exits recovery. | loss. When a larger packet is acknowledged, QUIC exits recovery. | |||
| ssthresh: Slow start threshold in bytes. When the congestion window | ssthresh: Slow start threshold in bytes. When the congestion window | |||
| is below ssthresh, the mode is slow start and the window grows by | is below ssthresh, the mode is slow start and the window grows by | |||
| the number of bytes acknowledged. | the number of bytes acknowledged. | |||
| 4.8.3. Initialization | 5.8.3. Initialization | |||
| At the beginning of the connection, initialize the congestion control | At the beginning of the connection, initialize the congestion control | |||
| variables as follows: | variables as follows: | |||
| congestion_window = kInitialWindow | congestion_window = kInitialWindow | |||
| bytes_in_flight = 0 | bytes_in_flight = 0 | |||
| end_of_recovery = 0 | end_of_recovery = 0 | |||
| ssthresh = infinite | ssthresh = infinite | |||
| ecn_ce_counter = 0 | ecn_ce_counter = 0 | |||
| 4.8.4. On Packet Sent | 5.8.4. On Packet Sent | |||
| Whenever a packet is sent, and it contains non-ACK frames, the packet | Whenever a packet is sent, and it contains non-ACK frames, the packet | |||
| increases bytes_in_flight. | increases bytes_in_flight. | |||
| OnPacketSentCC(bytes_sent): | OnPacketSentCC(bytes_sent): | |||
| bytes_in_flight += bytes_sent | bytes_in_flight += bytes_sent | |||
| 4.8.5. On Packet Acknowledgement | 5.8.5. On Packet Acknowledgement | |||
| Invoked from loss detection's OnPacketAcked and is supplied with | Invoked from loss detection's OnPacketAcked and is supplied with | |||
| acked_packet from sent_packets. | acked_packet from sent_packets. | |||
| InRecovery(packet_number): | InRecovery(packet_number): | |||
| return packet_number <= end_of_recovery | return packet_number <= end_of_recovery | |||
| OnPacketAckedCC(acked_packet): | OnPacketAckedCC(acked_packet): | |||
| // Remove from bytes_in_flight. | // Remove from bytes_in_flight. | |||
| bytes_in_flight -= acked_packet.bytes | bytes_in_flight -= acked_packet.bytes | |||
| if (InRecovery(acked_packet.packet_number)): | if (InRecovery(acked_packet.packet_number)): | |||
| // Do not increase congestion window in recovery period. | // Do not increase congestion window in recovery period. | |||
| return | return | |||
| if (congestion_window < ssthresh): | if (congestion_window < ssthresh): | |||
| // Slow start. | // Slow start. | |||
| congestion_window += acked_packet.bytes | congestion_window += acked_packet.bytes | |||
| else: | else: | |||
| // Congestion avoidance. | // Congestion avoidance. | |||
| congestion_window += | congestion_window += kMaxDatagramSize * acked_packet.bytes | |||
| kInitialMss * acked_packet.bytes / congestion_window | / congestion_window | |||
| 4.8.6. On New Congestion Event | 5.8.6. On New Congestion Event | |||
| Invoked from ProcessECN and OnPacketLost when a new congestion event | Invoked from ProcessECN and OnPacketsLost when a new congestion event | |||
| is detected. Starts a new recovery period and reduces the congestion | is detected. Starts a new recovery period and reduces the congestion | |||
| window. | window. | |||
| CongestionEvent(packet_number): | CongestionEvent(packet_number): | |||
| // Start a new congestion event if packet_number | // Start a new congestion event if packet_number | |||
| // is larger than the end of the previous recovery epoch. | // is larger than the end of the previous recovery epoch. | |||
| if (!InRecovery(packet_number)): | if (!InRecovery(packet_number)): | |||
| end_of_recovery = largest_sent_packet | end_of_recovery = largest_sent_packet | |||
| congestion_window *= kMarkReductionFactor | congestion_window *= kLossReductionFactor | |||
| congestion_window = max(congestion_window, kMinimumWindow) | congestion_window = max(congestion_window, kMinimumWindow) | |||
| ssthresh = congestion_window | ||||
| 4.8.7. Process ECN Information | 5.8.7. Process ECN Information | |||
| Invoked when an ACK_ECN frame is received from the peer. | Invoked when an ACK_ECN frame is received from the peer. | |||
| ProcessECN(ack): | ProcessECN(ack): | |||
| // If the ECN-CE counter reported by the peer has increased, | // If the ECN-CE counter reported by the peer has increased, | |||
| // this could be a new congestion event. | // this could be a new congestion event. | |||
| if (ack.ce_counter > ecn_ce_counter): | if (ack.ce_counter > ecn_ce_counter): | |||
| ecn_ce_counter = ack.ce_counter | ecn_ce_counter = ack.ce_counter | |||
| // Start a new congestion event if the last acknowledged | // Start a new congestion event if the last acknowledged | |||
| // packet is past the end of the previous recovery epoch. | // packet is past the end of the previous recovery epoch. | |||
| CongestionEvent(ack.largest_acked_packet) | CongestionEvent(ack.largest_acked_packet) | |||
| 4.8.8. On Packets Lost | 5.8.8. On Packets Lost | |||
| Invoked by loss detection from DetectLostPackets when new packets are | Invoked by loss detection from DetectLostPackets when new packets are | |||
| detected lost. | detected lost. | |||
| OnPacketsLost(lost_packets): | OnPacketsLost(lost_packets): | |||
| // Remove lost packets from bytes_in_flight. | // Remove lost packets from bytes_in_flight. | |||
| for (lost_packet : lost_packets): | for (lost_packet : lost_packets): | |||
| bytes_in_flight -= lost_packet.bytes | bytes_in_flight -= lost_packet.bytes | |||
| largest_lost_packet = lost_packets.last() | largest_lost_packet = lost_packets.last() | |||
| // Start a new congestion epoch if the last lost packet | // Start a new congestion epoch if the last lost packet | |||
| // is past the end of the previous recovery epoch. | // is past the end of the previous recovery epoch. | |||
| CongestionEvent(largest_lost_packet.packet_number) | CongestionEvent(largest_lost_packet.packet_number) | |||
| 4.8.9. On Retransmission Timeout Verified | 5.8.9. On Retransmission Timeout Verified | |||
| QUIC decreases the congestion window to the minimum value once the | QUIC decreases the congestion window to the minimum value once the | |||
| retransmission timeout has been verified. | retransmission timeout has been verified and removes any packets sent | |||
| before the newly acknowledged RTO packet. | ||||
| OnRetransmissionTimeoutVerified() | OnRetransmissionTimeoutVerified(packet_number) | |||
| congestion_window = kMinimumWindow | congestion_window = kMinimumWindow | |||
| // Declare all packets prior to packet_number lost. | ||||
| for (sent_packet: sent_packets): | ||||
| if (sent_packet.packet_number < packet_number): | ||||
| bytes_in_flight -= lost_packet.bytes | ||||
| sent_packets.remove(sent_packet.packet_number) | ||||
| 5. IANA Considerations | 6. Security Considerations | |||
| 6.1. Congestion Signals | ||||
| Congestion control fundamentally involves the consumption of signals | ||||
| - both loss and ECN codepoints - from unauthenticated entities. On- | ||||
| path attackers can spoof or alter these signals. An attacker can | ||||
| cause endpoints to reduce their sending rate by dropping packets, or | ||||
| alter send rate by changing ECN codepoints. | ||||
| 6.2. Traffic Analysis | ||||
| Packets that carry only ACK frames can be heuristically identified by | ||||
| observing packet size. Acknowledgement patterns may expose | ||||
| information about link characteristics or application behavior. | ||||
| Endpoints can use PADDING frames or bundle acknowledgments with other | ||||
| frames to reduce leaked information. | ||||
| 6.3. Misreporting ECN Markings | ||||
| A receiver can misreport ECN markings to alter the congestion | ||||
| response of a sender. Suppressing reports of ECN-CE markings could | ||||
| cause a sender to increase their send rate. This increase could | ||||
| result in congestion and loss. | ||||
| A sender MAY attempt to detect suppression of reports by marking | ||||
| occasional packets that they send with ECN-CE. If a packet marked | ||||
| with ECN-CE is not reported as having been marked when the packet is | ||||
| acknowledged, the sender SHOULD then disable ECN for that path. | ||||
| Reporting additional ECN-CE markings will cause a sender to reduce | ||||
| their sending rate, which is similar in effect to advertising reduced | ||||
| connection flow control limits and so no advantage is gained by doing | ||||
| so. | ||||
| Endpoints choose the congestion controller that they use. Though | ||||
| congestion controllers generally treat reports of ECN-CE markings as | ||||
| equivalent to loss [RFC8311], the exact response for each controller | ||||
| could be different. Failure to correctly respond to information | ||||
| about ECN markings is therefore difficult to detect. | ||||
| 7. IANA Considerations | ||||
| This document has no IANA actions. Yet. | This document has no IANA actions. Yet. | |||
| 6. References | 8. References | |||
| 6.1. Normative References | 8.1. Normative References | |||
| [QUIC-TRANSPORT] | [QUIC-TRANSPORT] | |||
| Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | |||
| Multiplexed and Secure Transport", draft-ietf-quic- | Multiplexed and Secure Transport", draft-ietf-quic- | |||
| transport-13 (work in progress), June 2018. | transport-14 (work in progress), August 2018. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | |||
| Notification (ECN) Experimentation", RFC 8311, | Notification (ECN) Experimentation", RFC 8311, | |||
| DOI 10.17487/RFC8311, January 2018, | DOI 10.17487/RFC8311, January 2018, | |||
| <https://www.rfc-editor.org/info/rfc8311>. | <https://www.rfc-editor.org/info/rfc8311>. | |||
| 6.2. Informative References | 8.2. Informative References | |||
| [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | |||
| Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | |||
| 2003, <https://www.rfc-editor.org/info/rfc3465>. | 2003, <https://www.rfc-editor.org/info/rfc3465>. | |||
| [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | |||
| "Improving the Robustness of TCP to Non-Congestion | "Improving the Robustness of TCP to Non-Congestion | |||
| Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | |||
| <https://www.rfc-editor.org/info/rfc4653>. | <https://www.rfc-editor.org/info/rfc4653>. | |||
| skipping to change at page 30, line 27 ¶ | skipping to change at page 31, line 27 ¶ | |||
| NewReno Modification to TCP's Fast Recovery Algorithm", | NewReno Modification to TCP's Fast Recovery Algorithm", | |||
| RFC 6582, DOI 10.17487/RFC6582, April 2012, | RFC 6582, DOI 10.17487/RFC6582, April 2012, | |||
| <https://www.rfc-editor.org/info/rfc6582>. | <https://www.rfc-editor.org/info/rfc6582>. | |||
| [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., | [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., | |||
| and Y. Nishida, "A Conservative Loss Recovery Algorithm | and Y. Nishida, "A Conservative Loss Recovery Algorithm | |||
| Based on Selective Acknowledgment (SACK) for TCP", | Based on Selective Acknowledgment (SACK) for TCP", | |||
| RFC 6675, DOI 10.17487/RFC6675, August 2012, | RFC 6675, DOI 10.17487/RFC6675, August 2012, | |||
| <https://www.rfc-editor.org/info/rfc6675>. | <https://www.rfc-editor.org/info/rfc6675>. | |||
| [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, | ||||
| "Increasing TCP's Initial Window", RFC 6928, | ||||
| DOI 10.17487/RFC6928, April 2013, | ||||
| <https://www.rfc-editor.org/info/rfc6928>. | ||||
| [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | |||
| "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | |||
| Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | |||
| in progress), February 2013. | in progress), February 2013. | |||
| 6.3. URIs | 8.3. URIs | |||
| [1] https://mailarchive.ietf.org/arch/search/?email_list=quic | [1] https://mailarchive.ietf.org/arch/search/?email_list=quic | |||
| [2] https://github.com/quicwg | [2] https://github.com/quicwg | |||
| [3] https://github.com/quicwg/base-drafts/labels/-recovery | [3] https://github.com/quicwg/base-drafts/labels/-recovery | |||
| Appendix A. Change Log | Appendix A. Change Log | |||
| *RFC Editor's Note:* Please remove this section prior to | *RFC Editor's Note:* Please remove this section prior to | |||
| publication of a final version of this document. | publication of a final version of this document. | |||
| A.1. Since draft-ietf-quic-recovery-12 | A.1. Since draft-ietf-quic-recovery-13 | |||
| o Corrected the lack of ssthresh reduction in CongestionEvent | ||||
| pseudocode (#1598) | ||||
| o Considerations for ECN spoofing (#1426, #1626) | ||||
| o Clarifications for PADDING and congestion control (#837, #838, | ||||
| #1517, #1531, #1540) | ||||
| o Reduce early retransmission timer to RTT/8 (#945, #1581) | ||||
| o Packets are declared lost after an RTO is verified (#935, #1582) | ||||
| A.2. Since draft-ietf-quic-recovery-12 | ||||
| o Changes to manage separate packet number spaces and encryption | o Changes to manage separate packet number spaces and encryption | |||
| levels (#1190, #1242, #1413, #1450) | levels (#1190, #1242, #1413, #1450) | |||
| o Added ECN feedback mechanisms and handling; new ACK_ECN frame | o Added ECN feedback mechanisms and handling; new ACK_ECN frame | |||
| (#804, #805, #1372) | (#804, #805, #1372) | |||
| A.2. Since draft-ietf-quic-recovery-11 | A.3. Since draft-ietf-quic-recovery-11 | |||
| No significant changes. | No significant changes. | |||
| A.3. Since draft-ietf-quic-recovery-10 | A.4. Since draft-ietf-quic-recovery-10 | |||
| o Improved text on ack generation (#1139, #1159) | o Improved text on ack generation (#1139, #1159) | |||
| o Make references to TCP recovery mechanisms informational (#1195) | o Make references to TCP recovery mechanisms informational (#1195) | |||
| o Define time_of_last_sent_handshake_packet (#1171) | o Define time_of_last_sent_handshake_packet (#1171) | |||
| o Added signal from TLS the data it includes needs to be sent in a | o Added signal from TLS the data it includes needs to be sent in a | |||
| Retry packet (#1061, #1199) | Retry packet (#1061, #1199) | |||
| o Minimum RTT (min_rtt) is initialized with an infinite value | o Minimum RTT (min_rtt) is initialized with an infinite value | |||
| (#1169) | (#1169) | |||
| A.4. Since draft-ietf-quic-recovery-09 | A.5. Since draft-ietf-quic-recovery-09 | |||
| No significant changes. | No significant changes. | |||
| A.5. Since draft-ietf-quic-recovery-08 | A.6. Since draft-ietf-quic-recovery-08 | |||
| o Clarified pacing and RTO (#967, #977) | o Clarified pacing and RTO (#967, #977) | |||
| A.6. Since draft-ietf-quic-recovery-07 | A.7. Since draft-ietf-quic-recovery-07 | |||
| o Include Ack Delay in RTO(and TLP) computations (#981) | o Include Ack Delay in RTO(and TLP) computations (#981) | |||
| o Ack Delay in SRTT computation (#961) | o Ack Delay in SRTT computation (#961) | |||
| o Default RTT and Slow Start (#590) | o Default RTT and Slow Start (#590) | |||
| o Many editorial fixes. | o Many editorial fixes. | |||
| A.7. Since draft-ietf-quic-recovery-06 | A.8. Since draft-ietf-quic-recovery-06 | |||
| No significant changes. | No significant changes. | |||
| A.8. Since draft-ietf-quic-recovery-05 | A.9. Since draft-ietf-quic-recovery-05 | |||
| o Add more congestion control text (#776) | o Add more congestion control text (#776) | |||
| A.9. Since draft-ietf-quic-recovery-04 | A.10. Since draft-ietf-quic-recovery-04 | |||
| No significant changes. | No significant changes. | |||
| A.10. Since draft-ietf-quic-recovery-03 | A.11. Since draft-ietf-quic-recovery-03 | |||
| No significant changes. | No significant changes. | |||
| A.11. Since draft-ietf-quic-recovery-02 | A.12. Since draft-ietf-quic-recovery-02 | |||
| o Integrate F-RTO (#544, #409) | o Integrate F-RTO (#544, #409) | |||
| o Add congestion control (#545, #395) | o Add congestion control (#545, #395) | |||
| o Require connection abort if a skipped packet was acknowledged | o Require connection abort if a skipped packet was acknowledged | |||
| (#415) | (#415) | |||
| o Simplify RTO calculations (#142, #417) | o Simplify RTO calculations (#142, #417) | |||
| A.12. Since draft-ietf-quic-recovery-01 | A.13. Since draft-ietf-quic-recovery-01 | |||
| o Overview added to loss detection | o Overview added to loss detection | |||
| o Changes initial default RTT to 100ms | o Changes initial default RTT to 100ms | |||
| o Added time-based loss detection and fixes early retransmit | o Added time-based loss detection and fixes early retransmit | |||
| o Clarified loss recovery for handshake packets | o Clarified loss recovery for handshake packets | |||
| o Fixed references and made TCP references informative | o Fixed references and made TCP references informative | |||
| A.13. Since draft-ietf-quic-recovery-00 | A.14. Since draft-ietf-quic-recovery-00 | |||
| o Improved description of constants and ACK behavior | o Improved description of constants and ACK behavior | |||
| A.14. Since draft-iyengar-quic-loss-recovery-01 | A.15. Since draft-iyengar-quic-loss-recovery-01 | |||
| o Adopted as base for draft-ietf-quic-recovery | o Adopted as base for draft-ietf-quic-recovery | |||
| o Updated authors/editors list | o Updated authors/editors list | |||
| o Added table of contents | o Added table of contents | |||
| Acknowledgments | Acknowledgments | |||
| Authors' Addresses | Authors' Addresses | |||
| Jana Iyengar (editor) | Jana Iyengar (editor) | |||
| Fastly | Fastly | |||
| Email: jri.ietf@gmail.com | Email: jri.ietf@gmail.com | |||
| Ian Swett (editor) | Ian Swett (editor) | |||
| End of changes. 181 change blocks. | ||||
| 362 lines changed or deleted | 478 lines changed or added | |||
This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||