| draft-ietf-quic-recovery-16.txt | draft-ietf-quic-recovery-17.txt | |||
|---|---|---|---|---|
| QUIC J. Iyengar, Ed. | QUIC J. Iyengar, Ed. | |||
| Internet-Draft Fastly | Internet-Draft Fastly | |||
| Intended status: Standards Track I. Swett, Ed. | Intended status: Standards Track I. Swett, Ed. | |||
| Expires: April 26, 2019 Google | Expires: June 21, 2019 Google | |||
| October 23, 2018 | December 18, 2018 | |||
| QUIC Loss Detection and Congestion Control | QUIC Loss Detection and Congestion Control | |||
| draft-ietf-quic-recovery-16 | draft-ietf-quic-recovery-17 | |||
| Abstract | Abstract | |||
| This document describes loss detection and congestion control | This document describes loss detection and congestion control | |||
| mechanisms for QUIC. | mechanisms for QUIC. | |||
| Note to Readers | Note to Readers | |||
| Discussion of this draft takes place on the QUIC working group | Discussion of this draft takes place on the QUIC working group | |||
| mailing list (quic@ietf.org), which is archived at | mailing list (quic@ietf.org), which is archived at | |||
| skipping to change at page 1, line 42 ¶ | skipping to change at page 1, line 42 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on April 26, 2019. | This Internet-Draft will expire on June 21, 2019. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 26 ¶ | skipping to change at page 2, line 26 ¶ | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 | 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 | |||
| 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 | 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 | |||
| 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 | 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 | |||
| 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 5 | 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 5 | |||
| 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 | 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 | |||
| 3.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 | 3.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 6 | 3.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.1.5. Explicit Correction For Delayed ACKs . . . . . . . . 6 | 3.1.5. Explicit Correction For Delayed ACKs . . . . . . . . 6 | |||
| 4. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 7 | 4. Generating Acknowledgements . . . . . . . . . . . . . . . . . 7 | |||
| 4.1. Computing the RTT estimate . . . . . . . . . . . . . . . 7 | 4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . . . 7 | |||
| 4.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 7 | 4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 4.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 7 | 4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . . . 8 | |||
| 4.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 8 | 5. Computing the RTT estimate . . . . . . . . . . . . . . . . . 8 | |||
| 4.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 9 | 6. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 4.3.1. Crypto Retransmission Timeout . . . . . . . . . . . . 9 | 6.1. Acknowledgement-based Detection . . . . . . . . . . . . . 9 | |||
| 4.3.2. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 10 | 6.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 9 | |||
| 4.3.3. Retransmission Timeout . . . . . . . . . . . . . . . 11 | 6.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 10 | |||
| 4.4. Generating Acknowledgements . . . . . . . . . . . . . . . 12 | 6.2. Timeout Loss Detection . . . . . . . . . . . . . . . . . 10 | |||
| 4.4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . 13 | 6.2.1. Crypto Retransmission Timeout . . . . . . . . . . . . 10 | |||
| 4.4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . 13 | 6.2.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . 12 | |||
| 4.4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . 13 | 6.3. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 13 | |||
| 4.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 14 | 6.3.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 14 | |||
| 4.5.1. Constants of interest . . . . . . . . . . . . . . . . 14 | 6.4. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 4.5.2. Variables of interest . . . . . . . . . . . . . . . . 14 | 6.4.1. Constants of interest . . . . . . . . . . . . . . . . 14 | |||
| 4.5.3. Initialization . . . . . . . . . . . . . . . . . . . 16 | 6.4.2. Variables of interest . . . . . . . . . . . . . . . . 15 | |||
| 4.5.4. On Sending a Packet . . . . . . . . . . . . . . . . . 16 | 6.4.3. Initialization . . . . . . . . . . . . . . . . . . . 16 | |||
| 4.5.5. On Receiving an Acknowledgment . . . . . . . . . . . 17 | 6.4.4. On Sending a Packet . . . . . . . . . . . . . . . . . 16 | |||
| 4.5.6. On Packet Acknowledgment . . . . . . . . . . . . . . 19 | 6.4.5. On Receiving an Acknowledgment . . . . . . . . . . . 16 | |||
| 4.5.7. Setting the Loss Detection Timer . . . . . . . . . . 19 | 6.4.6. On Packet Acknowledgment . . . . . . . . . . . . . . 18 | |||
| 4.5.8. On Timeout . . . . . . . . . . . . . . . . . . . . . 20 | 6.4.7. Setting the Loss Detection Timer . . . . . . . . . . 18 | |||
| 4.5.9. Detecting Lost Packets . . . . . . . . . . . . . . . 21 | 6.4.8. On Timeout . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 4.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 22 | 6.4.9. Detecting Lost Packets . . . . . . . . . . . . . . . 20 | |||
| 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . 22 | 6.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 5.1. Explicit Congestion Notification . . . . . . . . . . . . 23 | 7. Congestion Control . . . . . . . . . . . . . . . . . . . . . 22 | |||
| 5.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 23 | 7.1. Explicit Congestion Notification . . . . . . . . . . . . 22 | |||
| 5.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 23 | 7.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
| 5.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 23 | 7.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 22 | |||
| 5.5. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 24 | 7.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 5.6. Retransmission Timeout . . . . . . . . . . . . . . . . . 24 | 7.5. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 5.7. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 24 | 7.6. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 5.8. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 25 | 7.7. Sending data after an idle period . . . . . . . . . . . . 24 | |||
| 5.8.1. Constants of interest . . . . . . . . . . . . . . . . 25 | 7.8. Discarding Packet Number Space State . . . . . . . . . . 24 | |||
| 5.8.2. Variables of interest . . . . . . . . . . . . . . . . 25 | 7.9. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 5.8.3. Initialization . . . . . . . . . . . . . . . . . . . 26 | 7.9.1. Constants of interest . . . . . . . . . . . . . . . . 24 | |||
| 5.8.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 26 | 7.9.2. Variables of interest . . . . . . . . . . . . . . . . 25 | |||
| 5.8.5. On Packet Acknowledgement . . . . . . . . . . . . . . 26 | 7.9.3. Initialization . . . . . . . . . . . . . . . . . . . 26 | |||
| 5.8.6. On New Congestion Event . . . . . . . . . . . . . . . 27 | 7.9.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 26 | |||
| 5.8.7. Process ECN Information . . . . . . . . . . . . . . . 27 | 7.9.5. On Packet Acknowledgement . . . . . . . . . . . . . . 26 | |||
| 5.8.8. On Packets Lost . . . . . . . . . . . . . . . . . . . 27 | 7.9.6. On New Congestion Event . . . . . . . . . . . . . . . 26 | |||
| 5.8.9. On Retransmission Timeout Verified . . . . . . . . . 28 | 7.9.7. Process ECN Information . . . . . . . . . . . . . . . 27 | |||
| 6. Security Considerations . . . . . . . . . . . . . . . . . . . 28 | 7.9.8. On Packets Lost . . . . . . . . . . . . . . . . . . . 27 | |||
| 6.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 28 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 27 | |||
| 6.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 28 | 8.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 28 | |||
| 6.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 28 | 8.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 28 | |||
| 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 | 8.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 28 | |||
| 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 8.1. Normative References . . . . . . . . . . . . . . . . . . 29 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 8.2. Informative References . . . . . . . . . . . . . . . . . 29 | 10.1. Normative References . . . . . . . . . . . . . . . . . . 29 | |||
| 8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | 10.2. Informative References . . . . . . . . . . . . . . . . . 29 | |||
| 10.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 31 | ||||
| Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 31 | Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 31 | |||
| A.1. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 31 | A.1. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 31 | |||
| A.2. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 31 | A.2. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 32 | |||
| A.3. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 31 | A.3. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 32 | |||
| A.4. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 31 | A.4. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 32 | |||
| A.5. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 31 | A.5. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 32 | |||
| A.6. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 32 | A.6. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 32 | |||
| A.7. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 32 | A.7. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 33 | |||
| A.8. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 32 | A.8. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 33 | |||
| A.9. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 32 | A.9. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 33 | |||
| A.10. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 32 | A.10. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 33 | |||
| A.11. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 32 | A.11. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 33 | |||
| A.12. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 32 | A.12. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 33 | |||
| A.13. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 32 | A.13. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 33 | |||
| A.14. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 33 | A.14. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 33 | |||
| A.15. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 33 | A.15. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 34 | |||
| A.16. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 33 | A.16. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 34 | |||
| Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 33 | A.17. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 34 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 | Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 | ||||
| 1. Introduction | 1. Introduction | |||
| QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | |||
| on decades of transport and security experience, and implements | on decades of transport and security experience, and implements | |||
| mechanisms that make it attractive as a modern general-purpose | mechanisms that make it attractive as a modern general-purpose | |||
| transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | |||
| QUIC implements the spirit of known TCP loss recovery mechanisms, | QUIC implements the spirit of known TCP loss recovery mechanisms, | |||
| described in RFCs, various Internet-drafts, and also those prevalent | described in RFCs, various Internet-drafts, and also those prevalent | |||
| skipping to change at page 4, line 29 ¶ | skipping to change at page 4, line 29 ¶ | |||
| 2. Conventions and Definitions | 2. Conventions and Definitions | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| Definitions of terms that are used in this document: | Definitions of terms that are used in this document: | |||
| ACK-only: Any packet containing only an ACK frame. | ACK-only: Any packet containing only one or more ACK frame(s). | |||
| In-flight: Packets are considered in-flight when they have been sent | In-flight: Packets are considered in-flight when they have been sent | |||
| and neither acknowledged nor declared lost, and they are not ACK- | and neither acknowledged nor declared lost, and they are not ACK- | |||
| only. | only. | |||
| Retransmittable Frames: All frames besides ACK or PADDING are | Ack-eliciting Frames: All frames besides ACK or PADDING are | |||
| considered retransmittable. | considered ack-eliciting. | |||
| Retransmittable Packets: Packets that contain retransmittable frames | Ack-eliciting Packets: Packets that contain ack-eliciting frames | |||
| elicit an ACK from the receiver and are called retransmittable | elicit an ACK from the receiver within the maximum ack delay and | |||
| packets. | are called ack-eliciting packets. | |||
| Crypto Packets: Packets containing CRYPTO data sent in Initial or | Crypto Packets: Packets containing CRYPTO data sent in Initial or | |||
| Handshake packets. | Handshake packets. | |||
| 3. Design of the QUIC Transmission Machinery | 3. Design of the QUIC Transmission Machinery | |||
| All transmissions in QUIC are sent with a packet-level header, which | All transmissions in QUIC are sent with a packet-level header, which | |||
| indicates the encryption level and includes a packet sequence number | indicates the encryption level and includes a packet sequence number | |||
| (referred to below as a packet number). The encryption level | (referred to below as a packet number). The encryption level | |||
| indicates the packet number space, as described in [QUIC-TRANSPORT]. | indicates the packet number space, as described in [QUIC-TRANSPORT]. | |||
| skipping to change at page 5, line 18 ¶ | skipping to change at page 5, line 18 ¶ | |||
| transmissions and retransmissions and eliminates significant | transmissions and retransmissions and eliminates significant | |||
| complexity from QUIC's interpretation of TCP loss detection | complexity from QUIC's interpretation of TCP loss detection | |||
| mechanisms. | mechanisms. | |||
| QUIC packets can contain multiple frames of different types. The | QUIC packets can contain multiple frames of different types. The | |||
| recovery mechanisms ensure that data and frames that need reliable | recovery mechanisms ensure that data and frames that need reliable | |||
| delivery are acknowledged or declared lost and sent in new packets as | delivery are acknowledged or declared lost and sent in new packets as | |||
| necessary. The types of frames contained in a packet affect recovery | necessary. The types of frames contained in a packet affect recovery | |||
| and congestion control logic: | and congestion control logic: | |||
| o All packets are acknowledged, though packets that contain only ACK | o All packets are acknowledged, though packets that contain no ack- | |||
| and PADDING frames are not acknowledged immediately. | eliciting frames are only acknowledged along with ack-eliciting | |||
| packets. | ||||
| o Long header packets that contain CRYPTO frames are critical to the | o Long header packets that contain CRYPTO frames are critical to the | |||
| performance of the QUIC handshake and use shorter timers for | performance of the QUIC handshake and use shorter timers for | |||
| acknowledgement and retransmission. | acknowledgement and retransmission. | |||
| o Packets that contain only ACK frames do not count toward | o Packets that contain only ACK frames do not count toward | |||
| congestion control limits and are not considered in-flight. Note | congestion control limits and are not considered in-flight. Note | |||
| that this means PADDING frames cause packets to contribute toward | that this means PADDING frames cause packets to contribute toward | |||
| bytes in flight without directly causing an acknowledgment to be | bytes in flight without directly causing an acknowledgment to be | |||
| sent. | sent. | |||
| skipping to change at page 6, line 7 ¶ | skipping to change at page 6, line 7 ¶ | |||
| QUIC uses separate packet number spaces for each encryption level, | QUIC uses separate packet number spaces for each encryption level, | |||
| except 0-RTT and all generations of 1-RTT keys use the same packet | except 0-RTT and all generations of 1-RTT keys use the same packet | |||
| number space. Separate packet number spaces ensures acknowledgement | number space. Separate packet number spaces ensures acknowledgement | |||
| of packets sent with one level of encryption will not cause spurious | of packets sent with one level of encryption will not cause spurious | |||
| retransmission of packets sent with a different encryption level. | retransmission of packets sent with a different encryption level. | |||
| Congestion control and RTT measurement are unified across packet | Congestion control and RTT measurement are unified across packet | |||
| number spaces. | number spaces. | |||
| 3.1.2. Monotonically Increasing Packet Numbers | 3.1.2. Monotonically Increasing Packet Numbers | |||
| TCP conflates transmission sequence number at the sender with | TCP conflates transmission order at the sender with delivery order at | |||
| delivery sequence number at the receiver, which results in | the receiver, which results in retransmissions of the same data | |||
| retransmissions of the same data carrying the same sequence number, | carrying the same sequence number, and consequently leads to | |||
| and consequently to problems caused by "retransmission ambiguity". | "retransmission ambiguity". QUIC separates the two: QUIC uses a | |||
| QUIC separates the two: QUIC uses a packet number for transmissions, | packet number to indicate transmission order, and any application | |||
| and any application data is sent in one or more streams, with | data is sent in one or more streams, with delivery order determined | |||
| delivery order determined by stream offsets encoded within STREAM | by stream offsets encoded within STREAM frames. | |||
| frames. | ||||
| QUIC's packet number is strictly increasing, and directly encodes | QUIC's packet number is strictly increasing within a packet number | |||
| transmission order. A higher QUIC packet number signifies that the | space, and directly encodes transmission order. A higher packet | |||
| packet was sent later, and a lower QUIC packet number signifies that | number signifies that the packet was sent later, and a lower packet | |||
| the packet was sent earlier. When a packet containing frames is | number signifies that the packet was sent earlier. When a packet | |||
| deemed lost, QUIC rebundles necessary frames in a new packet with a | containing ack-eliciting frames is detected lost, QUIC rebundles | |||
| new packet number, removing ambiguity about which packet is | necessary frames in a new packet with a new packet number, removing | |||
| acknowledged when an ACK is received. Consequently, more accurate | ambiguity about which packet is acknowledged when an ACK is received. | |||
| RTT measurements can be made, spurious retransmissions are trivially | Consequently, more accurate RTT measurements can be made, spurious | |||
| detected, and mechanisms such as Fast Retransmit can be applied | retransmissions are trivially detected, and mechanisms such as Fast | |||
| universally, based only on packet number. | Retransmit can be applied universally, based only on packet number. | |||
| This design point significantly simplifies loss detection mechanisms | This design point significantly simplifies loss detection mechanisms | |||
| for QUIC. Most TCP mechanisms implicitly attempt to infer | for QUIC. Most TCP mechanisms implicitly attempt to infer | |||
| transmission ordering based on TCP sequence numbers - a non-trivial | transmission ordering based on TCP sequence numbers - a non-trivial | |||
| task, especially when TCP timestamps are not available. | task, especially when TCP timestamps are not available. | |||
| 3.1.3. No Reneging | 3.1.3. No Reneging | |||
| QUIC ACKs contain information that is similar to TCP SACK, but QUIC | QUIC ACKs contain information that is similar to TCP SACK, but QUIC | |||
| does not allow any acked packet to be reneged, greatly simplifying | does not allow any acked packet to be reneged, greatly simplifying | |||
| skipping to change at page 7, line 8 ¶ | skipping to change at page 7, line 7 ¶ | |||
| QUIC ACKs explicitly encode the delay incurred at the receiver | QUIC ACKs explicitly encode the delay incurred at the receiver | |||
| between when a packet is received and when the corresponding ACK is | between when a packet is received and when the corresponding ACK is | |||
| sent. This allows the receiver of the ACK to adjust for receiver | sent. This allows the receiver of the ACK to adjust for receiver | |||
| delays, specifically the delayed ack timer, when estimating the path | delays, specifically the delayed ack timer, when estimating the path | |||
| RTT. This mechanism also allows a receiver to measure and report the | RTT. This mechanism also allows a receiver to measure and report the | |||
| delay from when a packet was received by the OS kernel, which is | delay from when a packet was received by the OS kernel, which is | |||
| useful in receivers which may incur delays such as context-switch | useful in receivers which may incur delays such as context-switch | |||
| latency before a userspace QUIC receiver processes a received packet. | latency before a userspace QUIC receiver processes a received packet. | |||
| 4. Loss Detection | 4. Generating Acknowledgements | |||
| QUIC senders use both ack information and timeouts to detect lost | QUIC SHOULD delay sending acknowledgements in response to packets, | |||
| packets, and this section provides a description of these algorithms. | but MUST NOT excessively delay acknowledgements of ack-eliciting | |||
| Estimating the network round-trip time (RTT) is critical to these | packets. Specifically, implementations MUST attempt to enforce a | |||
| algorithms and is described first. | maximum ack delay to avoid causing the peer spurious timeouts. The | |||
| maximum ack delay is communicated in the "max_ack_delay" transport | ||||
| parameter and the default value is 25ms. | ||||
| 4.1. Computing the RTT estimate | An acknowledgement SHOULD be sent immediately upon receipt of a | |||
| second packet but the delay SHOULD NOT exceed the maximum ack delay. | ||||
| QUIC recovery algorithms do not assume the peer generates an | ||||
| acknowledgement immediately when receiving a second full-packet. | ||||
| Out-of-order packets SHOULD be acknowledged more quickly, in order to | ||||
| accelerate loss recovery. The receiver SHOULD send an immediate ACK | ||||
| when it receives a new packet which is not one greater than the | ||||
| largest received packet number. | ||||
| Similarly, packets marked with the ECN Congestion Experienced (CE) | ||||
| codepoint in the IP header SHOULD be acknowledged immediately, to | ||||
| reduce the peer's response time to congestion events. | ||||
| As an optimization, a receiver MAY process multiple packets before | ||||
| sending any ACK frames in response. In this case they can determine | ||||
| whether an immediate or delayed acknowledgement should be generated | ||||
| after processing incoming packets. | ||||
| 4.1. Crypto Handshake Data | ||||
| In order to quickly complete the handshake and avoid spurious | ||||
| retransmissions due to crypto retransmission timeouts, crypto packets | ||||
| SHOULD use a very short ack delay, such as 1ms. ACK frames MAY be | ||||
| sent immediately when the crypto stack indicates all data for that | ||||
| packet number space has been received. | ||||
| 4.2. ACK Ranges | ||||
| When an ACK frame is sent, one or more ranges of acknowledged packets | ||||
| are included. Including older packets reduces the chance of spurious | ||||
| retransmits caused by losing previously sent ACK frames, at the cost | ||||
| of larger ACK frames. | ||||
| ACK frames SHOULD always acknowledge the most recently received | ||||
| packets, and the more out-of-order the packets are, the more | ||||
| important it is to send an updated ACK frame quickly, to prevent the | ||||
| peer from declaring a packet as lost and spuriously retransmitting | ||||
| the frames it contains. | ||||
| Below is one recommended approach for determining what packets to | ||||
| include in an ACK frame. | ||||
| 4.3. Receiver Tracking of ACK Frames | ||||
| When a packet containing an ACK frame is sent, the largest | ||||
| acknowledged in that frame may be saved. When a packet containing an | ||||
| ACK frame is acknowledged, the receiver can stop acknowledging | ||||
| packets less than or equal to the largest acknowledged in the sent | ||||
| ACK frame. | ||||
| In cases without ACK frame loss, this algorithm allows for a minimum | ||||
| of 1 RTT of reordering. In cases with ACK frame loss and reordering, | ||||
| this approach does not guarantee that every acknowledgement is seen | ||||
| by the sender before it is no longer included in the ACK frame. | ||||
| Packets could be received out of order and all subsequent ACK frames | ||||
| containing them could be lost. In this case, the loss recovery | ||||
| algorithm may cause spurious retransmits, but the sender will | ||||
| continue making forward progress. | ||||
| 5. Computing the RTT estimate | ||||
| RTT is calculated when an ACK frame arrives by computing the | RTT is calculated when an ACK frame arrives by computing the | |||
| difference between the current time and the time the largest newly | difference between the current time and the time the largest acked | |||
| acked packet was sent. If no packets are newly acknowledged, RTT | packet was sent. An RTT sample MUST NOT be taken for a packet that | |||
| cannot be calculated. When RTT is calculated, the ack delay field | is not newly acknowledged or not ack-eliciting. | |||
| from the ACK frame SHOULD be subtracted from the RTT as long as the | ||||
| result is larger than the Min RTT. If the result is smaller than the | When RTT is calculated, the ack delay field from the ACK frame SHOULD | |||
| min_rtt, the RTT should be used, but the ack delay field should be | be limited to the max_ack_delay specified by the peer. Limiting | |||
| ignored. | ack_delay to max_ack_delay ensures a peer specifying an extremely | |||
| small max_ack_delay doesn't cause more spurious timeouts than a peer | ||||
| that correctly specifies max_ack_delay. It SHOULD be subtracted from | ||||
| the RTT as long as the result is larger than the min_rtt. If the | ||||
| result is smaller than the min_rtt, the RTT should be used, but the | ||||
| ack delay field should be ignored. | ||||
| Like TCP, QUIC calculates both smoothed RTT and RTT variance similar | Like TCP, QUIC calculates both smoothed RTT and RTT variance similar | |||
| to those specified in [RFC6298]. | to those specified in [RFC6298]. | |||
| Min RTT is the minimum RTT measured over the connection, prior to | min_rtt is the minimum RTT measured over the connection, prior to | |||
| adjusting by ack delay. Ignoring ack delay for min RTT prevents | adjusting by ack delay. Ignoring ack delay for min RTT prevents | |||
| intentional or unintentional underestimation of min RTT, which in | intentional or unintentional underestimation of min RTT, which in | |||
| turn prevents underestimating smoothed RTT. | turn prevents underestimating smoothed RTT. | |||
| 4.2. Ack-based Detection | 6. Loss Detection | |||
| Ack-based loss detection implements the spirit of TCP's Fast | QUIC senders use both ack information and timeouts to detect lost | |||
| Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss | packets, and this section provides a description of these algorithms. | |||
| recovery [RFC6675]. This section provides an overview of how these | Estimating the network round-trip time (RTT) is critical to these | |||
| algorithms are implemented in QUIC. | algorithms and is described first. | |||
| 4.2.1. Fast Retransmit | If a packet is lost, the QUIC transport needs to recover from that | |||
| loss, such as by retransmitting the data, sending an updated frame, | ||||
| or abandoning the frame. For more information, see Section 13.2 of | ||||
| [QUIC-TRANSPORT]. | ||||
| An unacknowledged packet is marked as lost when an acknowledgment is | 6.1. Acknowledgement-based Detection | |||
| received for a packet that was sent a threshold number of packets | ||||
| (kReorderingThreshold) and/or a threshold amount of time after the | ||||
| unacknowledged packet. Receipt of the acknowledgement indicates that | ||||
| a later packet was received, while the reordering threshold provides | ||||
| some tolerance for reordering of packets in the network. | ||||
| The RECOMMENDED initial value for kReorderingThreshold is 3, based on | Acknowledgement-based loss detection implements the spirit of TCP's | |||
| TCP loss recovery [RFC5681] [RFC6675]. Some networks may exhibit | Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK], | |||
| higher degrees of reordering, causing a sender to detect spurious | SACK loss recovery [RFC6675], and RACK [RACK]. This section provides | |||
| losses. Spuriously declaring packets lost leads to unnecessary | an overview of how these algorithms are implemented in QUIC. | |||
| A packet is declared lost if it meets all the following conditions: | ||||
| o The packet is unacknowledged, in-flight, and was sent prior to an | ||||
| acknowledged packet. | ||||
| o Either its packet number is kPacketThreshold smaller than an | ||||
| acknowledged packet (Section 6.1.1), or it was sent long enough in | ||||
| the past (Section 6.1.2). | ||||
| The acknowledgement indicates that a packet sent later was delivered, | ||||
| while the packet and time thresholds provide some tolerance for | ||||
| packet reordering. | ||||
| Spuriously declaring packets as lost leads to unnecessary | ||||
| retransmissions and may result in degraded performance due to the | retransmissions and may result in degraded performance due to the | |||
| actions of the congestion controller upon detecting loss. | actions of the congestion controller upon detecting loss. | |||
| Implementers MAY use algorithms developed for TCP, such as TCP-NCR | Implementations that detect spurious retransmissions and increase the | |||
| [RFC4653], to improve QUIC's reordering resilience. | reordering threshold in packets or time MAY choose to start with | |||
| smaller initial reordering thresholds to minimize recovery latency. | ||||
| QUIC implementations can use time-based loss detection to handle | 6.1.1. Packet Threshold | |||
| reordering based on time elapsed since the packet was sent. This may | ||||
| be used either as a replacement for a packet reordering threshold or | ||||
| in addition to it. The RECOMMENDED time threshold, expressed as a | ||||
| fraction of the round-trip time (kTimeReorderingFraction), is 1/8. | ||||
| 4.2.2. Early Retransmit | The RECOMMENDED initial value for the packet reordering threshold | |||
| (kPacketThreshold) is 3, based on best practices for TCP loss | ||||
| detection [RFC5681] [RFC6675]. | ||||
| Unacknowledged packets close to the tail may have fewer than | Some networks may exhibit higher degrees of reordering, causing a | |||
| kReorderingThreshold retransmittable packets sent after them. Loss | sender to detect spurious losses. Implementers MAY use algorithms | |||
| of such packets cannot be detected via Fast Retransmit. To enable | developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's | |||
| ack-based loss detection of such packets, receipt of an | reordering resilience. | |||
| acknowledgment for the last outstanding retransmittable packet | ||||
| triggers the Early Retransmit process, as follows. | ||||
| If there are unacknowledged in-flight packets still pending, they | 6.1.2. Time Threshold | |||
| should be marked as lost. To compensate for the reduced reordering | ||||
| resilience, the sender SHOULD set a timer for a small period of time. | ||||
| If the unacknowledged in-flight packets are not acknowledged during | ||||
| this time, then these packets MUST be marked as lost. | ||||
| An endpoint SHOULD set the timer such that a packet is marked as lost | Once a later packet has been acknowledged, an endpoint SHOULD declare | |||
| no earlier than 1.125 * max(SRTT, latest_RTT) since when it was sent. | an earlier packet lost if it was sent a threshold amount of time in | |||
| the past. The time threshold is computed as kTimeThreshold * | ||||
| max(SRTT, latest_RTT). If packets sent prior to the largest | ||||
| acknowledged packet cannot yet be declared lost, then a timer SHOULD | ||||
| be set for the remaining time. | ||||
| The RECOMMENDED time threshold (kTimeThreshold), expressed as a | ||||
| round-trip time multiplier, is 9/8. | ||||
| Using max(SRTT, latest_RTT) protects from the two following cases: | Using max(SRTT, latest_RTT) protects from the two following cases: | |||
| o the latest RTT sample is lower than the SRTT, perhaps due to | o the latest RTT sample is lower than the SRTT, perhaps due to | |||
| reordering where packet whose ack triggered the Early Retransit | reordering where packet whose ack triggered the Early Retransmit | |||
| process encountered a shorter path; | process encountered a shorter path; | |||
| o the latest RTT sample is higher than the SRTT, perhaps due to a | o the latest RTT sample is higher than the SRTT, perhaps due to a | |||
| sustained increase in the actual RTT, but the smoothed SRTT has | sustained increase in the actual RTT, but the smoothed SRTT has | |||
| not yet caught up. | not yet caught up. | |||
| The 1.125 multiplier increases reordering resilience. Implementers | Implementers MAY experiment with using other reordering thresholds, | |||
| MAY experiment with using other multipliers, bearing in mind that a | including absolute thresholds, bearing in mind that a lower | |||
| lower multiplier reduces reordering resilience and increases spurious | multiplier reduces reordering resilience and increases spurious | |||
| retransmissions, and a higher multiplier increases loss recovery | retransmissions, and a higher multiplier increases loss detection | |||
| delay. | delay. | |||
| This mechanism is based on Early Retransmit for TCP [RFC5827]. | 6.2. Timeout Loss Detection | |||
| However, [RFC5827] does not include the timer described above. Early | ||||
| Retransmit is prone to spurious retransmissions due to its reduced | ||||
| reordering resilence without the timer. This observation led Linux | ||||
| TCP implementers to implement a timer for TCP as well, and this | ||||
| document incorporates this advancement. | ||||
| 4.3. Timer-based Detection | ||||
| Timer-based loss detection recovers from losses that cannot be | Timeout loss detection recovers from losses that cannot be handled by | |||
| handled by ack-based loss detection. It uses a single timer which | acknowledgement-based loss detection. It uses a single timer which | |||
| switches between a crypto retransmission timer, a Tail Loss Probe | switches between a crypto retransmission timer and a probe timer. | |||
| timer and Retransmission Timeout mechanisms. | ||||
| 4.3.1. Crypto Retransmission Timeout | 6.2.1. Crypto Retransmission Timeout | |||
| Data in CRYPTO frames is critical to QUIC transport and crypto | Data in CRYPTO frames is critical to QUIC transport and crypto | |||
| negotiation, so a more aggressive timeout is used to retransmit it. | negotiation, so a more aggressive timeout is used to retransmit it. | |||
| The initial crypto retransmission timeout SHOULD be set to twice the | The initial crypto retransmission timeout SHOULD be set to twice the | |||
| initial RTT. | initial RTT. | |||
| At the beginning, there are no prior RTT samples within a connection. | At the beginning, there are no prior RTT samples within a connection. | |||
| Resumed connections over the same network SHOULD use the previous | Resumed connections over the same network SHOULD use the previous | |||
| connection's final smoothed RTT value as the resumed connection's | connection's final smoothed RTT value as the resumed connection's | |||
| initial RTT. If no previous RTT is available, or if the network | initial RTT. If no previous RTT is available, or if the network | |||
| changes, the initial RTT SHOULD be set to 100ms. When an | changes, the initial RTT SHOULD be set to 100ms. When an | |||
| acknowledgement is received, a new RTT is computed and the timer | acknowledgement is received, a new RTT is computed and the timer | |||
| SHOULD be set for twice the newly computed smoothed RTT. | SHOULD be set for twice the newly computed smoothed RTT. | |||
| When crypto packets are sent, the sender MUST set a timer for the | When crypto packets are sent, the sender MUST set a timer for the | |||
| crypto timeout period. Upon timeout, the sender MUST retransmit all | crypto timeout period. Upon timeout, the sender MUST retransmit all | |||
| unacknowledged CRYPTO data if possible. | unacknowledged CRYPTO data if possible. | |||
| Until the server has validated the client's address on the path, the | Until the server has validated the client's address on the path, the | |||
| number of bytes it can send is limited, as specified in | amount of data it can send is limited, as specified in | |||
| [QUIC-TRANSPORT]. If not all unacknowledged CRYPTO data can be sent, | [QUIC-TRANSPORT]. If not all unacknowledged CRYPTO data can be sent, | |||
| then all unacknowledged CRYPTO data sent in Initial packets should be | then all unacknowledged CRYPTO data sent in Initial packets should be | |||
| retransmitted. If no bytes can be sent, then no alarm should be | retransmitted. If no data can be sent, then no alarm should be armed | |||
| armed until bytes have been received from the client. | until data has been received from the client. | |||
| Because the server could be blocked until more packets are received, | Because the server could be blocked until more packets are received, | |||
| the client MUST start the crypto retransmission timer even if there | the client MUST start the crypto retransmission timer even if there | |||
| is no unacknowledged CRYPTO data. If the timer expires and the | is no unacknowledged CRYPTO data. If the timer expires and the | |||
| client has no CRYPTO data to retransmit and does not have Handshake | client has no CRYPTO data to retransmit and does not have Handshake | |||
| keys, it SHOULD send an Initial packet in a UDP datagram of at least | keys, it SHOULD send an Initial packet in a UDP datagram of at least | |||
| 1200 octets. If the client has Handshake keys, it SHOULD send a | 1200 bytes. If the client has Handshake keys, it SHOULD send a | |||
| Handshake packet. | Handshake packet. | |||
| On each consecutive expiration of the crypto timer without receiving | On each consecutive expiration of the crypto timer without receiving | |||
| an acknowledgement for a new packet, the sender SHOULD double the | an acknowledgement for a new packet, the sender SHOULD double the | |||
| crypto retransmission timeout and set a timer for this period. | crypto retransmission timeout and set a timer for this period. | |||
| When crypto packets are outstanding, the TLP and RTO timers are not | When crypto packets are in flight, the probe timer (Section 6.2.2) is | |||
| active. | not active. | |||
| 4.3.1.1. Retry and Version Negotiation | 6.2.1.1. Retry and Version Negotiation | |||
| A Retry or Version Negotiation packet causes a client to send another | A Retry or Version Negotiation packet causes a client to send another | |||
| Initial packet, effectively restarting the connection process. | Initial packet, effectively restarting the connection process and | |||
| resetting congestion control and loss recovery state, including | ||||
| Either packet indicates that the Initial was received but not | resetting any pending timers. Either packet indicates that the | |||
| processed. Neither packet can be treated as an acknowledgment for | Initial was received but not processed. Neither packet can be | |||
| the Initial, but they MAY be used to improve the RTT estimate. | treated as an acknowledgment for the Initial. | |||
| 4.3.2. Tail Loss Probe | ||||
| The algorithm described in this section is an adaptation of the Tail | ||||
| Loss Probe algorithm proposed for TCP [TLP]. | ||||
| A packet sent at the tail is particularly vulnerable to slow loss | ||||
| detection, since acks of subsequent packets are needed to trigger | ||||
| ack-based detection. To ameliorate this weakness of tail packets, | ||||
| the sender schedules a timer when the last retransmittable packet | ||||
| before quiescence is transmitted. Upon timeout, a Tail Loss Probe | ||||
| (TLP) packet is sent to evoke an acknowledgement from the receiver. | ||||
| The timer duration, or Probe Timeout (PTO), is set based on the | ||||
| following conditions: | ||||
| o PTO SHOULD be scheduled for max(1.5*SRTT+MaxAckDelay, | ||||
| kMinTLPTimeout) | ||||
| o If RTO (Section 4.3.3) is earlier, schedule a TLP in its place. | ||||
| That is, PTO SHOULD be scheduled for min(RTO, PTO). | ||||
| QUIC includes MaxAckDelay in all probe timeouts, because it assumes | 6.2.1.2. Discarding Initial State | |||
| the ack delay may come into play, regardless of the number of packets | ||||
| outstanding. TCP's TLP assumes if at least 2 packets are | ||||
| outstanding, acks will not be delayed. | ||||
| A PTO value of at least 1.5*SRTT ensures that the ACK is overdue. | As described in Section 17.5.1 of [QUIC-TRANSPORT], endpoints stop | |||
| The 1.5 is based on [TLP], but implementations MAY experiment with | sending and receiving Initial packets once they start exchanging | |||
| other constants. | Handshake packets. At this point, all loss recovery state for the | |||
| Initial packet number space is also discarded. Packets that are in | ||||
| flight for the packet number space are not declared as either | ||||
| acknowledged or lost. After discarding state, new Initial packets | ||||
| will not be sent. | ||||
| To reduce latency, it is RECOMMENDED that the sender set and allow | The client MAY however compute an RTT estimate to the server as the | |||
| the TLP timer to fire twice before setting an RTO timer. In other | time period from when the first Initial was sent to when a Retry or a | |||
| words, when the TLP timer expires the first time, a TLP packet is | Version Negotiation packet is received. The client MAY use this | |||
| sent, and it is RECOMMENDED that the TLP timer be scheduled for a | value to seed the RTT estimator for a subsequent connection attempt | |||
| second time. When the TLP timer expires the second time, a second | to the server. | |||
| TLP packet is sent, and an RTO timer SHOULD be scheduled | ||||
| Section 4.3.3. | ||||
| A TLP packet SHOULD carry new data when possible. If new data is | 6.2.2. Probe Timeout | |||
| unavailable or new data cannot be sent due to flow control, a TLP | ||||
| packet MAY retransmit unacknowledged data to potentially reduce | ||||
| recovery time. Since a TLP timer is used to send a probe into the | ||||
| network prior to establishing any packet loss, prior unacknowledged | ||||
| packets SHOULD NOT be marked as lost when a TLP timer expires. | ||||
| A sender may not know that a packet being sent is a tail packet. | A Probe Timeout (PTO) triggers a probe packet when ack-eliciting data | |||
| Consequently, a sender may have to arm or adjust the TLP timer on | is in flight but an acknowledgement is not received within the | |||
| every sent retransmittable packet. | expected period of time. A PTO enables a connection to recover from | |||
| loss of tail packets or acks. The PTO algorithm used in QUIC | ||||
| implements the reliability functions of Tail Loss Probe [TLP] [RACK], | ||||
| RTO [RFC5681] and F-RTO algorithms for TCP [RFC5682], and the timeout | ||||
| computation is based on TCP's retransmission timeout period | ||||
| [RFC6298]. | ||||
| 4.3.3. Retransmission Timeout | 6.2.2.1. Computing PTO | |||
| A Retransmission Timeout (RTO) timer is the final backstop for loss | When an ack-eliciting packet is transmitted, the sender schedules a | |||
| detection. The algorithm used in QUIC is based on the RTO algorithm | timer for the PTO period as follows: | |||
| for TCP [RFC5681] and is additionally resilient to spurious RTO | ||||
| events [RFC5682]. | ||||
| When the last TLP packet is sent, a timer is set for the RTO period. | PTO = max(smoothed_rtt + 4*rttvar + max_ack_delay, kGranularity) | |||
| When this timer expires, the sender sends two packets, to evoke | ||||
| acknowledgements from the receiver, and restarts the RTO timer. | ||||
| Similar to TCP [RFC6298], the RTO period is set based on the | kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in | |||
| following conditions: | Section 6.4.1 and Section 6.4.2. | |||
| o When the final TLP packet is sent, the RTO period is set to | The PTO period is the amount of time that a sender ought to wait for | |||
| max(SRTT + 4*RTTVAR + MaxAckDelay, kMinRTOTimeout) | an acknowledgement of a sent packet. This time period includes the | |||
| estimated network roundtrip-time (smoothed_rtt), the variance in the | ||||
| estimate (4*rttvar), and max_ack_delay, to account for the maximum | ||||
| time by which a receiver might delay sending an acknowledgement. | ||||
| o When an RTO timer expires, the RTO period is doubled. | The PTO value MUST be set to at least kGranularity, to avoid the | |||
| timer expiring immediately. | ||||
| The sender typically has incurred a high latency penalty by the time | When a PTO timer expires, the PTO period MUST be set to twice its | |||
| an RTO timer expires, and this penalty increases exponentially in | current value. This exponential reduction in the sender's rate is | |||
| subsequent consecutive RTO events. Sending a single packet on an RTO | important because the PTOs might be caused by loss of packets or | |||
| event therefore makes the connection very sensitive to single packet | acknowledgements due to severe congestion. | |||
| loss. Sending two packets instead of one significantly increases | ||||
| resilience to packet drop in both directions, thus reducing the | ||||
| probability of consecutive RTO events. | ||||
| QUIC's RTO algorithm differs from TCP in that the firing of an RTO | A sender computes its PTO timer every time an ack-eliciting packet is | |||
| timer is not considered a strong enough signal of packet loss, so | sent. A sender might choose to optimize this by setting the timer | |||
| does not result in an immediate change to congestion window or | fewer times if it knows that more ack-eliciting packets will be sent | |||
| recovery state. An RTO timer expires only when there's a prolonged | within a short period of time. | |||
| period of network silence, which could be caused by a change in the | ||||
| underlying network RTT. | ||||
| QUIC also diverges from TCP by including MaxAckDelay in the RTO | 6.2.2.2. Sending Probe Packets | |||
| period. Since QUIC corrects for this delay in its SRTT and RTTVAR | ||||
| computations, it is necessary to add this delay explicitly in the TLP | ||||
| and RTO computation. | ||||
| When an acknowledgment is received for a packet sent on an RTO event, | When a PTO timer expires, the sender MUST send one ack-eliciting | |||
| any unacknowledged packets with lower packet numbers than those | packet as a probe. A sender MAY send up to two ack-eliciting | |||
| acknowledged MUST be marked as lost. If an acknowledgement for a | packets, to avoid an expensive consecutive PTO expiration due to a | |||
| packet sent on an RTO is received at the same time packets sent prior | single packet loss. | |||
| to the first RTO are acknowledged, the RTO is considered spurious and | ||||
| standard loss detection rules apply. | ||||
| A packet sent when an RTO timer expires MAY carry new data if | Consecutive PTO periods increase exponentially, and as a result, | |||
| available or unacknowledged data to potentially reduce recovery time. | connection recovery latency increases exponentially as packets | |||
| Since this packet is sent as a probe into the network prior to | continue to be dropped in the network. Sending two packets on PTO | |||
| establishing any packet loss, prior unacknowledged packets SHOULD NOT | expiration increases resilience to packet drops, thus reducing the | |||
| be marked as lost. | probability of consecutive PTO events. | |||
| A packet sent on an RTO timer MUST NOT be blocked by the sender's | Probe packets sent on a PTO MUST be ack-eliciting. A probe packet | |||
| congestion controller. A sender MUST however count these bytes as | SHOULD carry new data when possible. A probe packet MAY carry | |||
| additional bytes in flight, since this packet adds network load | retransmitted unacknowledged data when new data is unavailable, when | |||
| without establishing packet loss. | flow control does not permit new data to be sent, or to | |||
| opportunistically reduce loss recovery delay. Implementations MAY | ||||
| use alternate strategies for determining the content of probe | ||||
| packets, including sending new or retransmitted data based on the | ||||
| application's priorities. | ||||
| 4.4. Generating Acknowledgements | 6.2.2.3. Loss Detection | |||
| QUIC SHOULD delay sending acknowledgements in response to packets, | Delivery or loss of packets in flight is established when an ACK | |||
| but MUST NOT excessively delay acknowledgements of packets containing | frame is received that newly acknowledges one or more packets. | |||
| frames other than ACK. Specifically, implementations MUST attempt to | ||||
| enforce a maximum ack delay to avoid causing the peer spurious | ||||
| timeouts. The maximum ack delay is communicated in the | ||||
| "max_ack_delay" transport parameter and the default value is 25ms. | ||||
| An acknowledgement SHOULD be sent immediately upon receipt of a | A PTO timer expiration event does not indicate packet loss and MUST | |||
| second packet but the delay SHOULD NOT exceed the maximum ack delay. | NOT cause prior unacknowledged packets to be marked as lost. After a | |||
| QUIC recovery algorithms do not assume the peer generates an | PTO timer has expired, an endpoint uses the following rules to mark | |||
| acknowledgement immediately when receiving a second full-packet. | packets as lost when an acknowledgement is received that newly | |||
| acknowledges packets. | ||||
| Out-of-order packets SHOULD be acknowledged more quickly, in order to | When an acknowledgement is received that newly acknowledges packets, | |||
| accelerate loss recovery. The receiver SHOULD send an immediate ACK | loss detection proceeds as dictated by packet and time threshold | |||
| when it receives a new packet which is not one greater than the | mechanisms, see Section 6.1. | |||
| largest received packet number. | ||||
| Similarly, packets marked with the ECN Congestion Experienced (CE) | 6.3. Tracking Sent Packets | |||
| codepoint in the IP header SHOULD be acknowledged immediately, to | ||||
| reduce the peer's response time to congestion events. | ||||
| As an optimization, a receiver MAY process multiple packets before | To correctly implement congestion control, a QUIC sender tracks every | |||
| sending any ACK frames in response. In this case they can determine | ack-eliciting packet until the packet is acknowledged or lost. It is | |||
| whether an immediate or delayed acknowledgement should be generated | expected that implementations will be able to access this information | |||
| after processing incoming packets. | by packet number and crypto context and store the per-packet fields | |||
| (Section 6.3.1) for loss recovery and congestion control. | ||||
| 4.4.1. Crypto Handshake Data | After a packet is declared lost, it SHOULD be tracked for an amount | |||
| of time comparable to the maximum expected packet reordering, such as | ||||
| 1 RTT. This allows for detection of spurious retransmissions. | ||||
| In order to quickly complete the handshake and avoid spurious | Sent packets are tracked for each packet number space, and ACK | |||
| retransmissions due to crypto retransmission timeouts, crypto packets | processing only applies to a single space. | |||
| SHOULD use a very short ack delay, such as 1ms. ACK frames MAY be | ||||
| sent immediately when the crypto stack indicates all data for that | ||||
| encryption level has been received. | ||||
| 4.4.2. ACK Ranges | 6.3.1. Sent Packet Fields | |||
| When an ACK frame is sent, one or more ranges of acknowledged packets | packet_number: The packet number of the sent packet. | |||
| are included. Including older packets reduces the chance of spurious | ||||
| retransmits caused by losing previously sent ACK frames, at the cost | ||||
| of larger ACK frames. | ||||
| ACK frames SHOULD always acknowledge the most recently received | ack_eliciting: A boolean that indicates whether a packet is ack- | |||
| packets, and the more out-of-order the packets are, the more | eliciting. If true, it is expected that an acknowledgement will | |||
| important it is to send an updated ACK frame quickly, to prevent the | be received, though the peer could delay sending the ACK frame | |||
| peer from declaring a packet as lost and spuriously retransmitting | containing it by up to the MaxAckDelay. | |||
| the frames it contains. | ||||
| Below is one recommended approach for determining what packets to | in_flight: A boolean that indicates whether the packet counts | |||
| include in an ACK frame. | towards bytes in flight. | |||
| 4.4.3. Receiver Tracking of ACK Frames | is_crypto_packet: A boolean that indicates whether the packet | |||
| contains cryptographic handshake messages critical to the | ||||
| completion of the QUIC handshake. In this version of QUIC, this | ||||
| includes any packet with the long header that includes a CRYPTO | ||||
| frame. | ||||
| When a packet containing an ACK frame is sent, the largest | sent_bytes: The number of bytes sent in the packet, not including | |||
| acknowledged in that frame may be saved. When a packet containing an | UDP or IP overhead, but including QUIC framing overhead. | |||
| ACK frame is acknowledged, the receiver can stop acknowledging | ||||
| packets less than or equal to the largest acknowledged in the sent | ||||
| ACK frame. | ||||
| In cases without ACK frame loss, this algorithm allows for a minimum | time_sent: The time the packet was sent. | |||
| of 1 RTT of reordering. In cases with ACK frame loss, this approach | ||||
| does not guarantee that every acknowledgement is seen by the sender | ||||
| before it is no longer included in the ACK frame. Packets could be | ||||
| received out of order and all subsequent ACK frames containing them | ||||
| could be lost. In this case, the loss recovery algorithm may cause | ||||
| spurious retransmits, but the sender will continue making forward | ||||
| progress. | ||||
| 4.5. Pseudocode | 6.4. Pseudocode | |||
| 4.5.1. Constants of interest | 6.4.1. Constants of interest | |||
| Constants used in loss recovery are based on a combination of RFCs, | Constants used in loss recovery are based on a combination of RFCs, | |||
| papers, and common practice. Some may need to be changed or | papers, and common practice. Some may need to be changed or | |||
| negotiated in order to better suit a variety of environments. | negotiated in order to better suit a variety of environments. | |||
| kMaxTLPs: Maximum number of tail loss probes before an RTO expires. | kPacketThreshold: Maximum reordering in packets before packet | |||
| The RECOMMENDED value is 2. | threshold loss detection considers a packet lost. The RECOMMENDED | |||
| value is 3. | ||||
| kReorderingThreshold: Maximum reordering in packet number space | ||||
| before FACK style loss detection considers a packet lost. The | ||||
| RECOMMENDED value is 3. | ||||
| kTimeReorderingFraction: Maximum reordering in time space before | ||||
| time based loss detection considers a packet lost. In fraction of | ||||
| an RTT. The RECOMMENDED value is 1/8. | ||||
| kUsingTimeLossDetection: Whether time based loss detection is in | ||||
| use. If false, uses FACK style loss detection. The RECOMMENDED | ||||
| value is false. | ||||
| kMinTLPTimeout: Minimum time in the future a tail loss probe timer | ||||
| may be set for. The RECOMMENDED value is 10ms. | ||||
| kMinRTOTimeout: Minimum time in the future an RTO timer may be set | kTimeThreshold: Maximum reordering in time before time threshold | |||
| for. The RECOMMENDED value is 200ms. | loss detection considers a packet lost. Specified as an RTT | |||
| multiplier. The RECOMMENDED value is 9/8. | ||||
| kDelayedAckTimeout: The length of the peer's delayed ack timer. The | kGranularity: Timer granularity. This is a system-dependent value. | |||
| RECOMMENDED value is 25ms. | However, implementations SHOULD use a value no smaller than 1ms. | |||
| kInitialRtt: The RTT used before an RTT sample is taken. The | kInitialRtt: The RTT used before an RTT sample is taken. The | |||
| RECOMMENDED value is 100ms. | RECOMMENDED value is 100ms. | |||
| 4.5.2. Variables of interest | 6.4.2. Variables of interest | |||
| Variables required to implement the congestion control mechanisms are | Variables required to implement the congestion control mechanisms are | |||
| described in this section. | described in this section. | |||
| loss_detection_timer: Multi-modal timer used for loss detection. | loss_detection_timer: Multi-modal timer used for loss detection. | |||
| crypto_count: The number of times all unacknowledged CRYPTO data has | crypto_count: The number of times all unacknowledged CRYPTO data has | |||
| been retransmitted without receiving an ack. | been retransmitted without receiving an ack. | |||
| tlp_count: The number of times a tail loss probe has been sent | pto_count: The number of times a PTO has been sent without receiving | |||
| without receiving an ack. | an ack. | |||
| rto_count: The number of times an RTO has been sent without | ||||
| receiving an ack. | ||||
| largest_sent_before_rto: The last packet number sent prior to the | ||||
| first retransmission timeout. | ||||
| time_of_last_sent_retransmittable_packet: The time the most recent | time_of_last_sent_ack_eliciting_packet: The time the most recent | |||
| retransmittable packet was sent. | ack-eliciting packet was sent. | |||
| time_of_last_sent_crypto_packet: The time the most recent crypto | time_of_last_sent_crypto_packet: The time the most recent crypto | |||
| packet was sent. | packet was sent. | |||
| largest_sent_packet: The packet number of the most recently sent | largest_sent_packet: The packet number of the most recently sent | |||
| packet. | packet. | |||
| largest_acked_packet: The largest packet number acknowledged in an | largest_acked_packet: The largest packet number acknowledged in the | |||
| ACK frame. | packet number space so far. | |||
| latest_rtt: The most recent RTT measurement made when receiving an | latest_rtt: The most recent RTT measurement made when receiving an | |||
| ack for a previously unacked packet. | ack for a previously unacked packet. | |||
| smoothed_rtt: The smoothed RTT of the connection, computed as | smoothed_rtt: The smoothed RTT of the connection, computed as | |||
| described in [RFC6298] | described in [RFC6298] | |||
| rttvar: The RTT variance, computed as described in [RFC6298] | rttvar: The RTT variance, computed as described in [RFC6298] | |||
| min_rtt: The minimum RTT seen in the connection, ignoring ack delay. | min_rtt: The minimum RTT seen in the connection, ignoring ack delay. | |||
| max_ack_delay: The maximum amount of time by which the receiver | max_ack_delay: The maximum amount of time by which the receiver | |||
| intends to delay acknowledgments, in milliseconds. The actual | intends to delay acknowledgments, in milliseconds. The actual | |||
| ack_delay in a received ACK frame may be larger due to late | ack_delay in a received ACK frame may be larger due to late | |||
| timers, reordering, or lost ACKs. | timers, reordering, or lost ACKs. | |||
| reordering_threshold: The largest packet number gap between the | ||||
| largest acknowledged retransmittable packet and an unacknowledged | ||||
| retransmittable packet before it is declared lost. | ||||
| time_reordering_fraction: The reordering window as a fraction of | ||||
| max(smoothed_rtt, latest_rtt). | ||||
| loss_time: The time at which the next packet will be considered lost | loss_time: The time at which the next packet will be considered lost | |||
| based on early transmit or exceeding the reordering window in | based on early transmit or exceeding the reordering window in | |||
| time. | time. | |||
| sent_packets: An association of packet numbers to information about | sent_packets: An association of packet numbers to information about | |||
| them, including a number field indicating the packet number, a | them. Described in detail above in Section 6.3. | |||
| time field indicating the time a packet was sent, a boolean | ||||
| indicating whether the packet is ack-only, a boolean indicating | ||||
| whether it counts towards bytes in flight, and a bytes field | ||||
| indicating the packet's size. sent_packets is ordered by packet | ||||
| number, and packets remain in sent_packets until acknowledged or | ||||
| lost. A sent_packets data structure is maintained per packet | ||||
| number space, and ACK processing only applies to a single space. | ||||
| 4.5.3. Initialization | 6.4.3. Initialization | |||
| At the beginning of the connection, initialize the loss detection | At the beginning of the connection, initialize the loss detection | |||
| variables as follows: | variables as follows: | |||
| loss_detection_timer.reset() | loss_detection_timer.reset() | |||
| crypto_count = 0 | crypto_count = 0 | |||
| tlp_count = 0 | pto_count = 0 | |||
| rto_count = 0 | ||||
| if (kUsingTimeLossDetection) | ||||
| reordering_threshold = infinite | ||||
| time_reordering_fraction = kTimeReorderingFraction | ||||
| else: | ||||
| reordering_threshold = kReorderingThreshold | ||||
| time_reordering_fraction = infinite | ||||
| loss_time = 0 | loss_time = 0 | |||
| smoothed_rtt = 0 | smoothed_rtt = 0 | |||
| rttvar = 0 | rttvar = 0 | |||
| min_rtt = infinite | min_rtt = infinite | |||
| largest_sent_before_rto = 0 | time_of_last_sent_ack_eliciting_packet = 0 | |||
| time_of_last_sent_retransmittable_packet = 0 | ||||
| time_of_last_sent_crypto_packet = 0 | time_of_last_sent_crypto_packet = 0 | |||
| largest_sent_packet = 0 | largest_sent_packet = 0 | |||
| largest_acked_packet = 0 | ||||
| 4.5.4. On Sending a Packet | 6.4.4. On Sending a Packet | |||
| After any packet is sent, be it a new transmission or a rebundled | ||||
| transmission, the following OnPacketSent function is called. The | ||||
| parameters to OnPacketSent are as follows: | ||||
| o packet_number: The packet number of the sent packet. | ||||
| o ack_only: A boolean that indicates whether a packet contains only | ||||
| ACK or PADDING frame(s). If true, it is still expected an ack | ||||
| will be received for this packet, but it is not retransmittable. | ||||
| o in_flight: A boolean that indicates whether the packet counts | ||||
| towards bytes in flight. | ||||
| o is_crypto_packet: A boolean that indicates whether the packet | ||||
| contains cryptographic handshake messages critical to the | ||||
| completion of the QUIC handshake. In this version of QUIC, this | ||||
| includes any packet with the long header that includes a CRYPTO | ||||
| frame. | ||||
| o sent_bytes: The number of bytes sent in the packet, not including | After a packet is sent, information about the packet is stored. The | |||
| UDP or IP overhead, but including QUIC framing overhead. | parameters to OnPacketSent are described in detail above in | |||
| Section 6.3.1. | ||||
| Pseudocode for OnPacketSent follows: | Pseudocode for OnPacketSent follows: | |||
| OnPacketSent(packet_number, ack_only, in_flight, | OnPacketSent(packet_number, ack_eliciting, in_flight, | |||
| is_crypto_packet, sent_bytes): | is_crypto_packet, sent_bytes): | |||
| largest_sent_packet = packet_number | largest_sent_packet = packet_number | |||
| sent_packets[packet_number].packet_number = packet_number | sent_packets[packet_number].packet_number = packet_number | |||
| sent_packets[packet_number].time = now | sent_packets[packet_number].time_sent = now | |||
| sent_packets[packet_number].ack_only = ack_only | sent_packets[packet_number].ack_eliciting = ack_eliciting | |||
| sent_packets[packet_number].in_flight = in_flight | sent_packets[packet_number].in_flight = in_flight | |||
| if !ack_only: | if (ack_eliciting): | |||
| if is_crypto_packet: | if (is_crypto_packet): | |||
| time_of_last_sent_crypto_packet = now | time_of_last_sent_crypto_packet = now | |||
| time_of_last_sent_retransmittable_packet = now | time_of_last_sent_ack_eliciting_packet = now | |||
| OnPacketSentCC(sent_bytes) | OnPacketSentCC(sent_bytes) | |||
| sent_packets[packet_number].bytes = sent_bytes | sent_packets[packet_number].size = sent_bytes | |||
| SetLossDetectionTimer() | SetLossDetectionTimer() | |||
| 4.5.5. On Receiving an Acknowledgment | 6.4.5. On Receiving an Acknowledgment | |||
| When an ACK frame is received, it may newly acknowledge any number of | When an ACK frame is received, it may newly acknowledge any number of | |||
| packets. | packets. | |||
| Pseudocode for OnAckReceived and UpdateRtt follow: | Pseudocode for OnAckReceived and UpdateRtt follow: | |||
| OnAckReceived(ack): | OnAckReceived(ack): | |||
| largest_acked_packet = ack.largest_acked | largest_acked_packet = max(largest_acked_packet, | |||
| // If the largest acknowledged is newly acked, | ack.largest_acked) | |||
| // update the RTT. | ||||
| if (sent_packets[ack.largest_acked]): | // If the largest acknowledged is newly acked and | |||
| latest_rtt = now - sent_packets[ack.largest_acked].time | // ack-eliciting, update the RTT. | |||
| if (sent_packets[ack.largest_acked] && | ||||
| sent_packets[ack.largest_acked].ack_eliciting): | ||||
| latest_rtt = | ||||
| now - sent_packets[ack.largest_acked].time_sent | ||||
| UpdateRtt(latest_rtt, ack.ack_delay) | UpdateRtt(latest_rtt, ack.ack_delay) | |||
| // Process ECN information if present. | ||||
| if (ACK frame contains ECN information): | ||||
| ProcessECN(ack) | ||||
| // Find all newly acked packets in this ACK frame | // Find all newly acked packets in this ACK frame | |||
| newly_acked_packets = DetermineNewlyAckedPackets(ack) | newly_acked_packets = DetermineNewlyAckedPackets(ack) | |||
| if (newly_acked_packets.empty()): | ||||
| return | ||||
| for acked_packet in newly_acked_packets: | for acked_packet in newly_acked_packets: | |||
| OnPacketAcked(acked_packet.packet_number) | OnPacketAcked(acked_packet.packet_number) | |||
| if !newly_acked_packets.empty(): | crypto_count = 0 | |||
| // Find the smallest newly acknowledged packet | pto_count = 0 | |||
| smallest_newly_acked = | ||||
| FindSmallestNewlyAcked(newly_acked_packets) | ||||
| // If any packets sent prior to RTO were acked, then the | ||||
| // RTO was spurious. Otherwise, inform congestion control. | ||||
| if (rto_count > 0 && | ||||
| smallest_newly_acked > largest_sent_before_rto): | ||||
| OnRetransmissionTimeoutVerified(smallest_newly_acked) | ||||
| crypto_count = 0 | ||||
| tlp_count = 0 | ||||
| rto_count = 0 | ||||
| DetectLostPackets(ack.largest_acked_packet) | DetectLostPackets() | |||
| SetLossDetectionTimer() | SetLossDetectionTimer() | |||
| // Process ECN information if present. | ||||
| if (ACK frame contains ECN information): | ||||
| ProcessECN(ack) | ||||
| UpdateRtt(latest_rtt, ack_delay): | UpdateRtt(latest_rtt, ack_delay): | |||
| // min_rtt ignores ack delay. | // min_rtt ignores ack delay. | |||
| min_rtt = min(min_rtt, latest_rtt) | min_rtt = min(min_rtt, latest_rtt) | |||
| // Limit ack_delay by max_ack_delay | ||||
| ack_delay = min(ack_delay, max_ack_delay) | ||||
| // Adjust for ack delay if it's plausible. | // Adjust for ack delay if it's plausible. | |||
| if (latest_rtt - min_rtt > ack_delay): | if (latest_rtt - min_rtt > ack_delay): | |||
| latest_rtt -= ack_delay | latest_rtt -= ack_delay | |||
| // Based on {{RFC6298}}. | // Based on {{RFC6298}}. | |||
| if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
| smoothed_rtt = latest_rtt | smoothed_rtt = latest_rtt | |||
| rttvar = latest_rtt / 2 | rttvar = latest_rtt / 2 | |||
| else: | else: | |||
| rttvar_sample = abs(smoothed_rtt - latest_rtt) | rttvar_sample = abs(smoothed_rtt - latest_rtt) | |||
| rttvar = 3/4 * rttvar + 1/4 * rttvar_sample | rttvar = 3/4 * rttvar + 1/4 * rttvar_sample | |||
| smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | |||
| 4.5.6. On Packet Acknowledgment | 6.4.6. On Packet Acknowledgment | |||
| When a packet is acked for the first time, the following | When a packet is acknowledged for the first time, the following | |||
| OnPacketAcked function is called. Note that a single ACK frame may | OnPacketAcked function is called. Note that a single ACK frame may | |||
| newly acknowledge several packets. OnPacketAcked must be called once | newly acknowledge several packets. OnPacketAcked must be called once | |||
| for each of these newly acked packets. | for each of these newly acknowledged packets. | |||
| OnPacketAcked takes one parameter, acked_packet, which is the struct | OnPacketAcked takes one parameter, acked_packet, which is the struct | |||
| of the newly acked packet. | detailed in Section 6.3.1. | |||
| If this is the first acknowledgement following RTO, check if the | ||||
| smallest newly acknowledged packet is one sent by the RTO, and if so, | ||||
| inform congestion control of a verified RTO, similar to F-RTO | ||||
| [RFC5682]. | ||||
| Pseudocode for OnPacketAcked follows: | Pseudocode for OnPacketAcked follows: | |||
| OnPacketAcked(acked_packet): | OnPacketAcked(acked_packet): | |||
| if (!acked_packet.is_ack_only): | if (acked_packet.ack_eliciting): | |||
| OnPacketAckedCC(acked_packet) | OnPacketAckedCC(acked_packet) | |||
| sent_packets.remove(acked_packet.packet_number) | sent_packets.remove(acked_packet.packet_number) | |||
| 4.5.7. Setting the Loss Detection Timer | 6.4.7. Setting the Loss Detection Timer | |||
| QUIC loss detection uses a single timer for all timer-based loss | QUIC loss detection uses a single timer for all timeout loss | |||
| detection. The duration of the timer is based on the timer's mode, | detection. The duration of the timer is based on the timer's mode, | |||
| which is set in the packet and timer events further below. The | which is set in the packet and timer events further below. The | |||
| function SetLossDetectionTimer defined below shows how the single | function SetLossDetectionTimer defined below shows how the single | |||
| timer is set. | timer is set. | |||
| This algorithm may result in the timer being set in the past, | ||||
| particularly if timers wake up late. Timers set in the past SHOULD | ||||
| fire immediately. | ||||
| Pseudocode for SetLossDetectionTimer follows: | Pseudocode for SetLossDetectionTimer follows: | |||
| SetLossDetectionTimer(): | SetLossDetectionTimer(): | |||
| // Don't arm timer if there are no retransmittable packets | // Don't arm timer if there are no ack-eliciting packets | |||
| // in flight. | // in flight. | |||
| if (bytes_in_flight == 0): | if (no ack-eliciting packets in flight): | |||
| loss_detection_timer.cancel() | loss_detection_timer.cancel() | |||
| return | return | |||
| if (crypto packets are outstanding): | if (crypto packets are in flight): | |||
| // Crypto retransmission timer. | // Crypto retransmission timer. | |||
| if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
| timeout = 2 * kInitialRtt | timeout = 2 * kInitialRtt | |||
| else: | else: | |||
| timeout = 2 * smoothed_rtt | timeout = 2 * smoothed_rtt | |||
| timeout = max(timeout, kMinTLPTimeout) | timeout = max(timeout, kGranularity) | |||
| timeout = timeout * (2 ^ crypto_count) | timeout = timeout * (2 ^ crypto_count) | |||
| loss_detection_timer.set( | loss_detection_timer.set( | |||
| time_of_last_sent_crypto_packet + timeout) | time_of_last_sent_crypto_packet + timeout) | |||
| return | return | |||
| if (loss_time != 0): | if (loss_time != 0): | |||
| // Early retransmit timer or time loss detection. | // Time threshold loss detection. | |||
| timeout = loss_time - | loss_detection_timer.set(loss_time) | |||
| time_of_last_sent_retransmittable_packet | return | |||
| else: | ||||
| // RTO or TLP timer | // Calculate PTO duration | |||
| // Calculate RTO duration | timeout = | |||
| timeout = | smoothed_rtt + 4 * rttvar + max_ack_delay | |||
| smoothed_rtt + 4 * rttvar + max_ack_delay | timeout = max(timeout, kGranularity) | |||
| timeout = max(timeout, kMinRTOTimeout) | timeout = timeout * (2 ^ pto_count) | |||
| timeout = timeout * (2 ^ rto_count) | ||||
| if (tlp_count < kMaxTLPs): | ||||
| // Tail Loss Probe | ||||
| tlp_timeout = max(1.5 * smoothed_rtt | ||||
| + max_ack_delay, kMinTLPTimeout) | ||||
| timeout = min(tlp_timeout, timeout) | ||||
| loss_detection_timer.set( | loss_detection_timer.set( | |||
| time_of_last_sent_retransmittable_packet + timeout) | time_of_last_sent_ack_eliciting_packet + timeout) | |||
| 4.5.8. On Timeout | 6.4.8. On Timeout | |||
| When the loss detection timer expires, the timer's mode determines | When the loss detection timer expires, the timer's mode determines | |||
| the action to be performed. | the action to be performed. | |||
| Pseudocode for OnLossDetectionTimeout follows: | Pseudocode for OnLossDetectionTimeout follows: | |||
| OnLossDetectionTimeout(): | OnLossDetectionTimeout(): | |||
| if (crypto packets are outstanding): | if (crypto packets are in flight): | |||
| // Crypto retransmission timeout. | // Crypto retransmission timeout. | |||
| RetransmitUnackedCryptoData() | RetransmitUnackedCryptoData() | |||
| crypto_count++ | crypto_count++ | |||
| else if (loss_time != 0): | else if (loss_time != 0): | |||
| // Early retransmit or Time Loss Detection | // Time threshold loss Detection | |||
| DetectLostPackets(largest_acked_packet) | DetectLostPackets() | |||
| else if (tlp_count < kMaxTLPs): | ||||
| // Tail Loss Probe. | ||||
| SendOnePacket() | ||||
| tlp_count++ | ||||
| else: | else: | |||
| // RTO. | // PTO | |||
| if (rto_count == 0) | ||||
| largest_sent_before_rto = largest_sent_packet | ||||
| SendTwoPackets() | SendTwoPackets() | |||
| rto_count++ | pto_count++ | |||
| SetLossDetectionTimer() | SetLossDetectionTimer() | |||
| 4.5.9. Detecting Lost Packets | 6.4.9. Detecting Lost Packets | |||
| Packets in QUIC are only considered lost once a larger packet number | ||||
| in the same packet number space is acknowledged. DetectLostPackets | ||||
| is called every time an ack is received and operates on the | ||||
| sent_packets for that packet number space. If the loss detection | ||||
| timer expires and the loss_time is set, the previous largest acked | ||||
| packet is supplied. | ||||
| 4.5.9.1. Pseudocode | ||||
| DetectLostPackets takes one parameter, acked, which is the largest | DetectLostPackets is called every time an ACK is received and | |||
| acked packet. | operates on the sent_packets for that packet number space. If the | |||
| loss detection timer expires and the loss_time is set, the previous | ||||
| largest acknowledged packet is supplied. | ||||
| Pseudocode for DetectLostPackets follows: | Pseudocode for DetectLostPackets follows: | |||
| DetectLostPackets(largest_acked): | DetectLostPackets(): | |||
| loss_time = 0 | loss_time = 0 | |||
| lost_packets = {} | lost_packets = {} | |||
| delay_until_lost = infinite | loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) | |||
| if (kUsingTimeLossDetection): | ||||
| delay_until_lost = | // Packets sent before this time are deemed lost. | |||
| (1 + time_reordering_fraction) * | lost_send_time = now() - loss_delay | |||
| max(latest_rtt, smoothed_rtt) | ||||
| else if (largest_acked.packet_number == largest_sent_packet): | // Packets with packet numbers before this are deemed lost. | |||
| // Early retransmit timer. | lost_pn = largest_acked_packet - kPacketThreshold | |||
| delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) | ||||
| foreach (unacked < largest_acked.packet_number): | foreach unacked in sent_packets: | |||
| time_since_sent = now() - unacked.time_sent | if (unacked.packet_number > largest_acked_packet): | |||
| delta = largest_acked.packet_number - unacked.packet_number | continue | |||
| if (time_since_sent > delay_until_lost || | ||||
| delta > reordering_threshold): | // Mark packet as lost, or set time when it should be marked. | |||
| if (unacked.time_sent <= lost_send_time || | ||||
| unacked.packet_number <= lost_pn): | ||||
| sent_packets.remove(unacked.packet_number) | sent_packets.remove(unacked.packet_number) | |||
| if (!unacked.is_ack_only): | if (unacked.in_flight): | |||
| lost_packets.insert(unacked) | lost_packets.insert(unacked) | |||
| else if (loss_time == 0 && delay_until_lost != infinite): | else if (loss_time == 0): | |||
| loss_time = now() + delay_until_lost - time_since_sent | loss_time = unacked.time_sent + loss_delay | |||
| else: | ||||
| loss_time = min(loss_time, unacked.time_sent + loss_delay) | ||||
| // Inform the congestion controller of lost packets and | // Inform the congestion controller of lost packets and | |||
| // lets it decide whether to retransmit immediately. | // let it decide whether to retransmit immediately. | |||
| if (!lost_packets.empty()): | if (!lost_packets.empty()): | |||
| OnPacketsLost(lost_packets) | OnPacketsLost(lost_packets) | |||
| 4.6. Discussion | 6.5. Discussion | |||
| The majority of constants were derived from best common practices | The majority of constants were derived from best common practices | |||
| among widely deployed TCP implementations on the internet. | among widely deployed TCP implementations on the internet. | |||
| Exceptions follow. | Exceptions follow. | |||
| A shorter delayed ack time of 25ms was chosen because longer delayed | A shorter delayed ack time of 25ms was chosen because longer delayed | |||
| acks can delay loss recovery and for the small number of connections | acks can delay loss recovery and for the small number of connections | |||
| where less than packet per 25ms is delivered, acking every packet is | where less than packet per 25ms is delivered, acking every packet is | |||
| beneficial to congestion control and loss recovery. | beneficial to congestion control and loss recovery. | |||
| The default initial RTT of 100ms was chosen because it is slightly | The default initial RTT of 100ms was chosen because it is slightly | |||
| higher than both the median and mean min_rtt typically observed on | higher than both the median and mean min_rtt typically observed on | |||
| the public internet. | the public internet. | |||
| 5. Congestion Control | 7. Congestion Control | |||
| QUIC's congestion control is based on TCP NewReno [RFC6582]. NewReno | QUIC's congestion control is based on TCP NewReno [RFC6582]. NewReno | |||
| is a congestion window based congestion control. QUIC specifies the | is a congestion window based congestion control. QUIC specifies the | |||
| congestion window in bytes rather than packets due to finer control | congestion window in bytes rather than packets due to finer control | |||
| and the ease of appropriate byte counting [RFC3465]. | and the ease of appropriate byte counting [RFC3465]. | |||
| QUIC hosts MUST NOT send packets if they would increase | QUIC hosts MUST NOT send packets if they would increase | |||
| bytes_in_flight (defined in Section 5.8.2) beyond the available | bytes_in_flight (defined in Section 7.9.2) beyond the available | |||
| congestion window, unless the packet is a probe packet sent after the | congestion window, unless the packet is a probe packet sent after a | |||
| TLP or RTO timer expires, as described in Section 4.3.2 and | PTO timer expires, as described in Section 6.2.2. | |||
| Section 4.3.3. | ||||
| Implementations MAY use other congestion control algorithms, and | Implementations MAY use other congestion control algorithms, such as | |||
| endpoints MAY use different algorithms from one another. The signals | Cubic [RFC8312], and endpoints MAY use different algorithms from one | |||
| QUIC provides for congestion control are generic and are designed to | another. The signals QUIC provides for congestion control are | |||
| support different algorithms. | generic and are designed to support different algorithms. | |||
| 5.1. Explicit Congestion Notification | 7.1. Explicit Congestion Notification | |||
| If a path has been verified to support ECN, QUIC treats a Congestion | If a path has been verified to support ECN, QUIC treats a Congestion | |||
| Experienced codepoint in the IP header as a signal of congestion. | Experienced codepoint in the IP header as a signal of congestion. | |||
| This document specifies an endpoint's response when its peer receives | This document specifies an endpoint's response when its peer receives | |||
| packets with the Congestion Experienced codepoint. As discussed in | packets with the Congestion Experienced codepoint. As discussed in | |||
| [RFC8311], endpoints are permitted to experiment with other response | [RFC8311], endpoints are permitted to experiment with other response | |||
| functions. | functions. | |||
| 5.2. Slow Start | 7.2. Slow Start | |||
| QUIC begins every connection in slow start and exits slow start upon | QUIC begins every connection in slow start and exits slow start upon | |||
| loss or upon increase in the ECN-CE counter. QUIC re-enters slow | loss or upon increase in the ECN-CE counter. QUIC re-enters slow | |||
| start anytime the congestion window is less than ssthresh, which | start anytime the congestion window is less than ssthresh, which | |||
| typically only occurs after an RTO. While in slow start, QUIC | typically only occurs after an PTO. While in slow start, QUIC | |||
| increases the congestion window by the number of bytes acknowledged | increases the congestion window by the number of bytes acknowledged | |||
| when each ack is processed. | when each acknowledgment is processed. | |||
| 5.3. Congestion Avoidance | 7.3. Congestion Avoidance | |||
| Slow start exits to congestion avoidance. Congestion avoidance in | Slow start exits to congestion avoidance. Congestion avoidance in | |||
| NewReno uses an additive increase multiplicative decrease (AIMD) | NewReno uses an additive increase multiplicative decrease (AIMD) | |||
| approach that increases the congestion window by one maximum packet | approach that increases the congestion window by one maximum packet | |||
| size per congestion window acknowledged. When a loss is detected, | size per congestion window acknowledged. When a loss is detected, | |||
| NewReno halves the congestion window and sets the slow start | NewReno halves the congestion window and sets the slow start | |||
| threshold to the new congestion window. | threshold to the new congestion window. | |||
| 5.4. Recovery Period | 7.4. Recovery Period | |||
| Recovery is a period of time beginning with detection of a lost | Recovery is a period of time beginning with detection of a lost | |||
| packet or an increase in the ECN-CE counter. Because QUIC | packet or an increase in the ECN-CE counter. Because QUIC does not | |||
| retransmits stream data and control frames, not packets, it defines | retransmit packets, it defines the end of recovery as a packet sent | |||
| the end of recovery as a packet sent after the start of recovery | after the start of recovery being acknowledged. This is slightly | |||
| being acknowledged. This is slightly different from TCP's definition | different from TCP's definition of recovery, which ends when the lost | |||
| of recovery, which ends when the lost packet that started recovery is | packet that started recovery is acknowledged. | |||
| acknowledged. | ||||
| The recovery period limits congestion window reduction to once per | The recovery period limits congestion window reduction to once per | |||
| round trip. During recovery, the congestion window remains unchanged | round trip. During recovery, the congestion window remains unchanged | |||
| irrespective of new losses or increases in the ECN-CE counter. | irrespective of new losses or increases in the ECN-CE counter. | |||
| 5.5. Tail Loss Probe | 7.5. Probe Timeout | |||
| A TLP packet MUST NOT be blocked by the sender's congestion | ||||
| controller. The sender MUST however count these bytes as additional | ||||
| bytes-in-flight, since a TLP adds network load without establishing | ||||
| packet loss. | ||||
| Acknowledgement or loss of tail loss probes are treated like any | ||||
| other packet. | ||||
| 5.6. Retransmission Timeout | Probe packets MUST NOT be blocked by the congestion controller. A | |||
| sender MUST however count these packets as being additionally in | ||||
| flight, since these packets adds network load without establishing | ||||
| packet loss. Note that sending probe packets might cause the | ||||
| sender's bytes in flight to exceed the congestion window until an | ||||
| acknowledgement is received that establishes loss or delivery of | ||||
| packets. | ||||
| When retransmissions are sent due to a retransmission timeout timer, | If a threshold number of consecutive PTOs have occurred (pto_count is | |||
| no change is made to the congestion window until the next | more than kPersistentCongestionThreshold, see Section 7.9.1), the | |||
| acknowledgement arrives. The retransmission timeout is considered | network is considered to be experiencing persistent congestion, and | |||
| spurious when this acknowledgement acknowledges packets sent prior to | the sender's congestion window MUST be reduced to the minimum | |||
| the first retransmission timeout. The retransmission timeout is | congestion window. | |||
| considered valid when this acknowledgement acknowledges no packets | ||||
| sent prior to the first retransmission timeout. In this case, the | ||||
| congestion window MUST be reduced to the minimum congestion window | ||||
| and slow start is re-entered. | ||||
| 5.7. Pacing | 7.6. Pacing | |||
| This document does not specify a pacer, but it is RECOMMENDED that a | This document does not specify a pacer, but it is RECOMMENDED that a | |||
| sender pace sending of all in-flight packets based on input from the | sender pace sending of all in-flight packets based on input from the | |||
| congestion controller. For example, a pacer might distribute the | congestion controller. For example, a pacer might distribute the | |||
| congestion window over the SRTT when used with a window-based | congestion window over the SRTT when used with a window-based | |||
| controller, and a pacer might use the rate estimate of a rate-based | controller, and a pacer might use the rate estimate of a rate-based | |||
| controller. | controller. | |||
| An implementation should take care to architect its congestion | An implementation should take care to architect its congestion | |||
| controller to work well with a pacer. For instance, a pacer might | controller to work well with a pacer. For instance, a pacer might | |||
| skipping to change at page 25, line 5 ¶ | skipping to change at page 24, line 9 ¶ | |||
| congestion window, or a pacer might pace out packets handed to it by | congestion window, or a pacer might pace out packets handed to it by | |||
| the congestion controller. Timely delivery of ACK frames is | the congestion controller. Timely delivery of ACK frames is | |||
| important for efficient loss recovery. Packets containing only ACK | important for efficient loss recovery. Packets containing only ACK | |||
| frames should therefore not be paced, to avoid delaying their | frames should therefore not be paced, to avoid delaying their | |||
| delivery to the peer. | delivery to the peer. | |||
| As an example of a well-known and publicly available implementation | As an example of a well-known and publicly available implementation | |||
| of a flow pacer, implementers are referred to the Fair Queue packet | of a flow pacer, implementers are referred to the Fair Queue packet | |||
| scheduler (fq qdisc) in Linux (3.11 onwards). | scheduler (fq qdisc) in Linux (3.11 onwards). | |||
| 5.8. Pseudocode | 7.7. Sending data after an idle period | |||
| 5.8.1. Constants of interest | A sender becomes idle if it ceases to send data and has no bytes in | |||
| flight. A sender's congestion window MUST not increase while it is | ||||
| idle. | ||||
| When sending data after becoming idle, a sender MUST reset its | ||||
| congestion window to the initial congestion window (see Section 4.1 | ||||
| of [RFC5681]), unless it paces the sending of packets. A sender MAY | ||||
| retain its congestion window if it paces the sending of any packets | ||||
| in excess of the initial congestion window. | ||||
| A sender MAY implement alternate mechanisms to update its congestion | ||||
| window after idle periods, such as those proposed for TCP in | ||||
| [RFC7661]. | ||||
| 7.8. Discarding Packet Number Space State | ||||
| When keys for an packet number space are discarded, any packets sent | ||||
| with those keys are removed from the count of bytes in flight. No | ||||
| loss events will occur any in-flight packets from that space, as a | ||||
| result of discarding loss recovery state (see Section 6.2.1.2). Note | ||||
| that it is expected that keys are discarded after those packets would | ||||
| be declared lost, but Initial secrets are destroyed earlier. | ||||
| 7.9. Pseudocode | ||||
| 7.9.1. Constants of interest | ||||
| Constants used in congestion control are based on a combination of | Constants used in congestion control are based on a combination of | |||
| RFCs, papers, and common practice. Some may need to be changed or | RFCs, papers, and common practice. Some may need to be changed or | |||
| negotiated in order to better suit a variety of environments. | negotiated in order to better suit a variety of environments. | |||
| kMaxDatagramSize: The sender's maximum payload size. Does not | kMaxDatagramSize: The sender's maximum payload size. Does not | |||
| include UDP or IP overhead. The max packet size is used for | include UDP or IP overhead. The max packet size is used for | |||
| calculating initial and minimum congestion windows. The | calculating initial and minimum congestion windows. The | |||
| RECOMMENDED value is 1200 bytes. | RECOMMENDED value is 1200 bytes. | |||
| kInitialWindow: Default limit on the initial amount of outstanding | kInitialWindow: Default limit on the initial amount of data in | |||
| data in bytes. Taken from [RFC6928]. The RECOMMENDED value is | flight, in bytes. Taken from [RFC6928]. The RECOMMENDED value is | |||
| the minimum of 10 * kMaxDatagramSize and max(2* kMaxDatagramSize, | the minimum of 10 * kMaxDatagramSize and max(2* kMaxDatagramSize, | |||
| 14600)). | 14600)). | |||
| kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED | kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED | |||
| value is 2 * kMaxDatagramSize. | value is 2 * kMaxDatagramSize. | |||
| kLossReductionFactor: Reduction in congestion window when a new loss | kLossReductionFactor: Reduction in congestion window when a new loss | |||
| event is detected. The RECOMMENDED value is 0.5. | event is detected. The RECOMMENDED value is 0.5. | |||
| 5.8.2. Variables of interest | kPersistentCongestionThreshold: Number of consecutive PTOs after | |||
| which network is considered to be experiencing persistent | ||||
| congestion. The rationale for this threshold is to enable a | ||||
| sender to use initial PTOs for aggressive probing, similar to Tail | ||||
| Loss Probe (TLP) in TCP [TLP] [RACK]. Once the number of | ||||
| consecutive PTOs reaches this threshold - that is, persistent | ||||
| congestion is established - the sender responds by collapsing its | ||||
| congestion window to kMinimumWindow, similar to a Retransmission | ||||
| Timeout (RTO) in TCP [RFC5681]. The RECOMMENDED value for | ||||
| kPersistentCongestionThreshold is 2, which is equivalent to having | ||||
| two TLPs before an RTO in TCP. | ||||
| 7.9.2. Variables of interest | ||||
| Variables required to implement the congestion control mechanisms are | Variables required to implement the congestion control mechanisms are | |||
| described in this section. | described in this section. | |||
| ecn_ce_counter: The highest value reported for the ECN-CE counter by | ecn_ce_counter: The highest value reported for the ECN-CE counter by | |||
| the peer in an ACK frame. This variable is used to detect | the peer in an ACK frame. This variable is used to detect | |||
| increases in the reported ECN-CE counter. | increases in the reported ECN-CE counter. | |||
| bytes_in_flight: The sum of the size in bytes of all sent packets | bytes_in_flight: The sum of the size in bytes of all sent packets | |||
| that contain at least one retransmittable or PADDING frame, and | that contain at least one ack-eliciting or PADDING frame, and have | |||
| have not been acked or declared lost. The size does not include | not been acked or declared lost. The size does not include IP or | |||
| IP or UDP overhead, but does include the QUIC header and AEAD | UDP overhead, but does include the QUIC header and AEAD overhead. | |||
| overhead. Packets only containing ACK frames do not count towards | Packets only containing ACK frames do not count towards | |||
| bytes_in_flight to ensure congestion control does not impede | bytes_in_flight to ensure congestion control does not impede | |||
| congestion feedback. | congestion feedback. | |||
| congestion_window: Maximum number of bytes-in-flight that may be | congestion_window: Maximum number of bytes-in-flight that may be | |||
| sent. | sent. | |||
| end_of_recovery: The largest packet number sent when QUIC detects a | recovery_start_time: The time when QUIC first detects a loss, | |||
| loss. When a larger packet is acknowledged, QUIC exits recovery. | causing it to enter recovery. When a packet sent after this time | |||
| is acknowledged, QUIC exits recovery. | ||||
| ssthresh: Slow start threshold in bytes. When the congestion window | ssthresh: Slow start threshold in bytes. When the congestion window | |||
| is below ssthresh, the mode is slow start and the window grows by | is below ssthresh, the mode is slow start and the window grows by | |||
| the number of bytes acknowledged. | the number of bytes acknowledged. | |||
| 5.8.3. Initialization | 7.9.3. Initialization | |||
| At the beginning of the connection, initialize the congestion control | At the beginning of the connection, initialize the congestion control | |||
| variables as follows: | variables as follows: | |||
| congestion_window = kInitialWindow | congestion_window = kInitialWindow | |||
| bytes_in_flight = 0 | bytes_in_flight = 0 | |||
| end_of_recovery = 0 | recovery_start_time = 0 | |||
| ssthresh = infinite | ssthresh = infinite | |||
| ecn_ce_counter = 0 | ecn_ce_counter = 0 | |||
| 5.8.4. On Packet Sent | 7.9.4. On Packet Sent | |||
| Whenever a packet is sent, and it contains non-ACK frames, the packet | Whenever a packet is sent, and it contains non-ACK frames, the packet | |||
| increases bytes_in_flight. | increases bytes_in_flight. | |||
| OnPacketSentCC(bytes_sent): | OnPacketSentCC(bytes_sent): | |||
| bytes_in_flight += bytes_sent | bytes_in_flight += bytes_sent | |||
| 5.8.5. On Packet Acknowledgement | 7.9.5. On Packet Acknowledgement | |||
| Invoked from loss detection's OnPacketAcked and is supplied with | Invoked from loss detection's OnPacketAcked and is supplied with the | |||
| acked_packet from sent_packets. | acked_packet from sent_packets. | |||
| InRecovery(packet_number): | InRecovery(sent_time): | |||
| return packet_number <= end_of_recovery | return sent_time <= recovery_start_time | |||
| OnPacketAckedCC(acked_packet): | OnPacketAckedCC(acked_packet): | |||
| // Remove from bytes_in_flight. | // Remove from bytes_in_flight. | |||
| bytes_in_flight -= acked_packet.bytes | bytes_in_flight -= acked_packet.size | |||
| if (InRecovery(acked_packet.packet_number)): | if (InRecovery(acked_packet.time_sent)): | |||
| // Do not increase congestion window in recovery period. | // Do not increase congestion window in recovery period. | |||
| return | return | |||
| if (congestion_window < ssthresh): | if (congestion_window < ssthresh): | |||
| // Slow start. | // Slow start. | |||
| congestion_window += acked_packet.bytes | congestion_window += acked_packet.size | |||
| else: | else: | |||
| // Congestion avoidance. | // Congestion avoidance. | |||
| congestion_window += kMaxDatagramSize * acked_packet.bytes | congestion_window += kMaxDatagramSize * acked_packet.size | |||
| / congestion_window | / congestion_window | |||
| 5.8.6. On New Congestion Event | 7.9.6. On New Congestion Event | |||
| Invoked from ProcessECN and OnPacketsLost when a new congestion event | Invoked from ProcessECN and OnPacketsLost when a new congestion event | |||
| is detected. Starts a new recovery period and reduces the congestion | is detected. May start a new recovery period and reduces the | |||
| window. | congestion window. | |||
| CongestionEvent(packet_number): | CongestionEvent(sent_time): | |||
| // Start a new congestion event if packet_number | // Start a new congestion event if the sent time is larger | |||
| // is larger than the end of the previous recovery epoch. | // than the start time of the previous recovery epoch. | |||
| if (!InRecovery(packet_number)): | if (!InRecovery(sent_time)): | |||
| end_of_recovery = largest_sent_packet | recovery_start_time = Now() | |||
| congestion_window *= kLossReductionFactor | congestion_window *= kLossReductionFactor | |||
| congestion_window = max(congestion_window, kMinimumWindow) | congestion_window = max(congestion_window, kMinimumWindow) | |||
| ssthresh = congestion_window | ssthresh = congestion_window | |||
| // Collapse congestion window if persistent congestion | ||||
| if (pto_count > kPersistentCongestionThreshold): | ||||
| congestion_window = kMinimumWindow | ||||
| 5.8.7. Process ECN Information | 7.9.7. Process ECN Information | |||
| Invoked when an ACK frame with an ECN section is received from the | Invoked when an ACK frame with an ECN section is received from the | |||
| peer. | peer. | |||
| ProcessECN(ack): | ProcessECN(ack): | |||
| // If the ECN-CE counter reported by the peer has increased, | // If the ECN-CE counter reported by the peer has increased, | |||
| // this could be a new congestion event. | // this could be a new congestion event. | |||
| if (ack.ce_counter > ecn_ce_counter): | if (ack.ce_counter > ecn_ce_counter): | |||
| ecn_ce_counter = ack.ce_counter | ecn_ce_counter = ack.ce_counter | |||
| // Start a new congestion event if the last acknowledged | // Start a new congestion event if the last acknowledged | |||
| // packet is past the end of the previous recovery epoch. | // packet was sent after the start of the previous | |||
| CongestionEvent(ack.largest_acked_packet) | // recovery epoch. | |||
| CongestionEvent(sent_packets[ack.largest_acked].time_sent) | ||||
| 5.8.8. On Packets Lost | 7.9.8. On Packets Lost | |||
| Invoked by loss detection from DetectLostPackets when new packets are | Invoked by loss detection from DetectLostPackets when new packets are | |||
| detected lost. | detected lost. | |||
| OnPacketsLost(lost_packets): | OnPacketsLost(lost_packets): | |||
| // Remove lost packets from bytes_in_flight. | // Remove lost packets from bytes_in_flight. | |||
| for (lost_packet : lost_packets): | for (lost_packet : lost_packets): | |||
| bytes_in_flight -= lost_packet.bytes | bytes_in_flight -= lost_packet.size | |||
| largest_lost_packet = lost_packets.last() | largest_lost_packet = lost_packets.last() | |||
| // Start a new congestion epoch if the last lost packet | // Start a new congestion epoch if the last lost packet | |||
| // is past the end of the previous recovery epoch. | // is past the end of the previous recovery epoch. | |||
| CongestionEvent(largest_lost_packet.packet_number) | CongestionEvent(largest_lost_packet.time_sent) | |||
| 5.8.9. On Retransmission Timeout Verified | ||||
| QUIC decreases the congestion window to the minimum value once the | ||||
| retransmission timeout has been verified and removes any packets sent | ||||
| before the newly acknowledged RTO packet. | ||||
| OnRetransmissionTimeoutVerified(packet_number) | ||||
| congestion_window = kMinimumWindow | ||||
| // Declare all packets prior to packet_number lost. | ||||
| for (sent_packet: sent_packets): | ||||
| if (sent_packet.packet_number < packet_number): | ||||
| bytes_in_flight -= sent_packet.bytes | ||||
| sent_packets.remove(sent_packet.packet_number) | ||||
| 6. Security Considerations | ||||
| 6.1. Congestion Signals | 8. Security Considerations | |||
| 8.1. Congestion Signals | ||||
| Congestion control fundamentally involves the consumption of signals | Congestion control fundamentally involves the consumption of signals | |||
| - both loss and ECN codepoints - from unauthenticated entities. On- | - both loss and ECN codepoints - from unauthenticated entities. On- | |||
| path attackers can spoof or alter these signals. An attacker can | path attackers can spoof or alter these signals. An attacker can | |||
| cause endpoints to reduce their sending rate by dropping packets, or | cause endpoints to reduce their sending rate by dropping packets, or | |||
| alter send rate by changing ECN codepoints. | alter send rate by changing ECN codepoints. | |||
| 6.2. Traffic Analysis | 8.2. Traffic Analysis | |||
| Packets that carry only ACK frames can be heuristically identified by | Packets that carry only ACK frames can be heuristically identified by | |||
| observing packet size. Acknowledgement patterns may expose | observing packet size. Acknowledgement patterns may expose | |||
| information about link characteristics or application behavior. | information about link characteristics or application behavior. | |||
| Endpoints can use PADDING frames or bundle acknowledgments with other | Endpoints can use PADDING frames or bundle acknowledgments with other | |||
| frames to reduce leaked information. | frames to reduce leaked information. | |||
| 6.3. Misreporting ECN Markings | 8.3. Misreporting ECN Markings | |||
| A receiver can misreport ECN markings to alter the congestion | A receiver can misreport ECN markings to alter the congestion | |||
| response of a sender. Suppressing reports of ECN-CE markings could | response of a sender. Suppressing reports of ECN-CE markings could | |||
| cause a sender to increase their send rate. This increase could | cause a sender to increase their send rate. This increase could | |||
| result in congestion and loss. | result in congestion and loss. | |||
| A sender MAY attempt to detect suppression of reports by marking | A sender MAY attempt to detect suppression of reports by marking | |||
| occasional packets that they send with ECN-CE. If a packet marked | occasional packets that they send with ECN-CE. If a packet marked | |||
| with ECN-CE is not reported as having been marked when the packet is | with ECN-CE is not reported as having been marked when the packet is | |||
| acknowledged, the sender SHOULD then disable ECN for that path. | acknowledged, the sender SHOULD then disable ECN for that path. | |||
| skipping to change at page 29, line 11 ¶ | skipping to change at page 28, line 43 ¶ | |||
| their sending rate, which is similar in effect to advertising reduced | their sending rate, which is similar in effect to advertising reduced | |||
| connection flow control limits and so no advantage is gained by doing | connection flow control limits and so no advantage is gained by doing | |||
| so. | so. | |||
| Endpoints choose the congestion controller that they use. Though | Endpoints choose the congestion controller that they use. Though | |||
| congestion controllers generally treat reports of ECN-CE markings as | congestion controllers generally treat reports of ECN-CE markings as | |||
| equivalent to loss [RFC8311], the exact response for each controller | equivalent to loss [RFC8311], the exact response for each controller | |||
| could be different. Failure to correctly respond to information | could be different. Failure to correctly respond to information | |||
| about ECN markings is therefore difficult to detect. | about ECN markings is therefore difficult to detect. | |||
| 7. IANA Considerations | 9. IANA Considerations | |||
| This document has no IANA actions. Yet. | This document has no IANA actions. Yet. | |||
| 8. References | 10. References | |||
| 10.1. Normative References | ||||
| 8.1. Normative References | ||||
| [QUIC-TRANSPORT] | [QUIC-TRANSPORT] | |||
| Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | |||
| Multiplexed and Secure Transport", draft-ietf-quic- | Multiplexed and Secure Transport", draft-ietf-quic- | |||
| transport-16 (work in progress), October 2018. | transport-17 (work in progress), December 2018. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | |||
| Notification (ECN) Experimentation", RFC 8311, | Notification (ECN) Experimentation", RFC 8311, | |||
| DOI 10.17487/RFC8311, January 2018, | DOI 10.17487/RFC8311, January 2018, | |||
| <https://www.rfc-editor.org/info/rfc8311>. | <https://www.rfc-editor.org/info/rfc8311>. | |||
| 8.2. Informative References | 10.2. Informative References | |||
| [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: | ||||
| Refining TCP Congestion Control", ACM SIGCOMM , August | ||||
| 1996. | ||||
| [RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: | ||||
| a time-based fast loss detection algorithm for TCP", | ||||
| draft-ietf-tcpm-rack-04 (work in progress), July 2018. | ||||
| [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | |||
| Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | |||
| 2003, <https://www.rfc-editor.org/info/rfc3465>. | 2003, <https://www.rfc-editor.org/info/rfc3465>. | |||
| [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | |||
| "Improving the Robustness of TCP to Non-Congestion | "Improving the Robustness of TCP to Non-Congestion | |||
| Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | |||
| <https://www.rfc-editor.org/info/rfc4653>. | <https://www.rfc-editor.org/info/rfc4653>. | |||
| skipping to change at page 30, line 38 ¶ | skipping to change at page 30, line 38 ¶ | |||
| and Y. Nishida, "A Conservative Loss Recovery Algorithm | and Y. Nishida, "A Conservative Loss Recovery Algorithm | |||
| Based on Selective Acknowledgment (SACK) for TCP", | Based on Selective Acknowledgment (SACK) for TCP", | |||
| RFC 6675, DOI 10.17487/RFC6675, August 2012, | RFC 6675, DOI 10.17487/RFC6675, August 2012, | |||
| <https://www.rfc-editor.org/info/rfc6675>. | <https://www.rfc-editor.org/info/rfc6675>. | |||
| [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, | [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, | |||
| "Increasing TCP's Initial Window", RFC 6928, | "Increasing TCP's Initial Window", RFC 6928, | |||
| DOI 10.17487/RFC6928, April 2013, | DOI 10.17487/RFC6928, April 2013, | |||
| <https://www.rfc-editor.org/info/rfc6928>. | <https://www.rfc-editor.org/info/rfc6928>. | |||
| [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating | ||||
| TCP to Support Rate-Limited Traffic", RFC 7661, | ||||
| DOI 10.17487/RFC7661, October 2015, | ||||
| <https://www.rfc-editor.org/info/rfc7661>. | ||||
| [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | ||||
| R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | ||||
| RFC 8312, DOI 10.17487/RFC8312, February 2018, | ||||
| <https://www.rfc-editor.org/info/rfc8312>. | ||||
| [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | |||
| "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | |||
| Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | |||
| in progress), February 2013. | in progress), February 2013. | |||
| 8.3. URIs | 10.3. URIs | |||
| [1] https://mailarchive.ietf.org/arch/search/?email_list=quic | [1] https://mailarchive.ietf.org/arch/search/?email_list=quic | |||
| [2] https://github.com/quicwg | [2] https://github.com/quicwg | |||
| [3] https://github.com/quicwg/base-drafts/labels/-recovery | [3] https://github.com/quicwg/base-drafts/labels/-recovery | |||
| Appendix A. Change Log | Appendix A. Change Log | |||
| *RFC Editor's Note:* Please remove this section prior to | *RFC Editor's Note:* Please remove this section prior to | |||
| publication of a final version of this document. | publication of a final version of this document. | |||
| A.1. Since draft-ietf-quic-recovery-14 | Issue and pull request numbers are listed with a leading octothorp. | |||
| A.1. Since draft-ietf-quic-recovery-16 | ||||
| o Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP | ||||
| and min crypto timeouts; eliminate timeout validation (#2114, | ||||
| #2166, #2168, #1017) | ||||
| o Redefine how congestion avoidance in terms of when the period | ||||
| starts (#1928, #1930) | ||||
| o Document what needs to be tracked for packets that are in flight | ||||
| (#765, #1724, #1939) | ||||
| o Integrate both time and packet thresholds into loss detection | ||||
| (#1969, #1212, #934, #1974) | ||||
| o Reduce congestion window after idle, unless pacing is used (#2007, | ||||
| #2023) | ||||
| o Disable RTT calculation for packets that don't elicit | ||||
| acknowledgment (#2060, #2078) | ||||
| o Limit ack_delay by max_ack_delay (#2060, #2099) | ||||
| o Initial keys are discarded once Handshake are avaialble (#1951, | ||||
| #2045) | ||||
| o Reorder ECN and loss detection in pseudocode (#2142) | ||||
| o Only cancel loss detection timer if ack-eliciting packets are in | ||||
| flight (#2093, #2117) | ||||
| A.2. Since draft-ietf-quic-recovery-14 | ||||
| o Used max_ack_delay from transport params (#1796, #1782) | o Used max_ack_delay from transport params (#1796, #1782) | |||
| o Merge ACK and ACK_ECN (#1783) | o Merge ACK and ACK_ECN (#1783) | |||
| A.2. Since draft-ietf-quic-recovery-13 | A.3. Since draft-ietf-quic-recovery-13 | |||
| o Corrected the lack of ssthresh reduction in CongestionEvent | o Corrected the lack of ssthresh reduction in CongestionEvent | |||
| pseudocode (#1598) | pseudocode (#1598) | |||
| o Considerations for ECN spoofing (#1426, #1626) | o Considerations for ECN spoofing (#1426, #1626) | |||
| o Clarifications for PADDING and congestion control (#837, #838, | o Clarifications for PADDING and congestion control (#837, #838, | |||
| #1517, #1531, #1540) | #1517, #1531, #1540) | |||
| o Reduce early retransmission timer to RTT/8 (#945, #1581) | o Reduce early retransmission timer to RTT/8 (#945, #1581) | |||
| o Packets are declared lost after an RTO is verified (#935, #1582) | o Packets are declared lost after an RTO is verified (#935, #1582) | |||
| A.3. Since draft-ietf-quic-recovery-12 | A.4. Since draft-ietf-quic-recovery-12 | |||
| o Changes to manage separate packet number spaces and encryption | o Changes to manage separate packet number spaces and encryption | |||
| levels (#1190, #1242, #1413, #1450) | levels (#1190, #1242, #1413, #1450) | |||
| o Added ECN feedback mechanisms and handling; new ACK_ECN frame | o Added ECN feedback mechanisms and handling; new ACK_ECN frame | |||
| (#804, #805, #1372) | (#804, #805, #1372) | |||
| A.4. Since draft-ietf-quic-recovery-11 | A.5. Since draft-ietf-quic-recovery-11 | |||
| No significant changes. | No significant changes. | |||
| A.5. Since draft-ietf-quic-recovery-10 | A.6. Since draft-ietf-quic-recovery-10 | |||
| o Improved text on ack generation (#1139, #1159) | o Improved text on ack generation (#1139, #1159) | |||
| o Make references to TCP recovery mechanisms informational (#1195) | o Make references to TCP recovery mechanisms informational (#1195) | |||
| o Define time_of_last_sent_handshake_packet (#1171) | o Define time_of_last_sent_handshake_packet (#1171) | |||
| o Added signal from TLS the data it includes needs to be sent in a | o Added signal from TLS the data it includes needs to be sent in a | |||
| Retry packet (#1061, #1199) | Retry packet (#1061, #1199) | |||
| o Minimum RTT (min_rtt) is initialized with an infinite value | o Minimum RTT (min_rtt) is initialized with an infinite value | |||
| (#1169) | (#1169) | |||
| A.6. Since draft-ietf-quic-recovery-09 | A.7. Since draft-ietf-quic-recovery-09 | |||
| No significant changes. | No significant changes. | |||
| A.7. Since draft-ietf-quic-recovery-08 | A.8. Since draft-ietf-quic-recovery-08 | |||
| o Clarified pacing and RTO (#967, #977) | o Clarified pacing and RTO (#967, #977) | |||
| A.8. Since draft-ietf-quic-recovery-07 | A.9. Since draft-ietf-quic-recovery-07 | |||
| o Include Ack Delay in RTO(and TLP) computations (#981) | o Include Ack Delay in RTO(and TLP) computations (#981) | |||
| o Ack Delay in SRTT computation (#961) | o Ack Delay in SRTT computation (#961) | |||
| o Default RTT and Slow Start (#590) | o Default RTT and Slow Start (#590) | |||
| o Many editorial fixes. | o Many editorial fixes. | |||
| A.9. Since draft-ietf-quic-recovery-06 | A.10. Since draft-ietf-quic-recovery-06 | |||
| No significant changes. | No significant changes. | |||
| A.10. Since draft-ietf-quic-recovery-05 | A.11. Since draft-ietf-quic-recovery-05 | |||
| o Add more congestion control text (#776) | o Add more congestion control text (#776) | |||
| A.11. Since draft-ietf-quic-recovery-04 | A.12. Since draft-ietf-quic-recovery-04 | |||
| No significant changes. | No significant changes. | |||
| A.12. Since draft-ietf-quic-recovery-03 | A.13. Since draft-ietf-quic-recovery-03 | |||
| No significant changes. | No significant changes. | |||
| A.13. Since draft-ietf-quic-recovery-02 | A.14. Since draft-ietf-quic-recovery-02 | |||
| o Integrate F-RTO (#544, #409) | o Integrate F-RTO (#544, #409) | |||
| o Add congestion control (#545, #395) | o Add congestion control (#545, #395) | |||
| o Require connection abort if a skipped packet was acknowledged | o Require connection abort if a skipped packet was acknowledged | |||
| (#415) | (#415) | |||
| o Simplify RTO calculations (#142, #417) | o Simplify RTO calculations (#142, #417) | |||
| A.14. Since draft-ietf-quic-recovery-01 | A.15. Since draft-ietf-quic-recovery-01 | |||
| o Overview added to loss detection | o Overview added to loss detection | |||
| o Changes initial default RTT to 100ms | o Changes initial default RTT to 100ms | |||
| o Added time-based loss detection and fixes early retransmit | o Added time-based loss detection and fixes early retransmit | |||
| o Clarified loss recovery for handshake packets | o Clarified loss recovery for handshake packets | |||
| o Fixed references and made TCP references informative | o Fixed references and made TCP references informative | |||
| A.15. Since draft-ietf-quic-recovery-00 | A.16. Since draft-ietf-quic-recovery-00 | |||
| o Improved description of constants and ACK behavior | o Improved description of constants and ACK behavior | |||
| A.16. Since draft-iyengar-quic-loss-recovery-01 | A.17. Since draft-iyengar-quic-loss-recovery-01 | |||
| o Adopted as base for draft-ietf-quic-recovery | o Adopted as base for draft-ietf-quic-recovery | |||
| o Updated authors/editors list | o Updated authors/editors list | |||
| o Added table of contents | o Added table of contents | |||
| Acknowledgments | Acknowledgments | |||
| Authors' Addresses | Authors' Addresses | |||
| End of changes. 190 change blocks. | ||||
| 631 lines changed or deleted | 655 lines changed or added | |||
This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||