idnits 2.17.1 draft-ietf-rtcweb-security-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 21, 2011) is 4153 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: draft-ietf-hybi-thewebsocketprotocol has been published as RFC 6455 -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4347 (Obsoleted by RFC 6347) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTC-Web E. Rescorla 3 Internet-Draft RTFM, Inc. 4 Intended status: Standards Track September 21, 2011 5 Expires: March 24, 2012 7 Security Considerations for RTC-Web 8 draft-ietf-rtcweb-security-00 10 Abstract 12 The Real-Time Communications on the Web (RTC-Web) working group is 13 tasked with standardizing protocols for real-time communications 14 between Web browsers. The major use cases for RTC-Web technology are 15 real-time audio and/or video calls, Web conferencing, and direct data 16 transfer. Unlike most conventional real-time systems (e.g., SIP- 17 based soft phones) RTC-Web communications are directly controlled by 18 some Web server, which poses new security challenges. For instance, 19 a Web browser might expose a JavaScript API which allows a server to 20 place a video call. Unrestricted access to such an API would allow 21 any site which a user visited to "bug" a user's computer, capturing 22 any activity which passed in front of their camera. This document 23 defines the RTC-Web threat model and defines an architecture which 24 provides security within that threat model. 26 Legal 28 THIS DOCUMENT AND THE INFORMATION CONTAINED THEREIN ARE PROVIDED ON 29 AN "AS IS" BASIS AND THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 30 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 31 IETF TRUST, AND THE INTERNET ENGINEERING TASK FORCE, DISCLAIM ALL 32 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 33 WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE 34 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 35 FOR A PARTICULAR PURPOSE. 37 Status of this Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at http://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on March 24, 2012. 54 Copyright Notice 56 Copyright (c) 2011 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 This document may contain material from IETF Documents or IETF 70 Contributions published or made publicly available before November 71 10, 2008. The person(s) controlling the copyright in some of this 72 material may not have granted the IETF Trust the right to allow 73 modifications of such material outside the IETF Standards Process. 74 Without obtaining an adequate license from the person(s) controlling 75 the copyright in such materials, this document may not be modified 76 outside the IETF Standards Process, and derivative works of it may 77 not be created outside the IETF Standards Process, except to format 78 it for publication as an RFC or to translate it into languages other 79 than English. 81 Table of Contents 83 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 84 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 85 3. The Browser Threat Model . . . . . . . . . . . . . . . . . . . 5 86 3.1. Access to Local Resources . . . . . . . . . . . . . . . . 6 87 3.2. Same Origin Policy . . . . . . . . . . . . . . . . . . . . 6 88 3.3. Bypassing SOP: CORS, WebSockets, and consent to 89 communicate . . . . . . . . . . . . . . . . . . . . . . . 7 90 4. Security for RTC-Web Applications . . . . . . . . . . . . . . 7 91 4.1. Access to Local Devices . . . . . . . . . . . . . . . . . 7 92 4.1.1. Calling Scenarios and User Expectations . . . . . . . 8 93 4.1.1.1. Dedicated Calling Services . . . . . . . . . . . . 8 94 4.1.1.2. Calling the Site You're On . . . . . . . . . . . . 8 95 4.1.1.3. Calling to an Ad Target . . . . . . . . . . . . . 9 96 4.1.2. Origin-Based Security . . . . . . . . . . . . . . . . 9 97 4.1.3. Security Properties of the Calling Page . . . . . . . 11 98 4.2. Communications Consent Verification . . . . . . . . . . . 12 99 4.2.1. ICE . . . . . . . . . . . . . . . . . . . . . . . . . 12 100 4.2.2. Masking . . . . . . . . . . . . . . . . . . . . . . . 12 101 4.2.3. Backward Compatibility . . . . . . . . . . . . . . . . 13 102 4.3. Communications Security . . . . . . . . . . . . . . . . . 14 103 4.3.1. Protecting Against Retrospective Compromise . . . . . 15 104 4.3.2. Protecting Against During-Call Attack . . . . . . . . 15 105 4.3.2.1. Key Continuity . . . . . . . . . . . . . . . . . . 16 106 4.3.2.2. Short Authentication Strings . . . . . . . . . . . 16 107 4.3.2.3. Recommendations . . . . . . . . . . . . . . . . . 17 108 5. Security Considerations . . . . . . . . . . . . . . . . . . . 17 109 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 110 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 111 7.1. Normative References . . . . . . . . . . . . . . . . . . . 18 112 7.2. Informative References . . . . . . . . . . . . . . . . . . 18 113 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20 115 1. Introduction 117 The Real-Time Communications on the Web (RTC-Web) working group is 118 tasked with standardizing protocols for real-time communications 119 between Web browsers. The major use cases for RTC-Web technology are 120 real-time audio and/or video calls, Web conferencing, and direct data 121 transfer. Unlike most conventional real-time systems, (e.g., SIP- 122 based[RFC3261] soft phones) RTC-Web communications are directly 123 controlled by some Web server. A simple case is shown below. 125 +----------------+ 126 | | 127 | Web Server | 128 | | 129 +----------------+ 130 ^ ^ 131 / \ 132 HTTP / \ HTTP 133 / \ 134 / \ 135 v v 136 JS API JS API 137 +-----------+ +-----------+ 138 | | Media | | 139 | Browser |<---------->| Browser | 140 | | | | 141 +-----------+ +-----------+ 143 Figure 1: A simple RTC-Web system 145 In the system shown in Figure 1, Alice and Bob both have RTC-Web 146 enabled browsers and they visit some Web server which operates a 147 calling service. Each of their browsers exposes standardized 148 JavaScript calling APIs which are used by the Web server to set up a 149 call between Alice and Bob. While this system is topologically 150 similar to a conventional SIP-based system (with the Web server 151 acting as the signaling service and browsers acting as softphones), 152 control has moved to the central Web server; the browser simply 153 provides API points that are used by the calling service. As with 154 any Web application, the Web server can move logic between the server 155 and JavaScript in the browser, but regardless of where the code is 156 executing, it is ultimately under control of the server. 158 It should be immediately apparent that this type of system poses new 159 security challenges beyond those of a conventional VoIP system. In 160 particular, it needs to contend with malicious calling services. For 161 example, if the calling service can cause the browser to make a call 162 at any time to any callee of its choice, then this facility can be 163 used to bug a user's computer without their knowledge, simply by 164 placing a call to some recording service. More subtly, if the 165 exposed APIs allow the server to instruct the browser to send 166 arbitrary content, then they can be used to bypass firewalls or mount 167 denial of service attacks. Any successful system will need to be 168 resistant to this and other attacks. 170 2. Terminology 172 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 173 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 174 document are to be interpreted as described in RFC 2119 [RFC2119]. 176 3. The Browser Threat Model 178 The security requirements for RTC-Web follow directly from the 179 requirement that the browser's job is to protect the user. Huang et 180 al. [huang-w2sp] summarize the core browser security guarantee as: 182 Users can safely visit arbitrary web sites and execute scripts 183 provided by those sites. 185 It is important to realize that this includes sites hosting arbitrary 186 malicious scripts. The motivation for this requirement is simple: 187 it is trivial for attackers to divert users to sites of their choice. 188 For instance, an attacker can purchase display advertisements which 189 direct the user (either automatically or via user clicking) to their 190 site, at which point the browser will execute the attacker's scripts. 191 Thus, it is important that it be safe to view arbitrarily malicious 192 pages. Of course, browsers inevitably have bugs which cause them to 193 fall short of this goal, but any new RTC-Web functionality must be 194 designed with the intent to meet this standard. The remainder of 195 this section provides more background on the existing Web security 196 model. 198 In this model, then, the browser acts as a TRUSTED COMPUTING BASE 199 (TCB) both from the user's perspective and to some extent from the 200 server's. While HTML and JS provided by the server can cause the 201 browser to execute a variety of actions, those scripts operate in a 202 sandbox that isolates them both from the user's computer and from 203 each other, as detailed below. 205 Conventionally, we refer to either WEB ATTACKERS, who are able to 206 induce you to visit their sites but do not control the network, and 207 NETWORK ATTACKERS, who are able to control your network. Network 208 attackers correspond to the [RFC3552] "Internet Threat Model". In 209 general, it is desirable to build a system which is secure against 210 both kinds of attackers, but realistically many sites do not run 211 HTTPS [RFC2818] and so our ability to defend against network 212 attackers is necessarily somewhat limited. Most of the rest of this 213 section is devoted to web attackers, with the assumption that 214 protection against network attackers is provided by running HTTPS. 216 3.1. Access to Local Resources 218 While the browser has access to local resources such as keying 219 material, files, the camera and the microphone, it strictly limits or 220 forbids web servers from accessing those same resources. For 221 instance, while it is possible to produce an HTML form which will 222 allow file upload, a script cannot do so without user consent and in 223 fact cannot even suggest a specific file (e.g., /etc/passwd); the 224 user must explicitly select the file and consent to its upload. 225 [Note: in many cases browsers are explicitly designed to avoid 226 dialogs with the semantics of "click here to screw yourself", as 227 extensive research shows that users are prone to consent under such 228 circumstances.] 230 Similarly, while Flash SWFs can access the camera and microphone, 231 they explicitly require that the user consent to that access. In 232 addition, some resources simply cannot be accessed from the browser 233 at all. For instance, there is no real way to run specific 234 executables directly from a script (though the user can of course be 235 induced to download executable files and run them). 237 3.2. Same Origin Policy 239 Many other resources are accessible but isolated. For instance, 240 while scripts are allowed to make HTTP requests via the 241 XMLHttpRequest() API those requests are not allowed to be made to any 242 server, but rather solely to the same ORIGIN from whence the script 243 came.[I-D.abarth-origin] (although CORS [CORS] and WebSockets 244 [I-D.ietf-hybi-thewebsocketprotocol] provides a escape hatch from 245 this restriction, as described below. This SAME ORIGIN POLICY (SOP) 246 prevents server A from mounting attacks on server B via the user's 247 browser, which protects both the user (e.g., from misuse of his 248 credentials) and the server (e.g., from DoS attack). 250 More generally, SOP forces scripts from each site to run in their 251 own, isolated, sandboxes. While there are techniques to allow them 252 to interact, those interactions generally must be mutually consensual 253 (by each site) and are limited to certain channels. For instance, 254 multiple pages/browser panes from the same origin can read each 255 other's JS variables, but pages from the different origins--or even 256 iframes from different origins on the same page--cannot. 258 3.3. Bypassing SOP: CORS, WebSockets, and consent to communicate 260 While SOP serves an important security function, it also makes it 261 inconvenient to write certain classes of applications. In 262 particular, mash-ups, in which a script from origin A uses resources 263 from origin B, can only be achieved via a certain amount of hackery. 264 The W3C Cross-Origin Resource Sharing (CORS) spec [CORS] is a 265 response to this demand. In CORS, when a script from origin A 266 executes what would otherwise be a forbidden cross-origin request, 267 the browser instead contacts the target server to determine whether 268 it is willing to allow cross-origin requests from A. If it is so 269 willing, the browser then allows the request. This consent 270 verification process is designed to safely allow cross-origin 271 requests. 273 While CORS is designed to allow cross-origin HTTP requests, 274 WebSockets [I-D.ietf-hybi-thewebsocketprotocol] allows cross-origin 275 establishment of transparent channels. Once a WebSockets connection 276 has been established from a script to a site, the script can exchange 277 any traffic it likes without being required to frame it as a series 278 of HTTP request/response transactions. As with CORS, a WebSockets 279 transaction starts with a consent verification stage to avoid 280 allowing scripts to simply send arbitrary data to another origin. 282 While consent verification is conceptually simple--just do a 283 handshake before you start exchanging the real data--experience has 284 shown that designing a correct consent verification system is 285 difficult. In particular, Huang et al. [huang-w2sp] have shown 286 vulnerabilities in the existing Java and Flash consent verification 287 techniques and in a simplified version of the WebSockets handshake. 288 In particular, it is important to be wary of CROSS-PROTOCOL attacks 289 in which the attacking script generates traffic which is acceptable 290 to some non-Web protocol state machine. In order to resist this form 291 of attack, WebSockets incorporates a masking technique intended to 292 randomize the bits on the wire, thus making it more difficult to 293 generate traffic which resembles a given protocol. 295 4. Security for RTC-Web Applications 297 4.1. Access to Local Devices 299 As discussed in Section 1, allowing arbitrary sites to initiate calls 300 violates the core Web security guarantee; without some access 301 restrictions on local devices, any malicious site could simply bug a 302 user. At minimum, then, it MUST NOT be possible for arbitrary sites 303 to initiate calls to arbitrary location without user consent. This 304 immediately raises the question, however, of what should be the scope 305 of user consent. 307 For the rest of this discussion we assume that the user is somehow 308 going to grant consent to some entity (e.g., a social networking 309 site) to initiate a call on his behalf. This consent may be limited 310 to a single call or may be a general consent. In order for the user 311 to make an intelligent decision about whether to allow a call (and 312 hence his camera and microphone input to be routed somewhere), he 313 must understand either who is request access, where the media is 314 going, or both. So, for instance, one might imagine that at the time 315 access to camera and microphone is requested, the user is shown a 316 dialog that says "site X has requested access to camera and 317 microphone, yes or no" (though note that this type of in-flow 318 interface violates one of the guidelines in Section Section 3). The 319 user's decision will of course be based on his opinion of Site X. 320 However, as discussed below, this is a complicated concept. 322 4.1.1. Calling Scenarios and User Expectations 324 While a large number of possible calling scenarios are possible, the 325 scenarios discussed in this section illustrate many of the 326 difficulties of identifying the relevant scope of consent. 328 4.1.1.1. Dedicated Calling Services 330 The first scenario we consider is a dedicated calling service. In 331 this case, the user has a relationship with a calling site and 332 repeatedly makes calls on it. It is likely that rather than having 333 to give permission for each call that the user will want to give the 334 calling service long-term access to the camera and microphone. This 335 is a natural fit for a long-term consent mechanism (e.g., installing 336 an app store "application" to indicate permission for the calling 337 service.) A variant of the dedicated calling service is a gaming 338 site (e.g., a poker site) which hosts a dedicated calling service to 339 allow players to call each other. 341 With any kind of service where the user may use the same service to 342 talk to many different people, there is a question about whether the 343 user can know who they are talking to. In general, this is difficult 344 as most of the user interface is presented by the calling site. 345 However, communications security mechanisms can be used to give some 346 assurance, as described in Section 4.3.2. 348 4.1.1.2. Calling the Site You're On 350 Another simple scenario is calling the site you're actually visiting. 351 The paradigmatic case here is the "click here to talk to a 352 representative" windows that appear on many shopping sites. In this 353 case, the user's expectation is that they are calling the site 354 they're actually visiting. However, it is unlikely that they want to 355 provide a general consent to such a site; just because I want some 356 information on a car doesn't mean that I want the car manufacturer to 357 be able to activate my microphone whenever they please. Thus, this 358 suggests the need for a second consent mechanism where I only grant 359 consent for the duration of a given call. As described in 360 Section 3.1, great care must be taken in the design of this interface 361 to avoid the users just clicking through. 363 4.1.1.3. Calling to an Ad Target 365 In both of the previous cases, the user has a direct relationship 366 (though perhaps a transient one) with the target of the call. 367 Moreover, in both cases he is actually visiting the site of the 368 person he is being asked to trust. However, this is not always so. 369 Consider the case where a user is a visiting a content site which 370 hosts an advertisement with an invitation to call for more 371 information. When the user clicks the ad, they are connected with 372 the advertiser or their agent. 374 The relationships here are far more complicated: the site the user 375 is actually visiting has no direct relationship with the advertiser; 376 they are just hosting ads from an ad network. The user has no 377 relationship with the ad network, but desires one with the 378 advertiser, at least for long enough to learn about their products. 379 At minimum, then, whatever consent dialog is shown needs to allow the 380 user to have some idea of the organization that they are actually 381 calling. 383 However, because the user also has some relationship with the hosting 384 site, it is also arguable that the hosting site should be allowed to 385 express an opinion (e.g., to be able to allow or forbid a call) since 386 a bad experience with an advertiser reflect negatively on the hosting 387 site [this idea was suggested by Adam Barth]. However, this 388 obviously presents a privacy challenge, as sites which host 389 advertisements often learn very little about whether individual users 390 clicked through to the ads, or even which ads were presented. 392 4.1.2. Origin-Based Security 394 As discussed in Section 3.2, the basic unit of Web sandboxing is the 395 origin, and so it is natural to scope consent to origin. 396 Specifically, a script from origin A MUST only be allowed to initiate 397 communications (and hence to access camera and microphone) if the 398 user has specifically authorized access for that origin. It is of 399 course technically possible to have coarser-scoped permissions, but 400 because the Web model is scoped to origin, this creates a difficult 401 mismatch. 403 Arguably, origin is not fine-grained enough. Consider the situation 404 where Alice visits a site and authorizes it to make a single call. 405 If consent is expressed solely in terms of origin, then at any future 406 visit to that site (including one induced via mash-up or ad network), 407 the site can bug Alice's computer, use the computer to place bogus 408 calls, etc. While in principle Alice could grant and then revoke the 409 privilege, in practice privileges accumulate; if we are concerned 410 about this attack, something else is needed. There are a number of 411 potential countermeasures to this sort of issue. 413 Individual Consent 414 Ask the user for permission for each call. 416 Callee-oriented Consent 417 Only allow calls to a given user. 419 Cryptographic Consent 420 Only allow calls to a given set of peer keying material or to a 421 cryptographically established identity. 423 Unfortunately, none of these approaches is satisfactory for all 424 cases. As discussed above, individual consent puts the user's 425 approval in the UI flow for every call. Not only does this quickly 426 become annoying but it can train the user to simply click "OK", at 427 which point the consent becomes useless. Thus, while it may be 428 necssary to have individual consent in some case, this is not a 429 suitable solution for (for instance) the calling service case. Where 430 necessary, in-flow user interfaces must be carefully designed to 431 avoid the risk of the user blindly clicking through. 433 The other two options are designed to restrict calls to a given 434 target. Unfortunately, Callee-oriented consent does not work well 435 because a malicious site can claim that the user is calling any user 436 of his choice. One fix for this is to tie calls to a 437 cryptographically established identity. While not suitable for all 438 cases, this approach may be useful for some. If we consider the 439 advertising case described in Section 4.1.1.3, it's not particularly 440 convenient to require the advertiser to instantiate an iframe on the 441 hosting site just to get permission; a more convenient approach is to 442 cryptographically tie the advertiser's certificate to the 443 communication directly. We're still tieing permissions to origin 444 here, but to the media origin (and-or destination) rather than to the 445 Web origin. 447 Another case where media-level cryptographic identity makes sense is 448 when a user really does not trust the calling site. For instance, I 449 might be worried that the calling service will attempt to bug my 450 computer, but I also want to be able to conveniently call my friends. 451 If consent is tied to particular communications endpoints, then my 452 risk is limited. However, this is also not that convenient an 453 interface, since managing individual user permissions can be painful. 455 While this is primarily a question not for IETF, it should be clear 456 that there is no really good answer. In general, if you cannot trust 457 the site which you have authorized for calling not to bug you then 458 your security situation is not really ideal. It is RECOMMENDED that 459 browsers have explicit (and obvious) indicators that they are in a 460 call in order to mitigate this risk. 462 4.1.3. Security Properties of the Calling Page 464 Origin-based security is intended to security against web attackers. 465 However, we must also consider the case of network attackers. 466 Consider the case where I have granted permission to a calling 467 service by an origin that has the HTTP scheme, e.g., 468 http://calling-service.example.com. If I ever use my computer on an 469 unsecured network (e.g., a hotspot or if my own home wireless network 470 is insecure), and browse any HTTP site, then an attacker can bug my 471 computer. The attack proceeds like this: 473 1. I connect to http://anything.example.org/. Note that this site 474 is unaffiliated with the calling service. 475 2. The attacker modifies my HTTP connection to inject an IFRAME (or 476 a redirect) to http://calling-service.example.com 477 3. The attacker forges the response apparently 478 http://calling-service.example.com/ to inject JS to initiate a 479 call to himself. 481 Note that this attack does not depend on the media being insecure. 482 Because the call is to the attacker, it is also encrypted to him. 483 Moreover, it need not be executed immediately; the attacker can 484 "infect" the origin semi-permanently (e.g., with a web worker or a 485 popunder) and thus be able to bug me long after I have left the 486 infected network. This risk is created by allowing calls at all from 487 a page fetched over HTTP. 489 Even if calls are only possible from HTTPS sites, if the site embeds 490 active content (e.g., JavaScript) that is fetched over HTTP or from 491 an untrusted site, because that JavaScript is executed in the 492 security context of the page [finer-grained]. Thus, it is also 493 dangerous to allow RTC-Web functionality from HTTPS origins that 494 embed mixed content. Note: this issue is not restricted to PAGES 495 which contain mixed content. If a page from a given origin ever 496 loads mixed content then it is possible for a network attacker to 497 infect the browser's notion of that origin semi-permanently. 499 [[ OPEN ISSUE: What recommendation should IETF make about (a) 500 whether RTCWeb long-term consent should be available over HTTP pages 501 and (b) How to handle origins where the consent is to an HTTPS URL 502 but the page contains active mixed content? ]] 504 4.2. Communications Consent Verification 506 As discussed in Section 3.3, allowing web applications unrestricted 507 access to the via the browser network introduces the risk of using 508 the browser as an attack platform against machines which would not 509 otherwise be accessible to the malicious site, for instance because 510 they are topologically restricted (e.g., behind a firewall or NAT). 511 In order to prevent this form of attack as well as cross-protocol 512 attacks it is important to require that the target of traffic 513 explicitly consent to receiving the traffic in question. Until that 514 consent has been verified for a given endpoint, traffic other than 515 the consent handshake MUST NOT be sent to that endpoint. 517 4.2.1. ICE 519 Verifying receiver consent requires some sort of explicit handshake, 520 but conveniently we already need one in order to do NAT hole- 521 punching. ICE [RFC5245] includes a handshake designed to verify that 522 the receiving element wishes to receive traffic from the sender. It 523 is important to remember here that the site initiating ICE is 524 presumed malicious; in order for the handshake to be secure the 525 receiving element MUST demonstrate receipt/knowledge of some value 526 not available to the site (thus preventing it from forging 527 responses). In order to achieve this objective with ICE, the STUN 528 transaction IDs must be generated by the browser and MUST NOT be made 529 available to the initiating script, even via a diagnostic interface. 531 4.2.2. Masking 533 Once consent is verified, there still is some concern about 534 misinterpretation attacks as described by Huang et al.[huang-w2sp]. 535 As long as communication is limited to UDP, then this risk is 536 probably limited, thus masking is not required for UDP. I.e., once 537 communications consent has been verified, it is most likely safe to 538 allow the implementation to send arbitrary UDP traffic to the chosen 539 destination, provided that the STUN keepalives continue to succeed. 540 However, with TCP the risk of transparent proxies becomes much more 541 severe. If TCP is to be used, then WebSockets style masking MUST be 542 employed. 544 4.2.3. Backward Compatibility 546 A requirement to use ICE limits compatibility with legacy non-ICE 547 clients. It seems unsafe to completely remove the requirement for 548 some check. All proposed checks have the common feature that the 549 browser sends some message to the candidate traffic recipient and 550 refuses to send other traffic until that message has been replied to. 551 The message/reply pair must be generated in such a way that an 552 attacker who controls the Web application cannot forge them, 553 generally by having the message contain some secret value that must 554 be incorporated (e.g., echoed, hashed into, etc.). Non-ICE 555 candidates for this role (in cases where the legacy endpoint has a 556 public address) include: 558 o STUN checks without using ICE (i.e., the non-RTC-web endpoint sets 559 up a STUN responder.) 560 o Use or RTCP as an implicit reachability check. 562 In the RTCP approach, the RTC-Web endpoint is allowed to send a 563 limited number of RTP packets prior to receiving consent. This 564 allows a short window of attack. In addition, some legacy endpoints 565 do not support RTCP, so this is a much more expensive solution for 566 such endpoints, for which it would likely be easier to implement ICE. 567 For these two reasons, an RTCP-based approach does not seem to 568 address the security issue satisfactorily. 570 In the STUN approach, the RTC-Web endpoint is able to verify that the 571 recipient is running some kind of STUN endpoint but unless the STUN 572 responder is integrated with the ICE username/password establishment 573 system, the RTC-Web endpoint cannot verify that the recipient 574 consents to this particular call. This may be an issue if existing 575 STUN servers are operated at addresses that are not able to handle 576 bandwidth-based attacks. Thus, this approach does not seem 577 satisfactory either. 579 If the systems are tightly integrated (i.e., the STUN endpoint 580 responds with responses authenticated with ICE credentials) then this 581 issue does not exist. However, such a design is very close to an 582 ICE-Lite implementation (indeed, arguably is one). An intermediate 583 approach would be to have a STUN extension that indicated that one 584 was responding to RTC-Web checks but not computing integrity checks 585 based on the ICE credentials. This would allow the use of standalone 586 STUN servers without the risk of confusing them with legacy STUN 587 servers. If a non-ICE legacy solution is needed, then this is 588 probably the best choice. 590 [TODO: Write something about consent freshness and RTCP]. 592 [[ OPEN ISSUE: Exactly what should be the requirements here? 593 Proposals include ICE all the time or ICE but with allowing one of 594 these non-ICE things for legacy. ]] 596 4.3. Communications Security 598 Finally, we consider a problem familiar from the SIP world: 599 communications security. For obvious reasons, it MUST be possible 600 for the communicating parties to establish a channel which is secure 601 against both message recovery and message modification. (See 602 [RFC5479] for more details.) This service must be provided for both 603 data and voice/video. Ideally the same security mechanisms would be 604 used for both types of content. Technology for providing this 605 service (for instance, DTLS [RFC4347] and DTLS-SRTP [RFC5763]) is 606 well understood. However, we must examine this technology to the 607 RTC-Web context, where the threat model is somewhat different. 609 In general, it is important to understand that unlike a conventional 610 SIP proxy, the calling service (i.e., the Web server) controls not 611 only the channel between the communicating endpoints but also the 612 application running on the user's browser. While in principle it is 613 possible for the browser to cut the calling service out of the loop 614 and directly present trusted information (and perhaps get consent), 615 practice in modern browsers is to avoid this whenever possible. "In- 616 flow" modal dialogs which require the user to consent to specific 617 actions are particularly disfavored as human factors research 618 indicates that unless they are made extremely invasive, users simply 619 agree to them without actually consciously giving consent. 620 [abarth-rtcweb]. Thus, nearly all the UI will necessarily be 621 rendered by the browser but under control of the calling service. 622 This likely includes the peer's identity information, which, after 623 all, is only meaningful in the context of some calling service. 625 This limitation does not mean that preventing attack by the calling 626 service is completely hopeless. However, we need to distinguish 627 between two classes of attack: 629 Retrospective compromise of calling service. 630 The calling service is is non-malicious during a call but 631 subsequently is compromised and wishes to attack an older call. 633 During-call attack by calling service. 634 The calling service is compromised during the call it wishes to 635 attack. 637 Providing security against the former type of attack is practical 638 using the techniques discussed in Section 4.3.1. However, it is 639 extremely difficult to prevent a trusted but malicious calling 640 service from actively attacking a user's calls, either by mounting a 641 MITM attack or by diverting them entirely. (Note that this attack 642 applies equally to a network attacker if communications to the 643 calling service are not secured.) We discuss some potential 644 approaches and why they are likely to be impractical in 645 Section 4.3.2. 647 4.3.1. Protecting Against Retrospective Compromise 649 In a retrospective attack, the calling service was uncompromised 650 during the call, but that an attacker subsequently wants to recover 651 the content of the call. We assume that the attacker has access to 652 the protected media stream as well as having full control of the 653 calling service. 655 If the calling service has access to the traffic keying material (as 656 in SDES [RFC4568]), then retrospective attack is trivial. This form 657 of attack is particularly serious in the Web context because it is 658 standard practice in Web services to run extensive logging and 659 monitoring. Thus, it is highly likely that if the traffic key is 660 part of any HTTP request it will be logged somewhere and thus subject 661 to subsequent compromise. It is this consideration that makes an 662 automatic, public key-based key exchange mechanism imperative for 663 RTC-Web (this is a good idea for any communications security system) 664 and this mechanism SHOULD provide perfect forward secrecy (PFS). The 665 signaling channel/calling service can be used to authenticate this 666 mechanism. 668 In addition, the system MUST NOT provide any APIs to extract either 669 long-term keying material or to directly access any stored traffic 670 keys. Otherwise, an attacker who subsequently compromised the 671 calling service might be able to use those APIs to recover the 672 traffic keys and thus compromise the traffic. 674 4.3.2. Protecting Against During-Call Attack 676 Protecting against attacks during a call is a more difficult 677 proposition. Even if the calling service cannot directly access 678 keying material (as recommended in the previous section), it can 679 simply mount a man-in-the-middle attack on the connection, telling 680 Alice that she is calling Bob and Bob that he is calling Alice, while 681 in fact the calling service is acting as a calling bridge and 682 capturing all the traffic. While in theory it is possible to 683 construct techniques which protect against this form of attack, in 684 practice these techniques all require far too much user intervention 685 to be practical, given the user interface constraints described in 686 [abarth-rtcweb]. 688 4.3.2.1. Key Continuity 690 One natural approach is to use "key continuity". While a malicious 691 calling service can present any identity it chooses to the user, it 692 cannot produce a private key that maps to a given public key. Thus, 693 it is possible for the browser to note a given user's public key and 694 generate an alarm whenever that user's key changes. SSH [RFC4251] 695 uses a similar technique. (Note that the need to avoid explicit user 696 consent on every call precludes the browser requiring an immediate 697 manual check of the peer's key). 699 Unfortunately, this sort of key continuity mechanism is far less 700 useful in the RTC-Web context. First, much of the virtue of RTC-Web 701 (and any Web application) is that it is not bound to particular piece 702 of client software. Thus, it will be not only possible but routine 703 for a user to use multiple browsers on different computers which will 704 of course have different keying material (SACRED [RFC3760] 705 notwithstanding.) Thus, users will frequently be alerted to key 706 mismatches which are in fact completely legitimate, with the result 707 that they are trained to simply click through them. As it is known 708 that users routinely will click through far more dire warnings 709 [cranor-wolf], it seems extremely unlikely that any key continuity 710 mechanism will be effective rather than simply annoying. 712 Moreover, it is trivial to bypass even this kind of mechanism. 713 Recall that unlike the case of SSH, the browser never directly gets 714 the peer's identity from the user. Rather, it is provided by the 715 calling service. Even enabling a mechanism of this type would 716 require an API to allow the calling service to tell the browser "this 717 is a call to user X". All the calling service needs to do to avoid 718 triggering a key continuity warning is to tell the browser that "this 719 is a call to user Y" where Y is close to X. Even if the user actually 720 checks the other side's name (which all available evidence indicates 721 is unlikely), this would require (a) the browser to trusted UI to 722 provide the name and (b) the user to not be fooled by similar 723 appearing names. 725 4.3.2.2. Short Authentication Strings 727 ZRTP [RFC6189] uses a "short authentication string" (SAS) which is 728 derived from the key agreement protocol. This SAS is designed to be 729 read over the voice channel and if confirmed by both sides precludes 730 MITM attack. The intention is that the SAS is used once and then key 731 continuity (though a different mechanism from that discussed above) 732 is used thereafter. 734 Unfortunately, the SAS does not offer a practical solution to the 735 problem of a compromised calling service. "Voice conversion" 736 systems, which modify voice from one speaker to make it sound like 737 another, are an active area of research. These systems are already 738 good enough to fool both automatic recognition systems 739 [farus-conversion] and humans [kain-conversion] in many cases, and 740 are of course likely to improve in future, especially in an 741 environment where the user just wants to get on with the phone call. 742 Thus, even if SAS is effective today, it is likely not to be so for 743 much longer. Moreover, it is possible for an attacker who controls 744 the browser to allow the SAS to succeed and then simulate call 745 failure and reconnect, trusting that the user will not notice that 746 the "no SAS" indicator has been set (which seems likely). 748 Even were SAS secure if used, it seems exceedingly unlikely that 749 users will actually use it. As discussed above, the browser UI 750 constraints preclude requiring the SAS exchange prior to completing 751 the call and so it must be voluntary; at most the browser will 752 provide some UI indicator that the SAS has not yet been checked. 753 However, it it is well-known that when faced with optional mechanisms 754 such as fingerprints, users simply do not check them [whitten-johnny] 755 Thus, it is highly unlikely that users will ever perform the SAS 756 exchange. 758 Once uses have checked the SAS once, key continuity is required to 759 avoid them needing to check it on every call. However, this is 760 problematic for reasons indicated in Section 4.3.2.1. In principle 761 it is of course possible to render a different UI element to indicate 762 that calls are using an unauthenticated set of keying material 763 (recall that the attacker can just present a slightly different name 764 so that the attack shows the same UI as a call to a new device or to 765 someone you haven't called before) but as a practical matter, users 766 simply ignore such indicators even in the rather more dire case of 767 mixed content warnings. 769 4.3.2.3. Recommendations 771 [[ OPEN ISSUE: What are the best UI recommendations to make? 772 Proposal: take the text from [I-D.kaufman-rtcweb-security-ui] 773 Section 2]] 775 [[ OPEN ISSUE: Exactly what combination of media security primitives 776 should be specified and/or mandatory to implement? In particular, 777 should we allow DTLS-SRTP only, or both DTLS-SRTP and SDES. Should 778 we allow RTP for backward compatibility? ]] 780 5. Security Considerations 782 This entire document is about security. 784 6. Acknowledgements 786 7. References 788 7.1. Normative References 790 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 791 Requirement Levels", BCP 14, RFC 2119, March 1997. 793 7.2. Informative References 795 [CORS] van Kesteren, A., "Cross-Origin Resource Sharing". 797 [I-D.abarth-origin] 798 Barth, A., "The Web Origin Concept", 799 draft-abarth-origin-09 (work in progress), November 2010. 801 [I-D.ietf-hybi-thewebsocketprotocol] 802 Fette, I. and A. Melnikov, "The WebSocket protocol", 803 draft-ietf-hybi-thewebsocketprotocol-15 (work in 804 progress), September 2011. 806 [I-D.kaufman-rtcweb-security-ui] 807 Kaufman, M., "Client Security User Interface Requirements 808 for RTCWEB", draft-kaufman-rtcweb-security-ui-00 (work in 809 progress), June 2011. 811 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. 813 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 814 A., Peterson, J., Sparks, R., Handley, M., and E. 815 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 816 June 2002. 818 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 819 Text on Security Considerations", BCP 72, RFC 3552, 820 July 2003. 822 [RFC3760] Gustafson, D., Just, M., and M. Nystrom, "Securely 823 Available Credentials (SACRED) - Credential Server 824 Framework", RFC 3760, April 2004. 826 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 827 Protocol Architecture", RFC 4251, January 2006. 829 [RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer 830 Security", RFC 4347, April 2006. 832 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 833 Description Protocol (SDP) Security Descriptions for Media 834 Streams", RFC 4568, July 2006. 836 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 837 (ICE): A Protocol for Network Address Translator (NAT) 838 Traversal for Offer/Answer Protocols", RFC 5245, 839 April 2010. 841 [RFC5479] Wing, D., Fries, S., Tschofenig, H., and F. Audet, 842 "Requirements and Analysis of Media Security Management 843 Protocols", RFC 5479, April 2009. 845 [RFC5763] Fischl, J., Tschofenig, H., and E. Rescorla, "Framework 846 for Establishing a Secure Real-time Transport Protocol 847 (SRTP) Security Context Using Datagram Transport Layer 848 Security (DTLS)", RFC 5763, May 2010. 850 [RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media 851 Path Key Agreement for Unicast Secure RTP", RFC 6189, 852 April 2011. 854 [abarth-rtcweb] 855 Barth, A., "Prompting the user is security failure", RTC- 856 Web Workshop. 858 [cranor-wolf] 859 Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., and 860 L. cranor, "Crying Wolf: An Empirical Study of SSL Warning 861 Effectiveness", Proceedings of the 18th USENIX Security 862 Symposium, 2009. 864 [farus-conversion] 865 Farrus, M., Erro, D., and J. Hernando, "Speaker 866 Recognition Robustness to Voice Conversion". 868 [finer-grained] 869 Barth, A. and C. Jackson, "Beware of Finer-Grained 870 Origins", W2SP, 2008. 872 [huang-w2sp] 873 Huang, L-S., Chen, E., Barth, A., Rescorla, E., and C. 874 Jackson, "Talking to Yourself for Fun and Profit", W2SP, 875 2011. 877 [kain-conversion] 878 Kain, A. and M. Macon, "Design and Evaluation of a Voice 879 Conversion Algorithm based on Spectral Envelope Mapping 880 and Residual Prediction", Proceedings of ICASSP, May 881 2001. 883 [whitten-johnny] 884 Whitten, A. and J. Tygar, "Why Johnny Can't Encrypt: A 885 Usability Evaluation of PGP 5.0", Proceedings of the 8th 886 USENIX Security Symposium, 1999. 888 Author's Address 890 Eric Rescorla 891 RTFM, Inc. 892 2064 Edgewood Drive 893 Palo Alto, CA 94303 894 USA 896 Phone: +1 650 678 2350 897 Email: ekr@rtfm.com