idnits 2.17.1 draft-ietf-uri-url-07.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2023-02-06) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand corner of the first page ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1080 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** There are 2 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (September 7, 1994) is 10379 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '1' is defined on line 966, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 978, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 983, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 987, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 997, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 1001, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 1005, but no explicit reference was found in the text == Unused Reference: '11' is defined on line 1012, but no explicit reference was found in the text == Unused Reference: '12' is defined on line 1017, but no explicit reference was found in the text == Unused Reference: '13' is defined on line 1022, but no explicit reference was found in the text == Unused Reference: '15' is defined on line 1030, but no explicit reference was found in the text == Unused Reference: '16' is defined on line 1034, but no explicit reference was found in the text == Unused Reference: '19' is defined on line 1047, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1436 (ref. '1') -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Downref: Normative reference to an Informational RFC: RFC 1630 (ref. '3') -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 822 (ref. '5') (Obsoleted by RFC 2822) -- Possible downref: Non-RFC (?) normative reference: ref. '6' ** Downref: Normative reference to an Informational RFC: RFC 1635 (ref. '7') ** Obsolete normative reference: RFC 1036 (ref. '8') (Obsoleted by RFC 5536, RFC 5537) -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' ** Obsolete normative reference: RFC 977 (ref. '11') (Obsoleted by RFC 3977) -- Possible downref: Non-RFC (?) normative reference: ref. '12' -- Possible downref: Non-RFC (?) normative reference: ref. '14' -- Possible downref: Non-RFC (?) normative reference: ref. '16' ** Downref: Normative reference to an Informational RFC: RFC 1625 (ref. '17') -- Possible downref: Non-RFC (?) normative reference: ref. '18' -- Possible downref: Non-RFC (?) normative reference: ref. '19' Summary: 20 errors (**), 0 flaws (~~), 15 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Uniform Resource Locators T. Berners-Lee 2 draft-ietf-uri-url-07.txt L. Masinter 3 Expires March 13, 1995 M. McCahill 4 Editors 5 September 7, 1994 7 Uniform Resource Locators (URL) 9 Status of this memo 11 This document is an Internet-Draft. Internet-Drafts are 12 working documents of the Internet Engineering Task Force 13 (IETF), its areas, and its working groups. Note that other 14 groups may also distribute working documents as 15 Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months. Internet-Drafts may be updated, replaced, or obsoleted 19 by other documents at any time. It is not appropriate to use 20 Internet-Drafts as reference material or to cite them other 21 than as a ``working draft'' or ``work in progress.'' 23 To learn the current status of any Internet-Draft, please check 24 the 1id-abstracts.txt listing contained in the Internet-Drafts 25 Shadow Directories on ds.internic.net, nic.nordu.net, 26 ftp.isi.edu, or munnari.oz.au. 28 This Internet Draft expires April 7, 1995. 30 0. Abstract 32 This document specifies a Uniform Resource Locator (URL), the 33 syntax and semantics of formalized information for location and 34 access of resources via the Internet. 36 1. Introduction 38 The specification of a Uniform Resource Locator, defined in this 39 document, is derived from concepts introduced by the World-Wide Web 40 global information initiative, whose use of such objects dates from 41 1990 and is described in "Universal Resource Identifiers in WWW", 42 RFC 1630. The specification is based on the "Functional 43 Requirements for Internet Resource Locators"[12]. 45 This document was written by the URI working group of the Internet 46 Engineering Task Force. Comments may be addressed to the editor, 47 Tim Berners-Lee , or to the URI-WG 48 . Discussions of the group are archived at 49 51 2. Definitions 53 This document describes the syntax for "Uniform Resource Locators" 54 (URLs): a compact representation of the location and access method 55 for a resource available via the Internet. Just as there are many 56 different methods of access to resources, there are several 57 _schemes_ for describing the location of such resources. 59 The generic syntax provides a framework for new URL schemes to be 60 established using as yet undefined protocols. 62 URLs are used to `locate' resources, by providing an abstract 63 identification of the resource location. Having located a 64 resource, a system may perform a variety of operations on the 65 resource, as might be characterized by such words as `access', 66 `update', `replace', `find attributes'. In general, only the 67 `access' method needs to be specified for any URL scheme. 69 2.1. URL SYNTAX 71 URLs are written as follows: 73 : 75 A URL contains the name of the scheme being used () 76 followed by a colon and then a string (the ) 77 whose interpretation depends on the scheme. 79 Scheme names consist of lower case letters "a"--"z", digits, and 80 the characters plus ("+"), period ("."), and hyphen ("-"). For 81 resiliency, programs interpreting URLs should treat upper case 82 letters as equivalent to lower case in scheme names (e.g., allow 83 "HTTP" as well as "http"). 85 A BNF description of the URL syntax is given in Section 5. 87 2.2. Encoding of reserved and unsafe characters 89 URLs are represented as a sequence of characters taken from the NVT 90 ASCII character set. 92 Characters and other 8-bit bytes are _encoded_ by representing them 93 as a percent sign "%" followed by two hexadecimal digits (0-9, 94 A-F). 96 In any circumstance, only printable ASCII characters are allowed in 97 URLs: URLs may not contain space or other non-printable characters. 98 If it is necessary to designate a byte in a component of a URL that 99 would otherwise be represented by a space or a non-printable 100 character, it is necessary to represent that byte encoded. 102 There are a number of printable ASCII characters whose use in URLs 103 is _unsafe_; characters can be unsafe for a number of reasons. The 104 characters "<" and ">" are unsafe because they are used as the 105 delimiters around URLs in free text; the quote mark (""") is used 106 to delimit URLs in some systems. The character "#" is unsafe and 107 should always be encoded because it is used in World Wide Web and 108 in other systems to delimit a URL from a fragment/anchor identifier 109 that might follow it. The character "%" is unsafe because it is 110 used for encodings of other characters. Other characters are 111 unsafe because gateways and other transport agents are sometimes 112 known to modify such characters. 114 All unsafe characters should always be encoded within a URL. For 115 example, the character "#" should be encoded within URLs even in 116 systems that do not normally deal with fragment or anchor 117 identifiers, so that if the URL is copied into another system that 118 does use them, it will not be necessary to change the URL encoding. 120 In general, only alphanumerics, reserved characters used for their 121 reserved purposes, "$", "-", "_", ".", and "+" may be used 122 unencoded. 124 On the other hand, even safe characters such as alphanumerics _may_ 125 be encoded, as long as they are not being used for a reserved 126 purpose. 128 Many URL schemes reserve certain characters for a special meaning; 129 their appearance in the scheme-specific part of the URL has a 130 designated semantics. If it is necessary to designate a byte in a 131 component of a URL that would otherwise be represented by a 132 reserved character, it is necessary to represent that byte encoded. 133 The characters ";", "/", "?", ":", "@", "=" and "&" are the 134 characters which may be reserved for special meaning within a 135 scheme. No other characters may be reserved within a scheme. 137 Usually, a URL has the same interpretation when a byte is 138 represented by a character and when it is represented by its hex 139 encoding. However, this is not true for reserved characters: 140 encoding a reserved character for a particular scheme may change 141 the semantics of a URL. 143 2.3 Hierarchical schemes and relative links 145 In some cases, URLs are used to locate resources that contain 146 pointers to _other_ resources. In some cases, those pointers are 147 represented as _relative links_ where the expression of the 148 location of the second resource is in terms of "in the same place 149 as this one except with the following relative path". Relative 150 links are not described in this document. However, the use of 151 relative links depends on the original URL containing a 152 hierarchical structure against which the relative link is based. 154 Some URL schemes (such as the ftp, http, and file schemes) contain 155 names that can be considered hierarchical; the components of the 156 hierarchy are separated by "/". 158 3. Specific Schemes 160 The mapping for some existing standard and experimental protocols 161 is outlined in the BNF syntax definition. Notes on particular 162 protocols follow. The schemes covered are: 164 ftp File Transfer protocol 165 http Hypertext Transfer Protocol 166 gopher The Gopher protocol 167 mailto Electronic mail address 168 news USENET news 169 nntp USENET news using NNTP access 170 telnet Reference to interactive sessions 171 wais Wide Area Information Servers 172 file Host-specific file names 173 prospero Prospero Directory Service 175 Other schemes may be specified by future specifications. Section 4 176 of this document describes how new schemes may be registered, and 177 lists some scheme names that are under development. 179 3.1. Common Internet Scheme Syntax 181 While the syntax for the rest of the URL may vary depending on the 182 particular scheme selected, URL schemes that involve the direct use 183 of an IP-based protocol to a specified host on the Internet use a 184 common syntax for the initial part of the scheme-specific data: 186 //:@: 187 or 188 //:@:/ 190 This initial part starts with a double slash "//" to indicate its 191 presence, and continues until the following slash "/", if any. 192 Within this section are: 194 user 195 An optional user name. Some schemes (e.g., ftp) allow the 196 specification of a user name. 198 password 199 An optional password. If present, it follows the user 200 name separated from it by a colon. 202 The user name (and password), if present, are followed by a 203 commercial at-sign "@". Within the user and password field, any 204 ":", "@", or "/" must be encoded. 206 Note that an empty user name or password is different than no user 207 name or password; there is no way to specify a password without 208 specifying a user name. E.g., has an empty 209 user name and no password, has no user name, 210 while has a user name of "foo" and an 211 empty password. 213 host 214 The fully qualified domain name of a network host, or its IP 215 address as a set of four decimal digits separated by periods. 216 Fully qualified domain names take the form as described in 217 Section 3.5 of RFC 1034: a sequence of parts separated by 218 period. 220 port 221 The port number to connect to. Most schemes designate 222 protocols that have a default port number. Another port number 223 may optionally be supplied, in decimal, separated from the 224 host by a colon. If the port is omitted, the colon is as well. 226 url-path 227 The rest of the locator consists of data specific to the 228 scheme, and is known as the "url-path". It supplies the 229 details of how the specified resource can be accessed. Note 230 that the "/" between the host (or port) and the url-path is 231 NOT part of the url-path. 233 The url-path syntax depends on the scheme being use, as does the 234 manner in which it is interpreted. 236 3.2. FTP 238 The FTP URL scheme is used to designate files and directories on 239 Internet hosts accessible using the FTP protocol (RFC959). 241 A FTP URL follow the syntax described in Section 3.1. If : 242 is omitted, the port defaults to 21. 244 3.2.1. FTP Name and Password 246 A user name and password may be supplied; they are used in the ftp 247 "USER" and "PASS" commands after first making the connection to the 248 FTP server. If no user name or password is supplied and one is 249 requested by the FTP server, the conventions for "anonymous" FTP 250 are to be used, as follows: 252 The user name "anonymous" is supplied. 254 The password is supplied as the Internet e-mail address 255 of the end user accessing the resource. 257 If the URL supplies a user name but no password, and the remote 258 server requests a password, the program interpreting the FTP URL 259 should request one from the user. 261 3.2.2. FTP url-path 263 The url-path of a FTP URL has the following syntax: 265 //...//;type= 267 Where through and are (possibly encoded) 268 strings and is one of the characters "a", "i", or "d". 270 The url-path is interpreted as a series of FTP commands as follows: 272 Each of the elements is to be supplied, sequentially, as 273 the argument to a CWD (change working directory) command. 275 If the typecode is "d", perform a NLST (name list) command with 276 as the argument, and interpret the results as a file 277 directory listing. 279 Otherwise, perform a TYPE command with as the 280 argument, and then access the file whose name is (for 281 example, using the RETR command.) 283 Within a name or CWD component, the characters "/" and ";" are 284 reserved and must be encoded. The components are decoded prior to 285 their use in the FTP protocol. In particular, if the appropriate 286 FTP sequence to access a particular file requires supplying a 287 string containing a "/" as an argument to a CWD or RETR command, it 288 is necessary to encode each "/" as %2F. 290 For example, the URL is 291 interpreted by FTP-ing to "host.dom", logging in as "myname" 292 (prompting for a password if it is asked for), and then executing 293 "CWD /etc" and then "RETR motd". This has a different meaning from 294 which would "CWD etc" and then 295 "RETR motd"; the initial "CWD" might be executed relative to the 296 default directory for "myname". On the other hand, 297 , would "CWD " with a null 298 argument, then "CWD etc", and then "RETR motd". 300 FTP URLs may also be used for other operations; for example, it is 301 possible to update a file on a remote file server, or infer 302 information about it from the directory listings. The mechanism for 303 doing so is not spelled out here. 305 3.2.3. FTP Typecode is Optional 307 The entire ;type= part of a FTP URL is optional. If it is 308 omitted, the client program interpreting the URL must guess the 309 appropriate mode to use. In general, the data content type of a 310 file can only be guessed from the name, e.g., from the suffix of 311 the name; the appropriate type code to be used for transfer of the 312 file can then be deduced from the data content of the file. 314 3.2.4 Hierarchy 316 For some file systems, the "/" used to denote the hierarchical 317 structure of the URL corresponds to the delimiter used to construct 318 a file name hierarchy, and thus, the filename will look similar to 319 the URL path. This does NOT mean that the URL is a Unix filename. 321 3.2.5. Optimization 323 Clients accessing resources via FTP may employ additional 324 heuristics to optimize the interaction. For some FTP servers, for 325 example, it may be reasonable to keep the control connection open 326 while accessing multiple URLs from the same server. However, there 327 is no common hierarchical model to the FTP protocol, so if a 328 directory change command has been given, it is impossible in 329 general to deduce what sequence should be given to navigate to 330 another directory for a second retrieval, if the paths are 331 different. The only reliable algorithm is to disconnect and 332 reestablish the control connection. 334 3.3. HTTP 336 The HTTP URL scheme is used to designate Internet resources 337 accessible using HTTP (HyperText Transfer Protocol). 339 The HTTP protocol is specified elsewhere. This specification only 340 describes the syntax of HTTP URLs. 342 An HTTP URL takes the form: 344 http://:/? 346 where and are as described in Section 3.1. If : 347 is omitted, the port defaults to 80. No user name or password is 348 allowed. is an HTTP selector, and is a query 349 string. The is optional, as is the and its 350 preceding "?". If neither nor is present, the 351 "/" may also be omitted. 353 Within the and components, "/", ";", "?" are 354 reserved. The "/" character may be used within HTTP to designate a 355 hierarchical structure. 357 3.4. GOPHER 359 The Gopher URL scheme is used to designate Internet resources 360 accessible using the Gopher protocol. 362 The base Gopher protocol is described in RFC 1436 and supports 363 items and collections of items (directories). The Gopher+ protocol 364 is a set of upward compatible extensions to the base Gopher 365 protocol and is described in [2]. Gopher+ supports associating 366 arbitrary sets of attributes and alternate data representations 367 with Gopher items. Gopher URLs accommodate both Gopher and Gopher+ 368 items and item attributes. 370 3.4.1. Gopher URL syntax 372 A Gopher URL takes the form: 374 gopher://:/ 376 where is one of 378 379 %09 380 %09 381 %09%09 383 If : is omitted, the port defaults to 70. is 384 single-character field to denote the Gopher type of the resource to 385 which the URL refers. The entire may also be empty, 386 in which case the delimiting "/" is also optional and the 387 defaults to "1". 389 is the Gopher selector string. In the Gopher protocol, 390 Gopher selector strings are a sequence of 8-bit bytes which may 391 contain any characters other than tab, return, or linefeed. Gopher 392 clients specify which item to retrieve by sending the Gopher 393 selector string to a Gopher server. 395 Within the , no additional characters have a reserved 396 interpretation. 398 Note that some Gopher strings begin with a copy of the 399 character, in which case that character will occur 400 twice consecutively. The Gopher selector string may be an empty 401 string; this is how Gopher clients refer to the top-level directory 402 on a Gopher server. 404 3.4.2 Specifying URLs for Gopher Search Engines 406 If the URL refers to a search to be submitted to a Gopher search 407 engine, the selector is followed by an encoded tab (%09) and the 408 search string. To submit a search to a Gopher search engine, the 409 Gopher client sends the string (after decoding), a tab, 410 and the search string to the Gopher server. 412 3.4.3 URL syntax for Gopher+ items 414 URLs for Gopher+ items have a second encoded tab (%09) and a 415 Gopher+ string. Note that in this case, the %09 string must 416 be supplied, although the element may be the empty string. 418 The is used to represent information required for 419 retrieval of the Gopher+ item. Gopher+ items may have alternate 420 views, arbitrary sets of attributes, and may have electronic forms 421 associated with them. 423 To retrieve the data associated with a Gopher+ URL, a client will 424 connect to the server and send the Gopher selector, followed 425 optionally by a tab and the search string (if the element 426 is not empty), followed by a tab and the Gopher+ commands. 428 3.4.4 Default Gopher+ data representation 430 When a Gopher server returns a directory listing to a client, the 431 Gopher+ items are tagged with either a "+" (denoting Gopher+ items) 432 or a "?" (denoting Gopher+ items which have a +ASK form associated 433 with them). A Gopher URL with a Gopher+ string consisting of only 434 a "+" refers to the default view (data representation) of the item 435 while a Gopher+ string containing only a "?" refer to an item with 436 a Gopher electronic form associated with it. 438 3.4.5 Gopher+ items with electronic forms 440 Gopher+ items which have a +ASK associated with them (i.e. Gopher+ 441 items tagged with a "?") require the client to fetch the item's 442 +ASK attribute to get the form definition, and then ask the user to 443 fill out the form and return the user's responses along with the 444 selector string to retrieve the item. Gopher+ clients know how to 445 do this but depend on the "?" tag in the Gopher+ item description 446 to know when to handle this case. The "?" is used in the Gopher+ 447 string to be consistent with Gopher+ protocol's use of this symbol. 449 3.4.6 Gopher+ item attribute collections 451 To refer to the Gopher+ attributes of an item, the Gopher URL's 452 Gopher+ string consists of "!" or "$". "!" refers to the all of a 453 Gopher+ item's attributes. "$" refers to all the item attributes for 454 all items in a Gopher directory. 456 3.4.7 Referring to specific Gopher+ attributes 458 To refer to specific attributes, the URL's gopher+_string is 459 "!attribute_name" or "$attribute_name". For example, to refer to 460 the attribute containing the abstract of an item, the 461 gopher+_string would be "!+ABSTRACT". 463 To refer to several attributes, the gopher+_string consists of 464 the attribute names separated by coded spaces. For example, 465 "!+ABSTRACT%20+SMELL" refers to the +ABSTRACT and +SMELL attributes 466 of an item. 468 3.4.8 URL syntax for Gopher+ alternate views 470 Gopher+ allows for optional alternate data representations 471 (alternate views) of items. To retrieve a Gopher+ alternate view, 472 a Gopher+ client sends the appropriate view and language 473 identifier (found in the item's +VIEW attribute). To refer to a 474 specific Gopher+ alternate view, the URL's Gopher+ string would 475 be in the form: 477 +view_name%20language_name 479 For example, a Gopher+ string of "+application/postscript%20Es_ES" 480 refers to the Spanish language postscript alternate view of a 481 Gopher+ item. 483 3.4.9 URL syntax for Gopher+ electronic forms 485 The gopher+_string for a URL that refers to an item referenced by 486 a Gopher+ electronic form (an ASK block) filled out with specific 487 values is a coded version of what the client sends to the server. 488 The gopher+_string is of the form: 490 +%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value%0D%0A.%0D%0A 492 To retrieve this item, the Gopher client sends: 494 a_gopher_selector+1 495 +-1 496 ask_item1_value 497 ask_item2_value 498 . 500 to the Gopher server. 502 3.5. MAILTO 504 The mailto URL scheme is used to designate the Internet mailing 505 address of an individual or service. No additional information 506 other than an Internet mailing address is present or implied. 508 A mailto URL takes the form: 510 mailto: 512 where is (the encoding of an) addr-spec, as 513 specified in RFC 822. Within mailto URLs, no additional characters 514 are reserved within the component. 516 Note that the percent sign ("%") is commonly used within RFC 822 517 addresses and must be URL-encoded. 519 Unlike many URLs, the mailto scheme does not represent a data 520 object to be accessed directly; there is no sense in which it 521 designates an object. It has a different use than the 522 message/external-body type in MIME. 524 3.6. NEWS 526 The news URL scheme is used to refer to either news groups or 527 individual articles of USENET news, as specified in RFC 1036. 529 A news URL takes one of two forms: 531 news: 532 news: 534 A is a period-delimited hierarchical name, such as 535 "comp.infosystems.www.misc". A corresponds to the 536 Message-ID of section 2.1.5 of RFC 1036, without the enclosing "<" 537 and ">"; it takes the form @. A message 538 identifier may be distinguished from a news group name by the 539 presence of the commercial at "@" character. No additional 540 characters are reserved within the components of a news URL. 542 If is "*" (as in ), it is used to 543 refer to "all available news groups". 545 The news URLs are unusual in that by themselves, they do not 546 contain sufficient information to locate a single resource, but, 547 rather, are location-independent. 549 3.7. NNTP 551 The nntp URL scheme is an alternative method of referencing news 552 articles, useful for specifying news articles from NNTP servers 553 (RFC 977). 555 A nntp URL take the form: 557 nntp://:// 559 where and are as described in Section 3.1. If : 560 is omitted, the port defaults to 119. 562 The is the name of the group, while the 563 is the numeric id of the article within that 564 newsgroup. 566 Note that while nntp: URLs specify a unique location for the 567 article resource, most NNTP servers currently on the Internet today 568 are configured only to allow access from local clients, and thus 569 nntp URLs do not designate globally accessible resources. Thus, the 570 news: form of URL is preferred as a way of identifying news 571 articles. 573 3.8. TELNET 575 The Telnet URL scheme is used to designate interactive services 576 that may be accessed by the Telnet protocol. 578 A telnet URL takes the form: 580 telnet://:@: [ / ] 582 as specified in Section 3.1. The port defaults to 23; the 583 and segments are completely optional (a 584 requires a element.) 586 This URL does not designate a data object, but rather an 587 interactive service. Remote interactive services vary widely in the 588 means by which they allow remote logins; in practice, the 589 and supplied are advisory only: clients accessing a 590 telnet URL merely advise the user of the suggested username and 591 password. 593 3.9. WAIS 595 The WAIS URL scheme is used to designate WAIS databases, searches, 596 or individual documents available from a WAIS database. WAIS is 597 described in [6]; the WAIS protocol is described in RFC 1625 [17]. 599 A WAIS URLs takes one the following forms: 601 wais://:/ 602 wais://:/? 603 wais://:/// 605 where and are as described in Section 3.1. If : 606 is omitted, the port defaults to 210. The first form designates a 607 WAIS database that is available for searching. The second form 608 designates a particular search. is the name of the WAIS 609 database being queried. 611 The third form designates a particular document within a WAIS 612 database to be retrieved. In this form is the WAIS 613 designation of the type of the object. Many WAIS implementations 614 require that a client know the "type" of an object prior to 615 retrieval, the type being returned along with the internal object 616 identifier in the search response. The is included in the 617 URL in order to allow the client interpreting the URL adequate 618 information to actually retrieve the document. 620 The of a WAIS URL consists of the WAIS document-id, encoded 621 as necessary using the method described in Section 2.2. The WAIS 622 document-id should be treated opaquely; it may only be decomposed 623 by the server that issued it. 625 3.10 FILES 627 The file URL scheme is used to designate files accessible on 628 a particular host computer. This scheme, unlike most other 629 URL schemes, does not designate a resource that is universally 630 accessible over the Internet. 632 A file URL takes the form: 634 file:/// 636 where is the fully qualified domain name of the system on 637 which the is accessible, and is a hierarchical 638 directory path of the form //.../. 640 For example, a VMS file 642 DISK$USER:[MY.NOTES]NOTE123456.TXT 644 might become 646 648 As a special case, can be the string "localhost" or the 649 empty string; this is interpreted as `the machine from which the 650 URL is being interpreted'. 652 The file URL scheme is unusual in that it does not specify an 653 Internet protocol or access method for such files; as such, its 654 utility in network protocols between hosts is limited. 656 3.11 PROSPERO 658 The Prospero URL scheme is used to designate resources that are 659 accessed via the Prospero Directory Service. The Prospero protocol 660 is described elsewhere [14]. 662 A prospero URLs takes the form: 664 prospero://:/;= 666 where and are as described in Section 3.1. If : 667 is omitted, the port defaults to 1525. No username or password is 668 allowed. 670 The is the host-specific object name in the Prospero 671 protocol, suitably encoded. This name is opaque and interpreted by 672 the Prospero server. The semicolon ";" is reserved and may not 673 appear without quoting in the . 675 Prospero URLs are interpreted by contacting a Prospero directory 676 server on the specified host and port to determine appropriate 677 access methods for a resource, which might themselves be 678 represented as different URLs. External Prospero links are 679 represented as URLs of the underlying access method and are not 680 represented as Prospero URLs. 682 Note that a slash "/" may appear in the without quoting 683 and no significance may be assumed by the application. Though 684 slashes may indicate hierarchical structure on the server, such 685 structure is not guaranteed. Note that many s begin with a 686 slash, in which case the host or port will be followed by a double 687 slash: the slash from the URL syntax, followed by the initial slash 688 from the . (E.g., 689 designates a of "/pros/name".) 691 In addition, after the , optional fields and values 692 associated with a Prospero link may be specified as part of the 693 URL. When present, each field/value pair is separated from each 694 other and from the rest of the URL by a ";" (semicolon). The name 695 of the field and its value are separated by a "=" (equal sign). If 696 present, these fields serve to identify the target of the URL. For 697 example, the OBJECT-VERSION field can be specified to identify a 698 specific version of an object. 700 4. REGISTRATION OF NEW SCHEMES 702 A new scheme may be introduced by defining a mapping onto a 703 conforming URL syntax, using a new prefix. Experimental prefixes 704 may be used by mutual agreement between parties. Scheme names 705 starting with the characters "x-" are reserved for experimental 706 purposes. 708 The Internet Assigned Numbers Authority (IANA) will maintain a 709 registry of URL schemes. Any submission of a new URL scheme must 710 include a definition of an algorithm for accessing of resources 711 within that scheme and the syntax for representing such a scheme. 713 URL schemes must have demonstrable utility and operability. One 714 way to provide such a demonstration is via a gateway which provides 715 objects in the new scheme for clients using an existing protocol. 716 If the new scheme does not locate resources that are data objects, 717 the properties of names in the new space must be clearly defined. 719 New schemes should try to follow the same syntactic conventions of 720 existing schemes, where appropriate. It is likewise recommended 721 that, where a protocol allows for retrieval by URL, that the client 722 software have provision for being configured to use specific 723 gateway locators for indirect access through new naming schemes. 725 The following scheme have been proposed at various times, but this 726 document does not define their syntax or use at this time. It is 727 suggested that IANA reserve their scheme names for future 728 definition: 730 afs Andrew File System global file names. 731 mid Message identifiers for electronic mail. 732 cid Content identifiers for MIME body parts. 733 nfs Network File System (NFS) file names. 734 tn3270 Interactive 3270 emulation sessions. 735 mailserver Access to data available from mail servers. 736 z39.50 Access to ANSI Z39.50 services. 738 5. BNF for specific URL schemes 740 This is a BNF-like description of the Uniform Resource Locator 741 syntax, using the conventions of RFC822, except that "|" is used to 742 designate alternatives, and brackets [] are used around optional or 743 repeated elements. Briefly, literals are quoted with "", optional 744 elements are enclosed in [brackets], and elements may be preceded 745 with * to designate n or more repetitions of the following 746 element; n defaults to 0. 748 ; The generic form of a URL is: 750 genericurl = scheme ":" schemepart 752 ; Specific predefined schemes are defined here; new schemes 753 ; may be registered with IANA 755 url = httpurl | ftpurl | newsurl | 756 nntpurl | telneturl | gopherurl | 757 waisurl | mailtourl | fileurl | 758 prosperourl | otherurl 760 ; new schemes follow the general syntax 761 otherurl = genericurl 763 ; the scheme is in lower case; interpreters should use case-ignore 764 scheme = 1*[ lowalpha | digit | "+" | "-" | "." ] 765 schemepart = *xchar | ip-schemepart 767 ; URL schemeparts for ip based protocols: 769 ip-schemepart = "//" login [ "/" urlpath ] 771 login = [ user [ ":" password ] "@" ] hostport 772 hostport = host [ ":" port ] 773 host = hostname | hostnumber 774 hostname = alpha *uchar 775 hostnumber = digits "." digits "." digits "." digits 776 port = digits 777 user = *[ uchar | ";" | "?" | "&" | "=" ] 778 password = *[ uchar | ";" | "?" | "&" | "=" ] 779 urlpath = *xchar ; depends on protocol see section 3.1 781 ; The predefined schemes: 783 ; FTP (see also RFC959) 785 ftpurl = "ftp://" login [ "/" fpath [ ";type=" ftptype ]] 786 fpath = fsegment *[ "/" fsegment ] 787 fsegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ] 788 ftptype = "A" | "I" | "D" | "a" | "i" | "d" 790 ; FILE 792 fileurl = "file://" host [ "/" fpath ] 794 ; HTTP 796 httpurl = "http://" hostport [ "/" hpath [ "?" search ]] 797 hpath = hsegment *[ "/" hsegment ] 798 hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ] 799 search = *[ uchar | ";" | ":" | "@" | "&" | "=" ] 801 ; GOPHER (see also RFC1436) 803 gopherurl = "gopher://" hostport [ / [ gtype [ selector 804 [ "%09" search [ "%09" gopher+_string ] ] ] ] ] 805 gtype = xchar 806 selector = *xchar 807 gopher+_string = *xchar 809 ; MAILTO (see also RFC822) 811 mailtourl = "mailto:" encoded822addr 812 encoded822addr = 1*xchar ; further defined in RFC822 814 ; NEWS (see also RFC1036) 816 newsurl = "news:" grouppart 817 grouppart = "*" | group | article 818 group = alpha *[ alpha | digit | "-" | "." ] 819 article = 1*[ uchar | ";" | "/" | "?" | ":" | "&" | "=" ] "@" host 821 ; NNTP (see also RFC977) 823 nntpurl = "nntp://" hostport "/" group [ "/" digits ] 825 ; TELNET 827 telneturl = "telnet://" login [ "/" ] 829 ; WAIS (see also RFC1625) 831 waisurl = waisdatabase | waisindex | waisdoc 832 waisdatabase = "wais://" hostport "/" database 833 waisindex = "wais://" hostport "/" database "?" search 834 waisdoc = "wais://" hostport "/" database "/" wtype "/" wpath 835 database = *uchar 836 wtype = *uchar 837 wpath = *uchar 839 ; PROSPERO 841 prosperourl = "prospero://" hostport "/" ppath *[ fieldspec ] 842 ppath = psegment *[ "/" psegment ] 843 psegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ] 844 fieldspec = ";" fieldname "=" fieldvalue 845 fieldname = *[ uchar | "?" | ":" | "@" | "&" ] 846 fieldvalue = *[ uchar | "?" | ":" | "@" | "&" ] 848 ; Miscellaneous definitions 850 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | 851 "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | 852 "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | 853 "y" | "z" 854 hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | 855 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | 856 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" 857 alpha = lowalpha | hialpha 858 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | 859 "8" | "9" 860 safe = "$" | "-" | "_" | "." | "+" 861 extra = "!" | "*" | "'" | "(" | ")" | "," | "=" 862 national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" 863 punctuation = "<" | ">" | """ | "#" 864 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" 865 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | 866 "a" | "b" | "c" | "d" | "e" | "f" 867 escape = "%" hex hex 869 unreserved = alpha | digit | safe | extra | national 870 uchar = unreserved | escape 871 xchar = unreserved | reserved | escape 872 digits = 1*digit 874 6. Security considerations 876 The URL scheme does not in itself pose a security threat. Users 877 should beware that there is no general guarantee that a URL which 878 at one time points to a given object continues to do so, and does 879 not even at some later time point to a different object due to the 880 movement of objects on servers. 882 A URL-related security threat is that it is sometimes possible to 883 construct a URL such that an attempt to perform a harmless 884 idempotent operation such as the retrieval of the object will in 885 fact cause a possibly damaging remote operation to occur. The 886 unsafe URL is typically constructed by specifying a port number 887 other than that reserved for the network protocol in question. The 888 client unwittingly contacts a server which is in fact running a 889 different protocol. The content of the URL contains instructions 890 which when interpreted according to this other protocol cause an 891 unexpected operation. An example has been the use of gopher URLs 892 to cause a rude message to be sent via a SMTP server. Caution 893 should be used when using any URL which specifies a port number 894 other than the default for the protocol, especially when it is a 895 number within the reserved space. 897 Care should be taken when URLs contain embedded encoded delimiters 898 for a given protocol (for example, CR and LF characters for telnet 899 protocols) that these are not unencoded before transmission. This 900 would violate the protocol but could be used to simulate an extra 901 operation or parameter, again causing an unexpected and possible 902 harmful remote operation to be performed. 904 The use of URLs containing passwords that should be secret is 905 clearly unwise. 907 7. Acknowledgements 909 This paper builds on the basic WWW design (RFC 1630) and much 910 discussion of these issues by many people on the network. The 911 discussion was particularly stimulated by articles by Clifford 912 Lynch, Brewster Kahle [10] and Wengyik Yeong [18]. Contributions 913 from John Curran, Clifford Neuman, Ed Vielmetti and later the IETF 914 URL BOF and URI working group were incorporated. 916 Most recently, careful readings and comments by Dan Connolly, Ned 917 Freed, Roy Fielding, Guido van Rossum, Michael Dolan, Bert Bos, 918 John Kunze, and many others have helped refine the current draft. 920 APPENDIX: Recommendations for URLs in Context 922 URIs, including URLs, are intended to be transmitted though 923 protocols which provide a context for their interpretation. 925 In some cases, it will be necessary to distinguish URLs from other 926 possible data structures in a syntactic structure. In this case, is 927 recommended that URLs be preceeded with a prefix consisting of the 928 characters "URL:". For example, this prefix may be used to 929 distinguish URLs from other kinds of URIs. 931 In addition, there are many occasions when URLs are included in 932 other kinds of text; examples include electronic mail, USENET news 933 messages, or printed on paper. In such cases, it is convenient to 934 have a separate syntactic wrapper that delimits the URL and 935 separates it from the rest of the text. For this purpose, is 936 recommended that angle brackets ("<" and ">"), along with the 937 prefix "URL:", be used to delimit the boundaries of the URL. This 938 wrapper does not form part of the URL and should not be used in 939 contexts in which delimiters are already specified. 941 In the case where a fragment/anchor identifier is associated with a 942 URL (following a "#"), the identifier would be placed within the 943 brackets as well. 945 In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) 946 may need to be added to break long URLs across lines. The 947 whitespace should be ignored when extracting the URL. 949 No whitespace should be introduced after a hyphen ("-") character. 950 Because some typesetters and printers may (erroneously) introduce a 951 hyphen at the end of line when breaking a line, the interpreter of 952 a URL containing a line break immediately after a hyphen should 953 ignore all unencoded whitespace around the line break, and should 954 be aware that the hyphen may or may not actually be part of the 955 URL. 957 Examples: 959 Yes, Jim, I found it under but you can probably pick it up from . Note the warning in . 964 REFERENCES 966 [1] Anklesaria, F., McCahill, M., Lindner, P., Johnson, D., 967 Torrey, D., and Alberti, B., "The Internet Gopher Protocol: 968 A distributed document search and retrieval protocol", 969 RFC 1436, , 970 March 1993. 972 [2] Anklesaria, F., Lindner, P., McCahill, M., Torrey, D., 973 Johnson, D., and Alberti, B., "Gopher+: Upward compatible 974 enhancements to the Internet Gopher protocol", 975 University of Minnesota, , July 1993. 978 [3] Berners-Lee, T., "Universal Resource Identifiers in WWW: A 979 Unifying Syntax for the Expression of Names and Addresses of 980 Objects on the Network as used in the World-Wide Web", RFC 981 1630, , June 1994. 983 [4] Berners-Lee, T ., "Hypertext Transfer Protocol (HTTP)" , 984 CERN, , 985 November 1993. 987 [5] Crocker, D. H., "Standard for the Format of ARPA Internet Text 988 Messages", RFC 822, , 989 April 1982. 991 [6] Davis, F., Kahle, B., Morris, H., Salem, J., Shen, T., Wang, R., 992 Sui, J., and Grinbaum, M., "WAIS Interface Protocol Prototype 993 Functional Specification", (v1.5), Thinking Machines Corporation, 994 , 995 April 1990. 997 [7] Deutsch, P., Emtage, A. & Marine, A. "How to Use Anonymous 998 FTP", RFC1635, , 999 May 1994. 1001 [8] Horton, M. and Adams, R., "Standard For Interchange of USENET 1002 messages", RFC 1036, , 1003 December 1987. 1005 [9] Huitema, C., "Naming: strategies and techniques", Computer 1006 Networks and ISDN Systems 23 (1991) 107-110. 1008 [10] Kahle, B., "Document Identifiers, or International Standard 1009 Book Numbers for the Electronic Age", , 1991. 1012 [11] Kantor, B. and Lapsley, P., "Network News Transfer Protocol: 1013 A Proposed Standard for the Stream-Based Transmission of News", 1014 RFC977, , 1015 February 1986. 1017 [12] Kunze, J., "Functional Requirements for Internet Resource 1018 Locators", Internet-Draft (work in progress), , 1020 July 1994. 1022 [13] Mockapetris, P., "Domain Names - Concepts and Facilities", 1023 RFC1034, USC-ISI, , 1024 November, 1987. 1026 [14] Neuman, B.C., and Augart, S. "The Prospero Protocol", USC 1027 Information Sciences Institute, , June 1993. 1030 [15] Postel, J. and Reynolds, J.K., "File Transfer Protocol (FTP)", 1031 RFC 959, , October 1032 1985. 1034 [16] Sollins, K. and Masinter, L. "Requirements for Uniform Resource 1035 Names", Internet-Draft (work in progress), 1038 [17] St. Pierre, M, Fullton, J., Gamiel, K., Goldman, J., Kahle, B., 1039 Kunze, J., Morris, H., and Schiettecatte, F., "WAIS over 1040 Z39.50-1988", RFC 1625, , June 1994. 1043 [18] Yeong, W. "Towards Networked Information Retrieval", Technical 1044 report 91-06-25-01, Performance Systems International, Inc. 1045 , June 1991. 1047 [19] Yeong, W., "Representing Public Archives in the Directory", 1048 Internet Draft, November 1991, now expired. 1050 EDITORS' ADDRESSES 1052 Tim Berners-Lee 1053 World-Wide Web project 1054 CERN, 1055 1211 Geneva 23, 1056 Switzerland 1057 Tel: +41 (22)767 3755 1058 Fax: +41 (22)767 7155 1059 Email: timbl@info.cern.ch 1061 Larry Masinter 1062 Xerox PARC 1063 3333 Coyote Hill Road 1064 Palo Alto, CA 94034 1065 Tel: (415) 812-4365 1066 Fax: (415) 812-4333 1067 Email: masinter@parc.xerox.com 1069 Mark McCahill 1070 Computer and Information Services, 1071 University of Minnesota 1072 Room 152 Shepherd Labs 1073 100 Union Street SE 1074 Minneapolis, MN 55455 1075 Tel: (612) 625 1300 1076 EMail: mpm@boombox.micro.umn.edu