Robots Exclusion Protocol
draft-koster-rep-04
Network Working Group M. Koster
Internet-Draft Stalworthy Computing, Ltd.
Intended status: Informational G. Illyes
Expires: June 5, 2021 H. Zeller
L. Harvey
Google
December 08, 2020
Robots Exclusion Protocol
draft-koster-rep-04
Abstract
This document standardizes and extends the "Robots Exclusion
Protocol" <http://www.robotstxt.org/> method originally defined by
Martijn Koster in 1996 for service owners to control how content
served by their services may be accessed, if at all, by automatic
clients known as crawlers.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This document may not be modified, and derivative works of it may not
be created, except to format it for publication as an RFC or to
translate it into languages other than English.
This Internet-Draft will expire on June 5, 2021.
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Koster, et al. Expires June 5, 2021 [Page 1]
Internet-Draft I-D July 2019
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 2
2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Protocol definition . . . . . . . . . . . . . . . . . . . 3
2.2. Formal syntax . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1. The user-agent line . . . . . . . . . . . . . . . . . 4
2.2.2. The Allow and Disallow lines . . . . . . . . . . . . 4
2.2.3. Special characters . . . . . . . . . . . . . . . . . 5
2.2.4. Other records . . . . . . . . . . . . . . . . . . . . 6
2.3. Access method . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1. Access results . . . . . . . . . . . . . . . . . . . 7
2.4. Caching . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5. Limits . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6. Security Considerations . . . . . . . . . . . . . . . . . 8
2.7. IANA Considerations . . . . . . . . . . . . . . . . . . . 8
3. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Simple example . . . . . . . . . . . . . . . . . . . . . 8
3.2. Longest Match . . . . . . . . . . . . . . . . . . . . . . 9
4. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1. Normative References . . . . . . . . . . . . . . . . . . 9
4.2. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction
This document applies to services that provide resources that clients
can access through URIs as defined in RFC3986 [1]. For example, in
the context of HTTP, a browser is a client that displays the content
of a web page.
Crawlers are automated clients. Search engines for instance have
crawlers to recursively traverse links for indexing as defined in
RFC8288 [2].
It may be inconvenient for service owners if crawlers visit the
entirety of their URI space. This document specifies the rules that
crawlers MUST obey when accessing URIs.
These rules are not a form of access authorization.
1.1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
Show full document text