Robots Exclusion Protocol
draft-koster-rep-04

Document Type Active Internet-Draft (individual in art area)
Authors Martijn Koster  , Gary Illyes  , Henner Zeller  , Lizzi Harvey 
Last updated 2020-12-13 (latest revision 2020-12-08)
Replaces draft-rep-wg-topic
Stream IETF
Intended RFC status Informational
Formats plain text pdf htmlized (tools) htmlized bibtex
Stream WG state Submitted to IESG for Publication
Document shepherd Ted Hardie
Shepherd write-up Show (last changed 2020-12-08)
IESG IESG state AD Evaluation::Revised I-D Needed
Consensus Boilerplate Unknown
Telechat date
Responsible AD Murray Kucherawy
Send notices to Ted Hardie <ted.ietf@gmail.com>
Network Working Group                                          M. Koster
Internet-Draft                                Stalworthy Computing, Ltd.
Intended status: Informational                                 G. Illyes
Expires: June 5, 2021                                          H. Zeller
                                                               L. Harvey
                                                                  Google
                                                       December 08, 2020

                       Robots Exclusion Protocol
                         draft-koster-rep-04

Abstract

   This document standardizes and extends the "Robots Exclusion
   Protocol" <http://www.robotstxt.org/> method originally defined by
   Martijn Koster in 1996 for service owners to control how content
   served by their services may be accessed, if at all, by automatic
   clients known as crawlers.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This document may not be modified, and derivative works of it may not
   be created, except to format it for publication as an RFC or to
   translate it into languages other than English.

   This Internet-Draft will expire on June 5, 2021.

Copyright Notice

   Copyright (c) 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Koster, et al.           Expires June 5, 2021                [Page 1]
Internet-Draft                     I-D                         July 2019

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Specification . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Protocol definition . . . . . . . . . . . . . . . . . . .   3
     2.2.  Formal syntax . . . . . . . . . . . . . . . . . . . . . .   3
       2.2.1.  The user-agent line . . . . . . . . . . . . . . . . .   4
       2.2.2.  The Allow and Disallow lines  . . . . . . . . . . . .   4
       2.2.3.  Special characters  . . . . . . . . . . . . . . . . .   5
       2.2.4.  Other records . . . . . . . . . . . . . . . . . . . .   6
     2.3.  Access method . . . . . . . . . . . . . . . . . . . . . .   6
       2.3.1.  Access results  . . . . . . . . . . . . . . . . . . .   7
     2.4.  Caching . . . . . . . . . . . . . . . . . . . . . . . . .   8
     2.5.  Limits  . . . . . . . . . . . . . . . . . . . . . . . . .   8
     2.6.  Security Considerations . . . . . . . . . . . . . . . . .   8
     2.7.  IANA Considerations . . . . . . . . . . . . . . . . . . .   8
   3.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . .   8
     3.1.  Simple example  . . . . . . . . . . . . . . . . . . . . .   8
     3.2.  Longest Match . . . . . . . . . . . . . . . . . . . . . .   9
   4.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     4.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
     4.2.  URIs  . . . . . . . . . . . . . . . . . . . . . . . . . .   9
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   This document applies to services that provide resources that clients
   can access through URIs as defined in RFC3986 [1].  For example, in
   the context of HTTP, a browser is a client that displays the content
   of a web page.

   Crawlers are automated clients.  Search engines for instance have
   crawlers to recursively traverse links for indexing as defined in
   RFC8288 [2].

   It may be inconvenient for service owners if crawlers visit the
   entirety of their URI space.  This document specifies the rules that
   crawlers MUST obey when accessing URIs.

   These rules are not a form of access authorization.

1.1.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
Show full document text