id author title date pages extension mime words sentences flesch summary cache txt inkdroid-org-477 25 Years of robots.txt .html text/html 838 58 71 Much web archiving crawling software has options for observing robots.txt, or explicitly ignoring it. But ethics are best decided by people not machines– even though some think the behavior of crawling bots can be measured and evaluated (Giles, Sun, & Councill, 2010 ; Thelwall & Stuart, 2006). Web archives use robots.txt in another significant way too. Archivists should provide a 'self-service' approach site owners can use to remove their materials based on the use of the robots.txt standard. This convention allows web publishers to use their robots.txt to tell the Internet Archive (and potentially other web archives) not to provide access to archived content from their website. Perhaps the collective wisdom now is that the use of robots.txt to control playback in web archives is fundamentally flawed and shouldn't be written down in a standard. ./cache/inkdroid-org-477.html ./txt/inkdroid-org-477.txt