Have the IA ever discussed why they retroactively apply robots.txt? I can see th... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		stordoff on Dec 25, 2017 \| parent \| context \| favorite \| on: Analyzing One Million Robots.txt Files Have the IA ever discussed why they retroactively apply robots.txt? I can see the rational (though don't necessarily think it is the best idea given the IA's goals) for respecting it at crawl time, but applying it retroactively always felt unnecessary to me.

db48x on Dec 25, 2017 [–]

It seems pretty obvious: copyright restricts distribution, so they hide pages that the apparent copyright holder apparently doesn't want distributed.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact