Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have the IA ever discussed why they retroactively apply robots.txt? I can see the rational (though don't necessarily think it is the best idea given the IA's goals) for respecting it at crawl time, but applying it retroactively always felt unnecessary to me.


It seems pretty obvious: copyright restricts distribution, so they hide pages that the apparent copyright holder apparently doesn't want distributed.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: