I'm a bit torn by the robots header. On one hand, it allows really fine control on a per-page basis. On the other hand, you have to do a request to the page to find whether you are allowed to keep the data or not which feels like a waste of bandwidth.
I mean, you could do a HEAD request to find out but then you might end up with two HTTP requests just to get content in an "allowed" scenario.
That said, I do see value in the header. I'm actually building my own web crawler (which I will do another post about in the future) and I want to add support for the header.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
I'm a bit torn by the robots header. On one hand, it allows really fine control on a per-page basis. On the other hand, you have to do a request to the page to find whether you are allowed to keep the data or not which feels like a waste of bandwidth.
I mean, you could do a HEAD request to find out but then you might end up with two HTTP requests just to get content in an "allowed" scenario.
That said, I do see value in the header. I'm actually building my own web crawler (which I will do another post about in the future) and I want to add support for the header.