Bot/scrapers - stop them at the server level?
Posted: Thu Jul 24, 2008 12:51 pm
Been doing some digging around on this lately....looking at my logs and feedback from other folks that monitor this type of thing, I'm starting to be suspicious that anywhere between 25 to 50% of web traffic on my server (and probably yours) is from bots or scrapers - not human visitors, and not even Google/Yahoo/MSN. Yes, that much. Half my freakin' traffic.
I've done some reading on how to stop this at the website level. You can set up honeypots like:
- include a file in robots.txt that isn't linked to from anywhere else. Some bots will go there immediately - so anyone reading that file is a bot and can be banned.
- include a 1X1 pixel link as a honeypot
- check for speed of page requests, and whether they're requesting css/image files and other stuff bots don't care about.
- check useage at the top and bottom of every page. if at the bottom of a page the same visitor has requested another page, then they're requesting multiple pages at the same instant - another bot to be banned.
However I'd like to take some action at the server level instead of the website level. Maybe via apache or something. Is anyone doing anything like this at the server level?
TIA
I've done some reading on how to stop this at the website level. You can set up honeypots like:
- include a file in robots.txt that isn't linked to from anywhere else. Some bots will go there immediately - so anyone reading that file is a bot and can be banned.
- include a 1X1 pixel link as a honeypot
- check for speed of page requests, and whether they're requesting css/image files and other stuff bots don't care about.
- check useage at the top and bottom of every page. if at the bottom of a page the same visitor has requested another page, then they're requesting multiple pages at the same instant - another bot to be banned.
However I'd like to take some action at the server level instead of the website level. Maybe via apache or something. Is anyone doing anything like this at the server level?
TIA