Block Bad Bots and Spiders
I discovered recently some unusual activities coming from unknown bots. After some digging, I found out these bots were not really trustable.
Here's what you will need to add in your .htaccess file (or vhost file under the Directory directive in your VirtualHost configuration) if you don't want bad bots to crawl your site.
SetEnvIfNoCase User-Agent "^BackDoorBot" bad_bot SetEnvIfNoCase user-agent "^BlackWidow" bad_bot SetEnvIfNoCase User-Agent "^BotALot" bad_bot SetEnvIfNoCase User-Agent "^Cegbfeieh" bad_bot SetEnvIfNoCase user-agent "^ChinaClaw" bad_bot SetEnvIfNoCase User-Agent "^CopyRightCheck" bad_bot SetEnvIfNoCase user-agent "^Custo" bad_bot SetEnvIfNoCase user-agent "^DISCo" bad_bot SetEnvIfNoCase user-agent "^Download\ Demon" bad_bot SetEnvIfNoCase user-agent "^eCatch" bad_bot SetEnvIfNoCase user-agent "^EirGrabber" bad_bot SetEnvIfNoCase user-agent "^EmailSiphon" bad_bot SetEnvIfNoCase user-agent "^EmailWolf" bad_bot SetEnvIfNoCase user-agent "^Express\ WebPictures" bad_bot SetEnvIfNoCase user-agent "^ExtractorPro" bad_bot SetEnvIfNoCase user-agent "^EyeNetIE" bad_bot SetEnvIfNoCase user-agent "^FlashGet" bad_bot SetEnvIfNoCase user-agent "^GetRight" bad_bot SetEnvIfNoCase user-agent "^GetWeb!" bad_bot SetEnvIfNoCase user-agent "^Go!Zilla" bad_bot SetEnvIfNoCase user-agent "^Go-Ahead-Got-It" bad_bot SetEnvIfNoCase user-agent "^GrabNet" bad_bot SetEnvIfNoCase user-agent "^Grafula" bad_bot SetEnvIfNoCase user-agent "^HMView" bad_bot SetEnvIfNoCase user-agent "HTTrack" bad_bot SetEnvIfNoCase user-agent "^Image\ Stripper" bad_bot SetEnvIfNoCase user-agent "Indy\ Library" [NC,OR] SetEnvIfNoCase user-agent "^InterGET" bad_bot SetEnvIfNoCase user-agent "^Internet\ Ninja" bad_bot SetEnvIfNoCase user-agent "^JetCar" bad_bot SetEnvIfNoCase user-agent "^JOC\ Web\ Spider" bad_bot SetEnvIfNoCase user-agent "^larbin" bad_bot SetEnvIfNoCase user-agent "^LeechFTP" bad_bot SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot SetEnvIfNoCase user-agent "^Mass\ Downloader" bad_bot SetEnvIfNoCase user-agent "^MIDown\ tool" bad_bot SetEnvIfNoCase user-agent "^Mister\ PiX" bad_bot SetEnvIfNoCase user-agent "^Navroad" bad_bot SetEnvIfNoCase user-agent "^NearSite" bad_bot SetEnvIfNoCase user-agent "^NetAnts" bad_bot SetEnvIfNoCase user-agent "^NetSpider" bad_bot SetEnvIfNoCase user-agent "^Net\ Vampire" bad_bot SetEnvIfNoCase user-agent "^NetZIP" bad_bot SetEnvIfNoCase user-agent "^Octopus" bad_bot SetEnvIfNoCase user-agent "^Offline\ Explorer" bad_bot SetEnvIfNoCase user-agent "^Offline\ Navigator" bad_bot SetEnvIfNoCase User-Agent "^Openfind" bad_bot SetEnvIfNoCase user-agent "^PageGrabber" bad_bot SetEnvIfNoCase user-agent "^Papa\ Foto" bad_bot SetEnvIfNoCase user-agent "^pavuk" bad_bot SetEnvIfNoCase user-agent "^pcBrowser" bad_bot SetEnvIfNoCase user-agent "^RealDownload" bad_bot SetEnvIfNoCase user-agent "^ReGet" bad_bot SetEnvIfNoCase user-agent "^SiteSnagger" bad_bot SetEnvIfNoCase user-agent "^SmartDownload" bad_bot SetEnvIfNoCase User-Agent "^SpankBot" bad_bot SetEnvIfNoCase user-agent "^SuperBot" bad_bot SetEnvIfNoCase user-agent "^SuperHTTP" bad_bot SetEnvIfNoCase user-agent "^Surfbot" bad_bot SetEnvIfNoCase user-agent "^tAkeOut" bad_bot SetEnvIfNoCase user-agent "^Teleport\ Pro" bad_bot SetEnvIfNoCase User-Agent "^Titan" bad_bot SetEnvIfNoCase user-agent "^VoidEYE" bad_bot SetEnvIfNoCase user-agent "^Web\ Image\ Collector" bad_bot SetEnvIfNoCase user-agent "^Web\ Sucker" bad_bot SetEnvIfNoCase user-agent "^WebAuto" bad_bot SetEnvIfNoCase User-Agent "^WebBandit" bad_bot SetEnvIfNoCase user-agent "^WebCopier" bad_bot SetEnvIfNoCase user-agent "^WebFetch" bad_bot SetEnvIfNoCase user-agent "^WebGo\ IS" bad_bot SetEnvIfNoCase user-agent "^WebLeacher" bad_bot SetEnvIfNoCase user-agent "^WebReaper" bad_bot SetEnvIfNoCase user-agent "^WebSauger" bad_bot SetEnvIfNoCase user-agent "^Website\ eXtractor" bad_bot SetEnvIfNoCase user-agent "^Website\ Quester" bad_bot SetEnvIfNoCase User-Agent "^Webster Pro" bad_bot SetEnvIfNoCase user-agent "^WebStripper" bad_bot SetEnvIfNoCase user-agent "^WebWhacker" bad_bot SetEnvIfNoCase user-agent "^WebZIP" bad_bot SetEnvIfNoCase user-agent "^Wget" bad_bot SetEnvIfNoCase user-agent "^Widow" bad_bot SetEnvIfNoCase user-agent "^WWWOFFLE" bad_bot SetEnvIfNoCase user-agent "^Xaldon\ WebSpider" bad_bot SetEnvIfNoCase user-agent "^Zeus" bad_bot <FilesMatch "(.*)"> Order Allow,Deny Allow from all Deny from env=bad_bot </FilesMatch>
This will output:
HTTP request sent, awaiting response... 403 Forbidden 2011-09-10 11:15:48 ERROR 403: Forbidden.
Although, this might not be a bulletproof solution, but it definitely stops these bots and let the good one, such as googlebot to crawl your site.