rss feed Twitter Page Facebook Page Github Page Stack Over Flow Page

Block Bad Bots and Spiders

I discovered recently some unusual activities coming from unknown bots. After some digging, I found out these bots were not really trustable.

Here's what you will need to add in your .htaccess file (or vhost file under the Directory directive in your VirtualHost configuration) if you don't want bad bots to crawl your site.

SetEnvIfNoCase User-Agent "^BackDoorBot" bad_bot
SetEnvIfNoCase user-agent "^BlackWidow" bad_bot
SetEnvIfNoCase User-Agent "^BotALot" bad_bot
SetEnvIfNoCase User-Agent "^Cegbfeieh" bad_bot
SetEnvIfNoCase user-agent "^ChinaClaw" bad_bot
SetEnvIfNoCase User-Agent "^CopyRightCheck" bad_bot
SetEnvIfNoCase user-agent "^Custo" bad_bot
SetEnvIfNoCase user-agent "^DISCo" bad_bot
SetEnvIfNoCase user-agent "^Download\ Demon" bad_bot
SetEnvIfNoCase user-agent "^eCatch" bad_bot
SetEnvIfNoCase user-agent "^EirGrabber" bad_bot
SetEnvIfNoCase user-agent "^EmailSiphon" bad_bot
SetEnvIfNoCase user-agent "^EmailWolf" bad_bot
SetEnvIfNoCase user-agent "^Express\ WebPictures" bad_bot
SetEnvIfNoCase user-agent "^ExtractorPro" bad_bot
SetEnvIfNoCase user-agent "^EyeNetIE" bad_bot
SetEnvIfNoCase user-agent "^FlashGet" bad_bot
SetEnvIfNoCase user-agent "^GetRight" bad_bot
SetEnvIfNoCase user-agent "^GetWeb!" bad_bot
SetEnvIfNoCase user-agent "^Go!Zilla" bad_bot
SetEnvIfNoCase user-agent "^Go-Ahead-Got-It" bad_bot
SetEnvIfNoCase user-agent "^GrabNet" bad_bot
SetEnvIfNoCase user-agent "^Grafula" bad_bot
SetEnvIfNoCase user-agent "^HMView" bad_bot
SetEnvIfNoCase user-agent "HTTrack" bad_bot
SetEnvIfNoCase user-agent "^Image\ Stripper" bad_bot
SetEnvIfNoCase user-agent "Indy\ Library" [NC,OR]
SetEnvIfNoCase user-agent "^InterGET" bad_bot
SetEnvIfNoCase user-agent "^Internet\ Ninja" bad_bot
SetEnvIfNoCase user-agent "^JetCar" bad_bot
SetEnvIfNoCase user-agent "^JOC\ Web\ Spider" bad_bot
SetEnvIfNoCase user-agent "^larbin" bad_bot
SetEnvIfNoCase user-agent "^LeechFTP" bad_bot
SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot
SetEnvIfNoCase user-agent "^Mass\ Downloader" bad_bot
SetEnvIfNoCase user-agent "^MIDown\ tool" bad_bot
SetEnvIfNoCase user-agent "^Mister\ PiX" bad_bot
SetEnvIfNoCase user-agent "^Navroad" bad_bot
SetEnvIfNoCase user-agent "^NearSite" bad_bot
SetEnvIfNoCase user-agent "^NetAnts" bad_bot
SetEnvIfNoCase user-agent "^NetSpider" bad_bot
SetEnvIfNoCase user-agent "^Net\ Vampire" bad_bot
SetEnvIfNoCase user-agent "^NetZIP" bad_bot
SetEnvIfNoCase user-agent "^Octopus" bad_bot
SetEnvIfNoCase user-agent "^Offline\ Explorer" bad_bot
SetEnvIfNoCase user-agent "^Offline\ Navigator" bad_bot
SetEnvIfNoCase User-Agent "^Openfind" bad_bot
SetEnvIfNoCase user-agent "^PageGrabber" bad_bot
SetEnvIfNoCase user-agent "^Papa\ Foto" bad_bot
SetEnvIfNoCase user-agent "^pavuk" bad_bot
SetEnvIfNoCase user-agent "^pcBrowser" bad_bot
SetEnvIfNoCase user-agent "^RealDownload" bad_bot
SetEnvIfNoCase user-agent "^ReGet" bad_bot
SetEnvIfNoCase user-agent "^SiteSnagger" bad_bot
SetEnvIfNoCase user-agent "^SmartDownload" bad_bot
SetEnvIfNoCase User-Agent "^SpankBot" bad_bot
SetEnvIfNoCase user-agent "^SuperBot" bad_bot
SetEnvIfNoCase user-agent "^SuperHTTP" bad_bot
SetEnvIfNoCase user-agent "^Surfbot" bad_bot
SetEnvIfNoCase user-agent "^tAkeOut" bad_bot
SetEnvIfNoCase user-agent "^Teleport\ Pro" bad_bot
SetEnvIfNoCase User-Agent "^Titan" bad_bot
SetEnvIfNoCase user-agent "^VoidEYE" bad_bot
SetEnvIfNoCase user-agent "^Web\ Image\ Collector" bad_bot
SetEnvIfNoCase user-agent "^Web\ Sucker" bad_bot
SetEnvIfNoCase user-agent "^WebAuto" bad_bot
SetEnvIfNoCase User-Agent "^WebBandit" bad_bot
SetEnvIfNoCase user-agent "^WebCopier" bad_bot
SetEnvIfNoCase user-agent "^WebFetch" bad_bot
SetEnvIfNoCase user-agent "^WebGo\ IS" bad_bot
SetEnvIfNoCase user-agent "^WebLeacher" bad_bot
SetEnvIfNoCase user-agent "^WebReaper" bad_bot
SetEnvIfNoCase user-agent "^WebSauger" bad_bot
SetEnvIfNoCase user-agent "^Website\ eXtractor" bad_bot
SetEnvIfNoCase user-agent "^Website\ Quester" bad_bot
SetEnvIfNoCase User-Agent "^Webster Pro" bad_bot
SetEnvIfNoCase user-agent "^WebStripper" bad_bot
SetEnvIfNoCase user-agent "^WebWhacker" bad_bot
SetEnvIfNoCase user-agent "^WebZIP" bad_bot
SetEnvIfNoCase user-agent "^Wget" bad_bot
SetEnvIfNoCase user-agent "^Widow" bad_bot
SetEnvIfNoCase user-agent "^WWWOFFLE" bad_bot
SetEnvIfNoCase user-agent "^Xaldon\ WebSpider" bad_bot
SetEnvIfNoCase user-agent "^Zeus" bad_bot
<FilesMatch "(.*)">
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</FilesMatch>

This will output:

HTTP request sent, awaiting response... 403 Forbidden
2011-09-10 11:15:48 ERROR 403: Forbidden.

Although, this might not be a bulletproof solution, but it definitely stops these bots and let the good one, such as googlebot to crawl your site.