htaccess file to deny Bots

steven
Posts: 133
Joined: Sun Oct 01, 2017 3:08 pm

htaccess file to deny Bots

Post by steven »

Before adding this file Apache states "Access control by User-Agent is an unreliable technique, since the User-Agent header can be set to anything at all, at the whim of the end user." That said if you have a pesky bot you can also enter the IP address or host name in the code.

Although I still use htaccess to limit access, the best option I've found so far is installing the Rip Prevention mod written by Brian McFadyen and Brent Hemphil. Bots can easily change names and IP addresses rendering the code below useless. The Rip prevention mods checks if a visitor's accesses are rapid and repeated. If they are a warning is issued and, if the accesses continue rapid and repeatedly, the visitor is temporarily banned and an explanation page is displayed. Warnings and bans are disabled for administrators and logged in users.

The mod creates a check_access.php file that can be edited manually to add or remove bots. You can optionally install the Rip Challenge Mod. This mod works with the Rip Prevention Mod by adding a CAPTCHA challenge after a configurable number of accesses (default 30) for non-registered users. Using these two mods, along with Bot-Trap, has dramatically reduced unwanted access.

This example file has a bot section and a section to block by IP address or host extension. This code is for Apache 2.4 with the Apache 2.2 commands commented out. You need to make changes described in the code to use Apache 2.2. The code and download file were updated because the original code did not work on a Synology server. The new codes are generic and should work on different systems running Apache. You can remove or add bots and sections you will not use but be careful because an incorrect entry will generate an Internal Server Error 500 disabling your website.

Create the .htaccess file by pasting the code into a plain text editor such as Komodo IDE or NotePad++, and save the file in the TNG folder. Check you are not overwriting an existing file that may contain commands necessary to run your \website. If you already have an ,htaccess file in your TNG folder, add the contents to your existing .htaccess file. Blocking whole countries is more complicated and can require entering thousands of IP addresses. This is not what .htaccess was intended for and it is easier to block a country (region) by creating a firewall rule with a Synology router. If you want to use .htaccess to block a country, get their IP addresses here.

You can download htaccess files by selecting one of the the download links below. The newest download link contains additional bots and IP addresses but only works with version 2.4. Apache version 2.2 is no longer maintained per the Apache website.

Code: Select all

## This file was written for Apache Version 2.4
## To use Apache Version 2.2 make the changes indicated below

# Bad Bot List
SetEnvIfNoCase USER_AGENT "^abot" BadBot
SetEnvIfNoCase USER_AGENT "^aipbot" BadBot
SetEnvIfNoCase USER_AGENT "^asterias" BadBot
SetEnvIfNoCase USER_AGENT "^EI" BadBot
SetEnvIfNoCase USER_AGENT "^libwww-perl" BadBot
SetEnvIfNoCase USER_AGENT "^LWP" BadBot
SetEnvIfNoCase USER_AGENT "^lwp" BadBot
SetEnvIfNoCase USER_AGENT "^MSIECrawler" BadBot
SetEnvIfNoCase USER_AGENT "^nameprotect" BadBot
SetEnvIfNoCase USER_AGENT "^PlantyNet_WebRobot" BadBot
SetEnvIfNoCase USER_AGENT "^UCmore" BadBot
SetEnvIfNoCase USER_AGENT "Alligator" BadBot
SetEnvIfNoCase USER_AGENT "AllSubmitter" BadBot
SetEnvIfNoCase USER_AGENT "Anonymous" BadBot
SetEnvIfNoCase USER_AGENT "AhrefsBot*" BadBot
SetEnvIfNoCase USER_AGENT "Asterias" BadBot
SetEnvIfNoCase USER_AGENT "autoemailspider" BadBot
SetEnvIfNoCase USER_AGENT "Badass" BadBot
SetEnvIfNoCase USER_AGENT "BaiDuSpider*" BadBot 
SetEnvIfNoCase USER_AGENT "BecomeBot" BadBot
SetEnvIfNoCase USER_AGENT "Bitacle" BadBot
SetEnvIfNoCase USER_AGENT "bladder\ fusion" BadBot
SetEnvIfNoCase USER_AGENT "Blogshares\ Spiders" BadBot
SetEnvIfNoCase USER_AGENT "Board\ Bot" BadBot
SetEnvIfNoCase USER_AGENT "Cityreview*" BadBot 
SetENvIfNoCase USER_AGENT "Choopa" BadBot
SetEnvIfNoCase USER_AGENT "Convera" BadBot
SetEnvIfNoCase USER_AGENT "ConveraMultiMediaCrawler" BadBot
SetEnvIfNoCase USER_AGENT "crawl" BadBot 
SetEnvIfNoCase USER_AGENT "c-spider" BadBot
SetEnvIfNoCase USER_AGENT "DA" BadBot
SetEnvIfNoCase USER_AGENT "DnloadMage" BadBot
SetEnvIfNoCase USER_AGENT "Dotbot*" BadBot 
SetEnvIfNoCase USER_AGENT "Download\ Demon" BadBot
SetEnvIfNoCase USER_AGENT "Download\ Express" BadBot
SetEnvIfNoCase USER_AGENT "Download\ Wonder" BadBot
SetEnvIfNoCase USER_AGENT "dragonfly" BadBot
SetEnvIfNoCase USER_AGENT "DreamPassport" BadBot
SetEnvIfNoCase USER_AGENT "DSurf" BadBot
SetEnvIfNoCase USER_AGENT "DTS Agent" BadBot
SetEnvIfNoCase USER_AGENT "EBrowse" BadBot
SetEnvIfNoCase USER_AGENT "eCatch" BadBot
SetEnvIfNoCase USER_AGENT "edgeio" BadBot
SetEnvIfNoCase USER_AGENT "Email\ Extractor" BadBot
SetEnvIfNoCase USER_AGENT "EmailSiphon" BadBot
SetEnvIfNoCase USER_AGENT "EmailWolf" BadBot
SetEnvIfNoCase USER_AGENT "EmeraldShield" BadBot
SetEnvIfNoCase USER_AGENT "ESurf" BadBot
SetEnvIfNoCase USER_AGENT "Exabot" BadBot
SetEnvIfNoCase USER_AGENT "ExtractorPro" BadBot
SetEnvIfNoCase USER_AGENT "FileHeap!\ file downloader" BadBot
SetEnvIfNoCase USER_AGENT "FileHound" BadBot
SetEnvIfNoCase USER_AGENT "Forex" BadBot
SetEnvIfNoCase USER_AGENT "Franklin\ Locator" BadBot
SetEnvIfNoCase USER_AGENT "FreshDownload" BadBot
SetEnvIfNoCase USER_AGENT "FrontPage" BadBot
SetEnvIfNoCase USER_AGENT "FSurf" BadBot
SetEnvIfNoCase USER_AGENT "Gaisbot" BadBot
SetEnvIfNoCase USER_AGENT "Gamespy_Arcade" BadBot
SetEnvIfNoCase USER_AGENT "genieBot" BadBot
SetEnvIfNoCase USER_AGENT "GetBot" BadBot
SetEnvIfNoCase USER_AGENT "GetRight" BadBot
SetEnvIfNoCase USER_AGENT "Gigabot" BadBot
SetEnvIfNoCase USER_AGENT "Go!Zilla" BadBot
SetEnvIfNoCase USER_AGENT "Go-Ahead-Got-It" BadBot
SetEnvIfNoCase USER_AGENT "GOFORITBOT" BadBot
SetEnvIfNoCase USER_AGENT "heritrix" BadBot
SetEnvIfNoCase USER_AGENT "HLoader" BadBot
SetEnvIfNoCase USER_AGENT "HooWWWer" BadBot
SetEnvIfNoCase USER_AGENT "HTTrack" BadBot
SetEnvIfNoCase USER_AGENT "iCCrawler" BadBot
SetEnvIfNoCase USER_AGENT "ichiro" BadBot
SetEnvIfNoCase USER_AGENT "iGetter" BadBot
SetEnvIfNoCase USER_AGENT "imds_monitor" BadBot
SetEnvIfNoCase USER_AGENT "Industry\ Program" BadBot
SetEnvIfNoCase USER_AGENT "Indy\ Library" BadBot
SetEnvIfNoCase USER_AGENT "InetURL" BadBot
SetEnvIfNoCase USER_AGENT "InstallShield\ DigitalWizard" BadBot
SetEnvIfNoCase USER_AGENT "IRLbot" BadBot
SetEnvIfNoCase USER_AGENT "IUPUI\ Research\ Bot" BadBot
SetEnvIfNoCase USER_AGENT "Java" BadBot
SetEnvIfNoCase USER_AGENT "jeteye" BadBot
SetEnvIfNoCase USER_AGENT "jeteyebot" BadBot
SetEnvIfNoCase USER_AGENT "JoBo" BadBot
SetEnvIfNoCase USER_AGENT "JOC\ Web\ Spider" BadBot
SetEnvIfNoCase USER_AGENT "Kapere" BadBot
SetEnvIfNoCase USER_AGENT "Larbin" BadBot
SetEnvIfNoCase USER_AGENT "LeechGet" BadBot
SetEnvIfNoCase USER_AGENT "LightningDownload" BadBot
SetEnvIfNoCase USER_AGENT "Linkie" BadBot
SetEnvIfNoCase USER_AGENT "Mac\ Finder" BadBot
SetEnvIfNoCase USER_AGENT "Mail\ Sweeper" BadBot
SetEnvIfNoCase USER_AGENT "Mass\ Downloader" BadBot
SetEnvIfNoCase USER_AGENT "MetaProducts\ Download\ Express" BadBot
SetEnvIfNoCase USER_AGENT "Microsoft\ Data\ Access" BadBot
SetEnvIfNoCase USER_AGENT "Microsoft\ URL\ Control" BadBot
SetEnvIfNoCase USER_AGENT "Missauga\ Locate" BadBot
SetEnvIfNoCase USER_AGENT "Missauga\ Locator" BadBot
SetEnvIfNoCase USER_AGENT "Missigua Locator" BadBot
SetEnvIfNoCase USER_AGENT "Missouri\ College\ Browse" BadBot
SetEnvIfNoCase USER_AGENT "Mister\ PiX" BadBot
SetEnvIfNoCase USER_AGENT "MJ12bot" BadBot 
SetEnvIfNoCase USER_AGENT "MovableType" BadBot
SetEnvIfNoCase USER_AGENT "Mozi!" BadBot
SetEnvIfNoCase USER_AGENT "Mozilla/3.0 (compatible)" BadBot
SetEnvIfNoCase USER_AGENT "Mozilla/5.0 (compatible; MSIE 5.0)" BadBot
SetEnvIfNoCase USER_AGENT "MSIE_6.0" BadBot
SetEnvIfNoCase USER_AGENT "MSIECrawler" badbot
SetEnvIfNoCase USER_AGENT "MVAClient" BadBot
SetEnvIfNoCase USER_AGENT "MyFamilyBot" BadBot
SetEnvIfNoCase USER_AGENT "MyGetRight" BadBot
SetEnvIfNoCase USER_AGENT "NASA\ Search" BadBot
SetEnvIfNoCase USER_AGENT "Naver" BadBot
SetEnvIfNoCase USER_AGENT "NaverBot" BadBot
SetEnvIfNoCase USER_AGENT "NetAnts" BadBot
SetEnvIfNoCase USER_AGENT "NetResearchServer" BadBot
SetEnvIfNoCase USER_AGENT "NEWT\ ActiveX" BadBot
SetEnvIfNoCase USER_AGENT "Nextopia" BadBot
SetEnvIfNoCase USER_AGENT "NG\ 1.x (Exalead)" BadBot 
SetEnvIfNoCase USER_AGENT "NICErsPRO" BadBot
SetEnvIfNoCase USER_AGENT "NimbleCrawler" BadBot
SetEnvIfNoCase USER_AGENT "Nitro\ Downloader" BadBot
SetEnvIfNoCase USER_AGENT "Nutch" BadBot
SetEnvIfNoCase USER_AGENT "Offline\ Explorer" BadBot
SetEnvIfNoCase USER_AGENT "OmniExplorer" BadBot
SetEnvIfNoCase USER_AGENT "OutfoxBot" BadBot
SetEnvIfNoCase USER_AGENT "P3P" BadBot
SetEnvIfNoCase USER_AGENT "PagmIEDownload" BadBot
SetEnvIfNoCase USER_AGENT "pavuk" BadBot
SetEnvIfNoCase USER_AGENT "PHP\ version" BadBot
SetEnvIfNoCase USER_AGENT "playstarmusic" BadBot
SetEnvIfNoCase USER_AGENT "Program\ Shareware" BadBot
SetEnvIfNoCase USER_AGENT "Progressive Download" BadBot
SetEnvIfNoCase USER_AGENT "psycheclone" BadBot
SetEnvIfNoCase USER_AGENT "puf" BadBot
SetEnvIfNoCase USER_AGENT "PussyCat" BadBot
SetEnvIfNoCase USER_AGENT "PuxaRapido" BadBot
SetEnvIfNoCase USER_AGENT "Python-urllib" BadBot
SetEnvIfNoCase USER_AGENT "RealDownload" BadBot
SetEnvIfNoCase USER_AGENT "RedKernel" BadBot
SetEnvIfNoCase USER_AGENT "relevantnoise" BadBot
SetEnvIfNoCase USER_AGENT "RepoMonkey\ Bait\ &\ Tackle" BadBot
SetEnvIfNoCase USER_AGENT "RTG30" BadBot
SetEnvIfNoCase USER_AGENT "SBIder" BadBot
SetEnvIfNoCase USER_AGENT "script" BadBot
SetEnvIfNoCase USER_AGENT "Seekbot" BadBot
SetEnvIfNoCase USER_AGENT "SiteSnagger" BadBot
SetEnvIfNoCase USER_AGENT "SmartDownload" BadBot
SetEnvIfNoCase USER_AGENT "sna-" BadBot
SetEnvIfNoCase USER_AGENT "Snap\ bot" BadBot
SetEnvIfNoCase USER_AGENT "Sogou" BadBot 
SetEnvIfNoCase USER_AGENT "Sosospider" BadBot
SetEnvIfNoCase USER_AGENT "SpeedDownload" BadBot
SetEnvIfNoCase USER_AGENT "Sphere" BadBot
SetEnvIfNoCase USER_AGENT "Spider" BadBot
SetEnvIfNoCase USER_AGENT "sproose" BadBot
SetEnvIfNoCase USER_AGENT "SQ\ Webscanner" BadBot
SetEnvIfNoCase USER_AGENT "Stamina" BadBot
SetEnvIfNoCase USER_AGENT "Star\ Downloader" BadBot
SetEnvIfNoCase USER_AGENT "Teleport" BadBot
SetEnvIfNoCase USER_AGENT "TurnitinBot" BadBot
SetEnvIfNoCase USER_AGENT "Twiceler" BadBot 
SetEnvIfNoCase USER_AGENT "UdmSearch" BadBot
SetEnvIfNoCase USER_AGENT "URLGetFile" BadBot
SetEnvIfNoCase USER_AGENT "USER_AGENT" BadBot
SetEnvIfNoCase USER_AGENT "UtilMind\ HTTPGet" BadBot
SetEnvIfNoCase USER_AGENT "Vultr.com"  BadBot [NC]
SetEnvIfNoCase USER_AGENT "WebAuto" BadBot
SetEnvIfNoCase USER_AGENT "WebCapture" BadBot
SetEnvIfNoCase USER_AGENT "webcollage" BadBot
SetEnvIfNoCase USER_AGENT "WebCopier" BadBot
SetEnvIfNoCase USER_AGENT "WebFilter" BadBot
SetEnvIfNoCase USER_AGENT "WebReaper" BadBot
SetEnvIfNoCase USER_AGENT "Website\ eXtractor" BadBot
SetEnvIfNoCase USER_AGENT "WebStripper" BadBot
SetEnvIfNoCase USER_AGENT "WebZIP" BadBot
SetEnvIfNoCase USER_AGENT "Wells\ Search" BadBot
SetEnvIfNoCase USER_AGENT "WEP\ Search\ 00" BadBot
SetEnvIfNoCase USER_AGENT "Wget" BadBot
SetEnvIfNoCase USER_AGENT "Wildsoft\ Surfer" BadBot
SetEnvIfNoCase USER_AGENT "WinHttpRequest" BadBot
SetEnvIfNoCase USER_AGENT "WWWOFFLE" BadBot
SetEnvIfNoCase USER_AGENT "Xaldon\ WebSpider" BadBot
SetEnvIfNoCase USER_AGENT "Y!TunnelPro" BadBot
SetEnvIfNoCase USER_AGENT "YahooYSMcm" BadBot
SetEnvIfNoCase USER_AGENT "YandexBot" BadBot 
SetEnvIfNoCase USER_AGENT "Yandex*" BadBot 
SetEnvIfNoCase USER_AGENT "Zade" BadBot
SetEnvIfNoCase USER_AGENT "ZBot" BadBot
SetEnvIfNoCase USER_AGENT "zerxbot" BadBot

## <--- Start Apache Version 2.4
<RequireAll>
    Require all granted
    Require not env BadBot
    Require not ip 27.72.60
    Require not ip 46.229.168
    Require not ip 50.62.208.94
    Require not ip 51.255.208
    Require not ip 112.64.33.92
    Require not ip 104.131.0
    Require not ip 104.131.68
    Require not ip 104.131.169.194
    Require not ip 159.65.242.236
    Require not ip 180.249.254.198
    Require not ip 216.244.66.241
##  Uncomment the line below to add hosts you wish to block
#    Require not host example-1.com example-2.com example-3.net
##  Uncomment the line below to add extensions you wish to block and replace xx yy and zz with the two letter code
#    Require not host .xx .yy .zz
</RequireAll>
## End Apache Version 2.4 --->

## <--- To use Apache Version 2.2 comment out version 2.4 above and uncomment the section below --->
#Order allow,deny
#deny from env=BadBot
#Deny from ip 45.55.55.20 46.229.168.60/80 50.62.208.94 51.255.208.0/255
#Deny from ip 112.64.33.92 104.131.0.0/255 104.131.68.0/255
#Deny from ip 104.236.221.0/255 105.100.69./255
#Deny from host example-1.com example-2.com example-2.net
#Deny from host .xx .yy .zz
htaccess_for_Bots_180321.zip


If you want to use Bot-trap with the .htaccess file below ensure it is compatible with the Apache 2.4 container directive. Using an incompatible version of bot-trap will cause a server 500 error.

I created a modified version of Bot-trap to work with Apache 2.4 containers which you can download from this post.

htaccess-2.4_for_Bots_180825.zip


You do not have the required permissions to view the files attached to this post.
Last edited by steven on Thu Mar 22, 2018 8:18 am, edited 4 times in total.


Last bumped by steven on Thu Mar 22, 2018 8:44 am.