Issue
We receive a lot of requests regarding the addition of a specific user-agent to block bot requests by Imunify360 rules. Below, you can find the detailed instructions on how to do it on a temporary basis (before rules update) and on a permanent basis.
Applies to
- Imunify360
- ModSecurity
Cause
That's how we process connections for the search bots. We use the ModSecurity rule ID 33311.
- A user-agent of the bot from which the request comes is in the list of crawlers-ualist.data
- A sender's address is not in the crawlers-iplist.data file (i.e. the sender is NOT a known good bot)
- A sender's address is not in the RBL good-bots zone (another check for known bots)
If one of these statements is true, the incoming request will be blocked.
An exception: if a request sender's address comes from the localhost – such a request will not be blocked.
If we receive a request in which the sender address is listed as the user-agent of a bot, and we do not know that this bot is a "good" bot, then the received request will be blocked. All known bots pass the above check (crawlers-iplist.data and RBL zone good-bots), and requests from them are not blocked.
Resolution
Temporary solution
Here is the instruction on how to add a specific user-agent to the configuration.
1. Locate the crawlers-ualist.data:
find /etc/ -name crawlers-ualist.data
/etc/httpd/conf/modsecurity.d/rules/custom/crawlers-ualist.data
In DirectAdmin:
find /usr/local/directadmin/ -name crawlers-ualist.data
/usr/local/directadmin/custombuild/custom/modsecurity/conf/crawlers-ualist.data
For Debian based servers:
/etc/apache2/modsecurity.d/rules/custom
2. Add a new user-agent:
Mozilla/5.0 (compatible; AspiegelBot)
With Modsec it is not allowed to use regexes with pmFromFile (bot list syntax), the spaces need to be provided as those appear in the bot UserAgent without screening.
Please note that web-server reload is required to allow update configs that loaded into memory.
Attention! All changes will be overwritten as soon as new rules will be updated via the Imunify360 agent.
Permanent solution
Below, you can find the description of the permanent solution using the custom ModSec rules.
1. Check what custom ModSec files can be used for adding custom rules, for example, you can use the DUMP command for the web server to see the current configuration:
# httpd -t -D DUMP_INCLUDES |grep modsec
The output example:
# httpd -t -D DUMP_INCLUDES |grep modsec
(265) /etc/apache2/conf.d/modsec2.conf
(32) /etc/apache2/conf.d/modsec/modsec2.user.conf
(33) /etc/apache2/conf.d/modsec/modsec2.cpanel.conf
(28) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/000_i360_0.conf
(29) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/001_i360_1_generic.conf
(30) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/002_i360_1_infectors.conf
(31) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/003_i360_1_vectors.conf
(32) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/004_i360_2_bruteforce.conf
(33) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/005_i360_3_webshells.conf
(34) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/006_i360_4_custom.conf
(35) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/007_i360_4_wordpress.conf
(36) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/008_i360_5_joomla.conf
(37) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/009_i360_6_drupal.conf
(38) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/010_i360_7_otherapps.conf
(39) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/011_i360_8_spam.conf
(40) /etc/apache2/conf.d/modsec_vendor_configs/imunify360-full-apache/012_Disable_WP_Redirect.conf
(40) /etc/apache2/conf.d/modsec2.i360_disabled_rules.conf
(40) /etc/apache2/conf.d/modsec2.imunify.conf
(584) /etc/apache2/conf.d/userdata/std/2_4/masterit/7masterim360.com.ua/i360_modsec_disable.conf
(939) /etc/apache2/conf.d/userdata/ssl/2_4/masterit/7masterim360.com.ua/i360_modsec_disable.conf
As we can see, in this case, it's possible to use the modsec2.user.conf file for custom settings (cPanel). On other Panels and setups it is required to account for the rules other as if the above rule applied the last it may not work. It is required to put it prior to the other rules in Web Server Includes or as per:
000_0_custom.conf
Make sure that this file will be included in the web configuration prior the other Imunify360 rules.
2. Open this file and add the following lines:
<IfModule mod_security2.c>
SecRule REQUEST_HEADERS:User-Agent "@pmFromFile custom-crawlers-bl.data" "id:77999907,phase:2,t:none,auditlog,pass,severity:2,msg:'Custom WAF: Found blacklisted crawler||User-Agent:%{REQUEST_HEADERS.User-Agent}||'"
</IfModule>
Related to how exactly you'd like to handle requests for the search bots you need to block, you can set the deny or the pass mode for this rule.
Set it to the deny mode if you'd like to completely block such requests by WAF only – requests will be blocked immediately.
Please note that all requests will be blocked by WAF only and will not be graylisted which, in such case, hots can create an additional load to the system because its IPs will not be blocked by the firewall.
Please refer to this paragraph on how to block it using Gray List and Captcha.
3. Create the custom-crawlers-bl.data file in a place where you edit your custom ModSec file:
# cd /etc/apache2/conf.d/modsec/ - you should use your PATH
# touch ./custom-crawlers-bl.data
# ll ./custom-crawlers-bl.data
-rw-r--r-- 1 root root 22 Jul 16 12:40 ./custom-crawlers-bl.data
Open it using any text editor (nano or vim) and add a bot description.
For example: (it uses regex, so it is enough to add just its name).
# cat ./custom-crawlers-bl.data
Semirush/2.1
Testrush
In this example, the restriction is added for bots Semirush and Testrush. Necessity to escape certain characters can depend on the ModSecurity version, the specific rule logic, and the regular expression (regex) engine support.
With ModSec2 standard build for @pmFromFile
operator to match against a list of User-Agent strings it is not needed to escape whitespace or other special characters, as each line in the file is treated as a literal string to match against, rather than a regular expression.
That means that if you have a User-Agent string in your custom-crawlers-bl.data
file like "Rogue\ Crawler", ModSecurity will look for an exact match to that string, including the backslash.
4.1 If you choose deny, it will be enough, just test the web server configuration:
# httpd -t
Syntax OK
And restart the service:
/scripts/restartsrv_apache
or
systemctl restart httpd
Let`s check how it works now.
You can use the terminal or SSH console on any remote server to check it. Please use your own domain name for this check.
# curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://YOUR TEST DOMAIN NAME/
406
# curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
406
Go to Modsec logs and check. You can use the tail tool, for example:
# tail -2000 /var/log/apache2/modsec_audit.log |grep "Testrush" -C20
--b6843209-A--
[19/Jul/2021:08:10:43 +0000] YPUzg_nafsckdcDl1sBUNwAAAAo 192.168.x.x 44716 192.168.245.59 80
--b6843209-B--
GET / HTTP/1.1
Host: testiim360.com.ua
X-Real-IP: 192.168.x.x
X-Remote-IP: 192.168.x.x
User-Agent: Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)
Accept: */*
--b6843209-F--
HTTP/1.1 406 Not Acceptable
Content-Length: 373
Content-Type: text/html; charset=iso-8859-1
--b6843209-H--
Message: Access denied with code 406 (phase 2). Matched phrase "Testrush" at REQUEST_HEADERS:User-Agent. [file "/etc/apache2/conf.d/modsec/modsec2.user.conf"] [line "2"] [id "77999907"] [msg "Custom WAF: Found blacklisted crawler||User-Agent:Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)||"] [severity "CRITICAL"]
Apache-Error: [file "apache2_util.c"] [line 271] [level 3] [client 192.168.x.x] ModSecurity: Access denied with code 406 (phase 2). Matched phrase "Testrush" at REQUEST_HEADERS:User-Agent. [file "/etc/apache2/conf.d/modsec/modsec2.user.conf"] [line "2"] [id "77999907"] [msg "Custom WAF: Found blacklisted crawler||User-Agent:Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)||"] [severity "CRITICAL"] [hostname "testiim360.com.ua"] [uri "/"] [unique_id "YPUzg_nafsckdcDl1sBUNwAAAAo"]
Action: Intercepted (phase 2)
Stopwatch: 1626682243243010 8933 (- - -)
Stopwatch2: 1626682243243010 8933; combined=461, p1=402, p2=9, p3=0, p4=0, p5=50, sr=119, sw=0, l=0, gc=0
Producer: ModSecurity for Apache/2.9.3 (http://www.modsecurity.org/).
Server: Apache
Engine-Mode: "ENABLED"
--b6843209-Z--
4.2 Here you can find an example of how to configure the blocking of bots with the Imunify360 agent.
- If you choose to set the ModSec rule to the pass, the request will not be rejected by WAF but will be blocked by Gray List and Captcha.
- If you choose to leave it as is in the deny mode, the requests will be firstly blocked by WAF and then will be blocked by Gray List.
You should adjust the Imunify360 config file and add the custom ModSec rule ID to the Imunify360 configuration. Let's run the following command to apply the changes:
# imunify360-agent config update '{"MOD_SEC_BLOCK_BY_CUSTOM_RULE": {"77999907": {"max_incidents": 3, "check_period": 60}}}'
MOD_SEC_BLOCK_BY_CUSTOM_RULE:
33332:
check_period: 120
max_incidents: 10
33339:
check_period: 120
max_incidents: 10
77999907:
check_period: 60
max_incidents: 3
You can set the max_incidents and the check_period variables with the required values.
Please note that the custom number ID for ModSec rules should be set from the allowed range to allow the Imunify360 agent to properly handle it. The range is 779999XX, otherwise, this rule will be skipped.
Let`s check and send a couple of requests to the target host:
# curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
# curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
# curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
# curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
# curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
000 # curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
# curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
000 # curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
# curl -s -o /dev/null -w "%{http_code}" -A 'Mozilla/5.0 (compatible; Testrush/2.1; +http://www.google.com/bot.html)' http://testim360.com.ua/
# curl -v --head http://7masterim360.com.ua
* Rebuilt URL to: http://testim360.com.ua/
* Trying 192.168....
* TCP_NODELAY set
* Connected to 7masterim360.com.ua (192.168.x.x) port 80 (#0)
> HEAD / HTTP/1.1
> Host: 7masterim360.com.ua
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Mon, 19 Jul 2021 08:32:43 GMT
Date: Mon, 19 Jul 2021 08:32:43 GMT
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8
< Connection: close
Connection: close
< Link: <http://testim360.com.ua/index.php?rest_route=/>; rel="https://api.w.org/"
Link: <http://testim360.com.ua/index.php?rest_route=/>; rel="https://api.w.org/"
< Server: imunify360-webshield/1.17
Server: imunify360-webshield/1.17
<
* Closing connection 0
As we can see, after the 3rd request from this bot, its IP was graylisted, and all traffic was redirected to WebShield. If Captcha will not be passed after 101 attempts such IP will be moved to Black List.
Please also refer to this article: How to add Custom ModSec rules on different Panels and IM360 Stand Alone
Additionally, we have task IFR-761, which includes a feature request to enable the above actions to be performed through the UI.
Comments
0 comments
Please sign in to leave a comment.