The issue is that quite a few bots taking off traffic and consume server resources, this affects server's stability and performance. It is required to block such bots with a list.
- Imunify360
- ModSecurity
A custom rule with a list of bots in external file can be used to block bots. There will be a few steps below describing those in more details:
1. Ceate a file with a list of bots per line, put this file within your web server configuration files or includes. For example:
cat /etc/apache2/conf.d/modsec/
Rogue bot
Ugly crawler
2. Create a custom rule with 77 prefix that refers to the list:
SecRule REQUEST_HEADERS:User-Agent "@pmFromFile /etc/apache2/conf.d/modsec/" "id:77999901,phase:2,t:none,auditlog,deny,status:403,severity:2,msg:'Custom WAF: Found blacklisted crawler||User-Agent:%{REQUEST_HEADERS.User-Agent}||'"
3. Set a threshold for this custom rule, so that not only requests are denied yet also IMunify360 starts to block IPs:
imunify360-agent config update '{"MOD_SEC_BLOCK_BY_CUSTOM_RULE": {"77999901": {"max_incidents": 1, "check_period": 600}}}'
And it will work as per:
- First a bot with the blocked User-Agent hit the server,
- The custom rule is triggered,
- The bot is denied to connect with deny action in this rule
- The bot's IP is Graylisted with Imunify360 as it reached the set custom rule threshold
- Further connections are redirected to the Captcha (less traffic and less CPU usage), so that this bot's IP does not reach your web server again
- If bot continues to be annoying (and not complying the no cache response) it will hit the Captcha 100 times and will be blocked on the firewall level or with a SplashScreen, both are the least consuming in terms of traffic.
It will be possible to tighten the response with MOD_SEC_BLOCK_BY_CUSTOM_RULE and CAPTCHA_DOS. Please let us know if we can be of a further assistance for you.
Imunify360 goal is to protect you servers against known vulnerabilities and malicious activity, there is no in built functionality to limit bandwidth for a specific bot or IP. It is expected that such a task will be carried out with customizations within the designed tool, although to a limited degree as well. I assume it can be a feature requests which I encourage you to share on our feedback portal or describe here the design of such feature as you would expect it.
Useful links
How to change the time limits for the xml-rpc requests?
How to limit a known crawler bot?
How to block a specific user-agent (with apache configuration)
Please sign in to leave a comment.