Safeguarding against bots
I host this blog on a Pi and try to ensure that the server doesn’t choke serving up pages. I have taken great care to ensure that the pages are as light as possible. This is a personal blog and I don’t expect much visitors either. Bots on the other hand love servers that are exposed to the internet and probe it regularly for any weaknesses. Last week and early this week, my blog went down with an HTTP 500 error. The first time it happened, I rebooted the Pi and everything was fine. This time around, it took some cursing and cajoling. Checking the logs confirmed my worst suspicion. The server was being probed for any Wordpress, PHP and various login paths among many other hits. Everything was going to a 404 page and pretty soon the Pi was overwhelmed.
The Pi was being DDoS’d by bot requests. I suspect this kind of traffic is common for servers hosted on VPS and on the cloud and their servers are robust enough and this might be trivial for them. I cannot quantify the traffic though nor do I understand how much is too much.
This is my nginx access log for all kinds of requests hitting it. 404 being the highest.
cat access.log | cut -d '"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort -rn
32752 404
15520 200
7547 301
4067 304
906 400
72 437
43 405
10 403
9 206
And the most visited 404 pages are:
awk '($9 ~ /404/)' access.log | awk '{print $7}' | sort | uniq -c | sort -rn
12924 /api/collections/blog/inbox
7638 /favicon.ico
1701 /feed/
1576 /api/v1/instance
1526 /.well-known/nodeinfo
1278 /api/statusnet/config.json
282 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php
254 /robots.txt
198 /.well-known/host-meta
171 /page2/favicon.ico
136 /api/jsonws/invoke
135 /index.php?s=/Index/\x5Cthink\x5Capp/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=HelloThinkPHP21
...
And the hacking attempts:
awk '($9 ~ /404/)' access.log | awk -F\" '($2 ~ "^GET .*\.php")' | awk '{print $7}' | sort | uniq -c | sort -r | head -n 20
awk: cmd. line:1: warning: escape sequence `\.' treated as plain `.'
208 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php
135 /index.php?s=/Index/\x5Cthink\x5Capp/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=HelloThinkPHP21
49 /wp-login.php
12 /system_api.php
12 /streaming/clients_live.php
5 /phpmyadmin/
5 /config.bak.php
5 //xmlrpc.php?rsd
4 //vendor/phpunit/phpunit/phpunit.xsd
3 /wp-admin/admin-ajax.php?action=revslider_show_image&img=../wp-config.php
3 /pmd/index.php
3 /phpmyadmin/index.php
2 /vendor/phpunit/phpunit/src/Util/PHP/XsamXadoo_Bot_.php
2 /vendor/phpunit/phpunit/src/Util/PHP/XsamXadoo_Bot.php
2 /phpMyAdmin4/index.php?lang=en
2 /phpMyAdmin3/index.php?lang=en
2 /phpMyAdmin2/index.php?lang=en
2 /phpMyAdmin-2.6.3/
2 /phpMyAdmin-2.6.2-rc1/
2 /index.php?m=admin&c=index&a=login&dosubmit=1
The most immediate step I took was to rate limit the incoming requests. Thanks to the copious documentation available for nginx, I did that with these two lines.
limit_req_zone $binary_remote_addr zone=speedbump:10m rate=10r/s;
# HTTPS server
#
server {
limit_req zone=speedbump burst=5 nodelay;
Next step was to catch these rogue IP addresses and ban them with fail2ban. I had the SSH jail set up earlier so went ahead and put these up for nginx requests:
[nginx-noscript]
port = http,https
filter = nginx-noscript
logpath = /var/log/nginx/access.log
maxretry = 6
[nginx-badbots]
port = http,https
filter = nginx-badbots
logpath = /var/log/nginx/access.log
maxretry = 2
[nginx-noproxy]
port = http,https
filter = nginx-noproxy
logpath = /var/log/nginx/access.log
maxretry = 2
[man-ban]
enabled = true
filter = nginx-limit-req
action = iptables-multiport[name=ReqLimit, port="http,https", protocol=tcp]
logpath = /var/log/nginx/access.log
findtime = 1
bantime = 2678400
maxretry = 99999
Then I manually banned 9 IP addresses from the top 10 requests in nginx log. One of the address was skipped because that’s a legitimate NextCloud News reader making requests to read my feed every 15 mins. Rest were bots triggering 404.
sudo awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr
5684 116.203.27.118
3792 116.203.17.124
1658 95.216.4.252
1412 95.111.255.178
1122 173.231.59.213
979 103.210.236.47
962 132.232.70.247
890 91.241.19.84
867 198.98.50.189
847 45.155.205.108
sudo fail2ban-client set man-ban banip 132.232.70.247
I have to be careful banning IP addresses manually because it could be an innocent feed reader or search bots trying to index the pages. I don’t have to worry about all this if this was hosted on a VPS or some blogging sites but then how will I get to know this cool fact about my server?
Resources:
- Secure your Django API from DDoS attacks with NGINX and fail2ban
- How To Protect an Nginx Server with Fail2Ban on Ubuntu 14.04
- nginx log parsing
- Count IP addresses in nginx access logs
Day 70 - Join Me in #100DaysToOffload