Safeguarding against bots

I host this blog on a Pi and try to ensure that the server doesn’t choke serving up pages. I have taken great care to ensure that the pages are as light as possible. This is a personal blog and I don’t expect much visitors either. Bots on the other hand love servers that are exposed to the internet and probe it regularly for any weaknesses. Last week and early this week, my blog went down with an HTTP 500 error. The first time it happened, I rebooted the Pi and everything was fine. This time around, it took some cursing and cajoling. Checking the logs confirmed my worst suspicion. The server was being probed for any Wordpress, PHP and various login paths among many other hits. Everything was going to a 404 page and pretty soon the Pi was overwhelmed.

The Pi was being DDoS’d by bot requests. I suspect this kind of traffic is common for servers hosted on VPS and on the cloud and their servers are robust enough and this might be trivial for them. I cannot quantify the traffic though nor do I understand how much is too much.

This is my nginx access log for all kinds of requests hitting it. 404 being the highest.

cat access.log | cut -d '"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort -rn

  32752 404
  15520 200
   7547 301
   4067 304
    906 400
     72 437
     43 405
     10 403
      9 206

And the most visited 404 pages are:

awk '($9 ~ /404/)' access.log | awk '{print $7}' | sort | uniq -c | sort -rn

  12924 /api/collections/blog/inbox
   7638 /favicon.ico
   1701 /feed/
   1576 /api/v1/instance
   1526 /.well-known/nodeinfo
   1278 /api/statusnet/config.json
    282 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php
    254 /robots.txt
    198 /.well-known/host-meta
    171 /page2/favicon.ico
    136 /api/jsonws/invoke
    135 /index.php?s=/Index/\x5Cthink\x5Capp/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=HelloThinkPHP21
...

And the hacking attempts:

awk '($9 ~ /404/)' access.log | awk -F\" '($2 ~ "^GET .*\.php")' | awk '{print $7}' | sort | uniq -c | sort -r | head -n 20

awk: cmd. line:1: warning: escape sequence `\.' treated as plain `.'

    208 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php
    135 /index.php?s=/Index/\x5Cthink\x5Capp/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=HelloThinkPHP21
     49 /wp-login.php
     12 /system_api.php
     12 /streaming/clients_live.php
      5 /phpmyadmin/
      5 /config.bak.php
      5 //xmlrpc.php?rsd
      4 //vendor/phpunit/phpunit/phpunit.xsd
      3 /wp-admin/admin-ajax.php?action=revslider_show_image&img=../wp-config.php
      3 /pmd/index.php
      3 /phpmyadmin/index.php
      2 /vendor/phpunit/phpunit/src/Util/PHP/XsamXadoo_Bot_.php
      2 /vendor/phpunit/phpunit/src/Util/PHP/XsamXadoo_Bot.php
      2 /phpMyAdmin4/index.php?lang=en
      2 /phpMyAdmin3/index.php?lang=en
      2 /phpMyAdmin2/index.php?lang=en
      2 /phpMyAdmin-2.6.3/
      2 /phpMyAdmin-2.6.2-rc1/
      2 /index.php?m=admin&c=index&a=login&dosubmit=1

The most immediate step I took was to rate limit the incoming requests. Thanks to the copious documentation available for nginx, I did that with these two lines.

       limit_req_zone $binary_remote_addr zone=speedbump:10m rate=10r/s;

	# HTTPS server
	#
	server {
		limit_req zone=speedbump burst=5 nodelay;

Next step was to catch these rogue IP addresses and ban them with fail2ban. I had the SSH jail set up earlier so went ahead and put these up for nginx requests:

[nginx-noscript]
port     = http,https
filter   = nginx-noscript
logpath  = /var/log/nginx/access.log
maxretry = 6

[nginx-badbots]
port     = http,https
filter   = nginx-badbots
logpath  = /var/log/nginx/access.log
maxretry = 2

[nginx-noproxy]
port     = http,https
filter   = nginx-noproxy
logpath  = /var/log/nginx/access.log
maxretry = 2

[man-ban]
enabled = true
filter = nginx-limit-req
action = iptables-multiport[name=ReqLimit, port="http,https", protocol=tcp]
logpath = /var/log/nginx/access.log
findtime = 1
bantime = 2678400
maxretry = 99999

Then I manually banned 9 IP addresses from the top 10 requests in nginx log. One of the address was skipped because that’s a legitimate NextCloud News reader making requests to read my feed every 15 mins. Rest were bots triggering 404.

sudo awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr

   5684 116.203.27.118
   3792 116.203.17.124
   1658 95.216.4.252
   1412 95.111.255.178
   1122 173.231.59.213
    979 103.210.236.47
    962 132.232.70.247
    890 91.241.19.84
    867 198.98.50.189
    847 45.155.205.108

sudo fail2ban-client set man-ban banip 132.232.70.247

I have to be careful banning IP addresses manually because it could be an innocent feed reader or search bots trying to index the pages. I don’t have to worry about all this if this was hosted on a VPS or some blogging sites but then how will I get to know this cool fact about my server?

Resources:

  1. Secure your Django API from DDoS attacks with NGINX and fail2ban
  2. How To Protect an Nginx Server with Fail2Ban on Ubuntu 14.04
  3. nginx log parsing
  4. Count IP addresses in nginx access logs

Day 70 - Join Me in #100DaysToOffload