The Beta Cluster websites are protected against various types of abuse by configuration in our Content Delivery Network (CDN) edge. This page describes the procedures forfinding abusive traffic sources,blocking based on IP network, andunblocking based on IP network.
Thegrafana load graph for deployment-mediawiki14 is usually helpful for determining if the server is being overloaded when folks are reporting unusually slow responses. Sustained load over 3-4 is a common indication of unusually heavy traffic.
There are a number of helper scripts indeployment-mediawiki14.deployment-prep.eqiad1.wikimedia.cloud:/root that can be used to identify networks that are sending the largest share of traffic. These scripts output YAML lists suitable for pasting into the hiera data in Horizon.
user@laptop:~$sshdeployment-mediawiki14.deployment-prep.eqiad1.wikimedia.clouduser@deployment-mediawiki14:~$sudo-iroot@deployment-mediawiki14:~#./big-ban-hammer.sh - 57.0.0.0/8 - 74.0.0.0/8 - 91.0.0.0/8 - 94.0.0.0/8 - 101.0.0.0/8 - 110.0.0.0/8 - 111.0.0.0/8 - 119.0.0.0/8 - 124.0.0.0/8 - 159.0.0.0/8 - 166.0.0.0/8 - 172.0.0.0/8 - 217.0.0.0/8
Ideally we will block smaller networks rather than giant class A and class B blocks, but when there are attackers using residential networks as their attack base we can end up needing to use larger blocks.T392534 was an example of blocking a large number of class A networks. A number of these have subsequently been split into smaller blocks to allow networks carrying friendly traffic to be unblocked.
To scan more than just the last 50k requests, you can use something likesudo grep -oP '"X-Client-IP": "\d+\.\d+\.\d+\.\d+' /var/log/apache2/other_vhosts_access-json.log|sort|uniq -c|sort -nr|head -n10 instead.
Blocking an abusing IP can be done by adding the IP or more commonly a CIDR network containing the IP to theabuse_networks:blocked_nets:networks Hiera configuration athttps://horizon.wikimedia.org/project/puppet/.
To ensure the Hiera change is picked up quickly, check the varnish serverdeployment-cache-text*:
/etc/haproxy/ipblocks.d/all.map to ensure your changes are presentsudo run-puppet-agentsudo service haproxy reloadRepeat the forced Puppet run and HAProxy reload ondeployment-cache-upload* to fully block the new ranges.
Unblocking works the same way as blocking. We will adjust theabuse_networks:blocked_nets:networks Hiera configuration athttps://horizon.wikimedia.org/project/puppet/, run Puppet ondeployment-cache-text*, and reload config for the haproxy service.
First we need to identify the specific IP or network CIDR to unblock. Typically a user will report a single IP. We can use that IP to identify the containing network viawhois lookup. One convenient way to do that lookup is by using thewhois-dev tool. For example if the IP requesting to be unblocked is69.92.197.73 thenwhois will tell us that the address is part of the69.92.197.0/24 class C network. There is a command line tool indeployment-mediawiki14.deployment-prep.eqiad1.wikimedia.cloud:/root/whoisit.sh that can be used as well:
$sshdeployment-mediawiki14.deployment-prep.eqiad1.wikimedia.cloud$sudo-i$./whoisit.sh69.92.197.73[ "ok", "69.92.197.73", "69.92.197.0/24"]
Once we have the network to unblock we need to find the currently blocked network that contains it. To do this we need to review theabuse_networks:blocked_nets:networks data in Hiera configuration athttps://horizon.wikimedia.org/project/puppet/ or thecloud/instance-puppet.git:deployment-prep/_.yaml file that tracks the Beta Cluster global Hiera data. For the sake of this example let's assume that the list currently contains an entry for the69.92.0.0/16 class B network.
Now we need to figure out what smaller network blocks should remain blocked after splitting69.92.0.0/16 to remove69.92.197.0/24. ThesubtractNetworks.py script can do that for you. We keep a copy on deployment-mediawiki14.deployment-prep.eqiad1.wikimedia.cloud with the scripts we use to find ranges to block:
user@laptop:~$sshdeployment-mediawiki14.deployment-prep.eqiad1.wikimedia.clouduser@deployment-mediawiki14:~$sudo-iroot@deployment-mediawiki14:~#./subtractNetworks.py69.92.0.0/1669.92.197.0/24abuse_networks: blocked_nets: networks: - 69.92.0.0/17 - 69.92.128.0/18 - 69.92.192.0/22 - 69.92.196.0/24 - 69.92.198.0/23 - 69.92.200.0/21 - 69.92.208.0/20 - 69.92.224.0/19
Now that we have the list of networks to remain blocked, we paste it into theabuse_networks:blocked_nets:networks Hiera configuration athttps://horizon.wikimedia.org/project/puppet/ in place of the prior69.92.0.0/16 network.
Finally we need to run Puppet and reload haproxy on to pick up the changes on relevant hosts:
user@laptop:~$forhostin{deployment-cache-text08,deployment-cache-upload08}.deployment-prep.eqiad1.wikimedia.cloud;doechoProcessing$host;ssh$host'sudo run-puppet-agent && sudo systemctl reload haproxy';doneProcessing deployment-cache-text08.deployment-prep.eqiad1.wikimedia.cloudInfo: Using environment 'production'Info: Retrieving pluginfactsInfo: Retrieving pluginInfo: Loading factsInfo: Caching catalog for deployment-cache-text08.deployment-prep.eqiad1.wikimedia.cloudInfo: Applying configuration version '(5327bea896) gitpuppet - puppetserver: Generalize git-rebase fix to work for labs/private'Notice: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]/ensure: ensure changed 'stopped' to 'running' (corrective)Info: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]: Unscheduling refresh on Service[prometheus-varnishkafka-exporter]Notice: Applied catalog in 16.00 secondsProcessing deployment-cache-upload08.deployment-prep.eqiad1.wikimedia.cloudInfo: Using environment 'production'Info: Retrieving pluginfactsInfo: Retrieving pluginInfo: Loading factsInfo: Caching catalog for deployment-cache-upload08.deployment-prep.eqiad1.wikimedia.cloudInfo: Applying configuration version '(5327bea896) gitpuppet - puppetserver: Generalize git-rebase fix to work for labs/private'Notice: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]/ensure: ensure changed 'stopped' to 'running' (corrective)Info: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]: Unscheduling refresh on Service[prometheus-varnishkafka-exporter]Notice: Applied catalog in 13.88 seconds