Contact me for feedback or questions! I reply to everyone.

Mitigating malicious websites

- Introduction -
- Mitigating requests -
- Mitigating Cloudflare -
- A better way? -

Introduction

The Web part of the Internet is broken, and websites hate you. Your average site loads several unneeded third party resources, and for mainstream sites, the figure is 15+. There are several reasons why this is undesirable, the chief one being that those resources can then spy on you in sometimes deep ways (such as collecting mouse movements or browsing history) and send that data wherever (such as to Facebook or Google). The unmitigated websites can also show you stuff you might not want to see (such as advertisements or porn) and the additional resources significantly increase the loading times and waste bandwidth. Then, 20+% of websites are hidden behind Cloudflare, an evil MitM (Man in the Middle, a type of attack where a third party intercepts a connection between you and the intended recipient) that collects everything about you, blocks you for arbitrary reasons and can serve modified sites. Even though this is an extremely big issue, almost everyone involved in these topics has either completely ignored it, or even made excuses for it (if a regular hacker MitMed a single big website, it would be front page news; CF does that for 20%? No problem). In this article we will learn how to mitigate both, and also a long term plan for a better future of the Web.

Mitigating requests

Install uMatrix (FF / Chrome) or ηMatrix (Pale Moon). An icon like this: should appear in the top right corner of the browser. Go to, for example, https://raypeat.com (but any site works), then click the icon, and you'll be greeted with this view:

Grid screen of uMatrix with default settings

The grid tells you which requests your browser has loaded (the ones with a green background will be allowed, the red ones will be blocked). The default settings of uMatrix load all first party stuff by default (including scripts, where most of the malice resides), as well as some third party. In this case, 2 CSS files from Google have been loaded, as well as 3 scripts from Ray Peat itself, that are not needed to read the site at all. To change the situation, click the extension title (e.g eMatrix 5.0.0). You will enter the settings menu. Click the My rules tab, then Edit. Select everything like this:

uMatrix rules menu

And with all the rules selected, press the Delete key on your keyboard. Click Save, then Import from file and import these rules. Save and Commit. The final view should look like this:

uMatrix grid screen allowing only first party CSS and images

What you just did is globally deny sites the ability to load anything except first party CSS and images, which are unlikely to negatively affect you. I call it the mild mode. However, by removing the last two rules * 1st-party css allow and * 1st-party image allow you can deny everything by default (hardcore mode). In this guide we will stick to the mild mode, though, as the hardcore mode will require you enable images and CSS on almost every site. NOTE: the matrix-off: behind-the-scene true rule is essential in ηMatrix, or Pale Moon's downloading of images won't work.

The whole point of uMatrix is to deny resources by default and allow them as needed to enable the website functionality you want. With the current setup, you deny almost everything that's possibly undesirable. And yet, most sites will still work perfectly fine. What if you find one that doesn't? Consider Euractiv. Though it loads 15 first party CSS files, it still looks clearly broken:

Showing the broken CSS of Euractiv site

The uMatrix grid of Euractiv looks like this:

uMatrix grid screen of Euractiv site

There are more requests down there, too (and you won't see all of them until you allow scripts, since scripts can load their own requests - making the real situation much worse than seen in the above picture). But we are only interested in the maxcdn.bootstrapcdn.com CSS one (actually 3). Click the top half of the tile with the 3 CSS files, like this:

Showing how to fix Euractiv in uMatrix

This is how the Euractiv grid should look like after:

Showing the fixed Euractiv grid

Remember - upper half is enabling, bottom half is disabling. Red is disabled, green is enabled. Only the requests with green tiles get loaded. Oh, and you might be wondering why the bootstrap CSS you've just allowed is a darker shade of green. The darker shade appears for the tiles you have specifically allowed (such as the bootstrap CSS). The 15 first party CSS files and the 57 images are all allowed because of inherited global rules - which is why they are light. The same applies to the red shades. Dark is local, light is global. Functionally, it doesn't matter - all green is allowed, all red is banned. The only difference is in the information provided to the person sitting in front of the screen. Anyway, here is how the fixed site looks like:

Showing how a fixed Euractiv site looks like

Much better ^_^. If you are satisfied with the results, click the padlock to make them permanent for the site you're on. And now, you've learned how to unbreak sites in uMatrix. Wasn't that hard, was it? With only one domain enabled over the default settings, you were able to make a site look properly and still block all undesirable stuff. What would a "minimal" browser be able to do (or any without uMatrix installed)? Either allow everything - including the 15 useless scripts (it is really more, since as I've stated above, they don't all show up unless the others - that are now blocked by uMatrix - are allowed to load them) - or deal with a broken website. Anyway, there is no magic rule that determines which requests need to be allowed to fix a site. That knowledge comes from experience. It just so happens that the bootstrap CSS is a pretty common third party resource that is also necessary. But sometimes you will need to allow scripts, and even refresh several times so that new requests get loaded, that you will then also need to allow.

However, it is mythology that uMatrix is a chore to use. Again, most sites are readable out of the box, so you don't need to do anything at all - but you still enjoy the protection from all the undesirable stuff. If a site displays badly, you most often only need to allow one or two domains with CSS, images, or a cookie to keep logins. This will become second nature while you become experienced with uMatrix. Sometimes more tinkering is necessary (especially to make interactive stuff like searches work), but this is rare - and you can permanently save rules, anyway. So, the next time you come back to a site, it will work the way it was when you left it. Using uMatrix gets more effortless the longer you do it - so much that, after you "cover" your most common sites, you almost don't notice it. All in all, uMatrix is the way to gain almost full control of what requests your browser is sending to the websites you visit - with only a little effort (and much less than writing your own adblocker lists would take). By the way, you do not want to use an adblocker, since they cannot do even close to what is necessary to mitigate the Web. Why that is, I have explained more deeply here.

Let's explore the other settings in the menu. You can Collapse placeholder of blocked elements - this will prevent showing a big ugly square for e.g a youtube video that didn't load. Probably should Spoof <noscript> tags when 1st-party scripts are blocked, since otherwise, every website that uses those tags will assume you're running scripts and fail to display properly. Spoof HTTP referrer string of third-party requests. can be turned off and replaced by your browser's settings (e.g Spoof referer to target URL in Pale Moon's Advanced Preferences - this needs Pale Moon Commander addon), which will also work for first party referers, not just third party (these can also track you). Turn on Block all hyperlink auditing attempts., since the only point of those is privacy invasion. Also make sure to enable Resolve CNAME records, since some trackers are now pretending to be first party to avoid extensions like uMatrix. By default, uMatrix lets websites store cookies on your disk - even if they're blocked. This still prevents the cookie-based tracking, because they're not sent to the website (if blocked); but - if you don't want them sticking around - you can enable the Delete blocked cookies option. You can go to the Hosts files tab and disable them all. With the grid set up according to this guide, they are unnecessary and just bring additional load. In the My rules tab, you can export your uMatrix settings in order to import them to another device later. This prevents having to redo your website fixes.

You might be wondering why I even wrote this guide. The addon is old and surely it's been covered over and over, right? Well, as usual, the other guides do not satisfy my standards. This one, for example, focuses on some other stuff instead of the beautiful grid. It takes them until the end of the page to mention the stuff that actually matters. And this is a very old version of the addon, which does not even allow entering the crucial global mode. Another guide just talks and talks, and also tries to pull people away from the essential global mode. By doing so, they glorify allowing all the trash to load by default - they pretty much admit it later The good news is that, as a beginner, you can ignore all the settings positioned to the right of "all.”. It is pointless to use uMatrix in anything other than mild or hardcore mode - such as allowing all scripts or images by default. What makes uMatrix so powerful is being able to situationally decide which categories of requests get loaded, and when. Allowing whole categories by default (like the other guide recommends) destroys this advantage, and makes uMatrix work more like a worse uBlock Origin. That site also requires JS and XHR just to see images, ironically making it a good boot camp for uMatrix usage. It is also Cloudflared.

Mitigating Cloudflare

Use kill-9's guide to distrust Cloudflare certs (archive) (note: there might be some more CF certs now - burn them all). This will prevent you from visiting any website that uses Cloudflare's SSL certificate, but not the ones that have their own. See, Cloudflare has several options for SSL handling - and only the Off and Flexible reveal their cert - which allows the sites to be blocked by the distrust method. The site can choose to pay 200$ per month for Cloudflare's Business plan, which allows them to upload their own SSL certificate. EDIT: turns out getting a Let's Encrypt cert for 10$ / month is possible with Cloudflare's Advanced Certificate Manager. Sites using either of these will not be blocked by the distrust method. Most sites do still use the Cloudflare certs, though, and so the distrust method is very effective (you will be surprised at what kind - and how many - of your favorite sites are behind Cloudflare).

To get those sites that use the Full SSL (an example would be Bitpay, if you want to test), or the Advanced Certificate Manager ones, you need to block the Cloudflare IP addresses (archive) on the firewall, router, etc. Here is a way to do it with iptables:

sudo iptables --append OUTPUT --destination 173.245.48.0/20 --jump DROP
sudo iptables --append OUTPUT --destination 103.21.244.0/22 --jump DROP
sudo iptables --append OUTPUT --destination 103.22.200.0/22 --jump DROP
sudo iptables --append OUTPUT --destination 103.31.4.0/22 --jump DROP
sudo iptables --append OUTPUT --destination 141.101.64.0/18 --jump DROP
sudo iptables --append OUTPUT --destination 108.162.192.0/18 --jump DROP
sudo iptables --append OUTPUT --destination 190.93.240.0/20 --jump DROP
sudo iptables --append OUTPUT --destination 188.114.96.0/20 --jump DROP
sudo iptables --append OUTPUT --destination 197.234.240.0/22 --jump DROP
sudo iptables --append OUTPUT --destination 198.41.128.0/17 --jump DROP
sudo iptables --append OUTPUT --destination 162.158.0.0/15 --jump DROP
sudo iptables --append OUTPUT --destination 104.16.0.0/13 --jump DROP
sudo iptables --append OUTPUT --destination 104.24.0.0/14 --jump DROP
sudo iptables --append OUTPUT --destination 131.0.72.0/22 --jump DROP

And for IPv6:

sudo ip6tables --append OUTPUT --destination 2400:cb00::/32 --jump DROP
sudo ip6tables --append OUTPUT --destination 2606:4700::/32 --jump DROP
sudo ip6tables --append OUTPUT --destination 2405:b500::/32 --jump DROP
sudo ip6tables --append OUTPUT --destination 2405:8100::/32 --jump DROP
sudo ip6tables --append OUTPUT --destination 2a06:98c0::/29 --jump DROP
sudo ip6tables --append OUTPUT --destination 2c0f:f248::/32 --jump DROP

After that, save with sudo iptables-save > /etc/iptables/block-cloudflare.conf and sudo iptables-save > /etc/iptables/block-cloudflare-ipv6.conf. And load the config files on startup by putting iptables-restore /etc/iptables/block-cloudflare.conf& and iptables-restore /etc/iptables/block-cloudflare-ipv6.conf& in /etc/rc.d/rc.local. There is no certainty that Cloudflare reveals all of their IPs - and they might add new ones at any moment, requiring the blocklists to be updated. This will also only give you the regular "could not connect" message when trying to visit a CF site, instead of telling you specifically that it was a failing cert that prevented the connection. But, it is still a lot more reliable to do it this way, since a significant amount of CF sites use their own certs.

There is only one perfect way (AFAIK) to check whether a site is behind Cloudflare. First, type the command dig [website_address.com] into the terminal. As in dig naturalnews.com. A part of the reply will look like this:

;; ANSWER SECTION: naturalnews.com. 300 IN A 104.16.135.70

Now type the command whois 104.16.135.70. If the result contains things such as Cloudflare, Inc. anywhere - then the site is behind it. If it doesn't, it's not.

The only way to actually visit websites behind Cloudflare safely is by using the Wayback Machine (Block Cloudflare MitM Attack extension can do that automatically, but it works only in Firefox). However, not everything is archived (you can archive the page on your own though, if the site allows it anyway) and the interactive features on the sites won't work. And you have to archive all pages separately if the archive bot didn't already do it on its own. Edit: morty proxy also works, if you can find a few instances. You can then spread your profiling onto a few random people, instead of Wayback Machine only. Edit 2: someone notified me that Ghostarchive is no longer Cloudflared, so you can use that instead or in addition to the Wayback Machine. Using a separate browser for Cloudflared sites could also be considered a mitigation, but it is like visiting your abusive husband in a mask, and hoping he doesn't detect you. You're still in his house, under his watchful eye, still affected by the things he does (e.g the captchas you're shown on Cloudflared pages), and hoping he doesn't eventually expose you.

A better way?

The current situation cannot go on forever. Sites will be getting more complicated and more malicious - and eventually, it will be impossible to mitigate them. This happens mainly because of two diseases: soydevism and capitalism (though many other minor reasons exist). Soydevism happens when a webdev didn't learn the basics (HTML, CSS, etc) properly and instead relied on frameworks to create their website. Some programmers have said that these days, a junior dev doesn't even learn the basics, but only the high-level stuff. Soydevs are also known for making their websites dumping grounds for trash like reCaptcha, third party fonts, social media buttons, etc (probably because they don't know any better). Capitalism happens when people stop making sites for enjoyment or passion and instead focus primarily on money (this will necessarily happen in a world where money decides your level of power and is also required just to live). That is when people figure out they can throw in a bunch of ads, tracking scripts, sponsorships etc. to make their site profitable. It also explains filling their websites with 20 clickbait articles per day (more impressions for ads and data collected by scripts). Ars Technica is a great example of a site that combines soydevism and capitalism. Of course, big corporations have contributed to the problem, by creating increasingly complicated standards that only care about working in Chrome (and then people - like the Mullvad team - stop supporting any other browsers). The more these behaviors are normalized, the more of a junkyard the Web ends up being.

The better way is to stop supporting the bad practices. Move away from visiting places that hate you towards those that respect you. This means websites with no ads, exploitative captchas, analytics, or CDNs. Sites that work in all browsers, including 20 years old ones - instead of only the big corpo abominations. Sites that - even if they decide to include modern functionality - practice graceful degradation when doing so, so that they still work properly in older or mitigated browsers. Sites that put their hearts and souls into everything that's on them, instead of dumping it there just to have it. Then, if you have your own site, do link to the other good ones. Though uMatrix, I don't care about cookies, URL Rewriter, BCMA, etc. are great tools, I really wish we could realize that they are mostly bandages on a Web that is broken at the fundamental level - then we can decide to just give them up. Let's create an internet full of love by default :D

Back to the front page