Scraping ransomware leak sites which don't want to be

Bypassing Ransomhubs anti-scraping code

Aug 05, 2024

Introduction

Several ransomware threat actors have attempted to make it difficult for security researchers to be able to scrape information from their servers via Tor. Threat actors have attempted to develop some basic alternatives to Cloudflares service via Tor. Threat actors are trying to block or mitigate scrapers overwhelming their servers. I’ve spoken about this previously when Lockbit was more operational.1 One of ransomware threat actors biggest weaknesses is the fact they need to make leak files available for others to download for it to be a valid threat to the ransomed company. Some threat actors have opted to using online file storage providers to display leaks, which are easy to report and take down. Others maintain a file storage site but incur costs on bandwidth and storage, the only threat actor I’ve analysed that was capable of maintaining infrastructure for ransomware leaks sufficiently was Lockbit, which was disrupted, from which it hasn’t recovered, earlier this year.2

Recently a slew of new leaks from RansomHub grabbed my attention, when viewing their file listings the initial page showed some logic which indicated attempts to stop scraping and DDoS. I investigated further with the aim to find a bypass.

Analysis

The initial page is pretty simple in display and only lasts a few seconds shown in Figure 1. I’m aware many of these pages can be bypassed by scriptable headless browsers, however, a simple python requests script has far less resources required compared to launching a browser in a scriptable manner. It’s also a fun excersise :-)

Figure 1 - Loading page

The code behind this page reveals it uses some Javascript obfuscation to be able to perform a calculation as shown in Figure 2. The obfuscation I’ve seen before, it’s been used by advanced persistent threat actors in some of their phishing pages in 2023.3 It’s called JSFuck and theres many solvers out their (most of them still in Python 2.7). To make this simpler I use a JavaScript interpreter to evaluate the code to make my life easier which is shown in the solution section of this post.

Figure 2 - The code served when first visiting RansomHub, a ransomware leak page

The code above has two variables containing JSFuck obfuscation. The return values from these obfuscations is an integer. The two variables are added together and set as a cookie called _k2. When looking at the initial page response we see a cookie is set called _k1. The cookie _k1 changes everytime you load the initial page without solving the initial challenge. The initial response is shown below when requesting without any cookies stored:

HTTP/1.1 401 Unauthorized
Content-Type: text/html; charset=utf-8
Content-Length: 1310
X-Xss-Protection: 0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Resource-Policy: same-origin
Origin-Agent-Cluster: ?1
Referrer-Policy: no-referrer
X-Dns-Prefetch-Control: off
X-Download-Options: noopen
X-Permitted-Cross-Domain-Policies: none
Set-Cookie: _k1=<randomised string value>; path=/; SameSite=Lax

As well as this initial response page, the Tor browser also sends another request for the onion sites favicon.ico. This is expected behaviour, the Tor browser is built from Firefox technology and relies on it to do the heavy lifting for a modern browser experience. This has been defined by some as a privacy issue and is an on-going debate45, in this instance however the ransomware leak administrator has used this behaviour to identify scrapers. From the initial request the favicon.ico request looks as follows:

GET /favicon.ico undefined
Host: fpwwt67hm3mkt6hdavkfyqi42oo3vkaggvjj4kxdr2ivsbzyka5yr2qd.onion
User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:109.0) Gecko/20100101 Firefox/115.0
Accept: image/avif,image/webp,*/*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Cookie: _k1=<randomised_string>;

Without sending this request with your scraper, the leak site will not send you the correct response, even if you’ve solved the cookie calculation! This shows some level of understanding by the administrators on fingerprinting browsers and finding distinction between scrapers.

Solution

So with all our understanding of the leak site, we need to:

Resolve two Javascript variables which have been obfuscated back to integers and then add them together;
Set the obfuscated variables final result as a cookie value, specifically _k2
Carry over the first cookie response into the second request; and,
Send a request to simulate the favicon.ico request.

To achieve this we’ll use Python. I import three libraries: re, requests and js2py. We’ll use re to define a regex and get the two obfuscated variables from the first request and we’ll use js2py to evaluate the code for us instead of deobfuscating ourselves to make it a little simpler for ourselves. The JavaScript interpeter library is a game changer for researchers who follow ransomware leak sites, making it much easier to resolve these types of challenges. I use torsocks to interact with the Tor page after completing the script and get the following response:

root@server:~/crawler# torsocks python3 script.py
<!DOCTYPE html><html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta name="viewport" content="width=device-width"><style type="text/css">body,html {background:#fff;font-family:"Bitstream Vera Sans","Lucida Grande","Lucida Sans Unicode",Lucidux,Verdana,Lucida,sans-serif;}tr:nth-child(even) {background:#f4f4f4;}th,td {padding:0.1em 0.5em;}th {text-align:left;font-weight:bold;background:#eee;border-bottom:1px solid #aaa;}#list {border:1px solid #aaa;width:100%;}a {color:#a33;}a:hover {color:#e33;}</style>
<link rel="stylesheet" href="/fancyindex/fancyindex-bolt.css" type="text/css"/>

<title>Index of /</title>
</head><body><h1>Index of /</h1>
...

Success! The code can be found here. The ransomware leak site administrator will likely change the challenge slightly, but the problem will still occur for them. The Tor browser is limited in some ways by design for privacy purposes, these limitations give constraints to leak site administrators that are hard to overcome and give security researchers an advantage.

https://www.youtube[.]com/watch?v=fRuRn7QDJ-I

https://www.nationalcrimeagency.gov.uk/news/lockbit-leader-unmasked-and-sanctioned

https://www.pwc.com/gx/en/issues/cybersecurity/cyber-threat-intelligence/blue-callisto-orbits-around-us.html

https://forum.torproject.org/t/disable-site-favicons/2120/6

https://gitlab.torproject.org/tpo/applications/tor-browser/-/issues/42021

Jack’s Substack

Discussion about this post