Understanding the Mirai Botnet
Antonakakis, Manos, Tim April, Michael Bailey, Matt Bernhard, Elie Bursztein, Jaime Cochran, Zakir Durumeric, et al.
USENIX Security 17
Mirai is an interesting botnet, for many reasons.
The source code of the botnet is available, it was first posted on
hackforums.net2 but you can find it on
GitHub. As Krebs points
this is not the author being charitable, this is to gain plausible deniability.
Regardless of the author’s intentions, the source code and accompanying forum
are worth a skim. If you can get past the sneers, they reveal how the botnet
This is enough context, let’s jump into the paper.
The functioning of the botnet can be broken into seven main steps.
Each bot in Mirai started out in the rapid scanning phase where it pseudorandomly hit Telnet ports TCP/23 and TCP/2323 and if the server responded, it would try to login with 10 randomly picked login credentials from a list of 62 login credentials including the following 6 combinations.
root admin user user admin (none) 666666 666666 mother fucker admin 7ujMko0admin
If a Mirai bot was able to login using the above method, it would report the victim IP and the credentials that worked to the server listening for scans (the reporting server) whose IP is known to the bot (the port is fixed to be 48101.)
The reporting server, which is the go script scanListen.go listens on port 48101, and dispatches the results to the loader.
Why do we have a server in between? Why not let the bot talk to the loading server? The idea is that you’ll have many loading servers and typically the loading servers are high-performance dedicated servers, while the server in the middle is a low-end server running a simple script, in the case of Mirai it is the scanListen.go script. So, the middle server acts as a kind of load-balancer. Also, in the case of a global botnet it might make sense to geographically distribute the loading servers, since loading is kinda bandwidth intensive, this helps.
The loader logs in, checks the architecture (MIPS, ARM,…) and loads an architecture-specific binary. After loading, the malware file is deleted. This makes the attack non-persistent; i.e., if you restart, the malware is gone. The binary also removed other infections like qbot.
The attacker sends a command to the Command and Control (C2) server.
The C2 server relays the command to the bots.
The bots perform the attack against the target. The attacks will be described in a bit.
Who owns a bot? It’s original owner, of course. But in a botnet, the bot is owned and commanded by the botnet C2, but how does the bot know that? The C2 location is hardcoded into the infection. So, once a bot is loaded, it listens to the C2 for instructions. The authors of the paper recovered 62 C2 domains from the infection binaries.
Now, to get a botnet started, you need to have a first infection which infects more devices, which infects more devices, and so on. The authors of the paper call this bootstrapping.
The first scan occurred from a DataWagon IP address (DataWagon is a well-known bulletproof host,3 see Krebs’s post for more.) And within 20 hours, 64,500 devices were infected. To put this in perspective, the attack on KrebsOnSecurity used only 24,000 devices!
Eventually the botnet reached a steady state of about 300k devices. After which the source code was released and a variant (one that used a CWMP exploit) blow up the number to a peak of 600k devices.
A disproportionate number of devices infected were located in South America and Southeast Asia. However, they were not clustered in AS,4 scans by the authors revealed that the top 10 ASes had 44.3% of the infections and top 100 ASes had 78.6% of the infections.
From the description of the attack above, it is no surprise that Mirai was not
very successful with enterprise web servers. Indeed, the authors of Mirai knew
this and targeted their malware at IoT devices by including hardcoded passwords
from known IoT vendors like
7ujMko0admin in the list above which was commonly
hardcoded in Dahua IP Cameras. But intended targets don’t mean shit, what
matters is the devices that were actually affected. The authors studied device
banners to identify the devices infected by this malware and were able conclude
that most of the Mirai infections were security cameras, DVRs, and customer
IoT devices typically don’t have too much bandwidth. The author’s network telescope observed that most devices were scanning at a rate of about 250 bytes per second. Further, the authors note that there was no rate-limiting code in the infection. This further confirms the hypothesis that Mirai was mostly low-power IoT devices.
Author’s studies revealed that the 39.8% of the attacks were TCP state exhaustion, 34.5% were application layer attacks, and 32.8% were volumetric attacks This is in stark contrast to other DDoS-for-hire services which primarily use amplification attacks, scraped data of VDO (a major booter) by Karami, Park, and McCoy showed that over 72% of the attacks used amplification.
I have thrown a lot of terms at you, so lets define them one-by-one.
A booter is a DDoS-for-hire service, they are usually called booters or stressors and advertised as services to network admins to stress-test their infrastructure to seem legitimate. (Of course, there are legitimate uses for a stressor, but a lot of these services use botnets and no legitimate company can use them without crossing ethical and legal boundaries.)
A TCP state exhaustion attack is something like a SYN flood, where the attacker floods the victims server with SYNs. Recall that the TCP handshake is SYN, SYNACK, and ACK, so the server responds to a SYN with a SYNACK and stores state so it knows that the server already received a SYN from this IP. At scale, this attack can exhaust the server’s “state”; in practice, this can be something like the number the ports available or the amount of memory available (suppose a server makes allocations on a SYN packet assuming that client will initiate a connection.) Mirai also did other variations on this attack like the ACK flood (stateless network devices like Firewalls process all packets and this could exhaust their memory) and the ACK-STOMP flood.
An application layer attack is something like a HTTP flood, where the attacker floods the victim server with GET requests (typically, for expensive assets like images which require a lot of work from the victim server) or POST requests (these typically require a lot of work as well because most servers do server-side validation of POST data.) Mirai also did other variations like the GRE flood and DNS flood.
A volumetric attack is something like a UDP flood, where the attacker floods the victim with UDP packets. The intuition is that the server will check if any port is listening on the port and if not replies back with an ICMP server unreachable, further if one can flood the server with requests, it should be overwhelmed.
Studies by the authors showed that Mirai targeted around 5k victims of which
4730 were individual IPs, 196 were subnets, and 120 were domains. Including an
/0 subnet i.e., everyone (makes for good for lolz I guess.) And, as
expected, most of the targets were located in the United States. Looking at the
port numbers for TCP attacks, the authors noticed that most attacks were on
ports 80 (HTTP), 53 (DNS), 25565 (Minecraft), 443 (HTTPS), 20000 (DNP3), and
23594 (Runescape). As expected, the targets included competing Mirai C2 servers
(this was after the source code was released.)
The attack on KrebsOnSecurity mentioned above, which was clocked at ~620Gbps used only about 24k devices in the botnet. This is frightening because sheds light on the potential of IoT attacks. Imagine the impact if the attack used all bots at Mirai’s peak, 600k devices, and more sophisticated attack methods?
While the attack on KrebsOnSecurity was bad, regular people didn’t really notice
it. The attack on Dyn is what caught the world’s attention, it affected Twitter,
Reddit, PayPal, GitHub, and the Playstation Network. If the last item looks a
little out of place you’ll be even more surprised to know that this (the
Playstation Network) was the only intended target! Reverse DNS queries and more
sleuthing by the authors revealed that the target probably was
ns<05-06>.playstation.net which was managed by Dyn. The title of the
corresponding Forbes article illustrates the absurdity quite well: Angry Gamer
Blamed For Most Devastating DDoS Of
Finally, there was the attack on Liberia’s Lonestar Cell which some people, including The Guardian, claimed to have taken Liberia offline, but as Krebs points out, with tonnes of evidence, that seems unlikely.
Read the paper, I have skipped over many parts including the entirety of the methodology which is fascinating.
Support USENIX, they made this paper open-access and do a lot more awesome things!
This article is for informational purposes only and cannot be interpreted as advice, nor is it to be relied on in making a decision. In particular, please don't do illegal things.
I am using networking terms, Gbps is gigabits per second. ↩︎
hackforums.net is a weird place. It seems to have a lot of computer security beginners and people selling exploits. Before Mirai, apparently there used to be a DDoS-for-hire board. It seems to have a lot of people I would characterize as script kiddies (people who run exploits written by other people) which makes it a great place to dump source code for botnet. ↩︎
A bulletproof host is one that does not handle abuse reports. For example, if you are port-scanning from big host like AWS, they will probably receive an abuse complaint from someone and AWS will promptly kick you. Bulletproof hosts will not even read the abuse reports! If you want to do things that will lead to abuse reports (like scanning for telnet) you better use a bulletproof host (THIS IS NOT ADVICE, this is my conversational tone.) Typically, these hosts are located offshore in countries where there are lax restrictions on stuff like this. If you wanna do this, you probably also want a bulletproof domain registrar (if you are using domains instead of raw IPs) because a domain registrar could also kick you by neutering your DNS. ↩︎
An autonomous system (AS) is a group of routers (usually under a single operator) with a clear routing policy. The internet is a group of ASes, with an intra-AS routing policy and an inter-AS routing policy. In practice, ISPs (e.g., Sprint) and large organizational networks (e.g., large universities like MIT) are ASes. ↩︎