Like many other security research firms, SpiderLabs Research has been actively investigating the Flame (a.k.a. sKyWIper) malware that was revealed earlier this week. For those unaware of what Flame is, I'll provide a very brief summary. Essentially, Flame is a modular, extremely large, piece of malware that was discovered in Iran. The malware was found to be quite complex, providing the attackers with a wealth of information and control over the infected system(s). Many of the components included in Flame provide some form of encryption and/or obfuscation, which is what I will be discussing today. Specifically, I'm going to talk about String obfuscation encountered in the advnetcfg.ocx component, and how I was able to defeat it using IDAPython.
Before I jump straight into IDAPython, however, I'd like to take a step back and describe how I was able to identify the String obfuscation, and ultimately discover the plain text of the Strings in the sample.
Upon analyzing advnetcfg.ocx, I discovered a number of pieces of data that were being supplied as arguments to the same function.
If we look above, we can see that this new function is supplied two arguments—The 'obfuscated_string' variable with an offset of +20, as well as the eighteenth byte in 'obfuscated_string'. We can see how these arguments are used below:
0xA7 – 0x82 = 0x25 ("%")
So that brings us to the third, and last, function in this de-obfuscation routine. This function, unlike the previous two, is quite small:
The above function decodes to the following in Ruby:
At this point we've concluded recreating the de-obfuscation routine that is used in advnetcfg.ocx. So you're probably sitting on your comfy lay-z-boy asking yourself, "Wait a minute, I thought you said you were going to defeat it with IDAPython!? What's all this Ruby nonsense?" Well, it's true; I made a slight blunder when I originally dove into this code.
I originally wrote everything in Ruby, only to realize that writing it in Python might have been a better option (you'll see why in a minute). No, it's not because Python is a superior scripting language, but it's because IDA Pro (http://www.hex-rays.com/) has graciously provided a plugin that integrates the Python language, allowing us to run scripts inside of IDA Pro. Remember how I said earlier that the de-obfuscation routine was referenced 179 times? I don't know about you, but I sure don't want to manually copy each obfuscated String into a script and manually comment the resulting value into my IDB file. As such, I looked into using IDAPython.
Now I'll be the first to admit it—I'm probably as far as you can get from being a "Python expert", but I did manage to get through it without too many bumps in the road. The first step was to convert the work I'd done earlier into Python. Overall it didn't prove to be terribly difficult as Ruby and Python share a lot of commonality.
The second step was to find all of the obfuscated strings that get supplied to the de-obfuscation routine. If we look back, we recall that before each call to the de-obfuscation function, there was a 'push offset' call, like so:
I've included the hex representation of each Assembly call, as this information will be important later on. If we look closely at the hex of the 'push offset unk_1008FBB8', we can see that the 'push offset' portion is represented by 0x68. The remaining 4 bytes represent the location of the obfuscated string, in the reverse order (due to the endianness).
Knowing this, we can use the CodeRefsTo() function to determine all of the XREFs to the de-obfuscation function. We can then subtract 4 from each location in order to point to the obfuscated string location. Using the information we gathered earlier regarding the size of the obfuscated string being located at the eighteenth byte, along with knowing that the actual obfuscated segment starts at the twentieth byte, we can obtain the information needed, like so:
The complete code can be found here.
An example of what it looks like when it is run can be seen below:
In addition to the above functionality, John Miller (@ethackal) provided this wonderful one-liner that will also rename the data blocks to their de-obfuscated string representation. The following code accomplishes this: