Analyzing EMOTET Network Traffic and Analyzing the Malware Sample
Context:
I was asked as part of a hiring challenge to analyze a network traffic capture (.pcap) that was retrieved as part of an ongoing investigation. The goal was to identify the affected hosts, how the attack happened, as well as the malware family if any. Also, I was asked to retrieve IoCs and any other pertinent intelligence.
Executive Summary:
A user ‘gregory.simmons’ downloaded and executed a malicious Microsoft Word document on 2020-07-31 00:25:37.09 UTC. The word document contained malicious VBA macros which downloaded and executed a base64-encoded and obfuscated powershell script, which retrieved another stage of the malware.
The second stage executable contained an RC4 encrypted executable with an unpacking shellcode stub at the start, and was contained in the second stage’s resources section.
Once the RC4 encrypted executable is decrypted and executed, it checks its current filename and working directory and decides: if it is running from a system-related directory, it continues with the main payload, otherwise, it copies itself into a system directory while in the process impersonating a legitimate executable, and then creates a service to execute itself in order to achieve persistence.
Once the service is executed, the malware grabs a number of host information such as the processes list, hostname, and working directory, and then encrypts them with AES128 and appends a SHA1 hash to the output, and sends it to the C2 server. Then, it retrieves encrypted information from the C2 and decides on its next steps.
The malware family at hand is Emotet.
IoCs:
NBI:
http://jambino.us/tv/DYsPb/
http://www.kappetijn.eu/wp-admin/t5Uujywz88/
http://killingworthlabs.com/wp-admin/n3tq5u168132549/
http://kevinley.com/logon/LXkUb/
http://movewithketty.com/cgi-bin/HISOotVOG/
201.235.10.215
HBI:
SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
SHA256: 0a3aaa398a6abe7a4ba256812b8b6632fa4595b4ac5c47b459d5a6a911c2d202
SHA256: 537ceaaf4b76967b916c857bf8113e6b6ccc65dca06df2d300b66b8a61d9eedc
Technical Analysis:
The artifact given for analysis is a PCAP file titled ‘traffic.pcap1’. Upon opening this file in Wireshark, we can instantly observe a decent number of packets that are captured by this file. Precisely, 4593 packets. To gain a better insight into the network traffic, we use the built-in Wireshark statistics’ tooling to determine the hosts present in our PCAP file: Statistics -> Conversations -> IPv4:
Based on this information, we can determine that we’re dealing with the traffic capture of a certain host with the IP: 10.7.31.101. This host seems to contact 2 more hosts within its local network (line 1 and two in image above), some broadcast and generic addresses (lines 3 and last 4 lines), as well as some remote addresses on the internet. Of these addresses, we can see that some of them seem to have communicated more with our host (such as IP: 201.235.10.215 which exchanged around 2MB of data with our host).
Jumping back to Wireshark’s main window, the first thing that crosses our eyes is some kerberos traffic at the start of the capture. Initially, I suspected that these could be indicative of some Active Directory exploitation attempt, and so I decided to investigate this further:
The first Kerberos packet can be found at line 4 (after the 3 TCP handshake-related packets). Examining the packet, we can see that it did not supply Pre-Auth information, which made me suspect that I was dealing with an AS-REProast attempt, but after being notified by the presumed AS in packet 5 that Pre-Auth is required, our host in question remade their request (in packet 12) with the Pre-Auth information included this time:
Examining more of the AS-REQ’s body, we can see that our host in question is attempting to login as ‘gregory.simmons’ on the machine ‘DESKTOP-DPHW305’ that are both part of the domain: ‘TECSOLUTIONS’:
The login request of our user ‘gregory.simmons’ is successful, and we can see that after that he proceeds to request a TGS ticket for the host ‘desktop-dphw305.tecsolutions.info’ on packet 23. We can also see that that TGS ticket request is granted in packet 26:
All of this indicates that everything appears to be ordinary so far regarding the Active Directory activity. However, we observe that the host in question later requests a TGS ticket for the domain controller ‘TECSOLUTIONS-DC’, and then uses that ticket to obtain access to the DRSUAPI. This made me suspect a potential DCSync or DCShadow attack:
Following my suspicion, I applied a packet filter to observe only DRSUAPI packets in the file. With that done, we can see that our host only uses regular methods from DRSUAPI, and not ones related to DCSync or DCShadow:
The remaining Active Directory traffic in the file seems benign, such as the host proceeding to fetch the Group Policy Objects applicable to the relevant user and machines of the domain.
After the AD setup is completed, we can see that the host immediately attempts to resolve the address for the hostname ‘e-dsm.com.br’. The DNS server replies with an A record giving the hostname’s IP address:
After this, we can see that our user ‘gregory.simmons’ initiates a TCP handshake with the ‘e-dsm-com.br’ at port 80, and then makes an HTTP GET request to the path: ‘/www/ZdJCAB’:
Following that, we can see that the server replies with many TCP segments, which upon complete transmission and reassembly, make up an HTTP response with code 200. One thing that is suspicious is the content of the HTTP request is a Microsoft Word file that is around 176 kB in size:
This raises suspicion given the seemingly randomized URI of the GET request, as well as the response being solely an MS Word file. So, we decide to dump it by doing: File -> Export Objects and selecting the MS Word file at hand:
After moving the MS Word document to my analysis VM, I began analyzing it for malware indicators.
First things first, I ran ‘oleid’ from the ‘oletools’ suite which gave me the following output:
From the results of ‘oleid’, and the context of our analysis, we can determine that the MS Word document at hand likely has some embedded VBA Macros within that execute malicious code. So, we then proceed to using ‘olevba’ from the ‘oletools’ suite to extract these VBA Macros.
After running ‘olevba’ with the MS Word document at hand, it extracts two VBA scripts as well as some form variables. The first VBA script defines the standard Document_open() function which is called after the MS Word document has loaded, and inside it, it runs the malicious ‘IJCNLqeundiwzfea’ function from the second VBA script:
As for the second VBA script, it is made up of 4 obfuscated functions. One of these functions is responsible for string decryption:
This function is referenced twice in the code, and it operates by removing a standard string “838h27…2n23d” from the passed string and returning the resulting value. As a result, it should be quite easy to deobfuscate the strings that are used with this function. Simply taking out all occurrences of the mentioned substring should suffice.
Another function is responsible for fetching another string form variable, decrypting it using the previously mentioned function, and returning it as the function’s output.
The third function creates an ActiveX object out of the passed parameter:
As for the final and largest function, which is the one that gets called once the document is opened, it decrypts an embedded string which denotes the ActiveX object to be created. The string as hand is stored in an encrypted format, and it can be decrypted using the aforementioned string decryption function:
We can see that the ActiveX object to be created is the ‘winmgmt:win32_process’. At this point, we can be sure that the Macro is going to create a new process.
Once this ActiveX object is called, a number of parameters are passed onto it. The most notable of which is command line string which is retrieved using the second discussed function and then decrypted using the string decryption function. We can get the value of the form variable using ‘olevba’, which returns a large encrypted string. Upon decryption we get the following string:
Once the MS Word document is loaded and the Macro is executed, it attempts to execute the aforementioned Powershell command. The command is base64 encrypted (given the ‘-e’ option), and its decrypted content (using GCHQ’s CyberChef) is as follows:
With some simple substitutions, we get the following Powershell script:
The script queries a list of different URLs and attempts to download an executable from them. Once a download is successful (i.e., the script encounters a working url), the script launches the executable in a new process and quits the powershell process. The downloaded executable is saved into the following path: ‘%USERPROFILE%/701.exe’.
Looking back at the Wireshark traffic, we can see that the host in question indeed made a DNS request to resolve the first hostname mentioned in the list above ‘jambino.us’, and then proceeded to download an executable from it. We can dump this executable in the same way we did with the MS Word document, and then begin Malware analysis on it:
Loading the file in IDA, we can see that it contains a large number of functions (4010 functions in total). The functions call graph for the program is as follows:
Zooming in more at one section, we can get a clearer idea of the number of functions that exist in the program, and the relations that exist between them:
In order to analyze this program, we will be relying on the tool ‘capa’, which should tell us the locations of interesting functions. ‘capa’ relies on an extensive and granular set of rules, so if the malware attempts to access content at an offset from the fs register for instance in order to read content from the PEB/TEB, then the tool would detect that. Once we get a list of discovered capabilities, we can rerun the tool with the ‘-vv’ option to display the locations of the discovered malware capabilities, thereby automating the process of sifting through the large set of functions and program instructions.
Running ‘capa’ on the file we get the following detected capabilities:
The capabilities that are most interesting to us are the RC4 encryption/decryption, the parsing of the PE header, and the dynamic resolution of library functions. Running capa again with the ‘-vv’ option, we get the address of the function that is performing RC4 encryption/decryption:
Looking at the function at hand, we can see that it initiates what appears to be the function initiating the S identity permutation:
Followed by the rest of the key-scheduling algorithm that mixes up the S array:
And finally a PRGA section of code whose output gets XORed with the input (switched to IDA decompilation since it was better than Binary Ninja’s):
Based on this, we can deduce the function’s signature to be the following:
After this, we check to see where this function is referenced, and luckily for us, it is referenced in only one location:
Looking at the function where the rc4 decryption function is called, we can see that it is being used to decrypt the content of a PE resource, and then jump to the start of it:
In the code above, we can see that the variable ‘v18’ contains a function pointer. This variable is set in the following code which is located before the code block listed above:
We can see that the variable ‘v18’ is set to the value returned by the function ‘sub_4012BC’, and that the latter function is called several times in the code with the first parameter being a handle to the kernel32 DLL, and second being what looks like an api hash value.
If we go back to the ‘capa’ results, we can see that that function performs PE exports table parsing, and as a consequence, is likely an api resolution function:
Looking into the function, we can see that it first locates the PE header:
And then proceeds to execute the following code which goes through the api’s present in the passed module’s export table, and then tries to find the one who’s hashed name mashes the provided api hash:
(The offsets were manually annotated using Aldeid’s PE reference: PE-Portable-executable - aldeid)
Looking into the function responsible for computing the numeric api hash values, we can see that its logic is simple and can be easily replicated:
To decrypt this, I wrote a Python script that takes in a list of the api method names that kernel32.dll exports, as well as the desired hash, and outputs the corresponding api name in plain text. The output of the code is as follows:
Unfortunately, we were able to find the api corresponding to only one hash value, but not the rest. I am unsure of the reason as to why this happened, but I was able to determine which api calls were being resolved based on the context and the signature, so I decided to move on.
The source code for the program I used above can be seen in the following image:
The method signature for the ‘v20’ variable looks very similar to that of the FindResource method:
While its output is then used as the second argument of the method pointed to by the ‘Src’ variable whose signature looks similar to that of LoadResource:
At this point, I made the educated guess that the program was retrieving the resource with the ID 19296, decrypting it with a hardcoded key, and jumping to the start of it. So, I went ahead and dumped the content of the resource with that ID using ‘CFF Explorer’:
Next, I retrieved the key used for decryption by going up the function’s xrefs:
Next, I wrote a C program that loads the encrypted contents of the resource and decrypts it using the provided key, and then saves it to a file:
Then, I loaded the shellcode into Binary Ninja, and after examining it, I could discern 3 functions in it. The first of which was merely calling into the second function with some params:
The second seemed to be performing some extended set of operations. And the third seemed to be doing some PE header parsing judging by the offsets it was using:
After the end of the final function, I could see an MZ header:
Therefore, I made the educated guess that the shellcode stub at the start was likely just performing manual loading and mapping of the PE executable into the current memory space. So, I decided to carve out the PE file at the tail of the program and take a look at it.
At the entry point, we can see that there are two calls that are made:
The first function that’s called is obfuscated with Control Flow Flattening (CFF):
As for the second function, then we can see that internally it relies on two other functions:
The ‘resolve_module’ (which I renamed after analysis) goes through the list of loaded modules found in the PEB.Ldr struct, and tries to find one of them based on a hash passed as argument:
As for the ‘resolve_api_by_hash’, then it manually parses the exports table of the module returned by ‘resolve_module’ to find an export whose name hash matches the one passed as argument:
Additionally, I also could find some strings that seemed to be encrypted in the ‘.data’ section, and who were being referenced in the code by being passed to a function which seemed to be decrypting them (based on the logic used – loop and bitwise operations).
So, I figured that there were 3 ways I could go from here:
- Write a Binary Ninja script that undoes the control flow flattening.
- Write a Binary Ninja script that automatically resolves api names and encrypted strings.
- Run the sample in a debugger or a sandbox.
Of the three options above, I opted to try the second one first.
The logic used in this sample seemed too intricate for me to replicate in a Python script, and that using something like ‘Dumpulator’ would be more efficient. So, I loaded the sample into x32dbg, executed it until it reached the entry point of the program, and dumped the process memory into a minidump.
After that, I wrote some code that interfaces with Binary Ninja Python API to retrieve all references to the api resolution function, fetches its arguments, and passes them to the emulator running the real ‘gen_call_table’ function from within ‘Dumpulator’. The main loop of the program is shown in the code below:
The ‘for’ loop iterates through all code references to the ‘gen_call_table’, which are retrieved using the Binary Ninja Python API:
Then for each reference, the loop tries to fetch the arguments for the call from either the High Level Intermediate Language (HLIL) or Medium Level Intermediate Language (MLIL) that Binary Ninja offers. Once retrieved, the code tries to retrieve the function name associated with the given arguments (module and api hashes) by emulating the ‘gen_call_table’ function from within ‘Dumpulator’ using the following function:
If a valid function name is returned, then a comment is added. In the case that the result is saved into a variable, then the variable’s name is set to the returned function’s name instead. In both cases, the resolved api call is saved into the ‘api_calls’ dictionary alongside the address where it was referenced. Running the code within Binary Ninja labels some addresses as shown in the image below. However, it seems that only kernel32 apis were referenced, and that not all calls to ‘gen_call_table’ were resolved.
Moving away from the idea of automating the resolution of api resolution, I considered the first idea briefly which is writing some code to perform control flow unflattening. However, given the many issues I encountered with trying to get Binary Ninja to work properly (such as its inner console not recognizing python packages installed system wide) I decided to forgo this idea as well and instead try to debug this code in a debugger. So, I loaded the sample in x32dbg, rebased my IDA’s disassembly view to the base address indicated by x32dbg, and put a breakpoint at the ‘gen_call_table’, since all calls to the Windows API must pass by it.
Additionally, given that I know that the sample performs network operations, I decided to leave ‘Fakenet-ng’ running in the background as well, so that I could see any data being transferred to a remote host and by which process.
After setting a breakpoint at ‘gen_call_table’, and hitting it for the first time, I pressed ‘Run until return’, and set a breakpoint there instead, and deleted the one at the start of the function. By doing that, I could inspect the contents of the ‘eax’ register which houses an address for the function being returned by the api resolution function.
Initially, the process seemed to load some libraries, as well as retrieving the path of the executable using the ‘GetModuleFileName’ method. After that, the program resolved the address for the ‘OpenSCManagerW’ function, followed by a call to it. This could be indicative of the program creating a new service:
After that, I could see that the program created an executable file in ‘C:\Windows\SysWOW64\uxlib\’ directory named ‘cmpbk32.exe’, which seemed to have an altered ‘Date Modified Time’ (presumably as a result of a time stomping attack), and then proceeded to call the ‘CreateServiceW’ WinAPI method:
After the service (named cmpbk32) was successfully created, the sample made another call to ‘OpenService’, followed by a call to ‘ChangeServiceConfig2W’ with the following parameters:
Following that, the sample creates a new process with the following arguments:
Based on the behavior described above, the sample seems to copy itself into some legitimate system directory, and then create a service with a seemingly legitimate name and description in order to avoid suspicion, and then launches the copy of itself using the ‘CreateProcessW’ api call.
After that, the process exits and control is transferred to the new process. We attach to the new process and set the same breakpoint as we did with the previous process (since it’s the same executable file) and resume execution.
Then, we encounter some calls to ‘GetComputerNameA’, as well as ‘CreateToolhelp32Snapshot’ and multiple calls to ‘Process32FirstW’ and ‘Process32NextW’. This seems to be collecting some information about the system.
Then, we see a call to ‘advapi32.CryptEncrypt’ with the following parameters:
Following the last parameter in x32dbg dump (which is the address of the buffer to be encrypted), we can see that it contains some system information:
Prior to the encryption process, we could examine the setup for the CSP with the specified ALG_ID that corresponded to AES128, as well as setting up a hashing key with an ALG_ID that corresponded to SHA1.
Once the encrypted text is generated, a hash is also computed, and then appended to the text. After that, we could see multiple calls to ‘ntdll.RtlRandomEx’ being used to generate what seems to be a randomized string (which would presumably become the URI for HTTP requests to the C2 server).
Following that, we also see some HTTP-related format strings appearing in some of the registers and on the stack, presumably for the construction of the HTTP request’s body:
The process then makes a call to the ‘ObtainUserAgentString’ string, followed by a call to ‘InternetOpenW’, and another to the ‘InternetConnectW’ with the following parameters:
We recognize the C2’s IP address from our Wireshark PCAP file. The Process then makes a call to ‘HTTPOpenRequestW’ and and ‘HTTPSendRequest’:
The arguments for the the latter of the two are as follows:
Once we let the execution of the ‘HttpSendRequestW’ call, we see the traffic immediately on out ‘FakeNet-ng’ window:
This traffic looks very similar to the one that can be observed on Wireshark.
Continuing from this, we can see that the sample reads some input, checks for the HTTP 200 OK code, and then decrypts the content using ‘advapi32.CryptDecrypt’.
At this point in the analysis, I made two (Windows minidump) memory dumps, and was considering using ‘Dumpulator’ to decrypt the other C2 traffic on Wireshark. However, I was not sure whether the AES128 and RSA keys used to encrypt and decrypt traffic were the same, and whether there is an inner function in the sample that generates the corresponding keys for that. So, I decided to not sink more time into that given that I had already obtained enough information about the sample as well as many IoCs.
The next step was to determine the malware family, and in order to do this, I began by searching for the hardcoded C2 server’s IP address on Google along with the keyword: “IoC”. One of the top results I received was a document from Bangladesh’s Computer Incident Response Team (CIRT) that contained many IoCs for the ‘Emotet’ malware family: Emotet Malware IOC -1.xlsx
Among the results was the C2 IP address we encountered.
Based on this information, I started reading more about ‘Emotet’, and this outline of the pre-2021 version of Emotet from VMRay seemed to match up well against what I observed during my analysis: Malware Analysis Spotlight: Emotet’s Use of Cryptography - VMRay.