In the ever-evolving landscape of malware threats, threat actors are continually creating new techniques to bypass detection. A recent discovery by JPCERT/CC sheds light on a new technique that involves embedding a malicious Word document within a seemingly benign PDF file using a .doc file extension.
In the file, researchers discovered the malicious Word document contained a magic header signature, `%PDF-1.7` - a header normally associated with PDF files. Furthermore, following the fake PDF structure, they came across a MIME encapsulation of aggregate HTML documents (MHTML Web Archive) containing an embedded Base64 encoded ActiveMIME object. ActiveMIME is an undocumented Microsoft file format that contains a ZLIB-compressed data often used to store VBA Macros.
Figure 1. Fake PDF header complete with PDF tags structures
Figure 2. Embedded in the file is a Word document in MHT format after the Fake PDF object.
As you may be aware, the delivery of malicious documents via MHT files is not new. We have been observing this type of malware as early as 2015. One noteworthy, but unfortunate, aspect of Microsoft Office is its wide array of file formats that can house malicious macros, allowing for unconventional file containers.
Figure 3. A document file can be saved with various formats including MHT or MHTML.
By changing the file’s extension to ‘.doc’, the Microsoft Office application can open the embedded MHT file and execute embedded malicious macros if they are enabled by the user.
There are several obfuscation techniques employed in these new samples, all designed to evade detection based on signatures.
Figure 5. ActiveMime Object uses a non-conventional MIME type in the MIME header, moreover, the Base64 encoded string is fragmented. ActiveMime comprises zlib-compressed data starting at offset 0x32, then a standard OLE file housing a VBA macro project.
Figure 6. The document conceals an MSO link to the ActiveMime Object by obfuscating it through URL percent-encoded characters.
We experimented with this file, and what's particularly concerning is this embedded MHT document file doesn't actually require a PDF header. Any text preceding the MHT file will still allow MS Word to open the document file and execute the malicious macro if enabled. This was not clear in the early reports of this flaw.
Figure 7. Removing the PDF header doesn’t hinder the MS Office application from opening the malicious MHT document.
This manoeuvre can evade signature-based detection systems that specifically scan for a PDF header. This is highlighted in the VirusTotal results displayed below: the DOC file with a PDF header is detected by 35 out of 59 vendors, whereas without the PDF header, only six out of 59 vendors detect the sample.
Figure 8. Comparing the VirusTotal detection results between a sample with a fake PDF header and one with the PDF header removed
To wrap up, the range of techniques used in this attack, from using non-compliant file headers and MIME types, to fragmented Base64 encoding strings, highlights a clever approach to evading traditional detection mechanisms used by most anti-virus engines. Equally concerning is the fact that MHT document files can be concealed within a plain text file, allowing Microsoft Word to open them seamlessly.
Here is some key mitigation advice to protect users:
Also, here’s a YARA rule, based on JPCERT’s YARA suggestions, to identify potential malicious macros embedded in files without conducting PDF header checking:
rule suspect_malware_mht_activemime { meta: desc = "MHT document with ActiveMime Object" reference = "https://blogs.jpcert.or.jp/en/2023/08/maldocinpdf.html" strings: $mhtfile1 = "mso" nocase ascii $mhtfile2 = "nextpart" nocase ascii $mhtfile3 = "mime" nocase ascii $mhtfile4 = "content-location:" nocase ascii $mhtfile5 = "content-type:" nocase ascii $mhtfile6 = /multipart\/(related|mixed)/ nocase ascii $mhtfile7 = "base64" nocase ascii $mhtfile8 = /\s-Version:/ nocase ascii $wordfile = "<w:WordDocument>" nocase ascii $excelfile = "<x:ExcelWorkbook>" nocase ascii $activemime1 = /\nQ\s{0,999}?W\s{0,999}?N\s{0,999}?0\s{0,999}?a/ // Base64 encoded 'ActiveMime' with spaces $activemime2 = "ActiveMime" base64 condition: all of ($mhtfile*) and ($wordfile or $excelfile) and ($activemime1 or $activemime2) } |
Figure 9. JPCERT YARA rule suggestions to identify potential malicious macros embedded in files without conducting PDF header checking