Emotet malspam campaign exploits reliance on magic for file type detection

Emotet is a Trojan designed to steal banking information. It is frequently spread by sending phishing emails to governments, banks, healthcare organizations, and schools. The phishing emails will often claim to be an invoice, with a malicious Microsoft Word document attached. The email may often appear to be from a trusted supplier. Once the attachment or link is opened, the target is prompted to click “Enable content”, which would allow the dropper to install Emotet.

Screenshot of a Emotet dropper document open in Microsoft Word 2016.
The document clams that the user must click “enable content” to view it, but doing so would actually install malware

I recently encountered two Emotet dropper samples (0b9ccb04553ba5f1ce784630ef9b2c478ed13a96e89c65dcd9c94205c235ea12 and eff6619aee017ee5d04c539ff12c63a199a1e489660f7156b95e562667393d3c) that would not run correctly in my malware sandbox. I soon found the cause of the problem: the file type had been detected as a generic XML file, rather than what it really is: a Microsoft Word document.

Modern Microsoft Office files (.docx, .xlsx, and .pptx) are XML documents inside a ZIP archive file. The OS knows to open these files as Office files instead of ZIPs based on the determined file type. On Windows systems, file types are based on the file extension part of the filename. On UNIX, Linux, and Mac, file type detection is based on magic, literally. Magic strings are signatures, consisting of specific sequences of bytes of characters that can be used to identify a file. A common software library for file type detection is libfile. You can see it in action by using the file command on a Linux system. For example, the file command generates this output when ran against a docx file:

$ file hello-world.docx
hello-world.docx: Microsoft Word 2007+

But, the attackers figured out that if you extract the word\document.xml file from a .docx ZIP archive save the Office document as a Office 2003 XML file, and rename it with a Microsoft Word file extension, such as .doc it will still open as a Word document on Windows systems. Unix systems using filemagic, on the other hand, consider it to be a plain XML file, because it is plain text content starts with <?xml.

Update: in0d3 pointed out that these are actually Office 2003 XML files, not extracted OOXML Office 2007+ files like I initially thought.

$ file Untitled_attachment_20190123.doc

Untitled_attachment_20190123.doc: XML 1.0 document, ASCII text, with very long lines, with CRLF line terminators

This gives the dropper an advantage: many email gateways and security appliances (including my sandbox) will treat the file as a plain XML file, and not treat it with much suspicion, while the Windows system will happily open Word.

Fortunately, a Microsoft office document as raw XML that also contains a macro is also not a normal occurrence at all. It is very easy to detect with yara.

Let’s take a look at the content of one of these files:

A screenshot of a malicious Microsoft Word document as a raw XML file

This file is not easy on the eyes

Here’s what the same file looks like after running a XML/HTML beautifier to make it more human readable using whitespace:

A screenshot of a malicious Microsoft Word document as a raw XML file after being run through a XML/HTML beautier
Much better

From here, we can see the <?xml declaration, and the macrosPresent=yes flag, both of which will come in handy when writing a yara rule.

Scrolling down past the Word document boilerplate, we can see a chunk of data encoded in base64, enclosed in binData tags. That is the obfuscated macro content.

A screenshot of base64 encoded binary content in a Microsoft Word XML document

Here’s a Yara rule that looks for:

  1. <?xml (it’s a XML file)
  2. macrosPresent=yes (The flag required to have macros in a modern Microsoft Office document)
  3. binData (The XML tag that encloses arbitrary base64 encoded data)
rule obfuscated_office_macro_xml: TLPWHITE
        date = "2019-01-25"
        author = "Sean Whalen - @SeanTheGeek"
        description = "Detects obfuscated macros in uncompressed Microsoft Office documents, as seen in a January 2019 Emotet dropper campaign"
        sample_sha256 = "0b9ccb04553ba5f1ce784630ef9b2c478ed13a96e89c65dcd9c94205c235ea12 eff6619aee017ee5d04c539ff12c63a199a1e489660f7156b95e562667393d3c"
        reference = "https://seanthegeek.net/598/emotet-malspam-campaign-exploits-reliance-on-magic-for-file-type-detection/"

        $xml = "<?xml" ascii wide fullword nocase
        $macros_flag = "macrosPresent=\"yes\"" ascii wide fullword nocase
        $binData = "binData" ascii wide fullword nocase 

        all of them

As a member of the Yara Exchange, I get access to VirusTotal Enterprise in exchange for sharing Yara rules with the Exchange members, which would otherwise be a cost prohibitive subscription. VT Enterprise includes a feature called Retrohunt, which lets you run your Yara rules against all samples uploaded to VirusTotal in the last six months.

A screenshot of Retrohunt results
4583 matches!

Here’s the full list of Emotet dropper SHA256 hashes that matched my Yara rule on retrohunt:

Emotet mitigations

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.