Reverse-Engineering checklist

This page is a work in progress

The idea of this page is to provide a decision tree of steps to follow when reverse-engineering a new malware sample. I won’t go into details about each step, but I’ll try to provide some useful links, refernces, and the high level approaches that are possible.

I usually update this page when I analyse a sample, once in a while.

Anyway, let’s start. First, →check the file type.


Check file type

You need to know a type of the file you’re dealing with. This be done manually, or automated with yara. The best, and simplest, way to do this is to run file command from the command line. Then inspect the result:

  • PE32 executable (GUI) Intel 80386, for MS Windows, Nullsoft Installer self-extracting archive, 5 sections - →NSIS

Otherwise check →other filetypes


Other filetypes

(TODO: migrate most of these to file)

For other types, you can try exe_kind.yar that will detect some of them (later I will write more ideas about how to detect the file type).

yara exe_kind.yar .

Depending on the file type:


NSIS

NSIS means Nullsoft Scriptable Install System. It supports self-extracting archives. The archives contain embedded files, and a NSIS script. Unfortunately I don’t know how to extract the embedded script on Linux. Sometimes you can get away by just extracting files:

7z x malware.exe

But this won’t give you a decompiled script. If you need it, there are several windows-only options, see https://nsis.sourceforge.io/Can_I_decompile_an_existing_installer%3F. I use 7z-build-nsis, in the version included in FlareVM. To extract the archive, just open it in 7z-nsis (regular build won’t work) and extract the files somewhere:

There is also a Ghidra NSIS extension but I never got it to work on a maleware sample.


Pyinstaller

In general, reverse-engineering of Python programs consists of two steps:

  • dumping the bytecode
  • analysing the bytecode

In case of Pyinstaller, the situation is usually simple because there are off-the shell tools of high quality. I recommend pyinstxtractor-ng, another older option is pyinstxtractor.

python3 ~/opt/pyinstxtractor.py malware.exe

If you managed to get the bytecode, continue from →Python Bytecode.

Otherwise, your best bet is dynamic analysis (the plain bytecode eventually lands in memory. Additionaly, all binary files are saved to a temporary location too).


Python Bytecode

You need to either disassemble or decompile the code. Try, in order:

  • pycdc - current state of the art of python decompiler (doesn’t always work).
  • pyc disassembly - disassemble the bytecode and analyse manually

pycdc

The best currentl decompiler is pycdc. If you’re lucky it’s packaged in your distro, otherwise you need to clone the repository and compile it yourself.

nix-shell -p pycdc
pycdc _extracted_malware/malware.pyc > malware.py

If it worked, you now have a decompiled Python script that should be a breeze to decompile: →success.

If it didn’t work, you’ll have to read →pyc disassembly.


pyc disassembly

In the worst case, you can always read the disassembled bytecode directly. Your options are:

  • dis - built-in Python bytecode disassembler
  • xdis - pure Python library for disassembling bytecode

dis

Surprisingly, even though Python includes a dis module, I’m not aware of any built-in script to actually disassemble Python bytecode. So you need to save this somewhere:

import dis, sys, marshal

path = sys.argv[1]
with open(path, "rb") as f:
    f.seek(16)
    dis.dis(marshal.load(f))

Run it like this:

nix-shell -p python313  # You need a correct python version!
python3 view_pyc_file.py malware.pyc > malware.pybc

The caveat is that that this depends on the Python version, so you need to use the same Python version for disassembly as the Python it was compiled with.

If you’re lucky and it worked, go →read the bytecode. Otherwise, try →xdis.


xdis

You may also consider xdis, which is pure Python library for disassembling Python bytecode. It is independent of the interpreter version, so you don’t have to worry about installing a correct Python.

TODO example

This should work. If it did, go →read the bytecode. Otherwise, it’s possible you’re dealing with an obfuscated bytecode, or even worse - a custom CPython build. I don’t have anything written up about that yet: →bad end.


Read the bytecode

Any text editor will do. I just want to mention vscode-python-bytecode-highlight, my vscode extension for python bytecode syntax highlighting. It colours the bytecode nicely and makes some links clickable - but that’s it. You can use any other editor.

That’s hopefully enough to →finish.


Delphi

First, load the binary to IDR (https://github.com/crypto2011/IDR) in a Windows VM. This may take a long time.

Then export as a .IDC script.

Then use https://github.com/huettenhain/dhrake and DhrakeInit.

That’s all I have →for now.


Dotnet

This means that the executable is a .NET assembly. Now, depending on how obfuscated the sample is:

  • Use →ilspycmd for quick analysis and triage.
  • Use →dnSpy for serious reverse-engineering.
  • Use →dnLib for automating the deobfuscation process.
  • Use →dotnet sdk to quickly unpack malware by reusing the unpacker’s code.
  • Use →Visual Studio if dotnet-sdk doesn’t work.
  • Check out →dbglib for obfuscated samples that elude dnSpy debugger.

ilspycmd

For lightweight analysis, I recommend ilspycmd.

sudo docker run -v .:/docker --rm -ti berdav/ilspycmd -c "cd /docker; /home/ilspy/.dotnet/tools/ilspycmd -p malware.exe -o out_dir"

This will decompile dotnet_malware.exe to out_dir. After that you can easily open the decompiled code in your favorite editor, or use standard Linux commands (like grep) to analyse it.

The downside is that this code is fully static, it’s not possible to debug it, and there is no east way to deobfuscate anything. In some cases ilspycmd will flat out refuse to decompile some of the code. In this case, you may have to turn to →dnSpy.


dnSpy

For heavyweight analysis, I recommend https://github.com/dnSpyEx/dnSpy. It’s a GUI, but it’s very powerful and contains a built-in debugger. Unfortunately, it’s Windows-only - at least I prefer to stay as much as possible in Linux for reverse-engineering.


dnLib

DnSpy is based on dnLib, which is a .NET library for when you really need to get your hands dirty and start scripting the analysis - see for example this XWorm analysis (the idea is clear, unfortunately full framework was not open-sourced).

Since I mostly script in Python, I use pythonnet a crazy library that brings .NET to Python. This allows me to use dnlib like This:

from dnlib.DotNet.Emit import OpCodes

def get_string_default_values(typeobj):
    """Get all variables initialised to a string as a dict.
    Ignore other initialisation code.
    typeobj is TypeDefMD from dnlib.DotNet."""
    static_ctor = typeobj.FindStaticConstructor()
    if not static_ctor:
        return {}

    result = {}
    code = static_ctor.Body.Instructions
    for i in range(len(code) - 1):
        if code[i].OpCode == OpCodes.Ldstr:
            if code[i+1].OpCode == OpCodes.Stsfld:
                fieldname = code[i+1].Operand.Name.String
                fieldvalue = code[i].Operand
                result[fieldname] = fieldvalue
    return result

For example, that is a code snippet that extracts all string initialisations from a static constructor. It’s an extremely useful piece of code when writing automatic extractors for .NET stealers (of which there are plenty)..


dotnet sdk

A powerful unpacking method is a code reuse. For example, let’s say decompiled malware contains this line:

GCP gCP = (GCP)Marshal.GetDelegateForFunctionPointer(GetProcAddress(hModule, Reverse(Decipher("zzljvyWaulyybJalN", 7))), typeof(GCP));

Instead of reverse-engineering Deciper, you can just reuse the code in your own small tool:

nix-shell -p dotnet-sdk
dotnet init
vim Program.cs

And put this code in Program.cs:

internal class Program
{
	// Reverse and Decipher methods, copied from decompiled source code.

	private static void Main(string[] args)
	{
		Console.WriteLine(Reverse(Decipher("jvssHzsM", 7)));
	}
}

And just run it:

dotnet run

it goes without saying, that you should be careful when running malware code - I recommend doing this in a Docker container or a Virtual Machine.

Visual Studio

dbglib

Honourable mention goes to dbglib, which is my library for native low-level .NET debugging. See this DotRunPeX analysis for a tutorial how to use it.


golang

Well, you’re in for an adventure. Golang is not the easiest language to reverse-engineer. By default the decompiled binary looks like trash, but with help of a few tools it’ll get slightly better.

My main decompiler/disassembler is Ghidra. Most of the hits should transfer to other tools, but this guide is opinionated and I’m not going to cover them.

First, load the executable to Ghidra. Since recently, Ghidra has a basic Golang support, so remember to select x86:LE:32:default:golang language instead of the default choice.

The support is not great right now, though. To get the symbols, we will use GoReSym:

nix-shell -p goresym
GoReSym -t -d -p malware.exe > goresym.json

Then, load the symbols to Ghidra using a helper script (goresym.py) (TODO: modified cerberus, upload and link).

For malware, most likely the binary is obfuscated and you will still see junk function names:

But that’s still a step up compared to what you get by default.

After that, you can try ghostrings to recover strings from the binary:

Install the extension, and run several of the included scripts:

  • GoDynamicStrings.java
  • GoStaticStrings.java
  • GoKnownStrings.java
  • GoFuncCallStrings.java

Some of the scripts will take a while. Go make a coffee (you will need it).

After that, you should have at least strings defined, and the methods named. Reverse engineering will still be a pain, but at least you have some context.

That’s all I have →for now.


Office file

Office files include:

  • files with .docx, .xlsx, .pptx extensions
  • files with .docm, .xlsm, .pptm extensions (with macros)

A good way to start analysis is to use oletools. Use oleid to check if there are any macros:

nix-shell -p python310Packages.oletools
oleid sample.xlsx

See also this writeup.

If you found macros, you can proceed with oletools - for example by using olevba.

nix-shell -p python310Packages.oletools
olevba -c sample.xlsx > source.vba

(this will put some additional headers in the extracted snippet. The reason is that there may be multiple VBA snippets in the file. Remove the header and footer manually)

If you didn’t find any macros, you can start a more low-level analysis. First, extract the file with 7zip

7z x malware.xlsx

This will extract the embedded files. Usually the interesting files are macros, but as you can notice, in this case the payload is in embeddings.

In this case, you can use oledir to dump the OLE container:

As you can see, there is a stream called OLe10nATive, and oledir even tells us that it’s related to a CVE.

You can extract this file with oletools, or use 7z again.


Other executable type

TODO


Other unsorted notes

Check compiler (nauz)

git clone https://github.com/horsicq/Nauz-File-Detector.git
sudo docker build . -t nauz
sudo docker run -v /home/msm/data:/home/msm/data nauz nfdc /home/msm/data/2024-12-18_ov8865sys/9c52d750eba2f72bdd38bcaf950da7f1128d5235223091d98dad2cc7146716fa

Gx64Sync

ImHex

floss -j binary.exe –only stack tight decoded > floss.json


Dumps

malduck fixpe

Themida

Entrypoint starts with a call:

e8 82 01 00 00    CALL       FUN_141fdd237
41 52             PUSH       R10
49 89 e2          MOV        R10,RSP
41 52             PUSH       R10
49 8b 72 10       MOV        RSI,qword ptr [R10 + local_res8]
49 8b 7a 20       MOV        RDI,qword ptr [R10 + local_res18]
fc                CLD
b2 80             MOV        DL,0x80

And later there is a tree of “if” statements:

  if (bVar12) {
    bVar11 = CARRY1(bVar7,bVar7);
    bVar7 = bVar7 * '\x02';
    bVar12 = bVar11;
    if (bVar7 == 0) {
      bVar7 = *local_res8;
      local_res8 = local_res8 + 1;
      bVar12 = CARRY1(bVar7,bVar7) || CARRY1(bVar7 * '\x02',bVar11);
      bVar7 = bVar7 * '\x02' + bVar11;
    }
    if (bVar12) {
      bVar11 = CARRY1(bVar7,bVar7);
      bVar7 = bVar7 * '\x02';
      bVar12 = bVar11;
      if (bVar7 == 0) {
        bVar7 = *local_res8;
        local_res8 = local_res8 + 1;
        bVar12 = CARRY1(bVar7,bVar7) || CARRY1(bVar7 * '\x02',bVar11);
        bVar7 = bVar7 * '\x02' + bVar11;
      }

The called function looks like this in Ghidra:

void FUN_141fdd237(void) {
  puVar2 = (undefined8 *)&stack0xffffffffffffffd8;
  pcVar1 = unaff_retaddr + -0x1de20b5;
  pcStack_30 = unaff_retaddr;
  if (*(int *)(unaff_retaddr + -0xf9f66e) == 0) {
    uStack_38 = 0;
    uStack_48 = 0;
    pcStack_40 = pcVar1;
    pcStack_30 = pcVar1;
    (*unaff_retaddr)();
    puVar2 = &uStack_48;
    pcVar1 = unaff_retaddr + 0x1cf;
  }
  (*(pcVar1 + 0xe42a47))(*(undefined8 *)((longlong)puVar2 + 0x20),*(undefined8 *)((longlong)puVar2 + 0x18));
  return;
}

This is the first layer of the packer. To unpack dynamically, add a breakpoint at the ret 0x20 instruction in the entrypoint and then run the binary.

Second stage is much more interesting, but also I currently don’t have a writeup for it.


Powershell

  • PSDecode

Success

Hopefully you arrived here by following the checklist and found a reverse-engineering method that works for you.

…unless there’s a second stage, then start from the beginning.


Bad end

I’m sorry, but it looks like this checklist is not able to help you for now.