msm's home

RULECOMPILE - Undocumented Ghidra decompiler rule language

Mon, 30 Dec 2024 00:00:00 +0000

My laptop only has one USB port.

I understand that the times are changing. Wireless devices are more and more popular, wi-fi and Bluetooth takes over that pesky cables and bulky physical connectors. We’re entering a new cable-less era.

This doesn’t change the fact, that my mouse and my keyboard connect over USB. That’s two devices. And I want both.

So of course I’ve decided to write a Bluetooth keyboard proxy using my Raspberry Pi as a gateway. I couldn’t find any working tutorials, and after a night of cursing at the screen, I’ve decided to share my findings with future generations.

The code should work with any Bluetooth-enabled Linux device. I’ve tested the code on Raspberry Pi and Ubuntu, non-Debian distributions may require a bit different steps.

D-Bus, how does it work?

D-Bus is a IPC framework designed to facilitate communication between multiple processes in a composable and extensible way. It’s a shared channel used by applications to communicate. For example, I can ask Spotify to change song using dbus-send:

$ dbus-send --print-reply --dest=org.mpris.MediaPlayer2.spotify /org/mpris/MediaPlayer2 org.mpris.MediaPlayer2.Player.Next

What just happened is I’ve called a org.mpris.MediaPlayer2.Player.Next “method” on the /org/mpris/MediaPlayer2 object using org.mpris.MediaPlayer2.spotify connection.

Bluez is a Linux implementation of Bluetooth, and it has a D-Bus API ¹. It implements the org.freedesktop.DBus.Introspectable interface, so we can look take a look at it via cli:

pi@tmicro:~ $ sudo gdbus introspect --system --dest org.bluez --object-path /org/bluez
node /org/bluez {
  interface org.freedesktop.DBus.Introspectable {
    methods:
      Introspect(out s xml);
    signals:
    properties:
  };
  interface org.bluez.AgentManager1 {
    methods:
      RegisterAgent(in  o agent,
                    in  s capability);
      UnregisterAgent(in  o agent);
      RequestDefaultAgent(in  o agent);
    signals:
    properties:
  };
  interface org.bluez.ProfileManager1 {
    methods:
      RegisterProfile(in  o profile,
                      in  s UUID,
                      in  a{sv} options);
      UnregisterProfile(in  o profile);
    signals:
    properties:
  };
  interface org.bluez.HealthManager1 {
    methods:
      CreateApplication(in  a{sv} config,
                        out o application);
      DestroyApplication(in  o application);
    signals:
    properties:
  };
  node hci0 {
  };
};

We see three interfaces and one node.

We can call methods the interfaces

which we can introspect further with

sudo gdbus introspect --system --dest org.bluez --object-path /org/bluez/hci0
# (...)

Other useful debugging commands are:

sudo btmon (monitor Bluetooth activity on the device)
sudo busctl monitor [service] (introspect selected bus)

Keyboard service

To serve as a keyboard we only need to do two things: register our service and wait for connection.

Registration is done by calling the org.bluez.ProfileManager1.RegisterProfile method with appropriate parameters.

# UUID for HID service (1124)
# https://www.bluetooth.com/specifications/assigned-numbers/service-discovery
UUID = "00001124-0000-1000-8000-00805f9b34fb"
PROFILE_DBUS_PATH = "/bluez/msm/bluekeyboard"

print("Registering the profile...")
opts = {
    "Role": "server",
    "RequireAuthentication": False,
    "RequireAuthorization": False,
    "AutoConnect": True,
    "ServiceRecord": (Path(__file__).parent / "service.xml").read_text(),
}
bluez = bus.get_object("org.bluez", "/org/bluez")
manager = dbus.Interface(bluez, "org.bluez.ProfileManager1")
manager.RegisterProfile(PROFILE_DBUS_PATH, UUID, opts)

Looks pretty straightforward, right? Wait, what is service.xml? Glad you’ve asked (service.xml):

<?xml version="1.0" encoding="UTF-8" ?>

<record>
    <attribute id="0x0001">
        <sequence>
            <uuid value="0x1124" />
        </sequence>
    </attribute>
    <attribute id="0x0004">
        <sequence>
            <sequence>
                <uuid value="0x0100" />
                <uint16 value="0x0011" />
            </sequence>
            <sequence>
                <uuid value="0x0011" />
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x0005">
        <sequence>
            <uuid value="0x1002" />
        </sequence>
    </attribute>
    <attribute id="0x0006">
        <sequence>
            <uint16 value="0x656e" />
            <uint16 value="0x006a" />
            <uint16 value="0x0100" />
        </sequence>
    </attribute>
    <attribute id="0x0009">
        <sequence>
            <sequence>
                <uuid value="0x1124" />
                <uint16 value="0x0100" />
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x000d">
        <sequence>
            <sequence>
                <sequence>
                    <uuid value="0x0100" />
                    <uint16 value="0x0013" />
                </sequence>
                <sequence>
                    <uuid value="0x0011" />
                </sequence>
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x0100">
        <text value="Raspberry Pi Virtual Keyboard" />
    </attribute>
    <attribute id="0x0101">
        <text value="USB > BT Keyboard" />
    </attribute>
    <attribute id="0x0102">
        <text value="Raspberry Pi" />
    </attribute>
    <attribute id="0x0200">
        <uint16 value="0x0100" />
    </attribute>
    <attribute id="0x0201">
        <uint16 value="0x0111" />
    </attribute>
    <attribute id="0x0202">
        <uint8 value="0x40" />
    </attribute>
    <attribute id="0x0203">
        <uint8 value="0x00" />
    </attribute>
    <attribute id="0x0204">
        <boolean value="false" />
    </attribute>
    <attribute id="0x0205">
        <boolean value="false" />
    </attribute>
    <attribute id="0x0206">
        <sequence>
            <sequence>
                <uint8 value="0x22" />
                <text encoding="hex" value="05010906a101850175019508050719e029e715002501810295017508810395057501050819012905910295017503910395067508150026ff000507190029ff8100c0050c0901a1018503150025017501950b0a23020a21020ab10109b809b609cd09b509e209ea09e9093081029501750d8103c0" />
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x0207">
        <sequence>
            <sequence>
                <uint16 value="0x0409" />
                <uint16 value="0x0100" />
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x020b">
        <uint16 value="0x0100" />
    </attribute>
    <attribute id="0x020c">
        <uint16 value="0x0c80" />
    </attribute>
    <attribute id="0x020d">
        <boolean value="true" />
    </attribute>
    <attribute id="0x020e">
        <boolean value="false" />
    </attribute>
    <attribute id="0x020f">
        <uint16 value="0x0640" />
    </attribute>
    <attribute id="0x0210">
        <uint16 value="0x0320" />
    </attribute>
</record>

Ok, what is that? It’s an SDP record. SDP (Service Discovery Protocol) records describe characteristics of the device that may be used by remote devices. I won’t explain how it works: partly because it’s out of scope of this post, and partly because I have no idea myself. But it works.

The second part of the puzzle is waiting for a connection. Fortunately Python3 natively supports Bluetooth sockets, so no external dependencies are required:

scontrol = socket.socket(AF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP)
scontrol.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
scontrol.bind((address, P_CTRL))
scontrol.listen(1)

sinterrupt = socket.socket(AF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP)
sinterrupt.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sinterrupt.bind((address, P_INTR))
sinterrupt.listen(1)

scontrol, sinfo = scontrol.accept()
print(f"Connected on the control socket {sinfo[0]}")

cinterrupt, cinfo = sinterrupt.accept()
print(f"Connected on the interrupt channel {cinfo[0]}")

Scan codes

Only one thing left to code - we want to send keystrokes to the connected device. This is done by sending a command packet to the socket. The format is:

[0xA1, 0x01, modifier, 0, key0, key1, key2, key3, key4, key5]

But remember that you always have to notify the remote that the keys were released, by zeroing out the keys in the next packet. So let’s implement that:

def send_char(char, cinterrupt):
    keycode, shift = char_to_keycode(char)
    modkey = (1 << 6) if shift else 0
    cinterrupt.send(bytes([0xA1, 1, modkey, 0, keycode, 0, 0, 0, 0, 0]))
    time.sleep(0.01)
    cinterrupt.send(bytes([0xA1, 1, 0, 0, 0, 0, 0, 0, 0, 0]))
    time.sleep(0.01)

The trick is that we’re sending keycodes, not chars. You can observe keycodes with many utilities, for example, xev. I couldn’t find an easy way to convert char to keycode in python, so I went the easy way and just hardcoded the ones I needed (keycodes.py):

def char_to_keycode(char):
    keymap = {
        "1": (30, False),
        "2": (31, False),
        "3": (32, False),
        # ...
        "!": (30, True),
    }
    return keymap[char]

For the demo we’ll just read user input in a loop and send it char by char to the remote:

while True:
    text = input()
    for c in text + "\n":
        send_char(c, cinterrupt)

That’s all! Put all the pieces together in bluetooth_server.py.

Connect the victim

We’re almost done! Now we must disable input plugin in Bluetooth, otherwise the keyboard code will not work:

sudo vim /etc/systemd/system/bluetooth.target.wants/bluetooth.service

Add the -P input parameter:

9c9
< ExecStart=/usr/lib/bluetooth/bluetoothd
---
> ExecStart=/usr/lib/bluetooth/bluetoothd -P input

And restart the service:

sudo systemctl daemon-reload
sudo systemctl restart Bluetooth

Make sure that the service runs on your machine and has the expected parameter:

$ ps aux | grep bluetoothd
root  230  0.0  0.0  8628  5108 ? Ss   02:08   0:00 /usr/lib/bluetooth/bluetoothd -P

Time to start our program:

$ sudo python3 bluetooth_server.py
Registering the profile...
Waiting for connections...

Now let’s connect the victim to your new “keyboard”. Start the agent with bluetoothctl:

$ sudo bluetoothctl
Agent registered
[bluetooth]# power on
Changing power on succeeded
[bluetooth]# discoverable on
Changing discoverable on succeeded
[CHG] Controller 00:21:5C:B0:89:56 Discoverable: yes
[bluetooth]# default-agent 
Default agent request successful

Your machine should now be discoverable. You’ll need to confirm pin in the terminal (and optionally authorize some services):

[NEW] Device 23:1D:C1:F4:10:1D Z2K21E1
Request confirmation
[agent] Confirm passkey 216956 (yes/no): yes

If everything went right, the remote machine should now connect to your “keyboard” and you can send your keystrokes. And without any USB cables. The future is now.

Closing thoughts

The (second) real reason for this post was that I’ve wanted to play with dbus and bluetooth for a long time. Now I had a good reason to do both.

My day job and main interest is security, so obviously I immediately thought how to abuse this. And I’m not impressed - I can easily masquerade my keyboard-wannabe Raspberry Pi as an audio player device. It’s easy to imagine an attack scenario where someone pairs with innocent looking speakers, only to be hacked by injected keystrokes. Nevertheless, I’m glad the USB flaws are not going away, and we can port Rubber Ducky to Bluetooth.

All the code for this post is on Github: https://github.com/msm-code/RandomCodes/tree/master/bluetooth-keyboard.

In fact, the API changed completely a few years ago, and that’s the reason why old code doesn’t work anymore. ↩︎

RULECOMPILE - Undocumented Ghidra decompiler rule language

Mon, 30 Dec 2024 00:00:00 +0000

Or “How I got annoyed by a poor decompilation so I unearthed a hidden Ghidra feature”

TLDR: there is a (undocumented and disabled by default) feature in the Ghidra decompiler that lets you create your own decompiler passes, using a custom DSL. I leverage it to write a deobfuscation rule for a simple obfuscation technique.

Story Setup - introduction and problem statement
Decompiler 101 - building and using Ghidra decompiler directly
RULECOMPILE - a curious #define flag from the decompiler source
A forgotten language of dragons - reverse-engineering a forgotten code pattern matching DSL
How to train your dragon - how to write a rule that is actually useful
Conclusion - parting thoughts

Story setup

It all started with this one missed deobfuscation:

Do you know what this function does? Take a few seconds to think about it, if you want.

…

Yes, this is just a simple incrementation. Consider the lowest bit of param_1:

if param1 & 1 is 0, then (param1 & 1) * 2 is zero too, and param1 ^ 1 is just param1 + 1.
if param1 & 1 is 1, then (param1 & 1) * 2 equals 2, but param1 ^ 1 is now param1 - 1, so the result is param1 + 1 again.

What can we do to clean this up? There are several possible courses of action. Let’s consider them one by one:

Ignore the problem

We can ignore the problem entirely and live with it - the pattern is easy to spot, and we already recognise it. This is probably the sanest option. As you may guess, I didn’t pursue it, though maybe I should.

Fix it with a script

Maybe we can write a Ghidra script that will fix the decompilation?¹ Well, no. Unfortunately, Ghidra is not very flexible when one wants to influence the decompiler. The decompiler is almost a blackbox that we can just nudge in the right directions. Almost all analysis happen in there:

Raw bytes are disassembled and translated to Pcode using SLEIGH
Pcode is optimized and lifted to high Pcode²
High-level programing language structures - loops, conditionals, control flow - are recovered³
A tokenized version of decompiled source code is generated and sent back to Ghidra.

With few exceptions, none of these steps can be influenced by a script/extension/program annotations.

Patch the assembly code

This one is actually doable - we can patch the binary assembly code and replace it that whols section with incrementation. But that has a lot of downsides:

Our code is now architecture dependent - even though the rule is very generic, and Ghidra works with dozens of architectures thanks to PCode.
It may be very hard to match the obfuscation pattern - especially if the code we see is already a simplified version of the original, much more obfuscated code.
We have to patch the binary, which is invasive and may make the binary unrunnable.

In practice this is what I do when I have to, but I wish there was a better way.

Improve the decompiler and submit a pull request

In principle, this is the best option. Ghidra is open-source, so we can grab the source code, make our changes, and maybe submit a pull request.

The problem is that decompiler is a complex piece of software, so that’s a non-trivial task. A large upside is that we can submit our changes to the official Ghidra repository as a pull request, so that everyone can benefit from them. Unfortunately, there’s not many people who can review such a pull request, so they tend to wait forever for merging. So, unless you’re very dedicated, there is no way to share your improvements with the community.

What if there was an easier way?

Decompiler 101

Let’s take a closer look at the decompiler. I didn’t get around to post about it yet⁴, but long story short: there is a internal debugger tool, not build by default, that you can use to peek into the decompiler internals. It’s called decomp_dbg, and the best public source of information about it right now is this nccgroup blog post. and this Github issue.

I’ll update this section with a link to a more detailed post when/if I write it, but for now the point is that you can go to Ghidra/Features/Decompiler/src/decompile/cpp, run make decomp_dbg, and get a decomp_dbg binary. For this blog post, I will use the ghidra 11.2.1 release (Ghidra_11.1.2_build tag in the git repository).

Let’s try it. First, download this obfuscated binary. You can check it in Ghidra - the main function is just return argc, but obfuscated with the technique I mentioned above. There is an option to “debug function decompilation” in Ghidra, which we can use to analyse the decompilation process:

But we won’t actually use it in this blog post⁵. So let’s run decomp_dbg on the binary directly:

$ set -x SLEIGHHOME ~/opt/ghidra  # export SLEIGHHOME=... for bash users
$ ./decomp_dbg
[decomp]> load file /home/you/xyz/obfuscated
[decomp]> load addr 0x0101129
Low-level ERROR: Unable to load 512 bytes at r0x00101129
Unable to proceed with function: func_0x00101129

Wait, what? This worked for me before. After a quick look at the source code and a lucky guess, another try:

$ set -x SLEIGHHOME ~/opt/ghidra  # export SLEIGHHOME=... for bash users
$ ./decomp_dbg
[decomp]> load file /home/you/xyz/obfuscated
[decomp]> adjust vma 0x100000
[decomp]> load addr 0x0101129
Function func_0x00101129: 0x00101129

Now we are free to decompile to our heart’s content:

[decomp]> decompile
Decompiling func_0x00101129
Decompilation complete
[decomp]> print C

int4 func_0x00101129(uint4 param_1)

{
  return (param_1 & 1) * 2 + (param_1 ^ 1);
}

RULECOMPILE

This begs the question⁶, what other features are possible? The list is in the ifacedecomp.cc file. But wait, what is this?

  status->registerCom(new IfcLoadTestFile(), "load","test","file");
  status->registerCom(new IfcListTestCommands(), "list","test","commands");
  status->registerCom(new IfcExecuteTestCommand(), "execute","test","command");
#ifdef CPUI_RULECOMPILE
  status->registerCom(new IfcParseRule(),"parse","rule");
  status->registerCom(new IfcExperimentalRules(),"experimental","rules");
#endif
  status->registerCom(new IfcContinue(),"continue");

Two commands are fenced behind a feature flag - undocumented and not enabled by default. As of today, googling CPUI_RULECOMPILE returns only three results, two of them are source code from the official Github and the last one from a source code mirror.⁷

Let’s try to enable it! Just add the flag to the makefile and build:

$ git diff
diff --git a/Ghidra/Features/Decompiler/src/decompile/cpp/Makefile b/Ghidra/Features/Decompiler/src/decompile/cpp/Makefile
index ead17e0..3946e17 100755
--- a/Ghidra/Features/Decompiler/src/decompile/cpp/Makefile
+++ b/Ghidra/Features/Decompiler/src/decompile/cpp/Makefile
@@ -38,7 +38,7 @@ endif
 CXX=g++ -std=c++11

 # Debug flags
-DBG_CXXFLAGS=-g -Wall -Wno-sign-compare
+DBG_CXXFLAGS=-g -Wall -Wno-sign-compare -DCPUI_RULECOMPILE
 #DBG_CXXFLAGS=-g -pg -Wall -Wno-sign-compare
 #DBG_CXXFLAGS=-g -fprofile-arcs -ftest-coverage -Wall -Wno-sign-compare
$ make decomp_dbg -j 8

Annnnd it doesn’t work - we get tons of compiler errors:

architecture.cc: In member function ‘void ghidra::Architecture::decodeDynamicRule(ghidra::Decoder&)’:
architecture.cc:729:57: error: ‘el’ was not declared in this scope
  729 |   Rule *dynrule = RuleGeneric::build(rulename,groupname,el->getContent());
In copy constructor ‘ghidra::Address::Address(const ghidra::Address&)’,
    inlined from ‘ghidra::rangemap<ghidra::ScopeMapper>::AddrRange::AddrRange(ghidra::rangemap<ghidra::ScopeMapper>::AddrRange&&)’ at rangemap.hh:76:9,
    inlined from ‘void std::__new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = ghidra::rangemap<ghidra::ScopeMapper>::AddrRange; _Args = {ghidra::rangemap<ghidra::ScopeMapper>::AddrRange}; _Tp = std::_Rb_tree_node<ghidra::rangem
ap<ghidra::ScopeMapper>::AddrRange>]’ at /nix/store/4krab2h0hd4wvxxmscxrw21pl77j4i7j-gcc-13.3.0/include/c++/13.3.0/bits/new_allocator.h:191:4,
    inlined from ‘static void std::allocator_traits<std::allocator<_CharT> >::construct(allocator_type&, _Up*, _Args&& ...) [with _Up = ghidra::rangemap<ghidra::ScopeMapper>::AddrRange; _Args = {ghidra::rangemap<ghidra::ScopeMapper>::AddrRa
nge}; _Tp = std::_Rb_tree_node<ghidra::rangemap<ghidra::ScopeMapper>::AddrRange>]’ at /nix/store/4krab2h0hd4wvxxmscxrw21pl77j4i7j-gcc-13.3.0/include/c++/13.3.0/bits/alloc_traits.h:538:17,

Apparently nobody tried to compile with this feature enabled in a long time. Let’s try to fix it.

First of all, we don’t have ruleparse.cc nor ruleparse.hh files, but we have ruleparse.y. For those of you who attended a compiler course, this is a YACC file and we can build it with bison:

$ make ruleparse.cc ruleparse.hh
bison -p ruleparse -d -o ruleparse.cc ruleparse.y

Then let’s hunt copilation errors one by one. I’ll spare you the boring details, and just show my ugly patch:

diff --git a/Ghidra/Features/Decompiler/src/decompile/cpp/architecture.cc b/Ghidra/Features/Decompiler/src/decompile/cpp/architecture.cc
index 494d160..8ac2725 100755
--- a/Ghidra/Features/Decompiler/src/decompile/cpp/architecture.cc
+++ b/Ghidra/Features/Decompiler/src/decompile/cpp/architecture.cc
@@ -726,7 +726,7 @@ void Architecture::decodeDynamicRule(Decoder &decoder)
     throw LowlevelError("Dynamic rule has no group");
   if (!enabled) return;
 #ifdef CPUI_RULECOMPILE
-  Rule *dynrule = RuleGeneric::build(rulename,groupname,el->getContent());
+  Rule *dynrule = RuleGeneric::build(rulename,groupname, (reinterpret_cast<XmlDecode*>(&decoder))->getCurrentXmlElement()->getContent());
   extra_pool_rules.push_back(dynrule);
 #else
   throw LowlevelError("Dynamic rules have not been enabled for this decompiler");
diff --git a/Ghidra/Features/Decompiler/src/decompile/cpp/rulecompile.cc b/Ghidra/Features/Decompiler/src/decompile/cpp/rulecompile.cc
index fe8a413..f346ce9 100755
--- a/Ghidra/Features/Decompiler/src/decompile/cpp/rulecompile.cc
+++ b/Ghidra/Features/Decompiler/src/decompile/cpp/rulecompile.cc
@@ -14,14 +14,19 @@
  * limitations under the License.
  */
 #ifdef CPUI_RULECOMPILE
-#include "rulecompile.hh"
-#include "ruleparse.hh"
+
+#include "types.h"
+#include <string>
+using std::string;
+using namespace ghidra;
+
+
+int4 ruleparsedebug;
+extern int4 ruleparseparse(void);

 namespace ghidra {

 RuleCompile *rulecompile;
-extern int4 ruleparsedebug;
-extern int4 ruleparseparse(void);

 class MyLoadImage : public LoadImage { // Dummy loadimage
 public:
diff --git a/Ghidra/Features/Decompiler/src/decompile/cpp/ruleparse.y b/Ghidra/Features/Decompiler/src/decompile/cpp/ruleparse.y
index 3d3ced6..32f42ff 100755
--- a/Ghidra/Features/Decompiler/src/decompile/cpp/ruleparse.y
+++ b/Ghidra/Features/Decompiler/src/decompile/cpp/ruleparse.y
@@ -15,11 +15,19 @@
  */
 %{
 #ifdef CPUI_RULECOMPILE
+
+#include "types.h"
+#include <string>
+using std::string;
+
 #include "rulecompile.hh"

 #define YYERROR_VERBOSE

+using namespace ghidra;
+namespace ghidra {
 extern RuleCompile *rulecompile;
+}
 extern int ruleparselex(void);
 extern int ruleparseerror(const char *str);

NB: these are just hacks to make it compile, not a proper fix.

Anyway, with these fixes we can compile decomp_dbg and run it:

$ set -x SLEIGHHOME ~/opt/ghidra  # export SLEIGHHOME=... for bash users
$ ./decomp_dbg
[decomp]> experimental rules
Command parsing error: Missing name of file containing experimental rules

Ok… now what?

Into the XML hell

Since there’s no documentation, we have to figure out how to use this by reading the source code. I’ll focus on the experimental rules command (it is used to load and enable the decompiler rules). If we try to load a random file, we get a syntax error:

[decomp]> experimental rules /etc/passwd
Successfully registered experimental file /etc/passwd
[decomp]> [decomp]> load file /home/you/xyz/obfuscated
ERROR: Invalid command
[decomp]> load file /home/you/xyz/obfuscated
Trying to parse /etc/passwd for experimental rules
syntax error
Skipping experimental rules
/home/you/xyz/obfuscated successfully loaded: Intel/AMD 64-bit x86

Let’s dig into a source code:

*status->optr << "Trying to parse " << dcp->experimental_file << " for experimental rules" << endl;
try {
   Element *root = store.openDocument(dcp->experimental_file)->getRoot();
   if (root->getName() == "experimental_rules") store.registerTag(root);

OK, so we need XML. By digging further, we deduce that the file should look like this:

<experimental_rules>
    <rule name="rule_name" group="group_name" enable="true">
      ???
    </rule>
</experimental_rules>

Rule name is arbitrary, and group may be analysis for example (this determines when our rule gets to execute). But what do we put inside? We have to read the YACC grammar to understand the syntax. The grammar, untouched since the initial Ghidra release, is here. It should look familiar if you ever wrote a BNF parser. For example,

fullrule: '{' statementlist actionlist '}'

Means that the full rule consists of a literal { followed by a statementlist followed by an actionlist followed by a literal }. Similarly, we can investigate statementlist and actionlist, and so on.

There is one thing that can help us - a comment left in rulecompile.hh:

/*
  Definition of the language

  Identifiers start with 'o' for named pcodeops
                         'v' for named varnodes
                         '#' for named constants

  A "statement" is a sequence of "steps", ending in a semicolon
  Steps are sequential, proceeding left to right.  Each step is either a
  building step (which defines a new entity in terms of an existing entity), or a
  constraint (which forces a condition to be true)

  Building steps:

  o -> v                v is the output of o
  o1 -> o2              o2 is a (named) copy of o1
  o <- v                v is ANY input of o
  o <-(0) v             v is input 0 of o
  o <-(1) #c            input 1 to o is a constant (now named c)
  o <-(1) #0            input 1 to o is a constant with value 0

  v <- o                o is the defining op of v
  v -> o                o is ANY of the ops taking v as an input (may be inefficient)
  v ->! o               o is the one and only op taking v as input
  v1 -> v2              v2 is a (named) copy of v1

  Constraints:

  o(+)                  o must have an opcode equal '+'
  o1(== o2)             o1 and o2 must be the same pcode op
  o1(!= o2)             o1 and o2 must not be the same pcode op
  v1(== v2)             v1 and v2 must be the same varnode
  v1(!= v2)             v1 and v2 must not be the same varnode

  Statements can be grouped (into "statementlist") with parentheses '(' and ')'
  There is an OR operator

  '['   statementlist
      | statementlist
      ...
  ']'

 */

That comment is not wrong, but it’s also incomplete. How do we actually create a complete rule? Let’s dig in deeper.

A forgotten language of dragons

First things first. We already know that

fullrule: '{' statementlist actionlist '}'

An abridged version of other important parts of the grammar (with C++ snippets removed) is:

statement: opnode ';' { ... }
   | varnode ';' { ... }
   | deadnode ';' { ... }
   | '[' orgroupmid ']' { ... }
   | '(' statementlist ')' { ... }

opnode: op_ident { ... }
   | opnode '(' op_list ')' { ... }
   ...;

varnode: var_ident { ... }
   | opnode LEFT_ARROW '(' INTB ')' var_ident { ... }
   | opnode LEFT_ARROW var_ident { ... }
   | opnode RIGHT_ARROW var_ident { ... }
   | varnode '(' OP_INT_EQUAL var_ident ')' { ... }
   ...;

deadnode: opnode LEFT_ARROW '(' INTB ')' rhs_const { ... }
   | opnode  '=' op_ident  { ... }
   ...;

actionlist: ACTION_TICK { ... }
   | actionlist action { ... };

action: opnewnode ';' { ... }
   | varnewnode ';' { ... }
   | deadnewnode ';' { ... };

varnewnode: opnewnode DOUBLE_LEFT_ARROW '(' rhs_const ')' var_ident { ... }
   ...;

deadnewnode: opnewnode DOUBLE_LEFT_ARROW '(' rhs_const ')' rhs_const var_size { ... }
   ...;

So we have statements followed by ACTION_TICK followed by actions, and they both consist of “opnodes”, “varnodes” and “deadnodes”. The high-level structure of the experimental rule file is therefore:

{
   statements
   --
   actions
}

And the possible statements are (among many others)

Opnodes: o1, o1(+), …
Varnodes: o1 <- v1, o1 <-(1) v1, o1 -> v1, v1(== v2)
Deadnodes: o1 <-(1) 123, o1 = o2, …

And for the actions (among many others):

Opnewnodes: o1
Varnewnodes: o1 <--(1) v1
Deadnewnodes: o1 <--(1) 123 4

Now that still doesn’t explain how to use it, but at least we can make a file that we can parse:

<experimental_rules>
   <rule name="rule_name" group="group_name" enable="true">
   {
      o1(+) -> v1;
      o2;
      o3;
      --
      o2 &lt; v1;
   }
   </rule>
</experimental_rules>

And it does parse:⁸

[decomp]> experimental rules /home/you/xyz/test.xml
Successfully registered experimental file /home/you/xyz/test.xml
[decomp]> load file /home/you/xyz/obfuscated
Trying to parse /home/you/xyz/test.xml for experimental rules
Unable to parse dynamic rule: rule_name
Could not create architecture
[decomp]> adjust vma 0x100000
Execution error: No load image present
[decomp]> load addr 0x0101129
fish: Process 914812, './decomp_dbg' from job 1, './decomp_dbg' terminated by signal SIGSEGV (Address boundary error)

Well, it’s not perfect yet, but we’re getting there.

How to train your dragon

Now the fun part. After reading the code and debugging segfaults (lots of segfaults⁹) with gdb, I figured out the rules:

Statements describe what we want to match
Actions describe how we want to transform the matched AST.
Opnodes are the (pcode) operations we want to optimize
Varnodes are, well, varnodes - the operands pcode operations take
Deadnodes are not actually dead operations, they perform operations on defined varnodes and opnodes.

So in the statement section, we can write for example:

o1 - match any operation (and name it o1).
o1(+) - match any addition operation (and name it o1).
v1 - match any varnode (value) (and name it v1).
v1(==v2) - match any varnode v1, as long as it’s equal to v2.
o1 <- v1 - match any operation o1, with v1 being any of its operands.
o1 <-(0) v1 - match any operation o1, with v1 being the first operand.
o1(+) <- v1 - match any addition operation o1, with v1 being any of the oprands.
o1 <-(1) 123 - match any addition operation o1, with 123 being the second operand.
o1 <- v2 <- o3 - match any operation o1, with v1 being any of its operands, and o3 using v1.

And so on. In the action section we define how we want to transform the AST, so for example:

o1 <--(0) v1 - make v1 the first operand of o1.
o1 <--(1) 42 - make 42 the second operand of o1.

There’s more, for example we can match on more complex conditions or create new nodes in the action section, but we won’t need that for this blog post. So a very simple rule that is not a NOP is:

{
   o1(+);
   --
   o1 <-- (0) 0;
   o1 <-- (1) 0;
}

Literally: “match any addition operation o1, and replace both operands with 0”. Let’s try it on our program:

$ decomp_dbg
...
[decomp]> decompile
Decompiling func_0x00101129
Decompilation complete
[decomp]> print C
xunknown8 func_0x00101129(void)
{
  xRam0000000000000000 = 0;
  return 0;
}

Great! As expected, the code simplified greatly (since we just removed all additions from our program). Getting to that point was tough, but now that we understand what’s going on it’s getting much easier.

Let’s go back to the original obfuscation and try to match the whole operation:

int4 main(uint4 param_1) {
  return (param_1 & 1) * 2 + (param_1 ^ 1);
}

We have several constraints:

The root of the AST tree that we want to match is a + operation.
One of + operands should be ^
The other operand of + must be *
And the other operand of + must be &

Let’s try to model this using our grammar knowledge:

{
   o_plus(+) <- v1 <- o_mul(*) <- v2 <- o_and(&);
   o_plus(+) <- v4 <- o_xor(^);
   --
   o_plus <-- (0) 0;
   o_plus <-- (1) 0;
}

For the action I still use the “zero everything rule”, to make sure the rule still matches. And it does:

$ decomp_dbg
...
[decomp]> print C
xunknown8 func_0x00101129(void)
{
  xRam0000000000000000 = 0;
  return 0;
}

We’re not done yet - we don’t check the constants or variables anywhere, so our rule will also match (a & 123) * 13 + (b ^ 123) for example. That’s not what we want. Let’s fix it. There are probably more elegant ways to achieve this, but I did this in the simplest way I could think of:

{
   o_plus(+) <- v1 <- o_mul(*) <- v2 <- o_and(&);
   o_plus(+) <- v4 <- o_xor(^);
   [ o_xor <-(0) 1; o_xor <-(1) vin; | o_xor <-(1) 1; o_xor <-(0) vin; ]
   [ o_and <-(0) 1; o_and <-(1) vin; | o_and <-(1) 1; o_and <-(0) vin; ]
   [ o_mul <-(0) 2; | o_mul <-(1) 2; ]
   --
   o_plus <-- (0) 0;
   o_plus <-- (1) 0;
}

This uses the “or” syntax that I didn’t mention before - [ ... | ... ]. This means that either the first or the second statement must match. This rule checks our constraints case by case. For example, [ o_xor <-(0) 1; o_xor <-(1) vin; | o_xor <-(1) 1; o_xor <-(0) vin; ] means that either first parameter to o_xor is 1, and the second parameter is vin, or the first parameter is vin and the second is 1.

After verifying that this still matches our code, we can replace the “zero everything” action: We want to change the operation to incrementation, i.e. x + 1. So we want a + operating with one parameter equal to 1, and the other equal to the matched varnode. Fortunately our top-level opration is already addition, so we just need to replace operands:

{
   o_plus(+) <- v1 <- o_mul(*) <- v2 <- o_and(&);
   o_plus(+) <- v4 <- o_xor(^);
   [ o_xor <-(0) 1; o_xor <-(1) vin; | o_xor <-(1) 1; o_xor <-(0) vin; ]
   [ o_and <-(0) 1; o_and <-(1) vin; | o_and <-(1) 1; o_and <-(0) vin; ]
   [ o_mul <-(0) 2; | o_mul <-(1) 2; ]
   --
   o_plus <-- (0) vin;
   o_plus <-- (1) 1;
}

A hand-painted artisanal version of the final rule:

And… that’s it! We can verify that our works correctly now:

[decomp]> print C
int4 func_0x00101129(int4 param_1) {
  return (int4)(param_1 + 1);
}

The full rule is, including the XML boilerplate is¹⁰:

<experimental_rules>
    <rule name="obfuscated_increment" group="analysis" enable="true"><![CDATA[
    {
        o_plus(+) <- v1 <- o_mul(*) <- v2 <- o_and(&);
        o_plus(+) <- v4 <- o_xor(^) <- vin0;
        [ o_xor <-(0) 1; o_xor <-(1) vin(==vin0); | o_xor <-(1) 1; o_xor <-(0) vin(==vin0); ]
        [ o_and <-(0) 1; o_and <-(1) vin(==vin0); | o_and <-(1) 1; o_and <-(0) vin(==vin0); ]
        [ o_mul <-(0) 2; | o_mul <-(1) 2; ]
        --
        o_plus <-- (0) vin;
        o_plus <-- (1) 1;
    }
    ]]></rule>
</experimental_rules>

And the commands to use it:

experimental rules /home/you/xyz/rules.xml
load file /home/you/xyz/obfuscated
adjust vma 0x100000
load addr 0x0101129
decompile
print C

Procedure to apply our rules to the Ghidra UI is slightly different (we need to patch ghidra_process.cc and build.gradle instead of consolemain.cc and Makefile), but the idea is the same. You can get the patch which makes decompiler use /etc/ghidra-rules.xml here. Happy hacking.

Conclusion

So that’s it, we created a simple deobfuscation rule for Ghidra decompiler. Since it’s an independent file, you can easily share it with your friends and family - just send them an XML file and they can use it.

As long as they also use your modified version of Ghidra with rules compiled in, of course.

What are the next steps? Frankly, I don’t think there are any. Clearly a lot of work was put into this rule engine, including a custom DSL and AST matcher. But this feature sits in the current state for at least 5 years, and I don’t think Ghidra devs will agree to enable it by default - even if I submitted a PR. It was disabled for a reason.

That would be nice of course - I work with obfuscated code often, and I would love an easier way to extend Ghidra decompiler¹¹. But nowadays I think I would just let plugins register their own hooks and do arbitrary transformations on PCode with Java or Python code. I’m not sure if that’s doable, but one can dream.

Btw: if you use Ghidra, check out my related open-source projects: ghidralib, a Pythonic standard library for Ghidra, and CtrlP, a quick search and command palette plugin.

Or can we? If there is a way, please let me know. It doesn’t invalidate the journey I’ve described in the rest of the post. ↩︎
This is not the official term, but it’s a good name for transformed Pcode. The samentics and amount of available information chanes drastically. ↩︎
One day I will figure out how to force Ghidra to generate proper switches. ↩︎
I plan to document more obscure/obscurish Ghidra features in the future, though. ↩︎
But if you want to load the exported XML, the commands are restore /path/to/file.xml and load function yourfunction. ↩︎
Someone told me this usage is incorrect, but cambridge dictionary disagrees. ↩︎
To be fair, I recall that one user on Github mentioned in discussion that this feature exist, and that they managed to compile it but had no success with it. That still doesn’t count as an official documentation. ↩︎
I’ve wasted SO MUCH time on that <. The decompiler is not very talkative, so I was just getting random syntax errors on a few varnode types. ↩︎
I wonder if the instability is the reason why this feature was never enabled or officially documented. ↩︎
You may notice there is a small difference to the previous version - <- vin0 and (==vin0). I decided to play it safe, because I’m not 100% sure how term unification works in this language. ↩︎
Also a low-level p-code in SSA form link link. And a C AST exposed to scripts link. And a pony. ↩︎

Ghidra configuration

Sun, 29 Dec 2024 00:00:00 +0000

This is a list of things that you can do to significantly improve your Ghidra experience. I’ll try to put “uncontroversial” things here, and then I’ll write a separate blog post with more invasibe changes I inflicted on my decompiler.

1. Dark theme

Save your eyes, change the default theme. In the main tool select Edit->Theme->Switch and select Flat Dark Theme (of pick another one you like).

Before:

After:

2. Dock the windows

Dock the windows in a reasonable way. Pick a layout you like (you can see on the screen that my is quite minimalistic). No matter what you do, remember that Ghidra keeps this setting - at any point you can do File->Save Tool and your window layout will be safely stored on the disk.

No, really, I mean it. I dock all the windows. By default most windows jump at me and frighten me (script manager, bundle manager, xref search results, probably more). I always dock them to one of the panels so they stay in their lane and don’t show up in random places. But I use a tiling window manager, so that may influence my choice.

I personally close everything and just keep two windows next to each other. This looks roughly like this (the set of open tab varies, but the layout does not):

Before I released ghidralib and started writing ad-hoc scripts in the Jython interpreter, I just had two screens next to each other, now I also have a small console open.

This is a pretty extreme minimalism (but it makes sense if you have keybindings confgured and memorised). It’s OK if you keep a few more windows open, but nevertheless consider reducing the visual clutter that Ghidra has by default.

3.1 Change the font

This is not strictly required. The default font is OK, but I wouldn’t call it beautiful. I personally use Fira Code (13pt), but pic a one you like. To do this, go to Edit->Tool Options and type font into filter:

This is a generic technique that you should use any time you want to configure something. Then just visit every row left after filtering and set the font there (there are many fonts configutable separately, but I just use the same font everywhere).

3.2 Cursor text highlight

While you’re in tool options, let’s configure several more things.

First, go to Listing Fields->Cursor Text Highlight and change the Mouse button To Activate option to LEFT. This will highlight element pointed by the mouse, which is extremely useful (by default this happens on middle click).

3.3 Markup variable references

Ghidra will replace register names in the listing with the inferred parameter names. YMMV but I personally rarely find this feature useful, and often annoying:

Go to Listing Fields->Operands Field and disable Markup Inferred Variable References and Markup Register Variable References. Register names are back:

3.4 Maximum lines to display

In the same view, set Maximum Lines To Display to 200. This is useful for showing long data that would otherwise be truncated.

3.5 Maximum numbers of xrefs to display

In Listinf Fields->XREFs Field consider changing Maximum Numbef or XREFs to Display to 50. That’s a lot, but XREF field is one of the most useful things Ghidra can show - I don’t want4 to truncate them except in extreme situations.

3.6 Plate comments and labels

By default Ghidra shows a large (and largely useless) plate in front of every function.

You can change that in Listing Fields->Format Code. Tick off Show Function Plates. I personally like to turn ON Flag Function Exits - sometimes Ghidra truncates functions at surprising place, and it’s good to know when this happens. Also I want to see function exits immediately at glance.

3.7 Decompiler analysis options

We’re not done with Tool Options yet. In Decompiler->Analysis

Decide if you want to see unreachable code. I personally want to, because I often work with very obfuscated code and don’t want to risk missing something. But in most cases ticking this option will simplify the code you see in the decompiler.
Turn on Use implace assignment oprators. This rarely works, but sometimes it does - and it’s almost always better to have myData[something] += uVar1 instead of myData[something] = myData[something] + uVar1.

Before:

After:

3.8 Decompiler display options

In Decompiler->Display:

I personally prfer to remove the (useless) empty line after a function definition. You can do it via Brace format for function blocks.
I prefer C-style comments to the default C++ (// instead of /* ... */)
I recommend to enable printing NULL for null pointers.
You can configure shorter or longer lines, depending on your screen size and preference. I like the default 120 in this case.

3.9 Comment quick-entry

Now this one is really minor, but I often add short comments to my code. There are two options:

[ctrl]+[enter] accepts the comment, [enter] inserts a newline
[enter] accepts the comment, [shift]+[enter] inserts a newline

I prefer the latter, slightly, but pick whatever you like. I’m just making you aware of that option.

3.10 Initial analysis options

By default, every time when you open a new project, Ghidra will do the analysis, and after a minute, when you’re already reversing, it’ll ask “do you want to go to the entry point”?

No, I don’t want to go there. I was there a minute ago. I don’t want to go there again. You can disable this question in the initial analysis options.

3.11 Auto analysis options

If you’re particularly lazy, you can enable automatic auto-analysis to save you a click every time you import a new file:

This will run auto-analysis with default options every time you open a new file. I’m a bit undecided on this, since there are legitimate reasons when you might not want to run auto-analysis. They are rare, but they happen. For that reason I don’t enable this personally.

4.1 Listing fields

Do you know that you can configure the fields you see in the listing? Just click that tiny white-orange button:

And you get to configure every aspect of this view:

I think I personally only changed the operands field to be a bit wider. Oh and by the way, you can enable showing PCode here if you need it to debug something. Just right click on PCode and select enable field.

Obviously not recommended for everyday use.

4.2 Program overview

While we’re talking about the listing, don’t you miss that bar on top of IDA that shows which section of program memory is function, code, data, etc?

Turns out Ghidra can do that too. Click the right-most icon (I don’t know what it represents) and enable Show Entropy and Show Overview (or just one of them if you prefer).

By the way, this is an extension point and plugins can add their own bars. I’m not aware of any plugin that did, but they can.

5. Bytes View

Bytes view is ugly and I hate it. I think about integrating Ghidra with Imhex someday. Anyway, by default even ASCII field is hidden, even though it is extremely useful.

Open the Bytes window (Window->Bytes) and click the wrench icon. Turn on Ascii column.

6. Tool reuse

There are also some things you can change in the main tool (the window with a list of files in the project). Go there, and select Edit->Tool options.

One thing I like to change is to change Default Tool Launch Mode into Reuse acceptable running tool - this option will open new files in the same window, in listing window tabs, instead of opening a new window.

You can also add your own keybindings there if you want, of course.

Next steps

Actually I think that’s all about the basic configuration I have to say.

Remember to save your tool (file->save tool). Actually, export the tool (file->Export->Export Tool As) and save it somewhere so you won’t lose your changes randomly. As an another bonus, you can take this file with you when you switch machines.

I plan to write about reasonable keybindings and unusual Ghidra settings later.

For more ideas, see the Ghidra table of contents page.

$ challenges

Thu, 19 Jan 2023 00:00:00 +0000

I like teaching people, and I strongly believe in learning by solving challenges. That’s why I’ve prepared quite a few CTF-like tasks teaching security basics - like cryptography.

They are meant to be used as a part of lecture, but have enough context to be solved without a formal introduction.

Cryptography

Pseudo-randomness

Your task is to predict the next number generated by this RNG.

LCG challenge

Symmetric Ciphers

They say that “encryption is not authentication”. Prove them right by circumventing the simulated “session” mechanism implemented using various encryption methods.

Final Assignments

(unsolvable without source code, I’ll add it here someday)

Crypto final assignment 2022: nc mimcrypto2022.var.tailcall.net 30006
Crypto final assignment 2023: nc mimcrypto2023.var.tailcall.net 30007

Reverse Engineering

RE final assignment 2024

More challenges

I actually created dozens (really) more CTF challenges, but most of them are not hosted anywhere anymore. A great collection of tasks - including mine - that I recommend is hack.cert.pl.

Collision attack for StreamHash5

Thu, 27 Jan 2022 00:00:00 +0000

StreamHash is an alternative family of cryptographic hash functions, designed for maximum speed, while being reasonably secure at the same time.

As far as I know, it’s not used in any production software, but is not completely fringe too - it was submitted to the SHA-3 competition and prompted a few academic publications.

StreamHash5 is the newest member of the family. It was created as an improved version of StreamHash4, after I implemented a practical collision attack for it ¹.

In this blog post, I will describe my impractical collision attack on StreamHash5 (and a practical partial attack). I stress that to manage expectations, but it’s enough to consider StreamHash5 broken cryptographically:

Hash size is 512 bits. Hence, the generic birthday attack complexity is 2**256.
My attack complexity is roughly 2**128 * 2**38 = 2**166.

While the project’s readme clearly states that There are currently no known collision attacks easier than the generic birthday attack ².

I’m also confident the attack complexity can be improved. But this research spent the last 2 years in my drawer and I think it’s time to finally publish it.

Partial collision example (line breaks added for clarity):

$ cat data.{0,1}.hex
626974636820646f20317420616c6f6ec0402c500c6570577b3c09fb4e966d2215ea0bb4a024bab440e907c9cc7e3bd0
696620796f752077616e6e61206372794b6dfb042aed03d14e8dbd4936536c3500000000000000000000000000000000

$ streamhash5sum data.0.bin
65bf759586d6b23e3ba7b36b9150920b  # collision
6f1daa35f658c4181fc25960d2b0ac61  # collision
71df8e636e38beecbf1d36ae7f9c26e4  # collision
e4121fda76468f48ae542cb70a31875e

$ streamhash5sum data.1.bin
65bf759586d6b23e3ba7b36b9150920b  # collision
6f1daa35f658c4181fc25960d2b0ac61  # collision
71df8e636e38beecbf1d36ae7f9c26e4  # collision
82ef6e3e544e910fc99b69c77ca58ceb

StreamHash5

Like one wise man with anger issues once said, “talk is cheap, show me the code”. StreamHash5 processes data in 16-byte blocks. The internal state has 4*16 bytes. After every round, the state is xor-ed with the number of bits processed so far. Simplified version (works when len(data) is divisible by 16) below:

def sh5(data):
    offset = 0
    state = list(consts)

    while offset + 16 <= len(data):
        offset += 16
        magic = struct.pack("<I", offset * 8).rjust(8, "\x00")
        state[3] = xor(state[3], magic * 2)
        block(data[offset-16:offset], state)

    return state[0] + state[1] + state[2] + state[3]

Of course, the real magic is in the block compression function. StreamHash can be described as four “almost-AES-CBC” encryptions going on in parallel.

The state is divided into four 16 byte chunks. First, every chunk is xored with a new block of hashed data. After that, every block is encrypted with two rounds of AES (every block uses a different but constant key). In Python:

def xorstate(data, state):
    state[0] = xor(data, state[0])
    state[1] = xor(data, state[1])
    state[2] = xor(data, state[2])
    state[3] = xor(data, state[3])


def halfblock(state):
    state[0] = aes_round(state[0], consts[0])
    state[1] = aes_round(state[1], consts[1])
    state[2] = aes_round(state[2], consts[2])
    state[3] = aes_round(state[3], consts[3])


def block(data, state):
    xorstate(data, state)
    halfblock(state)
    halfblock(state)

In StreamHash5, the halfblock function is called twice per block - this is the improvement over StreamHash4 introduced in the new version.

This can also be described with the following ASCII-art:

     data                                     bits
       |                                        |
       v                                        |
S0 -> xor -> aes_round(C0) -> aes_round(C0) --- | --> S0
S1 -> xor -> aes_round(C1) -> aes_round(C1) --- | --> S1
S2 -> xor -> aes_round(C2) -> aes_round(C2) --- v --> S2
S3 -> xor -> aes_round(C3) -> aes_round(C3) -> xor -> S3

Attack description

The attack itself is just a smart brute-force with a lot of precomputing.

The problem we’re trying to solve can be stated as follows:

AES(AES(A00 ^ A1D, C0), C0) = AES(AES(B00 ^ B1D, C0), C0)
AES(AES(A01 ^ A1D, C1), C1) = AES(AES(B01 ^ B1D, C1), C1)
AES(AES(A02 ^ A1D, C2), C2) = AES(AES(B02 ^ B1D, C2), C2)
AES(AES(A03 ^ A1D, C3), C3) = AES(AES(B03 ^ B1D, C3), C3)

Excuse my notation. C0, C1, C2 and C3. Are algorithm constants. AES is a single round of AES encryption. AnD and BnD describe nth data blocks (in message A and B respectively). And finally, Anm/Bnm describe mth state after processing nth block (remember, there are four states).

In other words, the relationship between the state variables is:

# state 0 in message A
A10 = AES(A00 ^ A1D, C0)
A20 = AES(A10 ^ A2D, C0)
A30 = AES(A20 ^ A3D, C0)
...

# state 1 in message B
B11 = AES(B00 ^ B1D, C1)
B21 = AES(B10 ^ B2D, C1)
B31 = AES(B20 ^ B3D, C1)
...

# etc

Finally, an example with ASCII art again:

       A1D                                     bits
        |                                        |
        v                                        |
A00 -> xor -> aes_round(C0) -> aes_round(C0) --- | --> A10
A01 -> xor -> aes_round(C1) -> aes_round(C1) --- | --> A11
A02 -> xor -> aes_round(C2) -> aes_round(C2) --- v --> A12
A03 -> xor -> aes_round(C3) -> aes_round(C3) -> xor -> A13

So, back to the problem again. I won’t try to solve all four equations. We’ll create a collision for three equations in 2**38 and observe that we can repeat this 2**128 times to get a collision for the fourth block ³. So we start with:

AES(AES(A00 ^ A1D, C0), C0) = AES(AES(B00 ^ B1D, C0), C0)
AES(AES(A01 ^ A1D, C1), C1) = AES(AES(B01 ^ B1D, C1), C1)
AES(AES(A02 ^ A1D, C2), C2) = AES(AES(B02 ^ B1D, C2), C2)

We expand AES round into suboperations - SubBytes, AddRoundKey, MixColumns and ShiftRows, and simplify:

SB(AR(MX(SR(SB(A00 ^ A1D))), C0)) ^ A2D = SB(AR(MX(SR(SB(B00 ^ B1D))), C0)) ^ B2D
SB(AR(MX(SR(SB(A01 ^ A1D))), C1)) ^ A2D = SB(AR(MX(SR(SB(B01 ^ B1D))), C1)) ^ B2D
SB(AR(MX(SR(SB(A02 ^ A1D))), C2)) ^ A2D = SB(AR(MX(SR(SB(B02 ^ B1D))), C2)) ^ B2D

Observe that SR(SB(A00 ^ A1D)) don’t mix bytes in any way. In fact, it’s equivalent to SR(SB(A00)) ^ SR(SB(A1D)). Use this trick to simplify equatoin again (Let A00sr = SR(SB(A00)) etc):

SB(AR(MX(A00sr ^ A1Dsr), C0)) ^ A2D = SB(AR(MX(B00sr ^ B1Dsr), C0)) ^ B2D
SB(AR(MX(A01sr ^ A1Dsr), C1)) ^ A2D = SB(AR(MX(B01sr ^ B1Dsr), C1)) ^ B2D
SB(AR(MX(A02sr ^ A1Dsr), C2)) ^ A2D = SB(AR(MX(B02sr ^ B1Dsr), C2)) ^ B2D

Move variables around, and let d = A2D ^ B2D

d ^ SB(AR(MX(A00sr ^ A1Dsr), C0))= SB(AR(MX(B00sr ^ B1Dsr), C0))
d ^ SB(AR(MX(A01sr ^ A1Dsr), C1))= SB(AR(MX(B01sr ^ B1Dsr), C1))
d ^ SB(AR(MX(A02sr ^ A1Dsr), C2))= SB(AR(MX(B02sr ^ B1Dsr), C2))

Now, introduce helper functions for four constant: let SSB_C0(x) = SB(AR(MX(x), C0))

d ^ SSB_C0(A00sr ^ A1Dsr) = SSB_C0(B00sr ^ B1Dsr)
d ^ SSB_C1(A01sr ^ A1Dsr) = SSB_C1(B01sr ^ B1Dsr)
d ^ SSB_C2(A02sr ^ A1Dsr) = SSB_C2(B02sr ^ B1Dsr)

Why? Because SB(AR(MX(x), C0)) is in a sweet spot, where it’s already hard to analyse mathematically, but there is still only a single MixColumns step - so we can operate on message dwords (32bit chunks) separately. This means we can precompute inverse functions like:

SSB_C0(x) = y
<=>
x ∈ INVSSB_C0(y)  # many inputs may give the same result

This will come in handy.

Now we can just brute-force A1Dsr quickly. After plugging in a specific value for A1D (remember, the first data block) we get:

d0 = SB_C0(B00sr ^ B1Dsr)
d1 = SB_C1(B01sr ^ B1Dsr)
d2 = SB_C2(B02sr ^ B1Dsr)

This means that:

d0 ^ d1 = SB_C0(B00sr ^ B1Dsr) ^ SB_C1(B01sr ^ B1Dsr)

Observe that B00sr and B01sr don’t depend on A1D (or even B1D) . This means that we can define a function:

SB_C0xC1(x) = SB_C0(B00sr ^ x) ^ SB_C1(B01sr ^ x)

And precompute its inverse:

SB_C0xC1(x) = y
<=>
x ∈ INVSB_C0xC1(y)

Going back to our equation, we can find solutions immediately. Pseudo-code:

# solve d0 ^ d1 = SB_C0(B00sr ^ B1Dsr) ^ SB_C1(B01sr ^ B1Dsr)
for B1Dsr in INVSB_C0xC1(d0 ^ d1):
    ...  # do more crypto magic

This takes care of the first two equations. But remember, we need to solve three of them at once:

d0 = SB_C0(B00sr ^ B1Dsr)
d1 = SB_C1(B01sr ^ B1Dsr)
d2 = SB_C2(B02sr ^ B1Dsr)

Sadly, this is as far as clever precomputing gets us. But the good thing is that we can work on 32bit input fragments independently. By picking a random A1D we have 1 / 2*32 chance of satisfying the third equation by pure chance. And when we do it’s over - the attack succeeded.

So the idea is to split a message block into four 32bit parts, find a collision for each part separately, and then combine them into a 128bit block.

The only bad news is that sometimes we won’t find any A1D that works, even after exhaustive checking of all 2**32 uint32 values. In this case, we have to pick other initial states and redo the attack.

That’s really it. I’ve glossed over a few details, but the gist of the attack and the most interesting part is just that - a big precomputed inverse for xor of two AES supersboxes.

Conclusions

Even though I didn’t present a full practical collision, I’m confident StreamHash5 shouldn’t be used in its current form. I see a few ways to improve my attack, and certainly focused work by experts and three-letter agencies would improve it even more.

The usual takeaway is: cryptography is hard - even when you’re a professional cryptographer, rolling your own crypto is risky.

You can read my slides or watch the video. Both are PL-only. ↩︎
https://github.com/mtrojnar/StreamHash5 ↩︎
Can we use the birthday paradox to bring it down to 2**64 somehow? I Don’t know. Maybe? But we don’t control the first three blocks, so it’s not obvious how. ↩︎

A simple bluetooth keyboard with Raspberry PI

Fri, 19 Nov 2021 00:00:00 +0000

My laptop only has one USB port.

This doesn’t change the fact, that my mouse and my keyboard connect over USB. That’s two devices. And I want both.

The code should work with any Bluetooth-enabled Linux device. I’ve tested the code on Raspberry Pi and Ubuntu, non-Debian distributions may require a bit different steps.

D-Bus, how does it work?

$ dbus-send --print-reply --dest=org.mpris.MediaPlayer2.spotify /org/mpris/MediaPlayer2 org.mpris.MediaPlayer2.Player.Next

What just happened is I’ve called a org.mpris.MediaPlayer2.Player.Next “method” on the /org/mpris/MediaPlayer2 object using org.mpris.MediaPlayer2.spotify connection.

Bluez is a Linux implementation of Bluetooth, and it has a D-Bus API ¹. It implements the org.freedesktop.DBus.Introspectable interface, so we can look take a look at it via cli:

pi@tmicro:~ $ sudo gdbus introspect --system --dest org.bluez --object-path /org/bluez
node /org/bluez {
  interface org.freedesktop.DBus.Introspectable {
    methods:
      Introspect(out s xml);
    signals:
    properties:
  };
  interface org.bluez.AgentManager1 {
    methods:
      RegisterAgent(in  o agent,
                    in  s capability);
      UnregisterAgent(in  o agent);
      RequestDefaultAgent(in  o agent);
    signals:
    properties:
  };
  interface org.bluez.ProfileManager1 {
    methods:
      RegisterProfile(in  o profile,
                      in  s UUID,
                      in  a{sv} options);
      UnregisterProfile(in  o profile);
    signals:
    properties:
  };
  interface org.bluez.HealthManager1 {
    methods:
      CreateApplication(in  a{sv} config,
                        out o application);
      DestroyApplication(in  o application);
    signals:
    properties:
  };
  node hci0 {
  };
};

We see three interfaces and one node.

We can call methods the interfaces

which we can introspect further with

sudo gdbus introspect --system --dest org.bluez --object-path /org/bluez/hci0
# (...)

Other useful debugging commands are:

sudo btmon (monitor Bluetooth activity on the device)
sudo busctl monitor [service] (introspect selected bus)

Keyboard service

To serve as a keyboard we only need to do two things: register our service and wait for connection.

Registration is done by calling the org.bluez.ProfileManager1.RegisterProfile method with appropriate parameters.

# UUID for HID service (1124)
# https://www.bluetooth.com/specifications/assigned-numbers/service-discovery
UUID = "00001124-0000-1000-8000-00805f9b34fb"
PROFILE_DBUS_PATH = "/bluez/msm/bluekeyboard"

print("Registering the profile...")
opts = {
    "Role": "server",
    "RequireAuthentication": False,
    "RequireAuthorization": False,
    "AutoConnect": True,
    "ServiceRecord": (Path(__file__).parent / "service.xml").read_text(),
}
bluez = bus.get_object("org.bluez", "/org/bluez")
manager = dbus.Interface(bluez, "org.bluez.ProfileManager1")
manager.RegisterProfile(PROFILE_DBUS_PATH, UUID, opts)

Looks pretty straightforward, right? Wait, what is service.xml? Glad you’ve asked (service.xml):

<?xml version="1.0" encoding="UTF-8" ?>

<record>
    <attribute id="0x0001">
        <sequence>
            <uuid value="0x1124" />
        </sequence>
    </attribute>
    <attribute id="0x0004">
        <sequence>
            <sequence>
                <uuid value="0x0100" />
                <uint16 value="0x0011" />
            </sequence>
            <sequence>
                <uuid value="0x0011" />
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x0005">
        <sequence>
            <uuid value="0x1002" />
        </sequence>
    </attribute>
    <attribute id="0x0006">
        <sequence>
            <uint16 value="0x656e" />
            <uint16 value="0x006a" />
            <uint16 value="0x0100" />
        </sequence>
    </attribute>
    <attribute id="0x0009">
        <sequence>
            <sequence>
                <uuid value="0x1124" />
                <uint16 value="0x0100" />
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x000d">
        <sequence>
            <sequence>
                <sequence>
                    <uuid value="0x0100" />
                    <uint16 value="0x0013" />
                </sequence>
                <sequence>
                    <uuid value="0x0011" />
                </sequence>
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x0100">
        <text value="Raspberry Pi Virtual Keyboard" />
    </attribute>
    <attribute id="0x0101">
        <text value="USB > BT Keyboard" />
    </attribute>
    <attribute id="0x0102">
        <text value="Raspberry Pi" />
    </attribute>
    <attribute id="0x0200">
        <uint16 value="0x0100" />
    </attribute>
    <attribute id="0x0201">
        <uint16 value="0x0111" />
    </attribute>
    <attribute id="0x0202">
        <uint8 value="0x40" />
    </attribute>
    <attribute id="0x0203">
        <uint8 value="0x00" />
    </attribute>
    <attribute id="0x0204">
        <boolean value="false" />
    </attribute>
    <attribute id="0x0205">
        <boolean value="false" />
    </attribute>
    <attribute id="0x0206">
        <sequence>
            <sequence>
                <uint8 value="0x22" />
                <text encoding="hex" value="05010906a101850175019508050719e029e715002501810295017508810395057501050819012905910295017503910395067508150026ff000507190029ff8100c0050c0901a1018503150025017501950b0a23020a21020ab10109b809b609cd09b509e209ea09e9093081029501750d8103c0" />
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x0207">
        <sequence>
            <sequence>
                <uint16 value="0x0409" />
                <uint16 value="0x0100" />
            </sequence>
        </sequence>
    </attribute>
    <attribute id="0x020b">
        <uint16 value="0x0100" />
    </attribute>
    <attribute id="0x020c">
        <uint16 value="0x0c80" />
    </attribute>
    <attribute id="0x020d">
        <boolean value="true" />
    </attribute>
    <attribute id="0x020e">
        <boolean value="false" />
    </attribute>
    <attribute id="0x020f">
        <uint16 value="0x0640" />
    </attribute>
    <attribute id="0x0210">
        <uint16 value="0x0320" />
    </attribute>
</record>

The second part of the puzzle is waiting for a connection. Fortunately Python3 natively supports Bluetooth sockets, so no external dependencies are required:

scontrol = socket.socket(AF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP)
scontrol.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
scontrol.bind((address, P_CTRL))
scontrol.listen(1)

sinterrupt = socket.socket(AF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP)
sinterrupt.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sinterrupt.bind((address, P_INTR))
sinterrupt.listen(1)

scontrol, sinfo = scontrol.accept()
print(f"Connected on the control socket {sinfo[0]}")

cinterrupt, cinfo = sinterrupt.accept()
print(f"Connected on the interrupt channel {cinfo[0]}")

Scan codes

Only one thing left to code - we want to send keystrokes to the connected device. This is done by sending a command packet to the socket. The format is:

[0xA1, 0x01, modifier, 0, key0, key1, key2, key3, key4, key5]

But remember that you always have to notify the remote that the keys were released, by zeroing out the keys in the next packet. So let’s implement that:

def send_char(char, cinterrupt):
    keycode, shift = char_to_keycode(char)
    modkey = (1 << 6) if shift else 0
    cinterrupt.send(bytes([0xA1, 1, modkey, 0, keycode, 0, 0, 0, 0, 0]))
    time.sleep(0.01)
    cinterrupt.send(bytes([0xA1, 1, 0, 0, 0, 0, 0, 0, 0, 0]))
    time.sleep(0.01)

def char_to_keycode(char):
    keymap = {
        "1": (30, False),
        "2": (31, False),
        "3": (32, False),
        # ...
        "!": (30, True),
    }
    return keymap[char]

For the demo we’ll just read user input in a loop and send it char by char to the remote:

while True:
    text = input()
    for c in text + "\n":
        send_char(c, cinterrupt)

That’s all! Put all the pieces together in bluetooth_server.py.

Connect the victim

We’re almost done! Now we must disable input plugin in Bluetooth, otherwise the keyboard code will not work:

sudo vim /etc/systemd/system/bluetooth.target.wants/bluetooth.service

Add the -P input parameter:

9c9
< ExecStart=/usr/lib/bluetooth/bluetoothd
---
> ExecStart=/usr/lib/bluetooth/bluetoothd -P input

And restart the service:

sudo systemctl daemon-reload
sudo systemctl restart Bluetooth

Make sure that the service runs on your machine and has the expected parameter:

$ ps aux | grep bluetoothd
root  230  0.0  0.0  8628  5108 ? Ss   02:08   0:00 /usr/lib/bluetooth/bluetoothd -P

Time to start our program:

$ sudo python3 bluetooth_server.py
Registering the profile...
Waiting for connections...

Now let’s connect the victim to your new “keyboard”. Start the agent with bluetoothctl:

$ sudo bluetoothctl
Agent registered
[bluetooth]# power on
Changing power on succeeded
[bluetooth]# discoverable on
Changing discoverable on succeeded
[CHG] Controller 00:21:5C:B0:89:56 Discoverable: yes
[bluetooth]# default-agent 
Default agent request successful

Your machine should now be discoverable. You’ll need to confirm pin in the terminal (and optionally authorize some services):

[NEW] Device 23:1D:C1:F4:10:1D Z2K21E1
Request confirmation
[agent] Confirm passkey 216956 (yes/no): yes

If everything went right, the remote machine should now connect to your “keyboard” and you can send your keystrokes. And without any USB cables. The future is now.

Closing thoughts

The (second) real reason for this post was that I’ve wanted to play with dbus and bluetooth for a long time. Now I had a good reason to do both.

All the code for this post is on Github: https://github.com/msm-code/RandomCodes/tree/master/bluetooth-keyboard.

In fact, the API changed completely a few years ago, and that’s the reason why old code doesn’t work anymore. ↩︎

$ publications

Wed, 01 Apr 2020 00:00:00 +0000

Talks

2016-10 - Virus Bulletin 2016 - Nymaim - the Untold Story 🇬🇧 [video] (with mak)
2016-10 - Security BSides Warsaw 2016 - How to Capture a Flag? 🇵🇱 [video] (with Mateusz Szymaniec)
2017-03 - Warszawskie Dni Informatyki 2017 - From hacker’s e-sport to job in IT security 🇵🇱 (with Mateusz Szymaniec)
2017-10 - Virus Bulletin 2017 - Peering into Spam Botnets 🇬🇧 (with mak)
2017-10 - Security BSides Warsaw 2017 - Practical Cryptography [video]
2017-12 - BotConf 2017 - Tracking Botnets With Bots 🇬🇧 (with psrok1)
2018-11 - Secure 2018 - mquery, or how to find malware in a sea of samples 🇵🇱
2019-06 - Let’s Play Częstochowa 2019 - IT Security vs computer games 🇵🇱
2020-01 - No Such Meetup 2020 - My Kernel is My Castle 🇵🇱 (pdf)
2020-06 - Secure EarlyBirds 2019 - Automated decompilation and correlation of malicious software 🇵🇱
2020-06 - Secure EarlyBirds 2020 - Evil Data For Good Cause 🇵🇱
2020-06 - CSIRT Network 2020 - Malware Hunting With Yara 🇬🇧
2020-12 - Oh My Hack 2020 - How to setup your kubernetes cluster (not) 🇵🇱
2022-11 - Secure EarlyBirds 2022 - Decrypt Ransomware or Die Trying 🇵🇱
2022-12 - Oh My Hack 2022 - APT as a Reverse Engineer 🇵🇱
2023-12 - Oh My Hack 2023 - Talking with stealers 🇵🇱

Publications

“Programista” Magazine" (PL only) 🇵🇱

2015-05 - PHP Core (with Mateusz Szymaniec)
2015-09 - Rhinoxorus (with Mateusz Szymaniec)
2015-12 - Rsabin (with Stanislaw Podgorski)
2016-05 - People’s Square (with Stanislaw Podgorski)
2016-07 - Blackbox (with akrasuski1)
2016-10 - PWNing 2016 CTF writeups (with multiple members of p4 team)
2017-01 - (Still) Broken Box (with Stanislaw Podgorski)
2017-06 - User authentication in web applications using public key infrastructure (with Michał Leszczyński)
2017-07 - WCTF 2017 - p4 challenges (with Stanislaw Podgorski)
2017-09 - Practical Cryptography: Cryptographic Hashes and Signatures (with Michał Leszczyński)
2017-10 - Practical Cryptography: Block Ciphers
2018-04 - Capture the Data Thief
2018-06 - Midnight Sun 2018 - Badchair
2018-08 - Find a needle in a data haystack
2018-12 - Threat models in practice
2019-01 - CONFidence 2019 Teaser - Watchmen
2019-04 - CONFidence 2019 Finals - Gothic
2020-01 - DragonCTF 2019 - Arcane Sector
2021-05 - Malware analysis - Decrypt the undecryptable
2024-03 - The art of malware emulation - talking with a botnet

Projects

2015+ - p4-team/ctf: (a lot of) writeups from CTF challenges
2016 - nymaim-tools: open sourced nymaim dissector
2018+ - ursadb: A fast trigram database
2018+ - mquery: Yara query accelerator
2024 - GhidraCtrlP: Ctrl+P plugin for Ghidra: quick search and command palette.
2024 - ghidralib: A Pythonic Ghidra standard library.

Workshops

2016+ - Multiple commercial malware analysis trainings
2017 - (lighthearted) Fast Track to Reverse Engineering 🇵🇱
2019+ - Multiple commercial Kubernetes security trainings
2022+ - Threat information pipelines (often with Paweł Pawliński)
- This training was conducted by me for international CERT community in Uganda, Malawi, Dominican Republic, Chile, Cyprus and Albania during FIRST and ITU events.
2024+ - Introduction to malware analysis for CERTs (co-prepared with Paweł Pawliński)
- This training was conducted by me for international CERT community in Peru and Bulgaria during FIRST and ITU events.

Blog posts elsewhere

cert.pl (🇬🇧 version)

2017-01 - Technical analysis of CryptoMix/CryptFile2 ransomware
2017-01 - Evil: A poor man’s ransomware in JavaScript
2017-01 - Nymaim revisited
2017-02 - Sage 2.0 analysis
2017-05 - Mole ransomware: analysis and decryptor
2017-10 - A deeper look at Tofsee modules
2018-01 - Mtracker - our take on malware tracking)
2020-12 - Set up your own malware analysis pipeline with Karton
2021-04 - Karton Gems 1: Getting Started
2021-04 - Karton Gems 2: Your first karton
2021-05 - Karton Gems 3: Malware extraction with malduck
2023-02 - A tale of Phobos - how we almost cracked a ransomware using CUDA (with nazywam)
2023-09 - Unpacking what’s packed: DotRunPeX analysis
2023-10 - Deworming the XWorm

cert.pl (🇵🇱 version)

2017-01 - Analiza techniczna rodziny CryptoMix/CryptFile2
2017-01 - Evil: prosty ransomware, napisany w języku JavaScript
2017-01 - Nymaim atakuje ponownie
2017-02 - Analiza Sage 2.0
2017-05 - Mole ransomware - analiza i dekryptor
2017-10 - Głębsze spojrzenie na moduły Tofsee
2018-01 - Mtracker - nasz sposób na śledzenie złośliwego oprogramowania

symantec-enterprise-blogs.security.com

(Important: NOT written by me. All posts are a collaboration with a people from my team. These are just ones where I contributed, usually by reverse-engineering samples.)

Others

A series of articles on 4programmers.net: Raytracing step by step 🇵🇱

2012-06 - 1. First steps (PL)
2012-06 - 2. Better camera (PL)
2012-07 - 3. Planes (PL)
2012-07 - 4. Light (PL)
2012-08 - 5. Shadow (PL)
2012-08 - 6. Phong’s model (PL)
2012-09 - 7. Mirror reflection (PL)
2012-09 - 8. Sampling and Antialiasing (PL)
2012-10 - 9. Depth of field (PL)
2012-10 - 10. Soft Shading (PL)
2012-11 - 11. Transparency (PL)

University of Warsaw, guest Lectures about RE and Cryptography 🇵🇱

2017-03 - 6. Cryptography 3: Block Ciphers (with Adam Iwaniuk)
2017-04 - 7. Cryptography 4: Randomness and Pseudo- (with Adam Iwaniuk)
2017-05 - 10. Reverse Engineering 3: Debugging and Anti- (with psrok1)

Politechnika Warszawska, guest Lectures about Cryptography 🇵🇱

2017-10 - 5. Cryptography 1: Block Ciphers (with Adam Iwaniuk)
2017-11 - 6. Cryptography 2: Square Attacks and PRNG (with Adam Iwaniuk)
2017-11 - 7. Cryptography 3: RSA (with Adam Iwaniuk)

Politechnika Warszawska, guest Lectures about Malware 🇵🇱

2019-11 - 4. Attacks making use of malicious software (with psrok1)
2019-11 - 5. Introduction to malware reverse engineering (with psrok1)

$ whoami

Wed, 01 Apr 2020 00:00:00 +0000

I like sysadm, high-level, software engineering, low-level, reverse engineering, cryptography, algorithms, math, death metal and cats. I play CTFs with p4.

Used to be a programmer, now I work as a malware researcher.

@msm0 at BlueSky. @MsmCode at Twitter. msm-code at Github. @msm at infosec.exchange.

Contact me at msm@tailcall.net. Contract me at itsec.re.

         *                  *
             __                *
          ,db'    *     *
         ,d8/       *        *    *
         888
         `db\       *     *
           `o`_                    **
      *               *   *    _      *
            *                 / )
         *    (\__/) *       ( (  *
       ,-.,-.,)    (.,-.,-.,-.) ).,-.,-.
      | @|  ={      }= | @|  / / | @|o |
     _j__j__j_)     `-------/ /__j__j__j_
     ________(               /___________
      |  | @| \              || o|O | @|
      |o |  |,'\       ,   ,'"|  |  |  |  hjw
     vV\|/vV|`-'\  ,---\   | \Vv\hjwVv\//v
                _) )    `. \ /
               (__/       ) )
                         (_/

Cracking RNGs: Linear Congruential Generators

Mon, 10 Jul 2017 00:00:00 +0000

Random numbers are often useful during programming - they can be used for rendering pretty animations, generating interesting content in computer games, load balancing, executing a randomized algorithm, etc. Unfortunately, CPUs are deterministic machines, and (controversial RDRAND instruction aside) cannot just generate random numbers out of thin air. This left programmers and computer designers with few options:

Invest in additional devices (Hardware Random Number Generators).
Use existing hardware in an unintended way (for example, by collecting lowest bits of audio input from a microphone, measuring hard disk seek times or timing keystrokes).
Fake it - generate numbers that “look” random, but aren’t.

All these options were explored, but dedicated devices never went mainstream, and other ways of gathering entropy have too small throughput to be used exclusively. This left programmers with the third option - numbers that look random but are in fact generated by a completely deterministic algorithm. These algorithms are called “Pseudo Random Number Generators”, or PRNGs in short.

PRNGs are usually really good at generating statistically random numbers. A quality of generator can be measured by one of few standardized tests, like TestU01 or DIEHARD test suite - and good PRNGs are often as good as true random number generators (TRNG).

Unfortunately, there is one problem with PRNGs that cannot be fixed - they are still deterministic “in heart”, and knowing a full internal state of PRNG allows an attacker to predict all future (and, usually, previous) values. This usually isn’t a problem, unless PRNGs are used for security sensitive things - like generating certificates, encryption keys, secrets, etc ¹. In this post, I’ll show how easily PRNGs can be cracked (cracking PRNG means recovering its internal state and predicting future values).

This time I’ll focus on one specific kind of PRNGs - Linear Congruential Generators. They are defined by three integers, “multiplier”, “increment” and “modulus”, and can be implemented in three lines of Python code:

class prng_lcg:
    m = 672257317069504227  # the "multiplier"
    c = 7382843889490547368  # the "increment"
    n = 9223372036854775783  # the "modulus"

    def __init__(self, seed):
        self.state = seed  # the "seed"

    def next(self):
        self.state = (self.state * self.m + self.c) % self.n
        return self.state


def test():
    gen = prng_lcg(123)  # seed = 123
    print gen.next()  # generate first value
    print gen.next()  # generate second value
    print gen.next()  # generate third value

LCGs are one of the most popular pseudo-random number generators. There are reasons for that: they’re mathematically elegant, very easy to understand/implement and very fast, especially when a modulus is a power of two (because slow modular division can be replaced with binary AND in this case). Unfortunately, they’re not perfect at being statistically random (depending on chosen constants, resulting bits often have varying level of “randomness”) and, as we’ll see soon, are dramatically weak at being cryptographically secure.

Challenge 0: everything known

After a short theoretical introduction, let’s focus on the attacks. This is not actually a challenge yet, just explanation of what we’re trying to achieve. Let’s say that we are observing one LCG, and it generated three consecutive values:

s0 = 2300417199649672133
s1 = 2071270403368304644
s2 = 5907618127072939765

And we want to learn the next value that will be generated, without actually calling PRNG again. In this case, we have all necessary information (state, m, c, n) so the problem is trivial - we just plug the values into the formula. Let’s check:

In [929]: m = 672257317069504227   # the "multiplier"
     ...: c = 7382843889490547368  # the "increment"
     ...: n = 9223372036854775783  # the "modulus"
     ...: s0 = 2300417199649672133 # seed

In [931]: s1 = (s0*m + c) % n

In [931]: s2 = (s1*m + c) % n

In [932]: s3 = (s2*m + c) % n

In [933]: s4 = (s3*m + c) % n

In [934]: s1
Out[934]: 2071270403368304644L # correct

In [935]: s2
Out[935]: 5907618127072939765L # correct

In [936]: s3
Out[936]: 5457707446309988294L # predicted!

We know a full internal state of our LCG, and we can easily generate all future values. In fact, we can even go back and get all previously generated values, which is a security problem too.

Challenge 1: unknown increment

Ok, let’s move to the first challenge. What if we don’t know “increment”? i.e:

m = 81853448938945944
c = # unknown
n = 9223372036854775783

And let’s say that we know two consecutive values generated by this LCG:

s0 = 4501678582054734753
s1 = 4371244338968431602

Can we still attack this? Again, in this case, all values are 64bit, too much to bruteforce it. Let’s use some basic math instead.

s1 = s0*m + c   (mod n)

c  = s1 - s0*m  (mod n)

Easy. Now we can implement our attack in Python and plug concrete values in:

def crack_unknown_increment(states, modulus, multiplier):
    increment = (states[1] - states[0]*multiplier) % modulus
    return modulus, multiplier, increment

print crack_unknown_increment([4501678582054734753, 4371244338968431602], 9223372036854775783, 81853448938945944)

That’s it - challenge solved.

Challenge 2: unknown increment and multiplier

Previous two levels were rather trivial, time for something more interesting. Now we know neither multiplier nor increment:

m = # unknown
c = # unknown
n = 9223372036854775783

Now we don’t know increment and multiplier. At least we get to know three consecutive values from LCG:

s0 = 6473702802409947663
s1 = 6562621845583276653
s2 = 4483807506768649573

This looks much harder, but really isn’t - we still have two linear equations, and two unknowns, so everything should go smoothly:

s_1 = s0*m + c  (mod n)
s_2 = s1*m + c  (mod n)

s_2 - s_1 = s1*m - s0*m  (mod n)
s_2 - s_1 = m*(s1 - s0)  (mod n)
m = (s_2 - s_1)/(s_1 - s_0)  (mod n)

And when we know multiplier, problem is reduced to the one we already solved in chellenge 1. Let’s implement this in Python:

def crack_unknown_multiplier(states, modulus):
    multiplier = (states[2] - states[1]) * modinv(states[1] - states[0], modulus) % modulus
    return crack_unknown_increment(states, modulus, multiplier)

print crack_unknown_multiplier([6473702802409947663, 6562621845583276653, 4483807506768649573], 9223372036854775783)

This algorithm uses modular division, so we’ll need modular inverse too. We can use this one.

def egcd(a, b):
    if a == 0:
        return (b, 0, 1)
    else:
        g, x, y = egcd(b % a, a)
        return (g, y - (b // a) * x, x)

def modinv(b, n):
    g, x, _ = egcd(b, n)
    if g == 1:
        return x % n

Challenge 3: unknown increment, multiplier and modulus

Now a list of values that we know doesn’t look very interesting:

m = # unknown
c = # unknown
n = # unknown

But we have a lot of generated integers:

s0 = 2818206783446335158
s1 = 3026581076925130250
s2 = 136214319011561377
s3 = 359019108775045580
s4 = 2386075359657550866
s5 = 1705259547463444505
s6 = 2102452637059633432

Unfortunatelly, this time we can’t solve this with simple linear equations - we don’t know modulus, so every equation we’ll form will introduce new unknown:

s1 = s0*m + c  (mod n)
s2 = s1*m + c  (mod n)
s3 = s2*m + c  (mod n)

This doesn’t look bad - three equations and three unknowns. At least until we remember, that by definition this really is equivalent to:

s1 - (s0*m + c) = k_1 * n
s2 - (s1*m + c) = k_2 * n
s3 - (s2*m + c) = k_3 * n

Six unknowns and three equations. And it’s clear that no number of equations will help us because every new equation introduces new unknown. Fortunately, there is a mathematical trick that usually comes in handy in situations like this. Namely, interesting number theory fact: if we have few random multiples of n, with large probability their gcd will be equal to n. For example:

In [944]: n = 123456789

In [945]: reduce(gcd, [randint(1, 1000000)*n, randint(1, 1000000)*n, randint(1, 1000000)*n])
Out[945]: 123456789

Why is this useful? Because if we can think of some modular operations that will give something congruent to zero, for example:

X = 0 (mod n)

Then, by definition, this is equivalent to:

X = k*n

This is only interesting if X != 0, but X = 0 (mod n). We just need to take a gcd from few such values, and we can recover n. This is a really generic method, that can often be used when modulus that we’re using is unknown.

Ok, now how can we generate something like this for above LCG? We can introduce sequence T(n) = S(n+1) - S(n):

t0 = s1 - s0
t1 = s2 - s1 = (s1*m + c) - (s0*m + c) = m*(s1 - s0) = m*t0 (mod n)
t2 = s3 - s2 = (s2*m + c) - (s1*m + c) = m*(s2 - s1) = m*t1 (mod n)
t3 = s4 - s3 = (s3*m + c) - (s2*m + c) = m*(s3 - s2) = m*t2 (mod n)

And now we can use this trick to generate our desired operation:

t2*t0 - t1*t1 = (m*m*t0 * t0) - (m*t0 * m*t0) = 0 (mod n)

Using this method we can generate few values congruent to 0, and crack LCG with mentioned “trick”. Attack in Python, again:

def crack_unknown_modulus(states):
    diffs = [s1 - s0 for s0, s1 in zip(states, states[1:])]
    zeroes = [t2*t0 - t1*t1 for t0, t1, t2 in zip(diffs, diffs[1:], diffs[2:])]
    modulus = abs(reduce(gcd, zeroes))
    return crack_unknown_multiplier(states, modulus)

print crack_unknown_modulus([2818206783446335158, 3026581076925130250,
    136214319011561377, 359019108775045580, 2386075359657550866, 1705259547463444505])

Looks like it works!

The end?

So we have just cracked LCG, without bruteforcing, with basically zero knowledge - only by observing its output. So can we crack every LCG in the world now? Unfortunately, no. The reason we were able to mount the attacks so easily is that all the operations are linear, and we get a complete state of LCG every time. This is basically a cryptologist’s dream.

Unfortunately, in a real world, things aren’t always that easy. Most importantly, all these attacks can be disrupted by truncating the output - for example, using 64bit integers for computation, but returning state % 2**32. This method is often used in practice (not for improved security, but because it’s improving number distribution). Alternatively, introducing nonlinear operations somewhere will complicate attacks greatly (for example xoring state with something).

Of course, this doesn’t mean that simple truncation will ruin our chances of cracking LCG - we’ll just have to use more complicated math, like all-powerful LLL Algorithm. I’ll come back to these attacks sooner or later - but this blog post is getting long anyway, and I’ve heard that nobody reads more than few pages. For now, the main takeaway from this article is:

If you need random numbers for anything crypto related, use secure random number generator, not just any PRNG.
LCG is not a secure RNG.

Although CSPRNGs (Cryptographically Secure Pseudo Random Number Generators) can be used in such case ↩︎

Mon, 01 Jan 0001 00:00:00 +0000

Reverse-Engineering checklist

Not all those who wander are lost, but how did you end up here?

This page is a work in progress

The idea of this page is to provide a decision tree of steps to follow when reverse-engineering a new malware sample. I won’t go into details about each step, but I’ll try to provide some useful links, refernces, and the high level approaches that are possible.

I plan to update this page with steps I do when analysing a sample, once in a while. Anyway, let’s start.

First, check the kind of the file you’re dealing with. This be done manually, or automated with yara. For your convenience, you can use exe_kind.yar:

yara exe_kind.yar .

Then follow a hyperlink depending on what you got:

Pyinstaller

In general, Python reverse-engineering consists of two steps: dumping the bytecode, and analysing it.

In case of Pyinstaller, the situation is usually simple because off-the shell tools work. I recommend pyinstxtractor-ng, another older option is pyinstxtractor.

python3 ~/opt/pyinstxtractor.py malware.exe

If you managed to get the bytecode, continue from →Python Bytecode.

Python Bytecode

Approaches to python bytecode analysis you should try are, in order:

→pycdc - current state of the art of python decompilation
→pyc disassembly - disassemble the bytecode and analyse manually

pycdc

Currently pycdc gives you the biggest chance of success. If you’re lucky it’s packaged in your distro, otherwise you need to clone the repository and compile it yourself.

nix-shell -p pycdc
pycdc _extracted_malware/malware.pyc > malware.py

If it didn’t work, you’ll have to read →pyc disassembly.

pyc disassembly

Worst case, you can always read the disassembled bytecode directly. Your options are:

→dis - built-in Python bytecode disassembler
→xdis - pure Python library for disassembling bytecode

dis

Surprisingly, even though Python includes a dis module, I’m not aware of any built-in way to actually disassemble Python bytecode. So you will need this tiny script:

import dis, sys, marshal

path = sys.argv[1]
with open(path, "rb") as f:
    f.seek(16)
    dis.dis(marshal.load(f))

Run it like this:

nix-shell -p python313  # pick a correct version
python3 view_pyc_file.py malware.pyc > malware.pybc

The caveat is that that this depends on the Python version, so you need to use the same Python version for disassembly as the Python it was compiled with.

If you’re lucky and it worked, go →read the bytecode. Otherwise, try →xdis.

xdis

That’s why you may also consider xdis, which is pure Python library for disassembling Python bytecode

independent of the original interpreter version.

TODO example

This should work. If it did, go →read the bytecode. Otherwise, it’s possible you’re dealing with an obfuscated bytecode, or even worse - a custom CPython build. I don’t have anything about then yet, so that’s EOF for now.

Read the bytecode

Any text editor will do. I just want to mention vscode-python-bytecode-highlight, my vscode extension for python bytecode syntax highlighting. It colours the bytecode nicely and makes some links clickable - but that’s it. You can use any other editor.

Delphi

First, load to IDR (https://github.com/crypto2011/IDR) in a Windows VM. This may take a long time.

Then export .IDC script.

Then use https://github.com/huettenhain/dhrake and DhrakeInit

Dotnet

This means that the executable is a .NET assembly. Now, depending on how obfuscated the sample is:

Use →ilspycmd for quick analysis and triage.
Use →dnSpy for serious reverse-engineering.
Use →dnLib for automating the deobfuscation process.
Use →dotnet sdk to quickly unpack malware by reusing the unpacker’s code.
Use →Visual Studio if dotnet-sdk doesn’t work.
Check out →dbglib for obfuscated samples that elude dnSpy debugger.

ilspycmd

For lightweight analysis, I recommend ilspycmd.

sudo docker run -v .:/docker --rm -ti berdav/ilspycmd -c "cd /docker; /home/ilspy/.dotnet/tools/ilspycmd -p malware.exe -o out_dir"

This will decompile dotnet_malware.exe to out_dir. After that you can easily open the decompiled code in your favorite editor, or use standard Linux commands (like grep) to analyse it.

The downside is that this code is fully static, it’s not possible to debug it, and there is no east way to deobfuscate anything. In some cases ilspycmd will flat out refuse to decompile some of the code. In this case, you may have to turn to →dnSpy.

dnSpy

For heavyweight analysis, I recommend https://github.com/dnSpyEx/dnSpy. It’s a GUI, but it’s very powerful and contains a built-in debugger. Unfortunately, it’s Windows-only - at least I prefer to stay as much as possible in Linux for reverse-engineering.

dnLib

DnSpy is based on dnLib, which is a .NET library for when you really need to get your hands dirty and start scripting the analysis - see for example this XWorm analysis (the idea is clear, unfortunately full framework was not open-sourced).

Since I mostly script in Python, I use pythonnet a crazy library that brings .NET to Python. This allows me to use dnlib like This:

from dnlib.DotNet.Emit import OpCodes

def get_string_default_values(typeobj):
    """Get all variables initialised to a string as a dict.
    Ignore other initialisation code.
    typeobj is TypeDefMD from dnlib.DotNet."""
    static_ctor = typeobj.FindStaticConstructor()
    if not static_ctor:
        return {}

    result = {}
    code = static_ctor.Body.Instructions
    for i in range(len(code) - 1):
        if code[i].OpCode == OpCodes.Ldstr:
            if code[i+1].OpCode == OpCodes.Stsfld:
                fieldname = code[i+1].Operand.Name.String
                fieldvalue = code[i].Operand
                result[fieldname] = fieldvalue
    return result

For example, that is a code snippet that extracts all string initialisations from a static constructor. It’s an extremely useful piece of code when writing automatic extractors for .NET stealers (of which there are plenty)..

dotnet sdk

A powerful unpacking method is a code reuse. For example, let’s say decompiled malware contains this line:

GCP gCP = (GCP)Marshal.GetDelegateForFunctionPointer(GetProcAddress(hModule, Reverse(Decipher("zzljvyWaulyybJalN", 7))), typeof(GCP));

Instead of reverse-engineering Deciper, you can just reuse the code in your own small tool:

nix-shell -p dotnet-sdk
dotnet init
vim Program.cs

And put this code in Program.cs:

internal class Program
{
	// Reverse and Decipher methods, copied from decompiled source code.

	private static void Main(string[] args)
	{
		Console.WriteLine(Reverse(Decipher("jvssHzsM", 7)));
	}
}

And just run it:

dotnet run

it goes without saying, that you should be careful when running malware code - I recommend doing this in a Docker container or a Virtual Machine.

Visual Studio

dbglib

Honourable mention goes to dbglib, which is my library for native low-level .NET debugging. See this DotRunPeX analysis for a tutorial how to use it.

golang

Well, you’re in for an adventure. Golang is not the easiest language to reverse-engineer. By default the decompiled binary looks like trash, but with help of a few tools it’ll get slightly better.

My main decompiler/disassembler is Ghidra. Most of the hits should transfer to other tools, but this guide is opinionated and I’m not going to cover them.

First, load the executable to Ghidra. Since recently, Ghidra has a basic Golang support, so remember to select x86:LE:32:default:golang language instead of the default choice.

The support is not great right now, though. To get the symbols, we will use GoReSym:

nix-shell -p goresym
GoReSym -t -d -p malware.exe > goresym.json

Then, load the symbols to Ghidra using a helper script (goresym.py) (TODO: modified cerberus, upload and link).

For malware, most likely the binary is obfuscated and you will still see junk function names:

But that’s still a step up compared to what you get by default.

After that, you can try ghostrings to recover strings from the binary:

Install the extension, and run several of the included scripts:

GoDynamicStrings.java
GoStaticStrings.java
GoKnownStrings.java
GoFuncCallStrings.java

Some of the scripts will take a while. Go make a coffee (you will need it).

After that, you will have transformed this complete mess:

Into something almost readable:

That’s all I have for now - EOF.

office_excel_xml

Office files include:

files with .docx, .xlsx, .pptx extensions
files with .docm, .xlsm, .pptm extensions (with macros)

A good way to start analysis is to use oletools. Use oleid to check if there are any macros:

nix-shell -p python310Packages.oletools
oleid sample.xlsx

Other executable type

TODO

Other notes

Check compiler (nauz)

git clone https://github.com/horsicq/Nauz-File-Detector.git
sudo docker build . -t nauz
sudo docker run -v /home/msm/data:/home/msm/data nauz nfdc /home/msm/data/2024-12-18_ov8865sys/9c52d750eba2f72bdd38bcaf950da7f1128d5235223091d98dad2cc7146716fa

Gx64Sync

ImHex

Dumps

malduck fixpe

Themida

Entrypoint starts with a call:

e8 82 01 00 00    CALL       FUN_141fdd237
41 52             PUSH       R10
49 89 e2          MOV        R10,RSP
41 52             PUSH       R10
49 8b 72 10       MOV        RSI,qword ptr [R10 + local_res8]
49 8b 7a 20       MOV        RDI,qword ptr [R10 + local_res18]
fc                CLD
b2 80             MOV        DL,0x80

And later there is a tree of “if” statements:

  if (bVar12) {
    bVar11 = CARRY1(bVar7,bVar7);
    bVar7 = bVar7 * '\x02';
    bVar12 = bVar11;
    if (bVar7 == 0) {
      bVar7 = *local_res8;
      local_res8 = local_res8 + 1;
      bVar12 = CARRY1(bVar7,bVar7) || CARRY1(bVar7 * '\x02',bVar11);
      bVar7 = bVar7 * '\x02' + bVar11;
    }
    if (bVar12) {
      bVar11 = CARRY1(bVar7,bVar7);
      bVar7 = bVar7 * '\x02';
      bVar12 = bVar11;
      if (bVar7 == 0) {
        bVar7 = *local_res8;
        local_res8 = local_res8 + 1;
        bVar12 = CARRY1(bVar7,bVar7) || CARRY1(bVar7 * '\x02',bVar11);
        bVar7 = bVar7 * '\x02' + bVar11;
      }

The called function looks like this in Ghidra:

void FUN_141fdd237(void) {
  puVar2 = (undefined8 *)&stack0xffffffffffffffd8;
  pcVar1 = unaff_retaddr + -0x1de20b5;
  pcStack_30 = unaff_retaddr;
  if (*(int *)(unaff_retaddr + -0xf9f66e) == 0) {
    uStack_38 = 0;
    uStack_48 = 0;
    pcStack_40 = pcVar1;
    pcStack_30 = pcVar1;
    (*unaff_retaddr)();
    puVar2 = &uStack_48;
    pcVar1 = unaff_retaddr + 0x1cf;
  }
  (*(pcVar1 + 0xe42a47))(*(undefined8 *)((longlong)puVar2 + 0x20),*(undefined8 *)((longlong)puVar2 + 0x18));
  return;
}

This is the first layer of the packer. To unpack dynamically, add a breakpoint at the ret 0x20 instruction in the entrypoint and then run the binary.

Second stage is much more interesting, but also I currently don’t have a writeup for it.

Powershell

PSDecode