When reverse engineering a binary application, at its lowest practical layer, the reverse engineer is looking at CPU-specific assembly language. In order to fully understand the application, the reverse engineer would need to understand those lower layers, instruction by instruction. This process is time consuming. Therefore, any shortcuts the reverse engineer can take to obtain valuable insight help to reduce the time and effort in discovering vulnerabilities: locating strings is one such shortcut.

Strings are typically human-readable character sequences that contain one or more words in a spoken language. More specifically, these string literals are statements directly coded into the application by a developer. Once they end up in the binary, they give the reverse engineer a peek at what the application is and what the application potentially does. Some examples of a string the reverse engineer may look for:

  • “error: invalid argument – expected an integer”
  • “entered function CryptoImpl::encryptWithDefaultKey()”
  • “password1234”

Go stores string literals encoded as UTF-8, but a Go string is technically a random sequence of bytes. In our scenario, we didn’t need to worry about characters outside of the ASCII range (7-bit).

But the strings command only gave me one long string

strings is the default utility for quickly pulling strings out of a binary. It simply looks for null-terminated character sequences of a certain length that match a range of likely human-readable characters.

When we put a Go binary through strings, the output consisted of one or more extremely long strings! This made it very confusing if you don’t know where one string ends and the next starts!

$ cat <<SOURCE > helloworld.go
package main

func main() {
    print("hello world, how are you today?")
}
SOURCE

$ go build helloworld.go

$ strings helloworld|grep hello
SIGSEGV: segmentation violationbad write barrier buffer boundscall from within the Go runtimecasgstatus: bad incoming valuescheckmark found unmarked objectentersyscallblock inconsistent hello world, how are you today?inserting span already in treapinternal error - misuse of itabnon in-use span in unswept listpacer: sweep done at heap size resetspinning: not a spinning mruntime: cannot allocate memoryruntime: split stack overflow:  (types from different packages)SIGFPE: floating-point exceptionSIGTTOU: background write to tty" not supported for cpu option "end outside usable address spacenon-Go code disabled sigaltstackpanic while printing panic valueruntime: mcall function 
...
...
... 

Reason: Go does not store null-terminated strings in the compiled binary. A string in Go consists of a sequence of bytes and a separately-maintained length value. It appeared that all strings were concatenated together as one long string, and it was now the job of the executable code to know the length of a string.

If we wanted to mimic strings for a Go binary, we therefore need to identify:

  1. The start (address) of a string, and
  2. The length of the string

For these tasks, we turned to Radare2, an (LGPL) open-source reverse-engineering framework available for Linux, macOS, and Windows. r2, as it is commonly called, supports analyzing a multitude of CPU architectures and file formats.

Intro to Radare2

r2 is a command line utility that has an extensive command set that affords powerful capabilities to a reverse engineer. Invoking r2 with the binary you wish to analyze, various commands allow you to disassemble, print data and other expressions, and perform a wide range of other features.

$ r2 helloworld
 -- r2 talks to you. tries to make you feel well.
[0x0104a030]> ?
Usage: [.][times][cmd][~grep][@[@iter]addr!size][|>pipe] ; ...
Append '?' to any char command to get detailed help
Prefix with number to repeat command N times (f.ex: 3x)
|%var =valuealias for 'env' command
| *[?] off[=[0x]value]    pointer read/write data/values (see ?v, wx, wv)
| (macro arg0 arg1)       manage scripting macros
| .[?] [-|(m)|f|!sh|cmd]  Define macro or load r2, cparse or rlang file
| =[?] [cmd]              send/listen for remote commands (rap://, http://, <fd>)
| <[...]                  push escaped string into the RCons.readChar buffer
| /[?]                    search for bytes, regexps, patterns, ..
| ![?] [cmd]              run given command as in system(3)
| #[?] !lang [..]         Hashbang to run an rlang script
| a[?]                    analysis commands
| b[?]                    display or change the block size
| c[?] [arg]              compare block with given data
| C[?]                    code metadata (comments, format, hints, ..)
| d[?]                    debugger commands
| e[?] [a[=b]]            list/get/set config evaluable vars
| f[?] [name][sz][at]     add flag at current address
| g[?] [arg]              generate shellcodes with r_egg
| i[?] [file]             get info about opened file from r_bin
| k[?] [sdb-query]        run sdb-query. see k? for help, 'k *', 'k **' ...
| L[?] [-] [plugin]       list, unload load r2 plugins
| m[?]                    mountpoints commands
| o[?] [file] ([offset])  open file at optional address
| p[?] [len]              print current block with format and length
| P[?]                    project management utilities
| q[?] [ret]              quit program with a return value
| r[?] [len]              resize file
| s[?] [addr]             seek to address (also for '0x', '0x1' == 's 0x1')
| S[?]                    io section manipulation information
| t[?]                    types, noreturn, signatures, C parser and more
| T[?] [-] [num|msg]      Text log utility
| u[?]                    uname/undo seek/write
| V                       visual mode (V! = panels, VV = fcngraph, VVV = callgraph)
| w[?] [str]              multiple write operations
| x[?] [len]              alias for 'px' (print hexadecimal)
| y[?] [len] [[[@]addr    Yank/paste bytes from/to memory
| z[?]                    zignatures management
| ?[??][expr]             Help or evaluate math expression
| ?$?                     show available '$' variables and aliases
| ?@?                     misc help for '@' (seek), '~' (grep) (see ~??)
| ?>?                     output redirection
[0x0104a030]> 

For instructions on how to use a command, simply add a ? after the command, or type ? to get an overview of all commands as shown above.

r2 contains language bindings so that we can interact with it from our programming language of choice, using those very same commands above. We chose Python to script the workflow to mimic strings for a Go binary. We simply need to import r2pipe to obtain a pipe into r2. We then use the cmd() method to execute an r2 command. cmdj() can be used to execute an r2 command that returns JSON, already formatted into a nice dictionary for us.

First, we’ll get some high-level metadata of the binary in question:

import r2pipe

class GoStringsR2(object):

    def load(self, _file):
        self.r2 = r2pipe.open(_file)
        self.data = {}
        self.data["symbols"] = self.r2.cmdj("isj")
        self.data["sections"] = self.r2.cmdj("iSj")
        self.data["info"] = self.r2.cmdj("ij")

        self.arch = self.data["info"]["bin"]["arch"]
        self.bintype = self.data["info"]["bin"]["bintype"]
        self.bits = self.data["info"]["bin"]["bits"]
        self.binos = self.data["info"]["bin"]["os"]

Radare2 helps slice up that long string

Our first real task is locating that extremely long string in the binary, which we’ll refer to as the “strings blob“. After that, we want Radare2 to find all references to the strings blob from executable code. Radare2 has the ability to look at all instructions in the executable code and determine if it references other data areas. If the data area referenced is within the strings blob, we’ll mark that as a string hit. Later on we’ll have to figure out the length of the string.

So where is that long string in a Go binary?

As mentioned, the strings blob is an extremely long “string” within the binary, but is not necessarily null-terminated. In theory, we could search the read-only data sections of the binary and look for the longest null-terminated string as a guess as to where the strings blob is. This could work, but a Go string may also contain null bytes (0x00), which would prematurely cut short the strings blob.

However, if the Go binary has not been stripped of its symbols, we noted that two symbols helped us to locate the strings blob and its extent. The go.strings.* symbol marked the beginning of the strings blob, whereas go.func.* marked the end of the strings blob. We could ask Radare2 for all symbols and look for these two symbols to locate the strings blob. If symbols have been stripped, we looked for two NULL bytes to locate the strings blob. This may not be foolproof, but it worked for the binaries we looked at: your mileage may vary.

Retrieving the strings blob from an unstripped binary:

    def get_string_table(self):
        rodata = self.get_rodata_section()
        stab_sym = self.get_string_table_symbols(rodata)

        strtab_start = stab_sym["vaddr"]
        strtab_end = strtab_start + stab_sym["tabsize"]
        strtab = {
            "startaddr": strtab_start,
            "endaddr": strtab_end,
            "data": stab_sym["table"],
        }
        return strtab

    def get_string_table_symbols(self, rdata):
        g_str = self.find_symbol("go.string.*")
        g_func = self.find_symbol("go.func.*")
        if g_str is not None and g_func is not None:
            g_str["tabsize"] = g_func["vaddr"] - g_str["vaddr"]
            startaddr = g_str["vaddr"] - rdata["vaddr"]
            endaddr = startaddr + g_str["tabsize"]
            g_str["table"] = rdata["data"][startaddr:endaddr]
            return g_str
        return None

Let’s chop it up

To start off, we use the /ra command of Radare2 to analyze all instructions and determine where one part of the code is accessing some other part of the code, or even data. For our Go strings extraction, we specifically care about the locations where executable code is accessing the strings blob. Once Radare2 has found all references, we can use the ax Radare2 command to review the references and search for those which refer to the strings blob.

1. Use: /ra to locate references

  • We found that /ra works well for 32-bit and 64-bit x86 architectures
  • We found that /aae works well for 32-bit and 64-bit ARM architectures

2. Use: axq to show the references in simple source -> target format.

  • axj may also be used to show references in JSON format
    def get_cross_refs(self):
        xrefs = None
        if self.arch == "x86":
            xrefs = self.get_cross_refs_x86()
        elif self.arch == "arm":
            xrefs = self.get_cross_refs_arm()
        return xrefs

    def get_cross_refs_x86(self):
        self.r2.cmd("/ra")
        return self.r2.cmd("axq")

    def get_cross_refs_arm(self):
        self.r2.cmd("aae")
        return self.r2.cmd("axq")

Now, filter those references to ensure they belong to the strings blob by creating a dictionary where key equals the address of the start of the string, and value is a simple counter of how many occurrences were found:

    def process_xrefs(self, xrefs, strtab_start, strtab_end):
        str_refs = {}

        code_section = self.get_code_section()

        # Sample xrefs string: "0x01640839 -> 0x016408a9  CALL"
        for line in xrefs.split(b"\n"):
            lparts = line.split(b" ")
            # 0 = src, 1= arrow, 2 = dst, 3=empty, 4=type
            if len(lparts) == 5:
                r_src = int(lparts[0].decode("ascii"), 16)
                r_dst = int(lparts[2].decode("ascii"), 16)
                if self.is_a_string_ref(
                    r_src, r_dst, strtab_start, strtab_end, code_section
                ):
                    str_refs[r_dst] = (
                        0 if r_dst not in str_refs.keys() else str_refs[r_dst]
                    ) + 1

        return str_refs

We found string references, what about the length though?

The second task after learning the start addresses of strings within the strings blob is to figure out the length of each string. When a reference to a string was located by Radare2 within executable code, it typically involved loading the virtual address of that string into a CPU register or onto the stack. Near that same code were instructions that loaded the length of that string into another CPU register. We decided not to try and detect string lengths by analyzing more code, and instead focused on a simpler approach.

If we assume that the following statements are true:

  1. All strings in the strings blob are referenced in code, and
  2. We can find all of those references

…then we don’t really care about each string’s length. We could simply walk backwards from the highest-address string reference in the strings blob and print it until we hit the end of the strings blob. Then, take the previous reference and print until we hit the start of the next string that we just printed, and so on. Of course, neither of those assumptions held true 100% for every Go binary we fed to Radare2, but, for the most part, it met our needs.

Sample strings blob:

   /--- [START]
  |       /--- ref2
  |      |       /--- ref3 
  |      |      |       /--- ref4
  v      v      v      v
  string1string2string3string4[END]

1. Print from ref4 to [END] = "string4"
2. Print from ref3 to ref4 = "string3"
3. Print from ref2 to ref3 = "string2"
4. Print from [START] to ref2 = "string1"

So let’s chop up the strings into an array of discrete strings!

   def find_strings(self, refs, tablebase, tabledata):
        # refs.keys() = address, refs.values() = count
        refs_addrs = sorted(refs.keys(), reverse=True)

        all_strings = []
        for r in refs_addrs:
            r_offset = r - tablebase
            if len(all_strings) > 0:
                last_ref = all_strings[len(all_strings) - 1][0] - tablebase
                r_end_offset = last_ref
            else:
                r_end_offset = len(tabledata)

            r_str = tabledata[r_offset:r_end_offset]
            all_strings.append([tablebase + r_offset, r_end_offset - r_offset, r_str])

        return all_strings

The above uses the simple approach and is an educated guess. What if we did want to find its exact length? That would generally mean we need introspection into register or stack values at the time the string reference is used. That typically means code emulation as well as making guesses as to which register (or stack-local variable) contains the length parameter. The effort to accomplish these approaches simply outweighed the benefit given the time we had allotted.

Putting it all together with Python: gostringsr2

In the above, we’ve been defining the GoStringsR2 class. We can take this class definition and invoke it with some simple user interface code to mimic strings for a Go binary. This script worked for the x86, x86-64, and (ARM) Aarch32/Aarch64 CPU architecture binaries we needed to look at. Thanks to Radare2’s wide architecture and file format support, we made it work whether the binary was compiled for Linux (ELF), macOS (Mach-O), and even Windows (PE)!

Review of the full workflow, once again:

  1. Open the binary application with r2pipe
  2. Locate the go.string.* and go.func.* symbols to identify the strings blob
  3. Read the strings blob from the “read-only” data section of the binary
  4. Ask r2 to find all references
  5. Retrieve, parse, and filter the references to locate those referencing the strings blob
  6. Loop through each strings blob reference, starting at the highest address and work our way backwards, and extract the string, stopping once we hit the next reference.
  7. Print the strings using an ASCII encoding.

The full code is posted below.

Sample invocation looking for our previous “hello world, how are you today” string in helloworld.go:

$ gostringsr2 helloworld|grep -A5 -B5 hello

bad write barrier buffer bounds
call from within the Go runtime
casgstatus: bad incoming values
checkmark found unmarked object
entersyscallblock inconsistent 
hello world, how are you today?
inserting span already in treap
internal error - misuse of itab
non in-use span in unswept list
pacer: sweep done at heap size 

Going further

We demonstrated a simple reverse engineering task that relied on Radare2 to help solve for us. We should also mention that we looked at Go binaries that were compiled with Go version 1.11.x. As mentioned before, this workflow assumes all (or at least most) references are found by Radare2; there are other ways in which strings can be referenced indirectly that this approach will not solve.

Our goal was not to make a foolproof solution that stands the test of time or Go revisions: we wanted a quick implementation to get us further than we were before. However, these techniques are not limited to Radare2 or the Go language – that was simply our task at hand. In the process, we noticed some interesting patterns with Go strings that could be further explored to refine our techniques such as the alphabetical and length-based sorting of the strings blob. The Go language is also open-source, so we could have just reviewed the source code…but reverse engineering is more fun!

The ability to script reverse engineering techniques using various tools, whether Radare2, Binary Ninja, IDA Pro, or Ghidra is a valuable tool in any reverse engineer’s arsenal. We hope this post demonstrates a quick simple use case.

Source Code

The code for this project is available over at our GitHub: gostringsr2