Extracting Type Information from a Go Binary
Introduction
Go is a strongly typed language. This means that you can’t concatenate for example a string with an integer without first converting the integer to a string. For this to be enforced, there needs to be a way for the runtime to track all the different types. In terms of Go, all the types have a definition that is included in the binary. By parsing all of these type definitions, it is possible to reconstruct all the types inside the binary which can aid the analysis of a suspicious application/malware. This post will walk through where this data is located, how to extract and parse it so the type definitions can be reconstructed for all the types in the binary.
It all starts with moduledata
As described in a previous blog post, the
moduledata structure holds a pointer to some very important data structures in
the Go binary. For recovering type-information, we are mainly interested in
two data structures: types
and typelinks
. Below is the current moduledata
structure as of this writing.
type moduledata struct {
pclntable []byte
ftab []functab
filetab []uint32
findfunctab uintptr
minpc, maxpc uintptr
text, etext uintptr
noptrdata, enoptrdata uintptr
data, edata uintptr
bss, ebss uintptr
noptrbss, enoptrbss uintptr
end, gcdata, gcbss uintptr
types, etypes uintptr
textsectmap []textsect
typelinks []int32 // offsets from types
itablinks []*itab
ptab []ptabEntry
pluginpath string
pkghashes []modulehash
modulename string
modulehashes []modulehash
hasmain uint8 // 1 if module contains the main function, 0 otherwise
gcdatamask, gcbssmask bitvector
typemap map[typeOff]*_type // offset to *_rtype in previous module
bad bool // module failed to load and should be ignored
next *moduledata
}
The moduledata structure has been relatively stable in the last few releases of
the Go compiler. In version 1.8 the field textsectmap
was added which means
the offset for the typelinks
slice is different between 1.7 and 1.8+, the
moduledata structure for 1.7 is shown below, otherwise, it has been unchanged.
type moduledata struct {
pclntable []byte
ftab []functab
filetab []uint32
findfunctab uintptr
minpc, maxpc uintptr
text, etext uintptr
noptrdata, enoptrdata uintptr
data, edata uintptr
bss, ebss uintptr
noptrbss, enoptrbss uintptr
end, gcdata, gcbss uintptr
types, etypes uintptr
typelinks []int32 // offsets from types
itablinks []*itab
modulename string
modulehashes []modulehash
gcdatamask, gcbssmask bitvector
typemap map[typeOff]*_type // offset to *_rtype in previous module
next *moduledata
}
All the type-information is located in the types
data. The types
data not
only holds the type-information, but it also holds other data about the types.
To find the type-information, the typelinks
slice is needed. This slice holds
offsets from the beginning of the types
to where the information of a type is
stored. Unfortunately, offsets for all types are not located within this slice,
but it is still possible to find all types using this array.
Parsing the type-information
The offsets in the typelinks
points to a data structure that describes the
type. The data structure is used by Go track all the different types within the
binary. The structure is defined in three places: the compiler, the reflect
package, and the runtime package. In the runtime package, the name of the
structure is _type
and in the reflect package it is called rtype
. The
definition of the rtype
structure is shown below.
type rtype struct {
size uintptr
ptrdata uintptr // number of bytes in the type that can contain pointers
hash uint32 // hash of type; avoids computation in hash tables
tflag tflag // extra type-information flags
align uint8 // alignment of variable with this type
fieldAlign uint8 // alignment of struct field with this type
kind uint8 // enumeration for C
alg *typeAlg // algorithm table
gcdata *byte // garbage collection data
str nameOff // string form
ptrToThis typeOff // type for pointer to this type, may be zero
}
As said earlier, all types in the binary have a corresponding _type
/rtype
structure. This includes all the primitive types and user-defined types. The
kind field is an enum value corresponding to the underlying primitive type. All
the possible options are shown below.
const (
Invalid Kind = iota
Bool
Int
Int8
Int16
Int32
Int64
Uint
Uint8
Uint16
Uint32
Uint64
Uintptr
Float32
Float64
Complex64
Complex128
Array
Chan
Func
Interface
Map
Ptr
Slice
String
Struct
UnsafePointer
)
Another interesting field is str
. This value is an offset from the beginning
of the types data to where a packed byte structure exists with the type’s
name and other string information. For example, the primitive type Int
will
also have the name of int
, but derived types are different. Say you have
defined a type superInt
as below. Its name would be superInt while the kind
enum is an Int
.
type superInt int
The tflag
field is a bitmask that is used to inform about potentially other
data after the structure as described in the source code snippet shown below.
// tflag is used by an rtype to signal what extra type-information is
// available in the memory directly following the rtype value.
//
// tflag values must be kept in sync with copies in:
// cmd/compile/internal/gc/reflect.go
// cmd/link/internal/ld/decodesym.go
// runtime/type.go
type tflag uint8
const (
// tflagUncommon means that there is a pointer, *uncommonType,
// just beyond the outer type structure.
//
// For example, if t.Kind() == Struct and t.tflag&tflagUncommon != 0,
// then t has uncommonType data and it can be accessed as:
//
// type tUncommon struct {
// structType
// u uncommonType
// }
// u := &(*tUncommon)(unsafe.Pointer(t)).u
tflagUncommon tflag = 1 << 0
// tflagExtraStar means the name in the str field has an
// extraneous '*' prefix. This is because for most types T in
// a program, the type *T also exists and reusing the str data
// saves binary size.
tflagExtraStar tflag = 1 << 1
// tflagNamed means the type has a name.
tflagNamed tflag = 1 << 2
)
An uncommonType
As mentioned in the previous section, some times can be uncommon types. So what are uncommon types? It turns out that they are more common than you think. In Go, any type can have methods associated with it. This is done by the example shown below.
type T struct{}
func (t T) myMethod()
In the code snippet, myMethod
is method for the type T
. This makes T
an
uncommon type. In other words, uncommon types are types with methods.
Information about the type’s methods is defined in the uncommon
structure.
As described in the section above, this structure is located right after the
type structure. The layout of the uncommonType
structure is shown below. It
holds information about the import path, the number of methods (total and
exported), and an offset from this structure to an array of method data
structures. This is the current definition of the structure as the release of
Go 1.13beta1 and its general shape has been like this since the first release
of Go 1.7. Versions before 1.7 have a very different look.
type uncommonType struct {
pkgPath nameOff // import path; empty for built-in types like int, string
mcount uint16 // number of methods
xcount uint16 // number of exported methods
moff uint32 // offset from this uncommontype to [mcount]method
_ uint32 // unused
}
Go 1.7beta1 was the first release with the new design of this structure. Its
uncommonType
is shown below. It is much smaller than the current one, but it
essentially holds the same information. This structure definition is unique and
does not exist any binaries produced by other versions of the Go compiler.
type uncommonType struct {
pkgPath nameOff // import path; empty for built-in types like int, string
mcount uint16 // number of methods
moff uint16 // offset from this uncommontype to [mcount]method
}
The general shape of the structure was released with the release of Go
1.7beta2. It is the same size as the current structure but the xcount
field
is unused. For extracting the methods, this has no noticeable effect.
type uncommonType struct {
pkgPath nameOff // import path; empty for built-in types like int, string
mcount uint16 // number of methods
_ uint16 // unused
moff uint32 // offset from this uncommontype to [mcount]method
_ uint32 // unused
}
One of the fields in the structure, moff
, points to an array of method
structures. The definition of this structure is shown below.
// Method on non-interface type
type method struct {
name nameOff // name of method
mtyp typeOff // method type (without receiver)
ifn textOff // fn used in interface call (one-word receiver)
tfn textOff // fn used for normal method call
}
The mtyp
field is an offset to the function type for the method. It is a
_type/rtype
structure with the kind value of Func
. More on this type later.
Both of the ifn
and tfn
fields points to offsets in the text section of
the binary. This where function code is located.
When analyzing real binaries, it turns out that some methods do not have a
method type or an offset in the text section. Below is an analysis of a binary.
In the snippet, the method array for *strconv.decimal
is walked and the
values are printed. It can be seen that most of them do not have a method type
and some of the functions do not have offsets to function code.
*strconv.decimal has 9 methods
Method 1 name: Assign
Function at 0x58930 and 0x58930
Method 2 name: Round
Function at 0x59170 and 0x59170
Method 3 name: RoundDown
Function at 0x592d0 and 0x592d0
Method 4 name: RoundUp
Function at 0x59320 and 0x59320
Method 5 name: RoundedInteger
Function at 0x0 and 0x0
Method 6 name: Shift
Function at 0x590a0 and 0x590a0
Method 7 name: String
Method type: func() string
Function at 0x58310 and 0x58310
Method 8 name: floatBits
Function at 0x0 and 0x0
Method 9 name: set
Method type: func(string) bool
Function at 0x0 and 0x0
The symbols in the binary, shown below, also confirms that some functions are missing.
0x00458720 sym.strconv.__decimal_.String
0x00458bf0 sym.strconv.__decimal_.Assign
0x00459130 sym.strconv.__decimal_.Shift
0x00459200 sym.strconv.__decimal_.Round
0x004592d0 sym.strconv.__decimal_.RoundUp
0x00459710 sym.strconv.__extFloat_.FixedDecimal
0x00459c10 sym.strconv.__extFloat_.ShortestDecimal
0x0045e210 sym.type..hash.strconv.decimal
0x0045e270 sym.type..eq.strconv.decimal
It turns out that the Go compiler does some pruning of methods that are not used. While not all information is always present, the name of the method is still available which can be used for further analysis.
Some of Go Types
Each primitive type has a corresponding data type in the runtime. All of these
data types are structures and the _type
/rtype
is the first field. It is an
anonymous field so hence embedded. This means, when parsing the type data, all
the extra data for the specific type is usually located right after the
_type
/rtype
data. The kind
field can be used to figure out what type and
what data will be right after the _type
/rtype
structure.
Struct type
The structType
data type, shown below, is used to store information about
each type derived from the primitive struct
type. It has two extra field,
pkgPath
, and fields
. The pkgPath
field is the import name of the package
while the fields
is a slice of structField
, also shown below, which are
used to store information about the fields. The structField
structure has
three fields. The first one is the name of the field, the second is a pointer
to a _type
/rtype
structure that can be used to determine the type of the
field, the last is an integer that encodes the offset and if the field is
embedded/anonymous.
// structType represents a struct type.
type structType struct {
rtype
pkgPath name
fields []structField // sorted by offset
}
// Struct field
type structField struct {
name name // name is always non-empty
typ *rtype // type of field
offsetEmbed uintptr // byte offset of field<<1 | isEmbedded
}
func (f *structField) offset() uintptr {
return f.offsetEmbed >> 1
}
func (f *structField) embedded() bool {
return f.offsetEmbed&1 != 0
}
If the struct
type has some methods attached to it, it is an uncommon type.
In this scenario, the uncommon data structure is right after the structType
data as shown below.
type structTypeUncommon struct {
structType
u uncommonType
}
Pointer type
Pointers to types have their own type called ptrType
, it is shown in the
code block below. It essentially just adds a pointer to a _type
/rtype
for
the type it points to. This means, for example, *int
and *uint
are two
different types and have their own ptrType
structure stored in the binary.
// ptrType represents a pointer type.
type ptrType struct {
rtype
elem *rtype // pointer element (pointed at) type
}
One note when it comes to methods. If a pointer receiver is used when defining
a method, as seen in the example below, the methods will be attached to *myThing
and not myThing
.
type myThing struct{}
func (m *myThing) DoSomething()
Interface type
The data structure for interfaces is simple and is shown below. It has
essentially two additional fields. One for the import pathname and a slice of
imethod
. The imethod
structure, also shown below, provides information
about the functions that need to be implemented to satisfy the interface. The
first field in the imethod
structure is the name. This is the function name.
The second field is the offset to a _type
/rtype
structure. This structure
is of the “kind” function and hence provide information about the function
definition, i.e., types for the function arguments and return values.
// interfaceType represents an interface type.
type interfaceType struct {
rtype
pkgPath name // import path
methods []imethod // sorted by hash
}
// imethod represents a method on an interface type
type imethod struct {
name nameOff // name of method
typ typeOff // .(*FuncType) underneath
}
Map type
The map type is probably the most complex structures of all the types. It is
shown below. It has information about a bunch of sizes that are used under the
hood. Luckily, this is created by the compiler and the programmer has no
control over it so it can be ignored. The fields that are of interest are
key
and elem
. By parsing these values, it is possible to reconstruct the
source code representation of the type definition. The fields are pointers to
two _type
/rtype
structures and essentially corresponds to map[key]elem
.
// mapType represents a map type.
type mapType struct {
rtype
key *rtype // map key type
elem *rtype // map element (value) type
bucket *rtype // internal bucket structure
keysize uint8 // size of key slot
valuesize uint8 // size of value slot
bucketsize uint16 // size of bucket
flags uint32
}
Slice and array type
The slice and array types are very similar, both shown below. The slice type
information is recorded in the elem
field and for arrays, the length is stored
in the len
field.
// sliceType represents a slice type.
type sliceType struct {
rtype
elem *rtype // slice element type
}
// arrayType represents a fixed array type.
type arrayType struct {
rtype
elem *rtype // array element type
slice *rtype // slice type
len uintptr
}
Channel type
Similar to the array, slice, and map type, the chanType
also has a field
called elem
to track what type is sent over the channel. It also has an enum
to indicate if the channel only can receive, only send, or send and receive.
// chanType represents a channel type.
type chanType struct {
rtype
elem *rtype // channel element type
dir uintptr // channel direction (ChanDir)
}
Function type
Since functions in Go are first-class citizens, there is also a type
definition for function types. The following code snippet is taken from the
standard library describing the type. Since it’s possible for all types to have
methods, making them an uncommonType
, function types can also have methods.
When this happens, the code snippet below describes how the data is stored in
the binary. The funcType
just has two additional fields after the
rtype
/_type
structure, a uint16
for the number of function arguments and
a uint16
for the number of function return values. The type-information for
the function arguments and return values are stored in an array right after the
funcType data structure.
// funcType represents a function type.
//
// A *rtype for each in and out parameter is stored in an array that
// directly follows the funcType (and possibly its uncommonType). So
// a function type with one method, one input, and one output is:
//
// struct {
// funcType
// uncommonType
// [2]*rtype // [0] is in, [1] is out
// }
type funcType struct {
rtype
inCount uint16
outCount uint16 // top bit is set if last input parameter is ...
}
Conclusion
All the types used by a Go application are stored within a types section inside the binary. By parsing this data structure, it is possible to fully recover all the function definitions. This includes private types and fields.