Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Advanced Type Annotations

IDA extends standard C/C++ type declarations with specialized annotations that provide control over data interpretation and display in disassembly and decompiled code.

For a complete list of all type system keywords, see the Type System Keywords

Shifted Pointers

Sometimes in binary code we can encounter a pointer to the middle of a structure. Such pointers usually do not exist in the source code but an optimizing compiler may introduce them to make the code shorter or faster.

Such pointers can be described using shifted pointers. A shifted pointer is a regular pointer with additional information about the name of the parent structure and the offset from its beginning. For example:

        struct mystruct
        {
          char buf[16];
          int dummy;
          int value;            // <- myptr points here
          double fval;
        };
        int *__shifted(mystruct,20) myptr;

The above declaration means that myptr is a pointer to ‘int’ and if we decrement it by 20 bytes, we will end up at the beginning of ‘mystruct’.

Please note that IDA does not limit parents of shifted pointers to structures. A shifted pointer after the adjustment may point to any type except ‘void’.

Also, negative offsets are supported too. They mean that the pointer points to the memory before the structure.

When a shifted pointer is used with an adjustment, it will be displayed with the ‘ADJ’ helper function. For example, if we refer to the memory 4 bytes further, it can be represented like this:

        ADJ(myptr)->fval

Shifted pointers are an improvement compared to the CONTAINING_RECORD macro because expressions with them are shorter and easier to read.

Scattered Argument Locations

Modern compilers may pass structure arguments across multiple registers or mixed register/stack locations. Scattered argument locations describe these complex calling conventions.

  00000000 struc_1         struc ; (sizeof=0xC)
  00000000 c1              db ?
  00000001                 db ? ; undefined
  00000002 s2              dw ?
  00000004 c3              db ?
  00000005                 db ? ; undefined
  00000006                 db ? ; undefined
  00000007                 db ? ; undefined
  00000008 i4              dd ?
  0000000C struc_1         ends

If we have this function prototype:

  void myfunc(struc_1 s);

the 64bit GNU compiler will pass the structure like this:

  RDI: c1, s2, and c3
  RSI: i4

Since compilers can use such complex calling conventions, IDA needs some mechanism to describe them. Scattered argument locations are used for that. The above calling convention can be described like this:

  void __usercall myfunc(struc_1 s@<0:rdi.1, 2:rdi^2.2, 4:rdi^4.1, 8:rsi.4>);

It reads:

  • 1 byte at offset 0 of the argument is passed in byte 0 of RDI
  • 2 bytes at offset 2 of the argument are passed in bytes 1–2 of RDI
  • 1 byte at offset 4 of the argument is passed in byte 3 of RDI
  • 4 bytes at offset 8 of the argument are passed starting from byte 0 of RSI

In other words, the following syntax is used:

  argoff:register^regoff.size

where:

  • argoff — offset within the argument
  • register — register name used to pass part of the argument
  • regoff — offset within the register
  • size — number of bytes

The regoff and size fields can be omitted if there is no ambiguity.

If the register is not specified, the expression describes a stack location:

  argoff:^stkoff.size

where:

  • argoff - offset within the argument
  • stkoff - offset in the stack frame (the first stack argument is at offset 0)
  • size - number of bytes

Please note that while IDA checks the argument location specifiers for soundness, it cannot perform all checks and some wrong locations may be accepted. In particular, IDA in general does not know the register sizes and accepts any offsets within them and any sizes.

See also the Set type… (action SetType) command.

Data Representation Annotations

Data representation: enum member

Syntax:

  __enum(enum_name)

Instead of a plain number, a symbolic constant from the specified enum will be used. The enum can be a regular enum or a bitmask enum. For bitmask enums, a bitwise combination of symbolic constants will be printed. If the value to print cannot be represented using the specified enum, it will be displayed in red.

Example:

   enum myenum { A=0, B=1, C=3 };
   short var __enum(myenum);

If var is equal to 1, it will be represented as “B”

Another example:

   enum mybits __bitmask { INITED=1, STARTED=2, DONE=4 };
   short var __enum(mybits);

If var is equal to 3, it will be represented as “INITED|STARTED”

This annotation is useful if the enum size is not equal to the variable size. Otherwise using the enum type for the declaration is better:

   myenum var;  // is 4 bytes, not 2 as above

Data representation: offset expression

Syntax:

  __offset(type, base, tdelta, target)
  __offset(type, base, tdelta)
  __offset(type, base)
  __offset(type|AUTO, tdelta)
  __offset(type)
  __off

where type is one of:

TypeDescription
OFF88-bit full offset
OFF1616-bit full offset
OFF3232-bit full offset
OFF6464-bit full offset
LOW8low 8 bits of 16-bit offset
LOW16low 16 bits of 32-bit offset
HIGH8high 8 bits of 16-bit offset
HIGH16high 16 bits of 32-bit offset

The type can also be the name of a custom refinfo.

It can be combined with the following keywords:

KeywordDescription
RVAOFFbased reference (rva)
PASTENDreference past an item; it may point to an nonexistent address
NOBASEforbid the base xref creation; implies that the base can be any value
Note: base xrefs are created only if the offset base points to the middle of a segment
SUBTRACTthe reference value is subtracted from the base value instead of (as usual) being added to it
SIGNEDOPthe operand value is sign-extended (only supported for REF_OFF8/16/32/64)
NO_ZEROSan opval of 0 will be considered invalid
NO_ONESan opval of ~0 will be considered invalid
SELFREFthe self-based reference

The base, target delta, and the target can be omitted. If the base is BADADDR, it can be omitted by combining the type with AUTO:

  __offset(type|AUTO, tdelta)

Zero based offsets without any additional attributes and having the size that corresponds the current application target (e.g. REF_OFF32 for a 32-bit bit application), the shoft __off form can be used.

Examples:

  • A 64-bit offset based on the image base:
  int var __offset(OFF64|RVAOFF);
  • A 32-bit offset based on 0 that may point to an non-existing address:
  int var __offset(OFF32|PASTEND|AUTO);
  • A 32-bit offset based on 0x400000:
  int var __offset(OFF32, 0x400000);
  • A simple zero based offset that matches the current application bitness:
  int var __off;

This annotation is useful when the type of the pointed object is unknown, or the variable size is different from the usual pointer size. Otherwise, it is better to use a pointer:

  type *var;

Data representation: string

Syntax:

  __strlit(strtype, "encoding")
  __strlit(strtype, char1, char2, "encoding")
  __strlit(strtype)

where strtype is one of:

TypeDescription
CZero-terminated string, 8 bits per symbol
C_16Zero-terminated string, 16 bits per symbol
C_32Zero-terminated string, 32 bits per symbol
PASCALPascal string: 1-byte length prefix, 8 bits per symbol
PASCAL_16Pascal string: 1-byte length prefix, 16 bits per symbol
LEN2Wide Pascal string: 2-byte length prefix, 8 bits per symbol
LEN2_16Wide Pascal string: 2-byte length prefix, 16 bits per symbol
LEN4Delphi string: 4-byte length prefix, 8 bits per symbol
LEN4_16Delphi string: 4-byte length prefix, 16 bits per symbol

It may be followed by two optional string termination characters (only for C). Finally, the string encoding may be specified, as the encoding name or “no_conversion” if the string encoding was not explicitly specified.

Example:

  • A zero-terminated string in windows-1252 encoding:
  char array[10] __strlit(C,"windows-1252");
  • A zero-terminated string in utf-8 encoding:
  char array[10] __strlit(C,"UTF-8");

Data representation: structure offset

Syntax:

  __stroff(structname)
  __stroff(structname, delta)

Instead of a plain number, the name of a struct or union member will be used. If delta is present, it will be subtracted from the value before converting it into a struct/union member name.

Example: An integer variable named var that hold an offset from the beginning of the mystruct structure:

  int var __stroff(mystruct);

If mystruct is defined like this:

  struct mystruct
  {
    char a;
    char b;
    char c;
    char d;
  }

The value 2 will be represented as mystruct.c

Another example: A structure offset with a delta:

  int var __stroff(mystruct, 1);

The value 2 will be represented as mystruct.d-1

Data representation: custom data type and format

Syntax:

 __custom(dtid, fid)

where dtid is the name of a custom data type and fid is the name of a custom data format. The custom type and format must be registered by a plugin beforehand, at the database opening time. Otherwise, custom data type and format ids will be displayed instead of names.

Data representation: tabular form

Syntax:

  __tabform(flags)
  __tabform(flags,lineitems)
  __tabform(flags,lineitems,alignment)
  __tabform(,lineitems,alignment)
  __tabform(,,alignment)

This keyword is used to format arrays. The following flags are accepted:

FlagDescription
NODUPSdo not use the dup keyword
HEXuse hexadecimal numbers to show array indexes
OCTuse octal numbers to show array indexes
BINuse binary numbers to show array indexes
DECuse decimal numbers to show array indexes

It is possible to combine NODUPS with the index radix: NODUPS|HEX

The `lineitems` and `alignment` attributes have the meaning described for the Array… (action MakeArray) command.

Example:

Display the array in tabular form, 4 decimal numbers on a line, each number taking 8 positions. Display indexes as comments in hexadecimal:

  char array[16] __tabform(HEX,4,8) __dec;

A possible array may look like:

  dd   50462976, 117835012, 185207048, 252579084; 0
  dd  319951120, 387323156, 454695192, 522067228; 4
  dd  589439264, 656811300, 724183336, 791555372; 8
  dd  858927408, 926299444, 993671480,1061043516; 0Ch

Without this annotation, the `dup` keyword is permitted, number of items on a line and the alignment are not defined.