Partitioner::IPDParser Class Reference

#include <Partitioner.h>

Collaboration diagram for Partitioner::IPDParser:

Collaboration graph
[legend]
List of all members.

Detailed Description

This is the parser for the instruction partitioning data (IPD) files.

These files are text-based descriptions of the functions and basic blocks used by the partitioner and allow the user to seed the partitioner with additional information that is not otherwise available to the partitioner.

For instance, the analyst may know that a function begins at a certain virtual address but for some reason the partitioner does not discover this address in its normal mode of operation. The analyst can create an IPD file that describes the function so that the Partitioning process finds the function.

An IPD file is able to:

The language non-terminals are:

     File := Declaration+
     Declaration := FuncDecl | BlockDecl

     FuncDecl := 'function' Address [Name] [FuncBody]
     FuncBody := '{' FuncStmtList '}'
     FuncStmtList := FuncStmt [';' FuncStmtList]
     FuncStmt := ( Empty | BlockDecl | ReturnSpec )
     ReturnSpec := 'return' | 'returns' | 'noreturn'

     BlockDecl := 'block' Address Integer [BlockBody]
     BlockBody := '{' BlockStmtList '}'
     BlockStmtList := BlockStmt [';' BlockStmtList]
     BlockStmt := ( Empty | Alias | Successors ) ';'
     Alias := 'alias' Address
     Successors := ('successor' | 'successors') [SuccessorAddrList|AssemblyCode]
     SuccessorAddrList := AddressList | AddressList '...' | '...'

     AddressList := Address ( ',' AddressList )*
     Address: Integer
     Integer: DECIMAL_INTEGER | OCTAL_INTEGER | HEXADECIMAL_INTEGER
     Name: STRING
     AssemblyCode: asm '{' ASSEMBLY '}'

Language terminals:

     HEXADECIMAL_INTEGER: as in C, for example: 0x08045fe2
     OCTAL_INTEGER: as in C, for example, 0775
     DECIMAL_INTEGER: as in C, for example, 1234
     STRING: double quoted. Use backslash to escape embedded double quotes
     ASSEMBLY: x86 assembly instructions (must contain balanced curly braces, if any)

Comments begin with a hash ('#') and continue to the end of the line. The hash character is not treated specially inside quoted strings. Comments within an ASSEMBLY terminal must conform to the syntax accepted by the Netwide Assembler (nasm), namely semicolon in place of a hash.

Semantics

A block declaration specifies the virtual memory address of the block's first instruction. The integer after the address specifies the number of instructions in the block. If the specified length is less than the number of instructions that ROSE would otherwise place in the block at that address, then ROSE will create a block of exactly the specified size. Likewise, if the specified address is midway into a block that ROSE would otherwise create, ROSE will create a block at the specified address anyway, causing the previous instructions to be in a separate block (or blocks). If the specified block size is larger than what ROSE would otherwise place in the block, the block will be created with fewer instructions but the BlockBody will be ignored.

A function declaration specifies the virtual memory address of the entry point of a function. The body may specify whether the function returns. As of this writing [2010-05-13] a function declared as non-returning will be marked as returning if ROSE discovers that a basic block of the function returns.

If a block declaration appears inside a function declaration, then ROSE will assign the block to the function.

The block 'alias' attribute is used to indicate that two basic blocks perform the exact same operation. The specified address is the address of the basic block to use instead of this basic block. All control-flow edges pointing to this block will be rewritten to point to the specified address instead.

Example file:

     function 0x805116 "func11" {             # declare a new function named "func11"
         returns;                             # this function returns to callers
         block 0x805116 {                     # block at 0x805116 is part of func11
             alias 0x8052116, 0x8052126       # use block 0x805116 in place of 0x8052116 and 0x8052126
         }
     }

Basic Block Successors

A block declaration can specify control-flow successors in two ways: as a list of addresses, or as an x86 assembly language program that's interpretted by ROSE. The benefits of using a program to determine the successors is that the program can directly extract information, such as jump tables, from the specimen executable.

The assembly source code is fed to the Netwide Assembler, nasm (http://www.nasm.us/), which assembles it into i386 machine code. When ROSE needs to figure out the successors for a basic block it will interpret the basic block, then load the successor program and interpret it, then extract the successor list from the program's return value. ROSE interprets the program rather than running it directly so that the program can operate on unknown, symbolic data values rather than actual 32-bit numbers.

The successor program is interpretted in a context that makes it appear to have been called (via CALL instruction) from the end of the basic block being analyzed. These arguments are passed to the program:

The successor program may either fall off the end or execute a RET statement.

For instance, if the 5-instruction block at virtual address 0x00c01115 ends with an indirect jump through a 256-element jump table beginning at 0x00c037fa, then a program to compute the successors might look like this:

    block 0x00c01115 5 {
      successors asm {
          push ebp
          mov ebp, esp
          ; ecx is the base address of the successors return vector,
          ; the first element of which is the vector size.
          mov ecx, [ebp+8]
          add ecx, 4
          ; loop over the entries in the jump table, copying each
          ; address from the jump table to the svec return value
          xor eax, eax
        loop:
          cmp eax, 256
          je done
          mov ebx, [0x00c037fa+eax*4]
          mov [ecx+eax*4], ebx
          inc eax
          jmp loop
        done:
          ; set the number of entries in the svec
          mov ecx, [ebp+8]
          mov DWORD [ecx], 256
          mov esp, ebp
          pop ebp
          ret

Example Programmatic Usage

The easiest way to parse an IPD file is to read it into memory and then call the parse() method. The following code demonstrates the use of mmap to read the file into memory, parse it, and release it from memory. For simplicity, we do not check for errors in this example.

    Partitioner p;
    int fd = open("test.ipd", O_RDONLY);
    struct stat sb;
    fstat(fd, &sb);
    const char *content = (char*)mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    Partitioner::IPDParser(p, content, sb.st_size).parse();
    munmap(content, sb.st_size);


Public Member Functions

 IPDParser (Partitioner *p, const char *input, size_t len, const std::string &input_name="")
void parse ()
 Top-level parsing function.

Private Member Functions

void skip_space ()
bool is_terminal (const char *to_match)
bool is_symbol (const char *to_match)
bool is_string ()
bool is_number ()
void match_terminal (const char *to_match)
void match_symbol (const char *to_match)
std::string match_symbol ()
std::string match_string ()
rose_addr_t match_number ()
std::string match_asm ()
bool parse_File ()
bool parse_Declaration ()
bool parse_FuncDecl ()
bool parse_FuncBody ()
bool parse_FuncStmtList ()
bool parse_FuncStmt ()
bool parse_ReturnSpec ()
bool parse_BlockDecl ()
bool parse_BlockBody ()
bool parse_BlockStmtList ()
bool parse_BlockStmt ()
bool parse_Alias ()
bool parse_Successors ()

Private Attributes

Partitionerpartitioner
 Partitioner to be initialized.
const char * input
 Input to be parsed.
size_t len
 Length of input, not counting NUL termination (if any).
std::string input_name
 Optional name of input (usually a file name).
size_t at
 Current parse position w.r.t.
Functioncur_func
 Non-null when inside a FuncBody nonterminal.
BlockConfigcur_block
 Non-null when inside a BlockBody nonterminal.

Classes

class  Exception


Constructor & Destructor Documentation

Partitioner::IPDParser::IPDParser ( Partitioner p,
const char *  input,
size_t  len,
const std::string &  input_name = "" 
) [inline]


Member Function Documentation

void Partitioner::IPDParser::parse (  ) 

Top-level parsing function.

void Partitioner::IPDParser::skip_space (  )  [private]

bool Partitioner::IPDParser::is_terminal ( const char *  to_match  )  [private]

bool Partitioner::IPDParser::is_symbol ( const char *  to_match  )  [private]

bool Partitioner::IPDParser::is_string (  )  [private]

bool Partitioner::IPDParser::is_number (  )  [private]

void Partitioner::IPDParser::match_terminal ( const char *  to_match  )  [private]

void Partitioner::IPDParser::match_symbol ( const char *  to_match  )  [private]

std::string Partitioner::IPDParser::match_symbol (  )  [private]

std::string Partitioner::IPDParser::match_string (  )  [private]

rose_addr_t Partitioner::IPDParser::match_number (  )  [private]

std::string Partitioner::IPDParser::match_asm (  )  [private]

bool Partitioner::IPDParser::parse_File (  )  [private]

bool Partitioner::IPDParser::parse_Declaration (  )  [private]

bool Partitioner::IPDParser::parse_FuncDecl (  )  [private]

bool Partitioner::IPDParser::parse_FuncBody (  )  [private]

bool Partitioner::IPDParser::parse_FuncStmtList (  )  [private]

bool Partitioner::IPDParser::parse_FuncStmt (  )  [private]

bool Partitioner::IPDParser::parse_ReturnSpec (  )  [private]

bool Partitioner::IPDParser::parse_BlockDecl (  )  [private]

bool Partitioner::IPDParser::parse_BlockBody (  )  [private]

bool Partitioner::IPDParser::parse_BlockStmtList (  )  [private]

bool Partitioner::IPDParser::parse_BlockStmt (  )  [private]

bool Partitioner::IPDParser::parse_Alias (  )  [private]

bool Partitioner::IPDParser::parse_Successors (  )  [private]


Member Data Documentation

Partitioner* Partitioner::IPDParser::partitioner [private]

Partitioner to be initialized.

const char* Partitioner::IPDParser::input [private]

Input to be parsed.

size_t Partitioner::IPDParser::len [private]

Length of input, not counting NUL termination (if any).

std::string Partitioner::IPDParser::input_name [private]

Optional name of input (usually a file name).

size_t Partitioner::IPDParser::at [private]

Current parse position w.r.t.

"input".

Function* Partitioner::IPDParser::cur_func [private]

Non-null when inside a FuncBody nonterminal.

BlockConfig* Partitioner::IPDParser::cur_block [private]

Non-null when inside a BlockBody nonterminal.


The documentation for this class was generated from the following file:
Generated on Tue Jan 31 05:34:23 2012 for ROSE by  doxygen 1.4.7