#include <Partitioner.h>
Collaboration diagram for Partitioner::IPDParser:

These files are text-based descriptions of the functions and basic blocks used by the partitioner and allow the user to seed the partitioner with additional information that is not otherwise available to the partitioner.
For instance, the analyst may know that a function begins at a certain virtual address but for some reason the partitioner does not discover this address in its normal mode of operation. The analyst can create an IPD file that describes the function so that the Partitioning process finds the function.
An IPD file is able to:
The language non-terminals are:
File := Declaration+
Declaration := FuncDecl | BlockDecl
FuncDecl := 'function' Address [Name] [FuncBody]
FuncBody := '{' FuncStmtList '}'
FuncStmtList := FuncStmt [';' FuncStmtList]
FuncStmt := ( Empty | BlockDecl | ReturnSpec )
ReturnSpec := 'return' | 'returns' | 'noreturn'
BlockDecl := 'block' Address Integer [BlockBody]
BlockBody := '{' BlockStmtList '}'
BlockStmtList := BlockStmt [';' BlockStmtList]
BlockStmt := ( Empty | Alias | Successors ) ';'
Alias := 'alias' Address
Successors := ('successor' | 'successors') [SuccessorAddrList|AssemblyCode]
SuccessorAddrList := AddressList | AddressList '...' | '...'
AddressList := Address ( ',' AddressList )*
Address: Integer
Integer: DECIMAL_INTEGER | OCTAL_INTEGER | HEXADECIMAL_INTEGER
Name: STRING
AssemblyCode: asm '{' ASSEMBLY '}'
Language terminals:
HEXADECIMAL_INTEGER: as in C, for example: 0x08045fe2 OCTAL_INTEGER: as in C, for example, 0775 DECIMAL_INTEGER: as in C, for example, 1234 STRING: double quoted. Use backslash to escape embedded double quotes ASSEMBLY: x86 assembly instructions (must contain balanced curly braces, if any)
Comments begin with a hash ('#') and continue to the end of the line. The hash character is not treated specially inside quoted strings. Comments within an ASSEMBLY terminal must conform to the syntax accepted by the Netwide Assembler (nasm), namely semicolon in place of a hash.
A block declaration specifies the virtual memory address of the block's first instruction. The integer after the address specifies the number of instructions in the block. If the specified length is less than the number of instructions that ROSE would otherwise place in the block at that address, then ROSE will create a block of exactly the specified size. Likewise, if the specified address is midway into a block that ROSE would otherwise create, ROSE will create a block at the specified address anyway, causing the previous instructions to be in a separate block (or blocks). If the specified block size is larger than what ROSE would otherwise place in the block, the block will be created with fewer instructions but the BlockBody will be ignored.
A function declaration specifies the virtual memory address of the entry point of a function. The body may specify whether the function returns. As of this writing [2010-05-13] a function declared as non-returning will be marked as returning if ROSE discovers that a basic block of the function returns.
If a block declaration appears inside a function declaration, then ROSE will assign the block to the function.
The block 'alias' attribute is used to indicate that two basic blocks perform the exact same operation. The specified address is the address of the basic block to use instead of this basic block. All control-flow edges pointing to this block will be rewritten to point to the specified address instead.
Example file:
function 0x805116 "func11" { # declare a new function named "func11" returns; # this function returns to callers block 0x805116 { # block at 0x805116 is part of func11 alias 0x8052116, 0x8052126 # use block 0x805116 in place of 0x8052116 and 0x8052126 } }
A block declaration can specify control-flow successors in two ways: as a list of addresses, or as an x86 assembly language program that's interpretted by ROSE. The benefits of using a program to determine the successors is that the program can directly extract information, such as jump tables, from the specimen executable.
The assembly source code is fed to the Netwide Assembler, nasm (http://www.nasm.us/), which assembles it into i386 machine code. When ROSE needs to figure out the successors for a basic block it will interpret the basic block, then load the successor program and interpret it, then extract the successor list from the program's return value. ROSE interprets the program rather than running it directly so that the program can operate on unknown, symbolic data values rather than actual 32-bit numbers.
The successor program is interpretted in a context that makes it appear to have been called (via CALL instruction) from the end of the basic block being analyzed. These arguments are passed to the program:
The successor program may either fall off the end or execute a RET statement.
For instance, if the 5-instruction block at virtual address 0x00c01115 ends with an indirect jump through a 256-element jump table beginning at 0x00c037fa, then a program to compute the successors might look like this:
block 0x00c01115 5 {
successors asm {
push ebp
mov ebp, esp
; ecx is the base address of the successors return vector,
; the first element of which is the vector size.
mov ecx, [ebp+8]
add ecx, 4
; loop over the entries in the jump table, copying each
; address from the jump table to the svec return value
xor eax, eax
loop:
cmp eax, 256
je done
mov ebx, [0x00c037fa+eax*4]
mov [ecx+eax*4], ebx
inc eax
jmp loop
done:
; set the number of entries in the svec
mov ecx, [ebp+8]
mov DWORD [ecx], 256
mov esp, ebp
pop ebp
ret
The easiest way to parse an IPD file is to read it into memory and then call the parse() method. The following code demonstrates the use of mmap to read the file into memory, parse it, and release it from memory. For simplicity, we do not check for errors in this example.
Partitioner p; int fd = open("test.ipd", O_RDONLY); struct stat sb; fstat(fd, &sb); const char *content = (char*)mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0); Partitioner::IPDParser(p, content, sb.st_size).parse(); munmap(content, sb.st_size);
Public Member Functions | |
| IPDParser (Partitioner *p, const char *input, size_t len, const std::string &input_name="") | |
| void | parse () |
| Top-level parsing function. | |
Private Member Functions | |
| void | skip_space () |
| bool | is_terminal (const char *to_match) |
| bool | is_symbol (const char *to_match) |
| bool | is_string () |
| bool | is_number () |
| void | match_terminal (const char *to_match) |
| void | match_symbol (const char *to_match) |
| std::string | match_symbol () |
| std::string | match_string () |
| rose_addr_t | match_number () |
| std::string | match_asm () |
| bool | parse_File () |
| bool | parse_Declaration () |
| bool | parse_FuncDecl () |
| bool | parse_FuncBody () |
| bool | parse_FuncStmtList () |
| bool | parse_FuncStmt () |
| bool | parse_ReturnSpec () |
| bool | parse_BlockDecl () |
| bool | parse_BlockBody () |
| bool | parse_BlockStmtList () |
| bool | parse_BlockStmt () |
| bool | parse_Alias () |
| bool | parse_Successors () |
Private Attributes | |
| Partitioner * | partitioner |
| Partitioner to be initialized. | |
| const char * | input |
| Input to be parsed. | |
| size_t | len |
| Length of input, not counting NUL termination (if any). | |
| std::string | input_name |
| Optional name of input (usually a file name). | |
| size_t | at |
| Current parse position w.r.t. | |
| Function * | cur_func |
| Non-null when inside a FuncBody nonterminal. | |
| BlockConfig * | cur_block |
| Non-null when inside a BlockBody nonterminal. | |
Classes | |
| class | Exception |
| Partitioner::IPDParser::IPDParser | ( | Partitioner * | p, | |
| const char * | input, | |||
| size_t | len, | |||
| const std::string & | input_name = "" | |||
| ) | [inline] |
| void Partitioner::IPDParser::parse | ( | ) |
Top-level parsing function.
| void Partitioner::IPDParser::skip_space | ( | ) | [private] |
| bool Partitioner::IPDParser::is_terminal | ( | const char * | to_match | ) | [private] |
| bool Partitioner::IPDParser::is_symbol | ( | const char * | to_match | ) | [private] |
| bool Partitioner::IPDParser::is_string | ( | ) | [private] |
| bool Partitioner::IPDParser::is_number | ( | ) | [private] |
| void Partitioner::IPDParser::match_terminal | ( | const char * | to_match | ) | [private] |
| void Partitioner::IPDParser::match_symbol | ( | const char * | to_match | ) | [private] |
| std::string Partitioner::IPDParser::match_symbol | ( | ) | [private] |
| std::string Partitioner::IPDParser::match_string | ( | ) | [private] |
| rose_addr_t Partitioner::IPDParser::match_number | ( | ) | [private] |
| std::string Partitioner::IPDParser::match_asm | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_File | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_Declaration | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_FuncDecl | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_FuncBody | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_FuncStmtList | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_FuncStmt | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_ReturnSpec | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_BlockDecl | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_BlockBody | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_BlockStmtList | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_BlockStmt | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_Alias | ( | ) | [private] |
| bool Partitioner::IPDParser::parse_Successors | ( | ) | [private] |
Partitioner* Partitioner::IPDParser::partitioner [private] |
Partitioner to be initialized.
const char* Partitioner::IPDParser::input [private] |
Input to be parsed.
size_t Partitioner::IPDParser::len [private] |
Length of input, not counting NUL termination (if any).
std::string Partitioner::IPDParser::input_name [private] |
Optional name of input (usually a file name).
size_t Partitioner::IPDParser::at [private] |
Current parse position w.r.t.
"input".
Function* Partitioner::IPDParser::cur_func [private] |
Non-null when inside a FuncBody nonterminal.
BlockConfig* Partitioner::IPDParser::cur_block [private] |
Non-null when inside a BlockBody nonterminal.
1.4.7