ROSE  0.11.56.0
AstMatching.docs
1 Introduction
2 ============
3 
4 The AstMatching mechanism allows to specify arbitrary large patterns
5 to be matched on any subtree in the AST. The patterns are specified as
6 strings and the type names of the AST nodes can be used to specify the
7 AST patterns. Additionally variables and some operators are available
8 to allow the specification of complex patterns. Subtrees can also be
9 ignored in the matching by using '_'. The binary operator '|' allows
10 to combine different matching subexpressions into one
11 expression. Variables are used for specifying pointers to which
12 matched subtrees are stored in the matching result for further
13 processing by the user.
14 
15 In the following example we match assignments with variables on both
16 sides, such as x=y, and assign the result to the variable $R.
17 
18  AstMatching m;
19  AstMatchingResult res=m.match("$R=AssignOp(SgVarRef,SgVarRef)",astRoot);
20 
21 where 'astRoot' is a pointer to some node in the AST.
22 
23 In the above example all subtrees representing an assign operation
24 with two variables as operands would be matched. The dollar sign
25 denotes a variable. In the above example the pointers to the matched
26 subtrees are assigned to the variable $R. The result with all matched
27 assignments is stored in the variable res of type
28 AstMatchingResult. The matching result is a set of maps where each map
29 represents the results for one successful match and holds pairs of a
30 variable name and a pointer to the respective AST subtree.
31 
32 Ignoring subtrees (wildcards '_')
33 ---------------------------------
34 
35 Subtrees can also be specified to be ignored for matching by using '_'
36 in the match expression. For example, if we use SgAssignOp(_,_) we can
37 match all assignment nodes in the AST, but ignore the structure of the ASTs
38 representing the rhs and lhs.
39 
40 Variables
41 =========
42 
43 Variables are used to specify that pointers to matched subtrees are
44 stored in the matching result. An arbitrary number of variables can be
45 used and two forms of use are supported. A variable is denoted with a
46 leading dollar sign an arbitrary number of letters and underscores (a
47 single underscore is used as wildcard). A variable assignment notation
48 can be used to assign the pointers of a specified pattern to a
49 variable. For example, $R=SgAssignOp(SgVarRef,_,_) is matched with all
50 assignments which have a variable on the left hand side and some
51 expression on the right hand side. Alternatively we can also use
52 $R=SgAssignOp($X=SgVarRef,$Y=_) - in this case we also store a pointer
53 to the matched variable node and a pointer to the expression on the
54 rhs in the match result. For the expression $Y=_ we can also simply
55 write $Y as a shorthand, thus we can also use
56 $R=SgAssignOp($X=SgVarRef,$Y) instead. The assignment of variables to
57 variables, such as $Z=$Y, is not allowed.
58 
59 Null-values
60 ===========
61 
62 Null values can be explicitely matched by using "null" in a match
63 expression. For example $X=SgForStatement(_,_,_,_,null) would match
64 all SgForStatement-terms with the 5th argument being 0.
65 
66 Operator '#'
67 ============
68 Placement of operator '#' in a match expression allows to exclude arbitrary subtrees from applying the match operation in subsequent matches. I.e. the marked subtrees are not traversed. For example if we only want to match the for-statements at the outer most level, but no nested for statements, we can use:
69 
70  $FOR=SgForStatement(_,_,_,#_)
71 
72 This matches only the outer for-statements, as the body (4th argument) is excluded from applying the match operator. Without '#' we would also match the inner loops.
73 
74 =============
75 Operator '..'
76 =============
77 This operator can be used in match expressions to specify that an arbitrary
78 number of arguments can follow. For example we can use
79 SgBlock($First,..) to match the very first statement in a
80 SgBlock. Since SgBlocks can have arbitrary arity this is quite useful in
81 this respect. The operator '..' can only be used at most once when
82 specifying the arity of a node, but arbitrary often in a match
83 pattern, e.g. SgBlock(SgForStatement($Cond,..),..) is OK, but
84 SgBlock(_,..,_,..) is not.
85 
86 ==============================================================================
87 Examples:
88 ==============================================================================
89 * match("$R=AssignOp(_,_)",astRoot);
90  Match all assignment operators in an AST.
91 
92 * match("$R=SgAssignOp(SgVarRefExp,SgIntVal),astRoot);
93  Match all assignment operators with a variable on the lhs and an integer value on the rhs.
94 
95 * match("$FORROOT=SgForStatement(_,_,_,#_)",astRoot);
96  Match all outer most for loops, but no nested for-loops. The operator '#' ensures that the match expression is not applied on the AST representing the body of the for-statement (4th argument). The pointer to the root of the AST representing the for-loop is bound to $FORROOT.
97 
98 * match("$N=_(null)",astRoot);
99  Match all nodes with arity 1 and a single null value. The main purpose for such match-expressions is to perform consistency checks in the AST.
100 
101 * match("$N=SgInitializedName(null)",astRoot); // many of those exist in a default ROSE AST
102  Specifically match all SgInitializedName nodes with a null pointer.
103 
104 * match("SgForStatement($LoopCond,_,_,_)|SgWhile($LoopCond,_)|SgDoWhile(_,$LoopCond)",astRoot);
105  Match different Loop constructs and bind variable $LoopCond to the respective loop condition.
106 
107 * match("SgAssignOp(SgVarRef,SgAddOp($X,$Y))",astRoot)
108  Match assignments with a variable on the rhs and an add-operator on the rhs(root). The pointers to the sub-ASTs representing the lhs and rhs of the add-operator are bound to variables $X and $Y for each match in the AST:
109 
110 * match("$Func=SgFunctionCallExp($FuncRef,$Params)",astRoot)
111  Match all function calls and bind variable $Func to the root of each such expression, bind $FuncRef to the SgFunctionRefExp (which can be used to obtain the name) and $Params to the AST representing the parameters:
112 
113 Accessing matching results
114 ==============================================================================
115 
116 The results are collected in a std::list of std::maps. Each map
117 represents on successful match at one location in the AST and contains
118 all the bound variables. The variables can be accessed by name and
119 using the random access operator. The number of elements (=maps) in
120 the list corresponds to the number of matched patterns in the AST.
121 
122 The pointers to matched patterns in the AST can be accessed as follows:
123 e.g.
124  /* 1 */ AstMatching m;
125  /* 2 */ MatchResult res=m.performMatching("$R=SgInitalizedName($X)",root);
126  /* 3 */ SgNode* myRvariable=res["$R"];
127  /* 4 */ SgNode* myXvariable=res["$X"];
128 
129 In line 1 the AstMatching object is created. In line 2 the
130 match-expression and the root node of the AST is provided to the
131 matching mechanism and the results are computed. In line 3 the
132 variable $R (and in line 4 variable $X) is accessed. This pointer
133 value refers to the node in the AST which was matched successfully in
134 the match expression. The matching can be performed on any AST
135 subtree of interest, by letting 'root' point to the respective AST
136 subtree when the match operation is started.
137 
138 Here is a more elaborate code example to perform one match operation
139 on the entire ROSE AST and print all match results in the map using
140 iterators:
141 
142  // Fragment from the matcher_demo program
143  AstMatching m;
144  MatchResult r=m.performMatching("$R=SgInitalizedName(_)",root);
145  // print result in readable form for demo purposes
146  std::cout << "Number of matched patterns: " << r.size() << std::endl;
147  for(MatchResult::iterator i=r.begin();i!=r.end();++i) {
148  std::cout << "MATCH: \n";
149  for(SingleMatchVarBindings::iterator vars_iter=(*i).begin();vars_iter!=(*i).end();++vars_iter) {
150  SgNode* matchedTerm=(*vars_iter).second;
151  std::cout << " VAR: " << (*vars_iter).first << "=" << generateAstTerm(matchedTerm) << " @" << matchedTerm << std::endl;
152  }
153  std::cout << std::endl;
154  }
155 
156 The variable matchedTerm is assigned the pointer to the respective
157 ROSE AST node which is bound to a variable. (*vars_iter).first is the
158 name of the variable as used in the match expression when calling
159 performMatching. In this example these are $R, $X, and $Y. The
160 function generateAstTerm is an auxiliary function which is used to
161 print an AST in readable form on stdout. It is implemented using the
162 same Ast::iterator_with_null which is also used by the matching
163 mechanism.
164 
165 Example-output:
166  MATCH:
167  VAR: $R=SgInitializedName(null) @0x7f1f8914da00
168 
169  MATCH:
170  VAR: $R=SgInitializedName(null) @0x7f1f8914db28
171 
172  MATCH:
173  VAR: $R=SgInitializedName(SgAssignInitializer(SgIntVal)) @0x7f1f8914dc50
174 
175  MATCH:
176  VAR: $R=SgInitializedName(null) @0x7f1f8914dd78
177 
178  MATCH:
179  VAR: $R=SgInitializedName(null) @0x7f1f8914dea0
180 
181  MATCH:
182  VAR: $R=SgInitializedName(SgAssignInitializer(SgIntVal)) @0x7f1f8914dfc8
183 
184 
185 
186 Operator "|"
187 ============
188 
189 This operator allows to combine multiple match expressions. For
190 example "SgAddOp($L,$R)|SgSubOp($L,$R)" will match either a SgAddOp
191 and bind pointers to its two children to $L and $R, or it will match
192 SgSubOp. The operator '|' performs a short-circuit evaluation, thus,
193 matching is performed from left to right and the matching stops as
194 soon as one of the patterns can be successfully matched.
This class represents the rhs of a variable declaration which includes an optional assignment (e...
This class represents the notion of a declared variable.
STL namespace.
Unsigned all(bool b=true)
Generate a value with all bits set or cleared.
Definition: Rose/BitOps.h:23
const char * Placement(int64_t)
Convert qrs::Q3TextCustomItem::Placement enum constant to a string.
void print(const StackVariables &, const Partitioner2::Partitioner &, std::ostream &out, const std::string &prefix="")
Print info about multiple local variables.
This class represents the base class for all IR nodes within Sage III.
Definition: Cxx_Grammar.h:9563
This class represents the concept of a for loop.
This class represents the variable refernece in expressions.
This class represents the function being called and must be assembled in the SgFunctionCall with the ...
void set(Word *words, const BitRange &where)
Set some bits.
Operator
Operators for interior nodes of the expression tree.
Definition: SymbolicExpr.h:73
This class represents the concept of a C++ function call (which is an expression).