| //===----------------------------------------------------------------------===// | 
 | // Clang Static Analyzer | 
 | //===----------------------------------------------------------------------===// | 
 |  | 
 | = Library Structure = | 
 |  | 
 | The analyzer library has two layers: a (low-level) static analysis | 
 | engine (GRExprEngine.cpp and friends), and some static checkers | 
 | (*Checker.cpp).  The latter are built on top of the former via the | 
 | Checker and CheckerVisitor interfaces (Checker.h and | 
 | CheckerVisitor.h).  The Checker interface is designed to be minimal | 
 | and simple for checker writers, and attempts to isolate them from much | 
 | of the gore of the internal analysis engine. | 
 |  | 
 | = How It Works = | 
 |  | 
 | The analyzer is inspired by several foundational research papers ([1], | 
 | [2]).  (FIXME: kremenek to add more links) | 
 |  | 
 | In a nutshell, the analyzer is basically a source code simulator that | 
 | traces out possible paths of execution.  The state of the program | 
 | (values of variables and expressions) is encapsulated by the state | 
 | (ProgramState).  A location in the program is called a program point | 
 | (ProgramPoint), and the combination of state and program point is a | 
 | node in an exploded graph (ExplodedGraph).  The term "exploded" comes | 
 | from exploding the control-flow edges in the control-flow graph (CFG). | 
 |  | 
 | Conceptually the analyzer does a reachability analysis through the | 
 | ExplodedGraph.  We start at a root node, which has the entry program | 
 | point and initial state, and then simulate transitions by analyzing | 
 | individual expressions.  The analysis of an expression can cause the | 
 | state to change, resulting in a new node in the ExplodedGraph with an | 
 | updated program point and an updated state.  A bug is found by hitting | 
 | a node that satisfies some "bug condition" (basically a violation of a | 
 | checking invariant). | 
 |  | 
 | The analyzer traces out multiple paths by reasoning about branches and | 
 | then bifurcating the state: on the true branch the conditions of the | 
 | branch are assumed to be true and on the false branch the conditions | 
 | of the branch are assumed to be false.  Such "assumptions" create | 
 | constraints on the values of the program, and those constraints are | 
 | recorded in the ProgramState object (and are manipulated by the | 
 | ConstraintManager).  If assuming the conditions of a branch would | 
 | cause the constraints to be unsatisfiable, the branch is considered | 
 | infeasible and that path is not taken.  This is how we get | 
 | path-sensitivity.  We reduce exponential blow-up by caching nodes.  If | 
 | a new node with the same state and program point as an existing node | 
 | would get generated, the path "caches out" and we simply reuse the | 
 | existing node.  Thus the ExplodedGraph is not a DAG; it can contain | 
 | cycles as paths loop back onto each other and cache out. | 
 |  | 
 | ProgramState and ExplodedNodes are basically immutable once created.  Once | 
 | one creates a ProgramState, you need to create a new one to get a new | 
 | ProgramState.  This immutability is key since the ExplodedGraph represents | 
 | the behavior of the analyzed program from the entry point.  To | 
 | represent these efficiently, we use functional data structures (e.g., | 
 | ImmutableMaps) which share data between instances. | 
 |  | 
 | Finally, individual Checkers work by also manipulating the analysis | 
 | state.  The analyzer engine talks to them via a visitor interface. | 
 | For example, the PreVisitCallExpr() method is called by GRExprEngine | 
 | to tell the Checker that we are about to analyze a CallExpr, and the | 
 | checker is asked to check for any preconditions that might not be | 
 | satisfied.  The checker can do nothing, or it can generate a new | 
 | ProgramState and ExplodedNode which contains updated checker state.  If it | 
 | finds a bug, it can tell the BugReporter object about the bug, | 
 | providing it an ExplodedNode which is the last node in the path that | 
 | triggered the problem. | 
 |  | 
 | = Notes about C++ = | 
 |  | 
 | Since now constructors are seen before the variable that is constructed  | 
 | in the CFG, we create a temporary object as the destination region that  | 
 | is constructed into. See ExprEngine::VisitCXXConstructExpr(). | 
 |  | 
 | In ExprEngine::processCallExit(), we always bind the object region to the | 
 | evaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the | 
 | corresponding lazy compound value if the variable is not a reference, and | 
 | bind the variable region to the lazy compound value. If the variable | 
 | is a reference, just use the object region as the initilizer value. | 
 |  | 
 | Before entering a C++ method (or ctor/dtor), the 'this' region is bound | 
 | to the object region. In ctors, we synthesize 'this' region with   | 
 | CXXRecordDecl*, which means we do not use type qualifiers. In methods, we | 
 | synthesize 'this' region with CXXMethodDecl*, which has getThisType()  | 
 | taking type qualifiers into account. It does not matter we use qualified | 
 | 'this' region in one method and unqualified 'this' region in another | 
 | method, because we only need to ensure the 'this' region is consistent  | 
 | when we synthesize it and create it directly from CXXThisExpr in a single | 
 | method call. | 
 |  | 
 | = Working on the Analyzer = | 
 |  | 
 | If you are interested in bringing up support for C++ expressions, the | 
 | best place to look is the visitation logic in GRExprEngine, which | 
 | handles the simulation of individual expressions.  There are plenty of | 
 | examples there of how other expressions are handled. | 
 |  | 
 | If you are interested in writing checkers, look at the Checker and | 
 | CheckerVisitor interfaces (Checker.h and CheckerVisitor.h).  Also look | 
 | at the files named *Checker.cpp for examples on how you can implement | 
 | these interfaces. | 
 |  | 
 | = Debugging the Analyzer = | 
 |  | 
 | There are some useful command-line options for debugging.  For example: | 
 |  | 
 | $ clang -cc1 -help | grep analyze | 
 |  -analyze-function <value> | 
 |  -analyzer-display-progress | 
 |  -analyzer-viz-egraph-graphviz | 
 |  ... | 
 |  | 
 | The first allows you to specify only analyzing a specific function. | 
 | The second prints to the console what function is being analyzed.  The | 
 | third generates a graphviz dot file of the ExplodedGraph.  This is | 
 | extremely useful when debugging the analyzer and viewing the | 
 | simulation results. | 
 |  | 
 | Of course, viewing the CFG (Control-Flow Graph) is also useful: | 
 |  | 
 | $ clang -cc1 -help | grep cfg | 
 |  -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses | 
 |  -cfg-add-initializers   Add C++ initializers to CFGs for all analyses | 
 |  -cfg-dump               Display Control-Flow Graphs | 
 |  -cfg-view               View Control-Flow Graphs using GraphViz | 
 |  -unoptimized-cfg        Generate unoptimized CFGs for all analyses | 
 |  | 
 | -cfg-dump dumps a textual representation of the CFG to the console, | 
 | and -cfg-view creates a GraphViz representation. | 
 |  | 
 | = References = | 
 |  | 
 | [1] Precise interprocedural dataflow analysis via graph reachability, | 
 |     T Reps, S Horwitz, and M Sagiv, POPL '95, | 
 |     http://portal.acm.org/citation.cfm?id=199462 | 
 |  | 
 | [2] A memory model for static analysis of C programs, Z Xu, T | 
 |     Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf |