Copyright © 2004-2005 Coverity, Inc.
Table of Contents
current_function_get_mangled_name
current_function_get_name
current_function_get_signature
current_function_get_class_name
current_function_get_method_name
current_function_is_ctor
current_function_is_dtor
current_file_get_name
current_file_lineno
Coverity Prevent™ checkers analyze code by traversing the statements and subexpressions of that code, matching each tree for properties the checker cares about, and acting when it finds those matches.
Coverity Extend is a language for defining Coverity Prevent checkers. The interesting parts of a checker are the rules for matching, and the actions to take on a match. The rest of the checker are infrastructure and boilerplate; the goal of Coverity Extend is to take care of those details for you and to make the task of writing match rules and actions as simple as possible.
Accordingly, the Coverity Prevent back end takes care of all the details up to and including traversing the code. It presents an Coverity Extend checker with the trees as it encounters them, and all a checker needs to do is define what to do when it sees those trees.
Coverity Extend is currently implemented as a set of C++ base classes for checkers, C++ support classes and functions for use inside the checker callbacks, and macros which abstract as much as possible the implementation details of checkers. Therefore in the current implementation, Coverity Extend checkers can include any valid C++ code before and after the checker definition, as well as within each callback function. This is done to minimize the learning curve on Coverity Extend -- since you are writing checkers for C++ code, you probably already know C++. However, checkers should not require very complicated C++ code, and for the most part once you understand patterns (see Chapter 3, Patterns) it will be very straightforward to write most checkers.
Coverity Extend's primary design goal is to be as easy to use as possible. It must be easy to express the rules we wish to check; if checkers are complicated they become error prone, which diminishes their value. Furthermore, Coverity Extend users typically are experts about the code they wish to check, but not necessarily experts on static source code analysis. The mechanics of writing a checker ought to be intuitive to allow the user to apply their expertise about the domain being checked without being obscured by the details of source code analysis.
This ease of use does conflict at times with performance and expressive power. Thus, Coverity Extend checkers are often slightly slower than the built in Coverity Prevent checkers, and do not have access to some of the more complicated and esoteric capabilities of the Coverity Prevent analysis engine. The last bit of speed and power/precision would greatly increase the complexity of Coverity Extend and make it unwieldy to use.
The following is a list of rules which can be checked with Coverity Extend. This is only a small list, not nearly an exhaustive one, meant to inspire ideas about what kinds of checkers might be helpful for a particular user.
A call to get_resource( res )
must be followed by a
call to release_resource( res )
, or: A call to release_resource( res )
must not be followed by a call to
use_resource( res )
.
Functions which may throw an exception must follow a specific naming convention.
Objects must use the correct pool when obtaining resources (e.g., memory).
Negative constants must not be assigned to unsigned variables.
Pointers must point to blocks which are multiples of the pointed to size
(e.g., an int *
cannot point to a block of size
10).
Global and static (file scope) variables may not be used.
If a function pushes an object onto a stack, it must also pop it before exiting.
There are two things you need in order to use Coverity Extend: the Coverity
Extend compiler, and the extend
directory. Each is
simply a compressed directory that can be uncompressed to the appropriate place
and used as-is. Both directories can be placed anywhere, independent of the
Coverity Prevent installation.
The Coverity Extend compiler is a regular C++ compiler with all the standard headers needed to compile a checker. We package this to ensure that checkers are compiled with the correct version of these headers, making installation and use of Coverity Extend as simple as possible.
The extend
directory consists of the following
subdirectories:
docs
: Coverity Extend documentation (this
file).
headers
: Coverity Prevent specific C++ header files
that are needed to compile an Coverity Extend checker. Coverity Extend users
do not need to look at most of these files. The files which might be helpful
to look at are include/extend-lang.h
, accessors/extend-patterns.h
, and lang/extend_checker_types.h
. Other headers contain details
which are not important to understanding Coverity Extend.
libs
: Binary files which are needed to link an
Coverity Extend checker.
samples
: Sample checkers which illustrate different
Coverity Extend features. See the README in this directory for details on what
features the different samples show.
The extend
directory also contains a script, build-checker
(build-checker.bat
for
Windows), which will compile a single Coverity Extend checker. Its usage is
described in Section 1.4, “Creating a Coverity Extend™ checker”.
The easiest way to create an Coverity Extend checker is by copying an
existing sample checker and modifying it to suit your needs. You can then
compile the checker using the build-checker
script
(build-checker.bat
on Windows) found in the top level
extend
directory. It expects one argument: the name of
the checker source file without the extension (and the script will assume the
file has a .c
extension).
Your checkers may be located anywhere, as long as the build script is run from the directory where the checker source file is.
The following enviroment variables affect how the build script works, and are required by the script.
EXTEND_COMP_DIR
: This specifies the directory where
the Coverity Extend compiler was installed.
EXTEND_ROOT
: This specifies where the extend
directory is installed.
The script will compile the checker and leave the executable in the current directory. This checker works the same way the main PREVENT integrated checker, cov-analyze[-cxx], works, and should be added to the PREVENT usage pipeline either instead of or immediately following running the integrated checker. Please see the PREVENT documentation for more details.
Below is a simple checker that reports every time a parameter is dereferenced in a program:
#include "extend-lang.h" START_EXTEND_CHECKER( param_deref, simple ); ANALYZE_TREE() { Parm p; if( MATCH( *p ) ) { OUTPUT_ERROR( "dereferenced parameter" << p ); } } END_EXTEND_CHECKER(); MAKE_MAIN( param_deref )
This checker includes all the essential elements of a Coverity Extend program:
Include statements: extend-lang.h
contains all
the definitions and declarations that make up the Coverity Extend language,
and therefore it is the only necessary header for an Coverity Extend
checker.
Checker start/end: The checker definition appears between the START_EXTEND_CHECKER
and END_EXTEND_CHECKER
commands. The first argument to START_EXTEND_CHECKER
is the name of the checker, and the
second is the type of checker (as described in Chapter 6, Coverity
Extend™ Checker Types).
Analysis functions: The analysis callback functions appear between the
start and end commands. The checker above only uses the ANALYZE_TREE
function, which is the most commonly used one,
but a checker may use any of the functions described in Section 2.3,
“Analysis Functions”. The body of the function makes use of a pattern
(Parm
) as well as some other macros. These are
described in Chapter 3, Patterns and Chapter 5, Analysis Actions,
respectively.
Main program functions: This checker uses MAKE_MAIN
to define a main function which runs the checker. This is the simplest means
of creating the checker, and is usually adequate for most checkers. Other
options for this section are described in Section 2.4, “Main Program
Functions”.
When writing Coverity Extend checkers, be sure to use the commands as described in this section. Do not expand any macros, and do not use any calls except as described. Doing so will decrease compatibility with future versions of Coverity Extend, which may change how the commands are implemented.
All pieces of the checker definition must appear between START_EXTEND_CHECKER
and END_EXTEND_CHECKER
. Typically this consists only of the
checker handler functions. You can also define functions which will be callable
by the handler functions, as well as variables whose values will persist between
callbacks into the checker. However, the use of the values of such variables is
discouraged: due to how the traversal works, these values may not be what you
would expect after branches and such. Any data you need to persist between
checker callbacks should be placed in the store (see Chapter 4,
The Store for more details).
START_EXTEND_CHECKER
takes two arguments: the name of
the checker and its type. Both strings are unquoted. The name should be unique
among all checkers which might run at the same time. The type should be one of
the types described in Chapter 6, Coverity
Extend™ Checker Types. Different types give different levels of
functionality and complexity. The more complex checker types can detect a
broader range of errors because they perform a more sophisticated analysis, but
they are also harder to use and take more time to run. It's generally best to
use the simplest checker type which includes all the functionality that your
checker needs.
END_EXTEND_CHECKER
takes no arguments.
Both the start and end statements must end with a semicolon.
These functions are the heart of the checker. The system will use these callbacks to handle various events during the traversal. Your checker will want to find events that are significant to the condition being checked, and take appropriate actions to deal with those events.
What exactly constitutes "appropriate action" depends on which callback occurs, which checker type you are using, and what your checker is looking for. This section describes the various callbacks, while later sections will discuss patterns (which are vital to analyzing each tree), the store (which allows the checker to track information between expressions), and other things you can do in the analysis functions.
The first three functions are the main analysis functions. The five functions
after are used for initializing and finalizing the analysis. In general most of
the interesting work will happen in the analysis functions, particularly ANALYZE_TREE
, and many checkers don't need to use the
initializer and finalizer functions.
ANALYZE_TREE
is the main analysis function. It is
called for every tree (roughly speaking, each subexpression and statement is a
tree). Many checkers can do all their useful work in this function, and won't
need to use the other two functions described below.
ANALYZE_TREE
usually consists of just pattern
declarations and matching rules that use those patterns. The rules enumerate all
code patterns that are significant to this checker, and what to do when it sees
such code that matches that pattern. This usually takes the form of cascading if
- then - else if statements. If the currently being analyzed tree isn't
"interesting" to this checker, typically it will fall through and no action will
be taken.
For more details about patterns, see Chapter 3, Patterns. For more information about what you can do once you've matched a pattern, see Chapter 5, Analysis Actions.
Checkers have an additional callback that the traversal calls when it follows an edge in a conditional statement. This permits a checker to take special actions when it sees conditions, and this callback has additional information about whether the true or the false branch was taken.
Note that ANALYZE_TREE
will be called for the
condition's tree before ANALYZE_CONDITION
is called, so
it is usually wrong to match the condition in both functions. Also, ANALYZE_CONDITION
will be called more than once for each
condition, since it analyzes condition and polarity. In general, if you care
just about the condition itself and will not care about the polarity, use ANALYZE_TREE
. You only need to use ANALYZE_CONDITION
if the polarity will influence your analysis
(and, in such cases, you should use MATCH_COND
rather
than MATCH
).
ANALYZE_END_OF_PATH
is called every time the
traversal reaches the end of a path, i.e. a return statement or end of function.
This lets you do things like search for the absence of something. For example,
if you are writing a checker to enforce an "x must be followed by y" ordering
rule, you will need this function to be able to tell that you've reached the end
of the function and hence have no more opportunities to see y.
This is a one time initialization function that is called before the checker sees any code at all.
Similar to CHECKER_INIT
, this function is called only
once during the lifetime of the checker. It is called after all code has been
analyzed.
FUNCTION_INIT
is called before the analysis of each
function. This can be useful, for example, to initialize a counter that is used
to track how often something happens per function.
This function is called after all paths in a function have been analyzed. It is typically used for final checks before the analysis moves on to the next function.
It is possible to add command line options to your checker by defining this function.
To pass command line options to a checker, you should use --checker_option
(or -co
), as
described in the main Coverity Prevent documentation. The value of this option
is of the form checker_name:option_name:option_value
with option_value being optional.
For every option thus specified where the checker name matches the current
checker's name, HANDLE_OPTION
is called. You can then
use the CHECK_OPTION
macro to match the option's name,
and OPTION_VALUE
to access the value. After the option
supplied is handled, OPTION_HANDLED()
should be called.
At the end of the HANDLE_OPTION
function, if the option
remains unhandled, OPTION_NOT_HANDLED()
should be
called.
Generally options are used to set global or member variables that modify how
the checker behaves. Note that HANDLE_OPTION
is actually
called before CHECKER_INIT
, so default values for the
option variables cannot be set in CHECKER_INIT
. You must
supply the INIT_OPTIONS
function to initialize these
values.
The example below will help clarify.
// assume that int mode and bool use_mangled are member vars // This function will recognize the following options: // - mode:<mode_value> // - use_mangled // The analysis functions will look at the member variables to // alter behavior during the analysis. INIT_OPTIONS() { mode = 0; use_mangled = false; } HANDLE_OPTION() { CHECK_OPTION( "mode" ) { mode = atoi( OPTION_VALUE ); OPTION_HANDLED(); } CHECK_OPTION( "use_mangled" ) { use_mangled = true; OPTION_HANDLED(); } OPTION_NOT_HANDLED(); }
In addition to the checker definition, an Coverity Extend program needs a
main function which sets up the system and runs the analysis using the Coverity
Extend checker. For the majority of checkers, MAKE_MAIN
is adequate for doing this: it creates a main function which will properly run
the checker. However, some Coverity Extend programs may have slightly different
needs, so other variants are provided for such programs.
MAKE_MAIN
takes one argument: the name of the
checker, which is the same name used in START_EXTEND_CHECKER
. Use this when you only want to run one
checker in this program.
Pattern matching is the core operation of a Coverity Extend checker. The bulk
of a checker will be rules that specify an action to take when the input tree
matches a certain pattern. Patterns denote types or properties of trees, and a
tree matches a pattern if the tree is of that type or has that property. In the
example above, Parm
denotes parameters; that pattern
type matches all trees that are uses of a parameter.
Note that Parm
is a type of pattern, not an actual
pattern itself. You must create pattern objects to do the matching. In the
example above, Parm p;
declares p as a Pattern
object of type Parm
and the
matching is actually done with the object, not the type.
After a successful match, the pattern object will have additional information
about the tree that was matched, which can be accessed through various accessor
methods of the pattern classes. This is what makes it possible, for example, to
pass p
into the stream operator: after matching, p
will have information about the parameter it matched, and
patterns will print out the name of the thing matched when shifted onto a
stream.
The most powerful feature of patterns is that they can be composed in a very
natural way to build more complicated patterns. The syntax for building patterns
mirrors C++ syntax. The example above didn't match parameters, but rather
dereferences of a parameter. In C++, the unary operator * is the derefence
operator, so in Coverity Extend, the unary operator * builds a new pattern which
matches the derefence of something that matches the operand pattern. Therefore,
*p
denotes a pattern which matches the dereference of
something that matches p
, or in other words the
dereference of a parameter.
All C++ operators can be used to build new patterns with the appropriate
meaning. There are also pattern building functions that mirror many keywords,
which differ only in capitalization (e.g., to build a pattern that matches an if
statement, you can use the function If
). There are also
functions which build patterns that match more general concepts, such as whether
a tree contains (or is within) another tree that matches some other pattern. For
example, you can use this to see if a parameter is used within a function call,
or if an assignment expression contained a global variable anywhere in its right
hand side. As you can see, it is possible to get extremely precise patterns in
this way.
After a successful match, patterns will know which tree they matched. This allows you to use matched patterns in places that expect trees; patterns will automatically convert themselves to a tree as needed. This also will allow you to use a pattern's accessor functions to get information about the matched tree. These functions should not be used before a match. Finally, a matched pattern may be written to an output stream. This will result in the text of the matched tree being appended to the stream.
You should use patterns with one of the following operations to perform the match. All of the following operations are boolean expressions (true when the match is successful, false if not).
MATCH(pat)
matches the pattern against the tree
currently being analyzed in ANALYZE_TREE
. This is the
most commonly used match operation. It is also usable in ANALYZE_CONDITION
, but in such cases you should consider
whether or not MATCH_COND
would be more appropriate
(and, if not, you may want to consider moving this match to ANALYZE_TREE
instead).
MATCH_TREE(pat,tree)
matches pattern pat
against tree tree
. This is useful
when you have a broader pattern match, and then want to further match that
pattern to get more specific information. This works because when patterns are
used in a tree context, they return the tree that they last matched. For
example, if you match a pattern like a = b
, where a
and b
are both patterns of type
Any
, you might want to do further pattern matching on
a
to determine more information about its
type.
MATCH_COND(pat)
may only be used in ANALYZE_CONDITION
. It matches the pattern against the current
condition and polarity. If we are currently analyzing the true branch, then the
pattern needs to match the condition exactly. If we are currently analyzing the
false branch, then the pattern needs to match the negation of the condition.
For example, suppose we have the following condition in the code being
analyzed: n == 0
. When ANALYZE_CONDITION
is considering the true case, then MATCH_COND( var == 0 )
will evaluate as true. However, when we
are considering the false case, then MATCH_COND( var != 0
)
will evaluate as true instead.
Here are descriptions of the most common pattern types, with some information
on how they can be used. Full information about patterns can be found in headers/accessors/extend-patterns.h
.
The Any
pattern matches any tree. This is primarily
useful as part of a composite pattern. For example, if you wanted to match the
assignment through a pointer parameter, you might do something like the
following:
Parm parm; Pointer ptr; DECL_PAT( pp, And( ptr, parm ) ); Any any; if( MATCH( *pp = any ) ) { // ... }
In the above example, the important thing is that we're assigning through a
pointer parameter; the right hand side of the assignment is not important, so we
use the Any
pattern to match an arbitrary tree.
There is a built-in pattern object of type Any
, _
, which can be used when you truly don't care about the tree
that is matched. You should not use _
in the body of the
match action because it is used by some pattern types during matching, so its
value is not stable. If you need to know what tree is matched by an Any
pattern, you will need to declare your own object. In the
above example, if the match action does not try to access the right hand side of
the assignment, we could replace any
with _
.
The Var
pattern matches any use of a variable as an
expression. It does not match a variable declaration. If you need to match the
declaration, use the Decl
pattern. If a Var
is created with a string parameter, it will only match
uses of variables with the appropriate name.
LocalVar
, TempVar
(for
compiler generated temporary variables), StaticVar
(for
file and function scope variables), GlobalVar
, and Parm
are related patterns which match the appropriate kind of
variable. These patterns cannot be created with a name parameter.
There are a few more in this family of patterns. AnyField(pat)
matches the same thing that pat
matches, as well as any number of levels of field access
off of that. That is to say, if pat
matches a
, then AnyField(pat)
will match a.b
, a.b.c
, a.b.c.d
, and so on. FunLocal
matches
storage that is local to a function; in other words, it matches parameters,
local variables, and any fields off of them. FunLocal
is
equivalent to AnyField( Or( Parm, LocalVar ) )
. AnySubpart(pat)
is similar to AnyField
but matches arrays as well. It also takes an optional second argument which
specifies whether to allow pointer dereferences in the match (e.g., whether
a.b.c[10]->d.e
should be matched or rejected).
All of these patterns have some common accessors. get_type()
returns the type of the thing matched. get_size()
returns its size. is_const()
returns whether or not the type of the expression
matched is read-only. is_unsigned()
returns whether or
not the type of the expression matches is unsigned.
The Decl
pattern matches a declaration. It can be
created with a pattern parameter; in this case, it will only match declarations
of variables that would match the supplied pattern. After a match, there are two
additional methods of this pattern which are handy: init()
which returns the initializer (null pointer if no
initializer), and var()
which returns the variable being
declared. Both return values are trees and therefore suitable for use with MATCH_TREE
.
Const_int
and Const_string
match constant values of the appropriate type. They can be supplied with a
parameter which specify the value to match; if this is done, the pattern will
only match the value supplied. Both pattern types have a val()
method which return the value matched. Const_string
also has a len()
method.
As with variable patterns, there is a family of patterns that match different kinds of function calls.
Fun
matches all function calls with particular
arguments. It can be constructed with a name argument, which specifies that the
pattern will only match calls to the function with that named. By default this
argument is interpreted as the mangled name; if Fun
is
constructed with the value Fun::UNMANGLE_NAME
as a
second argument, it will treat the first argument as the unmangled name.
A freshly declared Fun
pattern matches calls with
zero arguments. In order to build a pattern which matches a call with particular
arguments, you can use Fun::operator()
. You can specify
up to twelve patterns to this operator, and it returns a new pattern which will
only match calls where the arguments to that call match the patterns passed to
the operator. An example will make this much clearer:
Fun fn; Const_int ci; if( MATCH( fn( _, _ ) ) ) { // matches any function call with two arguments of any sort } if( MATCH( fn( ci ) ) ) { // matches any function call with a single argument which is a constant integer } if( MATCH( fn ) ) { // matches any function call with zero arguments }
Note that for calls to member functions, this
is
considered the zeroth argument. Therefore the first pattern match in the example
above would also match any member function call of the form obj->fn(arg)
(that is, with one argument). The MemberFun
pattern takes care of this in a more intuitive way
by automatically handling the this
argument. MemberFun'
s constructor requires at least one argument which
is a pattern that must match the object whose member function is being called.
(MemberFun
can take up to two additional arguments which
are the same as Fun
's arguments.)
Anyfun
matches any function call with any arguments.
Similarly, Member
matches any member function call with
any arguments. Constructor
and Destructor
match calls to C++ constructors and destructors,
respectively.
All of these patterns have the following accessors: get_mangled_name()
which returns the function's mangled name;
get_name()
which returns the unmangled name (with no
information about its parameters), and get_signature()
which returns the unmangled name with full parameter information. They also have
nargs()
which returns the number of arguments for the
call, and get_arg(n)
which returns the nth
argument.
A number of patterns match expressions of certain types. The patterns in this
group are Scalar
, Pointer
, Reference
, FunctionPointer
, Struct
, and Union
. Pointer
and Reference
can be created
with an argument which is a Pattern that specifies the pointed-to type. Note
that these match trees of expressions with the appropriate type, and not the
type itself. Therefore the tree matched here cannot be passed to accessor
functions which expect a type tree; you should use get_type_of_tree()
if you need the type.
Cast
matches a cast expression. Type
matches type trees. StmtPat
matches statement trees. ExprStmt
matches expressions
that are used as statements. This
matches a pointer
called "this". Arg
matches an expression that is being
passed into a function call. CondPattern
matches
expressions of the form c?a:b
.
Arg
has three useful accessors. pos()
returns the position of the matched argument (zero
indexed; a return of -1 indicates that the pattern did not match correctly).
get_call()
returns the tree for the function call to
which the argument was passed. get_formal_type()
gets
the type of the formal parameter that this argument matches.
CondPattern
can be constructed with patterns for its
three operands (default is _
for all); those patterns
must match the operands for the CondPattern
to match. If
you supply different patterns for all three, you can use the eval()
accessor for this pattern, which returns the value that
has been returned for this path at this point in the analysis.
Except for DECL_PAT
, the following are functions
which all return patterns, and so these functions can be conveniently used
within a match operation.
DECL_PAT( name, pattern )
creates a new pattern name
which matches pattern
. This is
mainly for creating an alias for some complex pattern.
And(pat1, pat2, ...)
can take up to 6 patterns, and
returns a pattern which matches when all the argument patterns match. Since all
the subpatterns must match in order for the And
to
match, any of the subpatterns may be used after a successful match.
Or(pat1, pat2, ...)
can take up to 6 patterns, and
returns a pattern which matches when one or more of the argument patterns match.
Unlike And
, the subpatterns may not be safely used after
a match, since some of them may have not matched. You can use DECL_PAT( name, Or( ... ) )
to create a pattern from the Or
, and then you can use name
in the
match action.
Within(pat)
matches if the tree being matched is a
subtree of something that matches pat
. After a match
pat
is usable and has information about the parent
tree.
Contains(pat)
matches if the tree being matched
contains a subtree that matches pat
. After a match pat
is usable and has information about the subtree that
matched.
Const(pat)
matches if the tree being matched matches
pat
, and is also read-only (i.e., if it is a variable
declared const).
Return(pat)
, Switch(pat)
,
If(pat)
, For(initpat, condpat,
updatepat)
, and While(pat)
build patterns that
match statements of the corresponding types. The arguments to these functions
are patterns which must match the subexpressions of the statement. All arguments
have default values of _
.
Almost all overloadable operators are overloaded to create new patterns that
match trees that consist of expressions which are of the appropriate type. For
example, operator+
is overloaded so that the expression
(pat1 + pat2)
creates a pattern which matches an
addition expression. The only operators which are not overloaded are the
assignment operators except for =
(e.g., +=
, <<=
). However, these are
converted during the analysis, so that when the analysis reaches a statement of
the form a += b
, it actually sees the statement a = a + b
. A pattern matching the latter statement can be
created with the pattern building operators.
Patterns provide much of the power behind Coverity Extend, but they have a
major limitation: they can only recognize properties within a single expression.
In order to find properties which span more than one line of code, a checker
needs to maintain state between calls to the ANALYZE_*
functions.
There are three levels of state tracking that make sense in checkers: global
to the entire analysis, per-function, and per-path. Each checker will only
analyze one function at a time, so global and per-function state can be
implemented with checker member variables and using CHECKER_INIT
/CHECKER_FINAL
and FUNCTION_INIT
/FUNCTION_FINAL
to
initalize/finalize this state. However, this doesn't work well for per-path
state, because while there are many paths being analyzed at once and therefore
many sets of per-path state, there is only one checker. Per-path state tends to
be the most commonly used and most useful of the three kinds of state, so we
need a convenient and efficient means for tracking it.
We do this with the store. The store is essentially a map which allows a
checker to associate state information with any variable or expression. Each
ANALYZE_*
function has access to the store and can query
it for the state of a tree as well as add or modify state for a tree. The
traversal infrastructure handles all the details of tracking the different
per-path sets of state, such as managing which state corresponds to which
path.
Suppose we wanted to detect the use of a pointer after it has been freed. Since the free and the use will most likely occur in different expressions, we need the store to write such a checker. Below is a version of how you might start writing this checker:
#include "extend-lang.h" START_EXTEND_CHECKER( basic_use_after_free, int_store ); ANALYZE_TREE() { Fun freecall( "free" ); Pointer p; if( MATCH( freecall( p ) ) ) { SET_STATE( p, 1 ); ADD_EVENT( p, "free", "freed pointer " << p ); } else if( MATCH( *p ) ) { if( MATCH_STATE( p, 1 ) ) { COMMIT_ERROR( p, "deref", "dereferenced free pointer " << p ); } } else if( MATCH( p = _ ) ) { CLEAR_STATE( p ); } } END_EXTEND_CHECKER(); MAKE_MAIN( basic_use_after_free )
Note that the above checker, as its name implies, catches only basic instances of use after free. It doesn't handle aliasing, interprocedural analysis, or violations other than an explicit dereference. While it would be possible to handle all those details in a Coverity Extend checker, doing so would obscure the illustrative nature of this example.
The rest of this section will offer a brief explanation of this checker. Section 4.3, “Store Operations” will describe in detail all the operations used by this checker.
The first thing to notice is the type of this checker: int_store
. This specifies a checker which uses a store whose
state information for each tree consists of an integer. In other words, for
every variable or expression we encounter during a match, we can track a single
integer for it, which we will be able to access in a later match. While it may
seem limited, in practice a single integer is sufficient for most checkers. You
can often enumerate the various states a variable can be in, and encode that
information accordingly. For this checker, we will use the value 1 to mean that
the pointer is known to be a freed pointer. Any other value, or absence of state
for that pointer, signifies that the pointer is not necessarily free.
A use after free violation is defined as a free of a pointer followed by a use of that pointer. In the above checker, the only use we will worry about is a direct dereference. Accordingly, the first two patterns the checker is concerned with are calls to free, and pointer dereferences.
Whenever the checker sees a call to free on a pointer, it remembers that the
pointer was freed by setting the state of the pointer to 1, using the SET_STATE
command. Additionally, it annotates the line of code
where the free happened with a comment that the pointer was freed, using ADD_EVENT
. If an error is later generated relating to this
pointer, all such annotations will be part of the error report.
On the other end, whenever the checker sees a pointer dereference, it needs
to ask the store whether the state for the pointer has previously been set to 1,
using MATCH_STATE
. If so, then we know that this pointer
has been freed, so we generate an error report (with an additional annotation
for the location where the error occurs) using COMMIT_ERROR
.
However, there's one very important case to consider: what if, between the
free and the dereference, the pointer is reassigned? Then the dereference is not
necessarily of a freed pointer, yet with only the two match actions described
above, we will still generate an error. We need a match action that will unmark
a pointer since it is no longer freed. The third match action does this using
CLEAR_STATE
. We could also "unmark" the pointer by
setting its state to something other than 1. However, CLEAR_STATE
both removes the state of the pointer altogether,
and also removes all annotations for that pointer. In other words, CLEAR_STATE
truly resets the state of the expression to be as
if the checker had never matched that expression before.
This section provides details for commands which interact with the store.
SET_STATE(tree, value)
assigns a state of value
to the tree tree
. If tree
already had a state, it is overwritten. This operation
does not affect any annotations that may exist for tree
.
GET_STATE(tree, value)
queries the state for tree
and, if it exists, stores it in value
, which is a reference parameter. GET_STATE
returns whether or not there was a state value for
tree
in the store. If no value was found for tree
, value
will be
unchanged.
CLEAR_STATE(tree)
removes any state information that
may exist about tree
from the store, and removes all
annotations for tree
.
MATCH_STATE(tree, value)
checks whether the state for
tree
matches value
. If the state
matches, then this returns true. if the state does not match, or there is no
state for tree
, then this returns false.
COPY_STATE(dst_tree, src_tree)
copies the state of
src_tree
(including annotations) and associates that
state with dst_tree
. This is typically useful when
tracking properties about the values of variables. If you have state for one
variable, and then see a copy of that variable's value, you can use COPY_STATE
to mirror the effects of the code.
FOREACH_IN_STORE(tree, value)
starts a loop which
will iterate over all store elements. Each time through the loop, tree
and value
will be set equal to a
pair in the store. tree and value
must be variables
which are declared before the loop. The following checker is the beginnings of a
basic memory leak detector that shows how to use this macro, and in what
circumstances it might be necessary:
#include "extend-lang.h" START_EXTEND_CHECKER( basic_leak, int_store ); ANALYZE_TREE() { Fun malloc( "malloc" ); Fun free( "free" ); LocalVar lv; AnySubpart lv_sp( lv ); DECL_PAT( non_local, Not( lv_sp ) ); if( MATCH( lv_sp = malloc( _ ) ) ) { SET_STATE( lv_sp, 1 ); ADD_EVENT( lv_sp, "alloc", "Allocation assigned to " << lv_sp ); } else if( MATCH( free( lv_sp ) ) ) { CLEAR_STATE( lv_sp ); } } ANALYZE_END_OF_PATH() { tree t; int v; FOREACH_IN_STORE( t, v ) { COMMIT_ERROR( t, "leaked", t << " was not freed before the end of path" ); } } END_EXTEND_CHECKER(); MAKE_MAIN( basic_leak )
This checker does not handle a host of things (such as pointer aliasing,
interprocedural analysis, and so on), and will have some false positives that
would need to be tuned out using more specific matches. It simply enforces that
if a pointer is allocated to a local variable and that local variable is not
freed within the same function, an error report will be generated. Making this
checker more precise would make it harder to see the main point, which is using
the FOREACH_IN_STORE
macro to find all things in the
store at the end of a path.
ADD_EVENT(tree, tag, text)
adds an annotation to
tree
, associated with the line number of the tree being
currently analyzed. Annotations contain a tag, which is a short descriptive
string of the event, and some text, which is any expression which may be shifted
onto an output stream. The text will be printed in the error report as the
explanation for the event, and the tag is used as the link name when other
events reference this event.
Note that expressions with no state value cannot have annotations; this means
that ADD_EVENT
will do nothing if you try to add an
event for an expression that has no state. This is also why CLEAR_STATE
will remove all annotations. You don't need to
explicitly SET_STATE
before every ADD_EVENT
: In places where you know that some state exists for
the expression (e.g., you have a true return from MATCH_STATE
), you can always use ADD_EVENT
.
Similarly, it is not necessary to ADD_EVENT
every
time you SET_STATE
. However, much of the value of the
error reporting in Coverity Prevent and therefore in Coverity Extend is the
precision and detail of the error reports. It's usually a good idea to add
events whenever you change an expression's state, so that the error report will
contain a record of all the events that influenced the checker's decision to
report an error.
COMMIT_ERROR(tree, tag, text)
generates an error
report, using the event history for tree
with one final
event specified by tag
and text
.
Note that this event is only added to the error report, and not to tree
's event history. This is done so that future errors
reported on tree
will not contain events specific to
previous error reports. If you do not wish to add an error-specific event, you
may pass the empty string ("") for tag
and text
.
The analysis engine guarantees that every time ANALYZE_TREE
or ANALYZE_COND
is
called, the store will look as if the analysis has examined the entire path
currently being analyzed up to the current tree from the start in approximate
execution order. In other words, if a checker respects the store abstraction and
doesn't use global, static, or class variables to hold state information about
the analysis, then the checker may behave as if the analysis were traversing all
paths completely and in order, which is the conceptually easy and intuitive
thing to do for checker writers. Basically, match actions should be written as
if every preceding tree in the function has already been analyzed and the store
has been modified accordingly.
Under the hood, it isn't quite that simple. If the analysis engine were to
actually traverse every path in its entirety, it would be performing a lot of
redundant analysis (for blocks of code which are common to all paths) and run
significantly slower. Instead, the engine identifies common blocks and analyzes
them only once, and generally only analyzes trees multiple times if the analysis
will result in different state each time. This means that statements and
subexpressions which are adjacent in the code will not necessarily be analyzed
in consecutive calls to ANALYZE_TREE
. However, the store
hides all this complexity; this is the main reason the store is necessary.
We described the guarantee as looking like "approximate execution order". What this means is that, from the store's point of view, subexpressions are handled before the statement itself is handled, and these are done in almost the same order as they would happen in the compiled program. There are two main reasons why it might not be exactly the same. In cases where the language does not specify the execution order (for example, order of evaluation of function arguments) the traversal order may not match a particular compiler's execution order. Additionally, in some situations the analysis artificially inserts and reorders some trees that aren't explicitly obvious in the code (for example, variables declared in loop conditions are transformed to declarations before the loop). These transformations will slightly alter how the code looks but results in trees which are functionally equivalent.
These functions return information about the function being currently analyzed. They may be used in the three analysis functions.
This returns the unmangled function name, with the entire namespace and class information prefixed.
This returns the full unmangled function name with parameter information.
This returns everything preceding the final "::" in the full unmangled function name. It returns the empty string for functions in the global namespace. Strictly speaking this isn't just the class, since it includes all namespace information as well.
This returns everything following the final "::" in the full unmangled function name. It returns the full function name for functions in the global namespace. Strictly speaking this isn't just the "method" name since it can be used on functions which are not members of classes.
This returns the name of the file in which the current function is defined.
The following functions get information about a particular tree. There are three ways to get a tree:
A pattern, after a successful match, may be used where a tree is expected; it will give you the tree that the pattern matched.
Many pattern types also have accessors which return trees which may then be passed into these functions.
CURRENT_TREE
is the tree currently being analyzed;
this is usable in ANALYZE_TREE
and ANALYZE_CONDITION
.
If the current tree is inside a macro, this function will return the macro's name. Otherwise, it returns NULL.
This returns a tree that represents the type of the tree passed in. If the argument has no type, this function returns NULL.
OUTPUT_ERROR(text)
generates an error report with
text as the details of the report. The error will be associated with the line
number of the tree being inspected. text
may be any
expression which may appear on the right hand side of a stream's operator<<
.
This error generating function is suitable for errors that occur entirely
within one tree. For errors that span multiple lines (for example, detecting for
double free), you will need to use ADD_EVENT
and COMMIT_ERROR
, described below. Note also that multi-line
errors only make sense with checker types that use some sort of store.
For use with store checker types; see Section 4.3.7, “ADD_EVENT
”.
For use with store checker types; see Section 4.3.8, “COMMIT_ERROR
”.
The text passed to OUTPUT_ERROR
and ADD_EVENT
will be written into an xml file. Any text that
contains characters which interfere with the xml (such as '<'), needs to be
escaped with this function. For example, suppose the function name could contain
'<' (e.g., the function is an overloaded operator). Then you would need to do
the following:
OUTPUT_ERROR( "The function " << escape_xml( function_name ) << " ..." );
Note that you cannot escape the entire contents of the macro, because escape_xml
takes a string.
When defining a Coverity Extend checker, you must specify its type as the
second argument to START_EXTEND_CHECKER
. Different types
provide different levels of complexity and access to different functionality.
This is done for two reasons: it allows Coverity Extend users to only choose a
type best suited for the current checker, so that the user need only deal with
as much complexity as necessary; and it also allows Coverity to introduce new
checker types as we develop them without affecting already written Coverity
Extend checkers.
The types are listed roughly in order of complexity and running time. The ones higher in the list are easier to use and run faster, but are more limited in the sorts of analyses they can do. When writing a checker, select the simplest type that is adequate for your analysis.
This is a very simple checker type. It maintains no state information between analysis callbacks, does not track values for variables, and does no interprocedural analysis. Basically, simple checkers will see every tree once, with no context about what's going on in the function when that tree is encountered. These checkers can be used to perform search operations: for example, searching for all parameter dereferences as seen above, searching for all assignments to the integer constant 0, or all local variable declarations which are of types which exceed a certain size.
This is the simplest checker type which uses a store (see Chapter 4,
The Store). The store in this case allows the checker to associate an
integer with a particular tree. This checker type does not perform
interprocedural analysis. int_store
checkers can be used
to find ordering properties within a single function (e.g., calls to function A
must be followed by calls to function B) as well as properties involving integer
values that span multiple lines (e.g., pointers may not be assigned to
dynamically allocated blocks of a size which is not a multiple of the size
pointed to).