From: Andy Chou [andy@coverity.com]
Sent: Tuesday, March 22, 2005 8:20 AM
To: Terry Staycer
Subject: Extend docs

Attachments: ATT00011.txt

Coverity Extend™ Language Reference

This document constitutes Confidential Information as defined in that certain License Agreement between Coverity, Inc. and your company and is being delivered to you subject to all of the terms and provisions of such License applicable to Confidential Information.


Table of Contents

1. Overview
1.1. Introduction
1.2. Coverity Extend™ capabilities
1.3. Installing Coverity Extend
1.4. Creating a Coverity Extend™ checker
2. Structure of a Coverity Extend™ Checker
2.1. A Sample Checker
2.2. Start/End Statements
2.3. Analysis Functions
2.3.1. ANALYZE_TREE
2.3.2. ANALYZE_CONDITION
2.3.3. ANALYZE_END_OF_PATH
2.3.4. CHECKER_INIT
2.3.5. CHECKER_FINAL
2.3.6. FUNCTION_INIT
2.3.7. FUNCTION_FINAL
2.3.8. INIT_OPTIONS and HANDLE_OPTION
2.4. Main Program Functions
2.4.1. MAKE_MAIN
2.4.2. MAKE_MAIN_MULTI
3. Patterns
3.1. Overview
3.2. Matching Operations
3.2.1. MATCH
3.2.2. MATCH_TREE
3.2.3. MATCH_COND
3.3. Common Pattern Types
3.3.1. Any
3.3.2. Variable patterns
3.3.3. Decl
3.3.4. Constant patterns
3.3.5. Function call patterns
3.3.6. Type patterns
3.3.7. Miscellany
3.3.8. Pattern building functions
3.3.8.1. DECL_PAT
3.3.8.2. And
3.3.8.3. Or
3.3.8.4. Not
3.3.8.5. Within
3.3.8.6. Contains
3.3.8.7. Const
3.3.8.8. Statement builders
3.3.9. Overloaded operators
4. The Store
4.1. Motivation
4.2. Example: Detecting use after free
4.3. Store Operations
4.3.1. SET_STATE
4.3.2. GET_STATE
4.3.3. CLEAR_STATE
4.3.4. MATCH_STATE
4.3.5. COPY_STATE
4.3.6. FOREACH_IN_STORE
4.3.7. ADD_EVENT
4.3.8. COMMIT_ERROR
4.4. Store Mechanics
5. Analysis Actions
5.1. Current Function Information Accessors
5.1.1. current_function_get_mangled_name
5.1.2. current_function_get_name
5.1.3. current_function_get_signature
5.1.4. current_function_get_class_name
5.1.5. current_function_get_method_name
5.1.6. current_function_is_ctor
5.1.7. current_function_is_dtor
5.1.8. current_file_get_name
5.1.9. current_file_lineno
5.2. Tree information accessors
5.2.1. is_tree_in_macro
5.2.2. get_type_of_tree
5.2.3. get_size_of_type
5.3. Error Reporting
5.3.1. OUTPUT_ERROR
5.3.2. ADD_EVENT
5.3.3. COMMIT_ERROR
5.3.4. escape_xml
6. Coverity Extend™ Checker Types
6.1. Overview
6.2. simple
6.3. int_store

Chapter 1. Overview

1.1. Introduction

Coverity Prevent™ checkers analyze code by traversing the statements and subexpressions of that code, matching each tree for properties the checker cares about, and acting when it finds those matches.

Coverity Extend is a language for defining Coverity Prevent checkers. The interesting parts of a checker are the rules for matching, and the actions to take on a match. The rest of the checker are infrastructure and boilerplate; the goal of Coverity Extend is to take care of those details for you and to make the task of writing match rules and actions as simple as possible.

Accordingly, the Coverity Prevent back end takes care of all the details up to and including traversing the code. It presents an Coverity Extend checker with the trees as it encounters them, and all a checker needs to do is define what to do when it sees those trees.

Coverity Extend is currently implemented as a set of C++ base classes for checkers, C++ support classes and functions for use inside the checker callbacks, and macros which abstract as much as possible the implementation details of checkers. Therefore in the current implementation, Coverity Extend checkers can include any valid C++ code before and after the checker definition, as well as within each callback function. This is done to minimize the learning curve on Coverity Extend -- since you are writing checkers for C++ code, you probably already know C++. However, checkers should not require very complicated C++ code, and for the most part once you understand patterns (see Chapter 3, Patterns) it will be very straightforward to write most checkers.

1.2. Coverity Extend™ capabilities

Coverity Extend's primary design goal is to be as easy to use as possible. It must be easy to express the rules we wish to check; if checkers are complicated they become error prone, which diminishes their value. Furthermore, Coverity Extend users typically are experts about the code they wish to check, but not necessarily experts on static source code analysis. The mechanics of writing a checker ought to be intuitive to allow the user to apply their expertise about the domain being checked without being obscured by the details of source code analysis.

This ease of use does conflict at times with performance and expressive power. Thus, Coverity Extend checkers are often slightly slower than the built in Coverity Prevent checkers, and do not have access to some of the more complicated and esoteric capabilities of the Coverity Prevent analysis engine. The last bit of speed and power/precision would greatly increase the complexity of Coverity Extend and make it unwieldy to use.

The following is a list of rules which can be checked with Coverity Extend. This is only a small list, not nearly an exhaustive one, meant to inspire ideas about what kinds of checkers might be helpful for a particular user.

  • A call to get_resource( res ) must be followed by a call to release_resource( res ), or: A call to release_resource( res ) must not be followed by a call to use_resource( res ).

  • Functions which may throw an exception must follow a specific naming convention.

  • Objects must use the correct pool when obtaining resources (e.g., memory).

  • Negative constants must not be assigned to unsigned variables.

  • Pointers must point to blocks which are multiples of the pointed to size (e.g., an int * cannot point to a block of size 10).

  • Global and static (file scope) variables may not be used.

  • If a function pushes an object onto a stack, it must also pop it before exiting.

1.3. Installing Coverity Extend

There are two things you need in order to use Coverity Extend: the Coverity Extend compiler, and the extend directory. Each is simply a compressed directory that can be uncompressed to the appropriate place and used as-is. Both directories can be placed anywhere, independent of the Coverity Prevent installation.

The Coverity Extend compiler is a regular C++ compiler with all the standard headers needed to compile a checker. We package this to ensure that checkers are compiled with the correct version of these headers, making installation and use of Coverity Extend as simple as possible.

The extend directory consists of the following subdirectories:

  • docs: Coverity Extend documentation (this file).

  • headers: Coverity Prevent specific C++ header files that are needed to compile an Coverity Extend checker. Coverity Extend users do not need to look at most of these files. The files which might be helpful to look at are include/extend-lang.h, accessors/extend-patterns.h, and lang/extend_checker_types.h. Other headers contain details which are not important to understanding Coverity Extend.

  • libs: Binary files which are needed to link an Coverity Extend checker.

  • samples: Sample checkers which illustrate different Coverity Extend features. See the README in this directory for details on what features the different samples show.

The extend directory also contains a script, build-checker (build-checker.bat for Windows), which will compile a single Coverity Extend checker. Its usage is described in Section 1.4, “Creating a Coverity Extend™ checker”.

1.4. Creating a Coverity Extend™ checker

The easiest way to create an Coverity Extend checker is by copying an existing sample checker and modifying it to suit your needs. You can then compile the checker using the build-checker script (build-checker.bat on Windows) found in the top level extend directory. It expects one argument: the name of the checker source file without the extension (and the script will assume the file has a .c extension).

Your checkers may be located anywhere, as long as the build script is run from the directory where the checker source file is.

The following enviroment variables affect how the build script works, and are required by the script.

  • EXTEND_COMP_DIR: This specifies the directory where the Coverity Extend compiler was installed.

  • EXTEND_ROOT: This specifies where the extend directory is installed.

The script will compile the checker and leave the executable in the current directory. This checker works the same way the main PREVENT integrated checker, cov-analyze[-cxx], works, and should be added to the PREVENT usage pipeline either instead of or immediately following running the integrated checker. Please see the PREVENT documentation for more details.

Chapter 2. Structure of a Coverity Extend™ Checker

2.1. A Sample Checker

Below is a simple checker that reports every time a parameter is dereferenced in a program:

#include "extend-lang.h"

START_EXTEND_CHECKER( param_deref, simple );

ANALYZE_TREE()
{
    Parm p;

    if( MATCH( *p ) ) {
        OUTPUT_ERROR( "dereferenced parameter" << p );
    }
}

END_EXTEND_CHECKER();

MAKE_MAIN( param_deref )

This checker includes all the essential elements of a Coverity Extend program:

  1. Include statements: extend-lang.h contains all the definitions and declarations that make up the Coverity Extend language, and therefore it is the only necessary header for an Coverity Extend checker.

  2. Checker start/end: The checker definition appears between the START_EXTEND_CHECKER and END_EXTEND_CHECKER commands. The first argument to START_EXTEND_CHECKER is the name of the checker, and the second is the type of checker (as described in Chapter 6, Coverity Extend™ Checker Types).

  3. Analysis functions: The analysis callback functions appear between the start and end commands. The checker above only uses the ANALYZE_TREE function, which is the most commonly used one, but a checker may use any of the functions described in Section 2.3, “Analysis Functions”. The body of the function makes use of a pattern (Parm) as well as some other macros. These are described in Chapter 3, Patterns and Chapter 5, Analysis Actions, respectively.

  4. Main program functions: This checker uses MAKE_MAIN to define a main function which runs the checker. This is the simplest means of creating the checker, and is usually adequate for most checkers. Other options for this section are described in Section 2.4, “Main Program Functions”.

When writing Coverity Extend checkers, be sure to use the commands as described in this section. Do not expand any macros, and do not use any calls except as described. Doing so will decrease compatibility with future versions of Coverity Extend, which may change how the commands are implemented.

2.2. Start/End Statements

All pieces of the checker definition must appear between START_EXTEND_CHECKER and END_EXTEND_CHECKER. Typically this consists only of the checker handler functions. You can also define functions which will be callable by the handler functions, as well as variables whose values will persist between callbacks into the checker. However, the use of the values of such variables is discouraged: due to how the traversal works, these values may not be what you would expect after branches and such. Any data you need to persist between checker callbacks should be placed in the store (see Chapter 4, The Store for more details).

START_EXTEND_CHECKER takes two arguments: the name of the checker and its type. Both strings are unquoted. The name should be unique among all checkers which might run at the same time. The type should be one of the types described in Chapter 6, Coverity Extend™ Checker Types. Different types give different levels of functionality and complexity. The more complex checker types can detect a broader range of errors because they perform a more sophisticated analysis, but they are also harder to use and take more time to run. It's generally best to use the simplest checker type which includes all the functionality that your checker needs.

END_EXTEND_CHECKER takes no arguments.

Both the start and end statements must end with a semicolon.

2.3. Analysis Functions

These functions are the heart of the checker. The system will use these callbacks to handle various events during the traversal. Your checker will want to find events that are significant to the condition being checked, and take appropriate actions to deal with those events.

What exactly constitutes "appropriate action" depends on which callback occurs, which checker type you are using, and what your checker is looking for. This section describes the various callbacks, while later sections will discuss patterns (which are vital to analyzing each tree), the store (which allows the checker to track information between expressions), and other things you can do in the analysis functions.

The first three functions are the main analysis functions. The five functions after are used for initializing and finalizing the analysis. In general most of the interesting work will happen in the analysis functions, particularly ANALYZE_TREE, and many checkers don't need to use the initializer and finalizer functions.

2.3.1. ANALYZE_TREE

ANALYZE_TREE is the main analysis function. It is called for every tree (roughly speaking, each subexpression and statement is a tree). Many checkers can do all their useful work in this function, and won't need to use the other two functions described below.

ANALYZE_TREE usually consists of just pattern declarations and matching rules that use those patterns. The rules enumerate all code patterns that are significant to this checker, and what to do when it sees such code that matches that pattern. This usually takes the form of cascading if - then - else if statements. If the currently being analyzed tree isn't "interesting" to this checker, typically it will fall through and no action will be taken.

For more details about patterns, see Chapter 3, Patterns. For more information about what you can do once you've matched a pattern, see Chapter 5, Analysis Actions.

2.3.2. ANALYZE_CONDITION

Checkers have an additional callback that the traversal calls when it follows an edge in a conditional statement. This permits a checker to take special actions when it sees conditions, and this callback has additional information about whether the true or the false branch was taken.

Note that ANALYZE_TREE will be called for the condition's tree before ANALYZE_CONDITION is called, so it is usually wrong to match the condition in both functions. Also, ANALYZE_CONDITION will be called more than once for each condition, since it analyzes condition and polarity. In general, if you care just about the condition itself and will not care about the polarity, use ANALYZE_TREE. You only need to use ANALYZE_CONDITION if the polarity will influence your analysis (and, in such cases, you should use MATCH_COND rather than MATCH).

2.3.3. ANALYZE_END_OF_PATH

ANALYZE_END_OF_PATH is called every time the traversal reaches the end of a path, i.e. a return statement or end of function. This lets you do things like search for the absence of something. For example, if you are writing a checker to enforce an "x must be followed by y" ordering rule, you will need this function to be able to tell that you've reached the end of the function and hence have no more opportunities to see y.

2.3.4. CHECKER_INIT

This is a one time initialization function that is called before the checker sees any code at all.

2.3.5. CHECKER_FINAL

Similar to CHECKER_INIT, this function is called only once during the lifetime of the checker. It is called after all code has been analyzed.

2.3.6. FUNCTION_INIT

FUNCTION_INIT is called before the analysis of each function. This can be useful, for example, to initialize a counter that is used to track how often something happens per function.

2.3.7. FUNCTION_FINAL

This function is called after all paths in a function have been analyzed. It is typically used for final checks before the analysis moves on to the next function.

2.3.8. INIT_OPTIONS and HANDLE_OPTION

It is possible to add command line options to your checker by defining this function.

To pass command line options to a checker, you should use --checker_option (or -co), as described in the main Coverity Prevent documentation. The value of this option is of the form checker_name:option_name:option_value with option_value being optional.

For every option thus specified where the checker name matches the current checker's name, HANDLE_OPTION is called. You can then use the CHECK_OPTION macro to match the option's name, and OPTION_VALUE to access the value. After the option supplied is handled, OPTION_HANDLED() should be called. At the end of the HANDLE_OPTION function, if the option remains unhandled, OPTION_NOT_HANDLED() should be called.

Generally options are used to set global or member variables that modify how the checker behaves. Note that HANDLE_OPTION is actually called before CHECKER_INIT, so default values for the option variables cannot be set in CHECKER_INIT. You must supply the INIT_OPTIONS function to initialize these values.

The example below will help clarify.

// assume that int mode and bool use_mangled are member vars
// This function will recognize the following options:
// - mode:<mode_value>
// - use_mangled
// The analysis functions will look at the member variables to
// alter behavior during the analysis.

INIT_OPTIONS()
{
    mode = 0;
    use_mangled = false;
}

HANDLE_OPTION()
{
    CHECK_OPTION( "mode" ) {
        mode = atoi( OPTION_VALUE );
        OPTION_HANDLED();
    }
    CHECK_OPTION( "use_mangled" ) {
        use_mangled = true;
        OPTION_HANDLED();
    }
    OPTION_NOT_HANDLED();
}

2.4. Main Program Functions

In addition to the checker definition, an Coverity Extend program needs a main function which sets up the system and runs the analysis using the Coverity Extend checker. For the majority of checkers, MAKE_MAIN is adequate for doing this: it creates a main function which will properly run the checker. However, some Coverity Extend programs may have slightly different needs, so other variants are provided for such programs.

2.4.1. MAKE_MAIN

MAKE_MAIN takes one argument: the name of the checker, which is the same name used in START_EXTEND_CHECKER. Use this when you only want to run one checker in this program.

2.4.2. MAKE_MAIN_MULTI

MAKE_MAIN_MULTI is used when running more than one checker in this program. Not yet implemented.

Chapter 3. Patterns

3.1. Overview

Pattern matching is the core operation of a Coverity Extend checker. The bulk of a checker will be rules that specify an action to take when the input tree matches a certain pattern. Patterns denote types or properties of trees, and a tree matches a pattern if the tree is of that type or has that property. In the example above, Parm denotes parameters; that pattern type matches all trees that are uses of a parameter.

Note that Parm is a type of pattern, not an actual pattern itself. You must create pattern objects to do the matching. In the example above, Parm p; declares p as a Pattern object of type Parm and the matching is actually done with the object, not the type.

After a successful match, the pattern object will have additional information about the tree that was matched, which can be accessed through various accessor methods of the pattern classes. This is what makes it possible, for example, to pass p into the stream operator: after matching, p will have information about the parameter it matched, and patterns will print out the name of the thing matched when shifted onto a stream.

The most powerful feature of patterns is that they can be composed in a very natural way to build more complicated patterns. The syntax for building patterns mirrors C++ syntax. The example above didn't match parameters, but rather dereferences of a parameter. In C++, the unary operator * is the derefence operator, so in Coverity Extend, the unary operator * builds a new pattern which matches the derefence of something that matches the operand pattern. Therefore, *p denotes a pattern which matches the dereference of something that matches p, or in other words the dereference of a parameter.

All C++ operators can be used to build new patterns with the appropriate meaning. There are also pattern building functions that mirror many keywords, which differ only in capitalization (e.g., to build a pattern that matches an if statement, you can use the function If). There are also functions which build patterns that match more general concepts, such as whether a tree contains (or is within) another tree that matches some other pattern. For example, you can use this to see if a parameter is used within a function call, or if an assignment expression contained a global variable anywhere in its right hand side. As you can see, it is possible to get extremely precise patterns in this way.

After a successful match, patterns will know which tree they matched. This allows you to use matched patterns in places that expect trees; patterns will automatically convert themselves to a tree as needed. This also will allow you to use a pattern's accessor functions to get information about the matched tree. These functions should not be used before a match. Finally, a matched pattern may be written to an output stream. This will result in the text of the matched tree being appended to the stream.

3.2. Matching Operations

You should use patterns with one of the following operations to perform the match. All of the following operations are boolean expressions (true when the match is successful, false if not).

3.2.1. MATCH

MATCH(pat) matches the pattern against the tree currently being analyzed in ANALYZE_TREE. This is the most commonly used match operation. It is also usable in ANALYZE_CONDITION, but in such cases you should consider whether or not MATCH_COND would be more appropriate (and, if not, you may want to consider moving this match to ANALYZE_TREE instead).

3.2.2. MATCH_TREE

MATCH_TREE(pat,tree) matches pattern pat against tree tree. This is useful when you have a broader pattern match, and then want to further match that pattern to get more specific information. This works because when patterns are used in a tree context, they return the tree that they last matched. For example, if you match a pattern like a = b, where a and b are both patterns of type Any, you might want to do further pattern matching on a to determine more information about its type.

3.2.3. MATCH_COND

MATCH_COND(pat) may only be used in ANALYZE_CONDITION. It matches the pattern against the current condition and polarity. If we are currently analyzing the true branch, then the pattern needs to match the condition exactly. If we are currently analyzing the false branch, then the pattern needs to match the negation of the condition.

For example, suppose we have the following condition in the code being analyzed: n == 0. When ANALYZE_CONDITION is considering the true case, then MATCH_COND( var == 0 ) will evaluate as true. However, when we are considering the false case, then MATCH_COND( var != 0 ) will evaluate as true instead.

3.3. Common Pattern Types

Here are descriptions of the most common pattern types, with some information on how they can be used. Full information about patterns can be found in headers/accessors/extend-patterns.h.

3.3.1. Any

The Any pattern matches any tree. This is primarily useful as part of a composite pattern. For example, if you wanted to match the assignment through a pointer parameter, you might do something like the following:

Parm parm;
Pointer ptr;
DECL_PAT( pp, And( ptr, parm ) );
Any any;

if( MATCH( *pp = any ) ) {
    // ...
}

In the above example, the important thing is that we're assigning through a pointer parameter; the right hand side of the assignment is not important, so we use the Any pattern to match an arbitrary tree.

There is a built-in pattern object of type Any, _, which can be used when you truly don't care about the tree that is matched. You should not use _ in the body of the match action because it is used by some pattern types during matching, so its value is not stable. If you need to know what tree is matched by an Any pattern, you will need to declare your own object. In the above example, if the match action does not try to access the right hand side of the assignment, we could replace any with _.

3.3.2. Variable patterns

The Var pattern matches any use of a variable as an expression. It does not match a variable declaration. If you need to match the declaration, use the Decl pattern. If a Var is created with a string parameter, it will only match uses of variables with the appropriate name.

LocalVar, TempVar (for compiler generated temporary variables), StaticVar (for file and function scope variables), GlobalVar, and Parm are related patterns which match the appropriate kind of variable. These patterns cannot be created with a name parameter.

There are a few more in this family of patterns. AnyField(pat) matches the same thing that pat matches, as well as any number of levels of field access off of that. That is to say, if pat matches a, then AnyField(pat) will match a.b, a.b.c, a.b.c.d, and so on. FunLocal matches storage that is local to a function; in other words, it matches parameters, local variables, and any fields off of them. FunLocal is equivalent to AnyField( Or( Parm, LocalVar ) ). AnySubpart(pat) is similar to AnyField but matches arrays as well. It also takes an optional second argument which specifies whether to allow pointer dereferences in the match (e.g., whether a.b.c[10]->d.e should be matched or rejected).

All of these patterns have some common accessors. get_type() returns the type of the thing matched. get_size() returns its size. is_const() returns whether or not the type of the expression matched is read-only. is_unsigned() returns whether or not the type of the expression matches is unsigned.

3.3.3. Decl

The Decl pattern matches a declaration. It can be created with a pattern parameter; in this case, it will only match declarations of variables that would match the supplied pattern. After a match, there are two additional methods of this pattern which are handy: init() which returns the initializer (null pointer if no initializer), and var() which returns the variable being declared. Both return values are trees and therefore suitable for use with MATCH_TREE.

3.3.4. Constant patterns

Const_int and Const_string match constant values of the appropriate type. They can be supplied with a parameter which specify the value to match; if this is done, the pattern will only match the value supplied. Both pattern types have a val() method which return the value matched. Const_string also has a len() method.

3.3.5. Function call patterns

As with variable patterns, there is a family of patterns that match different kinds of function calls.

Fun matches all function calls with particular arguments. It can be constructed with a name argument, which specifies that the pattern will only match calls to the function with that named. By default this argument is interpreted as the mangled name; if Fun is constructed with the value Fun::UNMANGLE_NAME as a second argument, it will treat the first argument as the unmangled name.

A freshly declared Fun pattern matches calls with zero arguments. In order to build a pattern which matches a call with particular arguments, you can use Fun::operator(). You can specify up to twelve patterns to this operator, and it returns a new pattern which will only match calls where the arguments to that call match the patterns passed to the operator. An example will make this much clearer:

Fun fn;
Const_int ci;

if( MATCH( fn( _, _ ) ) ) {
    // matches any function call with two arguments of any sort
}

if( MATCH( fn( ci ) ) ) {
    // matches any function call with a single argument which is a constant integer
}

if( MATCH( fn ) ) {
    // matches any function call with zero arguments
}

Note that for calls to member functions, this is considered the zeroth argument. Therefore the first pattern match in the example above would also match any member function call of the form obj->fn(arg) (that is, with one argument). The MemberFun pattern takes care of this in a more intuitive way by automatically handling the this argument. MemberFun's constructor requires at least one argument which is a pattern that must match the object whose member function is being called. (MemberFun can take up to two additional arguments which are the same as Fun's arguments.)

Anyfun matches any function call with any arguments. Similarly, Member matches any member function call with any arguments. Constructor and Destructor match calls to C++ constructors and destructors, respectively.

All of these patterns have the following accessors: get_mangled_name() which returns the function's mangled name; get_name() which returns the unmangled name (with no information about its parameters), and get_signature() which returns the unmangled name with full parameter information. They also have nargs() which returns the number of arguments for the call, and get_arg(n) which returns the nth argument.

3.3.6. Type patterns

A number of patterns match expressions of certain types. The patterns in this group are Scalar, Pointer, Reference, FunctionPointer, Struct, and Union. Pointer and Reference can be created with an argument which is a Pattern that specifies the pointed-to type. Note that these match trees of expressions with the appropriate type, and not the type itself. Therefore the tree matched here cannot be passed to accessor functions which expect a type tree; you should use get_type_of_tree() if you need the type.

3.3.7. Miscellany

Cast matches a cast expression. Type matches type trees. StmtPat matches statement trees. ExprStmt matches expressions that are used as statements. This matches a pointer called "this". Arg matches an expression that is being passed into a function call. CondPattern matches expressions of the form c?a:b.

Arg has three useful accessors. pos() returns the position of the matched argument (zero indexed; a return of -1 indicates that the pattern did not match correctly). get_call() returns the tree for the function call to which the argument was passed. get_formal_type() gets the type of the formal parameter that this argument matches.

CondPattern can be constructed with patterns for its three operands (default is _ for all); those patterns must match the operands for the CondPattern to match. If you supply different patterns for all three, you can use the eval() accessor for this pattern, which returns the value that has been returned for this path at this point in the analysis.

3.3.8. Pattern building functions

Except for DECL_PAT, the following are functions which all return patterns, and so these functions can be conveniently used within a match operation.

3.3.8.1. DECL_PAT

DECL_PAT( name, pattern ) creates a new pattern name which matches pattern. This is mainly for creating an alias for some complex pattern.

3.3.8.2. And

And(pat1, pat2, ...) can take up to 6 patterns, and returns a pattern which matches when all the argument patterns match. Since all the subpatterns must match in order for the And to match, any of the subpatterns may be used after a successful match.

3.3.8.3. Or

Or(pat1, pat2, ...) can take up to 6 patterns, and returns a pattern which matches when one or more of the argument patterns match. Unlike And, the subpatterns may not be safely used after a match, since some of them may have not matched. You can use DECL_PAT( name, Or( ... ) ) to create a pattern from the Or, and then you can use name in the match action.

3.3.8.4. Not

Not(pat) matches if pat does not match. Clearly pat may not be used if this matches.

3.3.8.5. Within

Within(pat) matches if the tree being matched is a subtree of something that matches pat. After a match pat is usable and has information about the parent tree.

3.3.8.6. Contains

Contains(pat) matches if the tree being matched contains a subtree that matches pat. After a match pat is usable and has information about the subtree that matched.

3.3.8.7. Const

Const(pat) matches if the tree being matched matches pat, and is also read-only (i.e., if it is a variable declared const).

3.3.8.8. Statement builders

Return(pat), Switch(pat), If(pat), For(initpat, condpat, updatepat), and While(pat) build patterns that match statements of the corresponding types. The arguments to these functions are patterns which must match the subexpressions of the statement. All arguments have default values of _.

3.3.9. Overloaded operators

Almost all overloadable operators are overloaded to create new patterns that match trees that consist of expressions which are of the appropriate type. For example, operator+ is overloaded so that the expression (pat1 + pat2) creates a pattern which matches an addition expression. The only operators which are not overloaded are the assignment operators except for = (e.g., +=, <<=). However, these are converted during the analysis, so that when the analysis reaches a statement of the form a += b, it actually sees the statement a = a + b. A pattern matching the latter statement can be created with the pattern building operators.

Chapter 4. The Store

4.1. Motivation

Patterns provide much of the power behind Coverity Extend, but they have a major limitation: they can only recognize properties within a single expression. In order to find properties which span more than one line of code, a checker needs to maintain state between calls to the ANALYZE_* functions.

There are three levels of state tracking that make sense in checkers: global to the entire analysis, per-function, and per-path. Each checker will only analyze one function at a time, so global and per-function state can be implemented with checker member variables and using CHECKER_INIT/CHECKER_FINAL and FUNCTION_INIT/FUNCTION_FINAL to initalize/finalize this state. However, this doesn't work well for per-path state, because while there are many paths being analyzed at once and therefore many sets of per-path state, there is only one checker. Per-path state tends to be the most commonly used and most useful of the three kinds of state, so we need a convenient and efficient means for tracking it.

We do this with the store. The store is essentially a map which allows a checker to associate state information with any variable or expression. Each ANALYZE_* function has access to the store and can query it for the state of a tree as well as add or modify state for a tree. The traversal infrastructure handles all the details of tracking the different per-path sets of state, such as managing which state corresponds to which path.

4.2. Example: Detecting use after free

Suppose we wanted to detect the use of a pointer after it has been freed. Since the free and the use will most likely occur in different expressions, we need the store to write such a checker. Below is a version of how you might start writing this checker:

#include "extend-lang.h"

START_EXTEND_CHECKER( basic_use_after_free, int_store );

ANALYZE_TREE()
{
    Fun freecall( "free" );
    Pointer p;

    if( MATCH( freecall( p ) ) ) {
        SET_STATE( p, 1 );
        ADD_EVENT( p, "free", "freed pointer " << p );
    }
    else if( MATCH( *p ) ) {
        if( MATCH_STATE( p, 1 ) ) {
            COMMIT_ERROR( p, "deref", "dereferenced free pointer " << p );
        }
    }
    else if( MATCH( p = _ ) ) {
        CLEAR_STATE( p );
    }
}

END_EXTEND_CHECKER();

MAKE_MAIN( basic_use_after_free )

Note that the above checker, as its name implies, catches only basic instances of use after free. It doesn't handle aliasing, interprocedural analysis, or violations other than an explicit dereference. While it would be possible to handle all those details in a Coverity Extend checker, doing so would obscure the illustrative nature of this example.

The rest of this section will offer a brief explanation of this checker. Section 4.3, “Store Operations” will describe in detail all the operations used by this checker.

The first thing to notice is the type of this checker: int_store. This specifies a checker which uses a store whose state information for each tree consists of an integer. In other words, for every variable or expression we encounter during a match, we can track a single integer for it, which we will be able to access in a later match. While it may seem limited, in practice a single integer is sufficient for most checkers. You can often enumerate the various states a variable can be in, and encode that information accordingly. For this checker, we will use the value 1 to mean that the pointer is known to be a freed pointer. Any other value, or absence of state for that pointer, signifies that the pointer is not necessarily free.

A use after free violation is defined as a free of a pointer followed by a use of that pointer. In the above checker, the only use we will worry about is a direct dereference. Accordingly, the first two patterns the checker is concerned with are calls to free, and pointer dereferences.

Whenever the checker sees a call to free on a pointer, it remembers that the pointer was freed by setting the state of the pointer to 1, using the SET_STATE command. Additionally, it annotates the line of code where the free happened with a comment that the pointer was freed, using ADD_EVENT. If an error is later generated relating to this pointer, all such annotations will be part of the error report.

On the other end, whenever the checker sees a pointer dereference, it needs to ask the store whether the state for the pointer has previously been set to 1, using MATCH_STATE. If so, then we know that this pointer has been freed, so we generate an error report (with an additional annotation for the location where the error occurs) using COMMIT_ERROR.

However, there's one very important case to consider: what if, between the free and the dereference, the pointer is reassigned? Then the dereference is not necessarily of a freed pointer, yet with only the two match actions described above, we will still generate an error. We need a match action that will unmark a pointer since it is no longer freed. The third match action does this using CLEAR_STATE. We could also "unmark" the pointer by setting its state to something other than 1. However, CLEAR_STATE both removes the state of the pointer altogether, and also removes all annotations for that pointer. In other words, CLEAR_STATE truly resets the state of the expression to be as if the checker had never matched that expression before.

4.3. Store Operations

This section provides details for commands which interact with the store.

4.3.1. SET_STATE

SET_STATE(tree, value) assigns a state of value to the tree tree. If tree already had a state, it is overwritten. This operation does not affect any annotations that may exist for tree.

4.3.2. GET_STATE

GET_STATE(tree, value) queries the state for tree and, if it exists, stores it in value, which is a reference parameter. GET_STATE returns whether or not there was a state value for tree in the store. If no value was found for tree, value will be unchanged.

4.3.3. CLEAR_STATE

CLEAR_STATE(tree) removes any state information that may exist about tree from the store, and removes all annotations for tree.

4.3.4. MATCH_STATE

MATCH_STATE(tree, value) checks whether the state for tree matches value. If the state matches, then this returns true. if the state does not match, or there is no state for tree, then this returns false.

4.3.5. COPY_STATE

COPY_STATE(dst_tree, src_tree) copies the state of src_tree (including annotations) and associates that state with dst_tree. This is typically useful when tracking properties about the values of variables. If you have state for one variable, and then see a copy of that variable's value, you can use COPY_STATE to mirror the effects of the code.

4.3.6. FOREACH_IN_STORE

FOREACH_IN_STORE(tree, value) starts a loop which will iterate over all store elements. Each time through the loop, tree and value will be set equal to a pair in the store. tree and value must be variables which are declared before the loop. The following checker is the beginnings of a basic memory leak detector that shows how to use this macro, and in what circumstances it might be necessary:

#include "extend-lang.h"

START_EXTEND_CHECKER( basic_leak, int_store );

ANALYZE_TREE()
{
    Fun malloc( "malloc" );
    Fun free( "free" );
    LocalVar lv;
    AnySubpart lv_sp( lv );
    DECL_PAT( non_local, Not( lv_sp ) );
    
    if( MATCH( lv_sp = malloc( _ ) ) ) {
        SET_STATE( lv_sp, 1 );
        ADD_EVENT( lv_sp, "alloc", "Allocation assigned to " << lv_sp );
    }

    else if( MATCH( free( lv_sp ) ) ) {
        CLEAR_STATE( lv_sp );
    }
}

ANALYZE_END_OF_PATH()
{
    tree t;
    int v;
    FOREACH_IN_STORE( t, v ) {
        COMMIT_ERROR( t, "leaked", t << " was not freed before the end of path" );
    }
}

END_EXTEND_CHECKER();

MAKE_MAIN( basic_leak )

This checker does not handle a host of things (such as pointer aliasing, interprocedural analysis, and so on), and will have some false positives that would need to be tuned out using more specific matches. It simply enforces that if a pointer is allocated to a local variable and that local variable is not freed within the same function, an error report will be generated. Making this checker more precise would make it harder to see the main point, which is using the FOREACH_IN_STORE macro to find all things in the store at the end of a path.

4.3.7. ADD_EVENT

ADD_EVENT(tree, tag, text) adds an annotation to tree, associated with the line number of the tree being currently analyzed. Annotations contain a tag, which is a short descriptive string of the event, and some text, which is any expression which may be shifted onto an output stream. The text will be printed in the error report as the explanation for the event, and the tag is used as the link name when other events reference this event.

Note that expressions with no state value cannot have annotations; this means that ADD_EVENT will do nothing if you try to add an event for an expression that has no state. This is also why CLEAR_STATE will remove all annotations. You don't need to explicitly SET_STATE before every ADD_EVENT: In places where you know that some state exists for the expression (e.g., you have a true return from MATCH_STATE), you can always use ADD_EVENT.

Similarly, it is not necessary to ADD_EVENT every time you SET_STATE. However, much of the value of the error reporting in Coverity Prevent and therefore in Coverity Extend is the precision and detail of the error reports. It's usually a good idea to add events whenever you change an expression's state, so that the error report will contain a record of all the events that influenced the checker's decision to report an error.

4.3.8. COMMIT_ERROR

COMMIT_ERROR(tree, tag, text) generates an error report, using the event history for tree with one final event specified by tag and text. Note that this event is only added to the error report, and not to tree's event history. This is done so that future errors reported on tree will not contain events specific to previous error reports. If you do not wish to add an error-specific event, you may pass the empty string ("") for tag and text.

4.4. Store Mechanics

The analysis engine guarantees that every time ANALYZE_TREE or ANALYZE_COND is called, the store will look as if the analysis has examined the entire path currently being analyzed up to the current tree from the start in approximate execution order. In other words, if a checker respects the store abstraction and doesn't use global, static, or class variables to hold state information about the analysis, then the checker may behave as if the analysis were traversing all paths completely and in order, which is the conceptually easy and intuitive thing to do for checker writers. Basically, match actions should be written as if every preceding tree in the function has already been analyzed and the store has been modified accordingly.

Under the hood, it isn't quite that simple. If the analysis engine were to actually traverse every path in its entirety, it would be performing a lot of redundant analysis (for blocks of code which are common to all paths) and run significantly slower. Instead, the engine identifies common blocks and analyzes them only once, and generally only analyzes trees multiple times if the analysis will result in different state each time. This means that statements and subexpressions which are adjacent in the code will not necessarily be analyzed in consecutive calls to ANALYZE_TREE. However, the store hides all this complexity; this is the main reason the store is necessary.

We described the guarantee as looking like "approximate execution order". What this means is that, from the store's point of view, subexpressions are handled before the statement itself is handled, and these are done in almost the same order as they would happen in the compiled program. There are two main reasons why it might not be exactly the same. In cases where the language does not specify the execution order (for example, order of evaluation of function arguments) the traversal order may not match a particular compiler's execution order. Additionally, in some situations the analysis artificially inserts and reorders some trees that aren't explicitly obvious in the code (for example, variables declared in loop conditions are transformed to declarations before the loop). These transformations will slightly alter how the code looks but results in trees which are functionally equivalent.

Chapter 5. Analysis Actions

5.1. Current Function Information Accessors

These functions return information about the function being currently analyzed. They may be used in the three analysis functions.

5.1.1. current_function_get_mangled_name

This returns the entire mangled name.

5.1.2. current_function_get_name

This returns the unmangled function name, with the entire namespace and class information prefixed.

5.1.3. current_function_get_signature

This returns the full unmangled function name with parameter information.

5.1.4. current_function_get_class_name

This returns everything preceding the final "::" in the full unmangled function name. It returns the empty string for functions in the global namespace. Strictly speaking this isn't just the class, since it includes all namespace information as well.

5.1.5. current_function_get_method_name

This returns everything following the final "::" in the full unmangled function name. It returns the full function name for functions in the global namespace. Strictly speaking this isn't just the "method" name since it can be used on functions which are not members of classes.

5.1.6. current_function_is_ctor

This returns whether the current function is a constructor.

5.1.7. current_function_is_dtor

This returns whether the current function is a destructor.

5.1.8. current_file_get_name

This returns the name of the file in which the current function is defined.

5.1.9. current_file_lineno

This returns the line number for the expression currently being analyzed.

5.2. Tree information accessors

The following functions get information about a particular tree. There are three ways to get a tree:

  • A pattern, after a successful match, may be used where a tree is expected; it will give you the tree that the pattern matched.

  • Many pattern types also have accessors which return trees which may then be passed into these functions.

  • CURRENT_TREE is the tree currently being analyzed; this is usable in ANALYZE_TREE and ANALYZE_CONDITION.

5.2.1. is_tree_in_macro

If the current tree is inside a macro, this function will return the macro's name. Otherwise, it returns NULL.

5.2.2. get_type_of_tree

This returns a tree that represents the type of the tree passed in. If the argument has no type, this function returns NULL.

5.2.3. get_size_of_type

This returns the size of the type that is passed in. If the argument is NULL, then this returns -1. If the argument is not NULL but is not a type, this function returns 0.

5.3. Error Reporting

5.3.1. OUTPUT_ERROR

OUTPUT_ERROR(text) generates an error report with text as the details of the report. The error will be associated with the line number of the tree being inspected. text may be any expression which may appear on the right hand side of a stream's operator<<.

This error generating function is suitable for errors that occur entirely within one tree. For errors that span multiple lines (for example, detecting for double free), you will need to use ADD_EVENT and COMMIT_ERROR, described below. Note also that multi-line errors only make sense with checker types that use some sort of store.

5.3.2. ADD_EVENT

For use with store checker types; see Section 4.3.7, “ADD_EVENT.

5.3.3. COMMIT_ERROR

For use with store checker types; see Section 4.3.8, “COMMIT_ERROR.

5.3.4. escape_xml

The text passed to OUTPUT_ERROR and ADD_EVENT will be written into an xml file. Any text that contains characters which interfere with the xml (such as '<'), needs to be escaped with this function. For example, suppose the function name could contain '<' (e.g., the function is an overloaded operator). Then you would need to do the following:

OUTPUT_ERROR( "The function " << escape_xml( function_name ) << " ..." );

Note that you cannot escape the entire contents of the macro, because escape_xml takes a string.

Chapter 6. Coverity Extend™ Checker Types

6.1. Overview

When defining a Coverity Extend checker, you must specify its type as the second argument to START_EXTEND_CHECKER. Different types provide different levels of complexity and access to different functionality. This is done for two reasons: it allows Coverity Extend users to only choose a type best suited for the current checker, so that the user need only deal with as much complexity as necessary; and it also allows Coverity to introduce new checker types as we develop them without affecting already written Coverity Extend checkers.

The types are listed roughly in order of complexity and running time. The ones higher in the list are easier to use and run faster, but are more limited in the sorts of analyses they can do. When writing a checker, select the simplest type that is adequate for your analysis.

6.2. simple

This is a very simple checker type. It maintains no state information between analysis callbacks, does not track values for variables, and does no interprocedural analysis. Basically, simple checkers will see every tree once, with no context about what's going on in the function when that tree is encountered. These checkers can be used to perform search operations: for example, searching for all parameter dereferences as seen above, searching for all assignments to the integer constant 0, or all local variable declarations which are of types which exceed a certain size.

6.3. int_store

This is the simplest checker type which uses a store (see Chapter 4, The Store). The store in this case allows the checker to associate an integer with a particular tree. This checker type does not perform interprocedural analysis. int_store checkers can be used to find ordering properties within a single function (e.g., calls to function A must be followed by calls to function B) as well as properties involving integer values that span multiple lines (e.g., pointers may not be assigned to dynamically allocated blocks of a size which is not a multiple of the size pointed to).