SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
Workbook HEPIC Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Wkbk. Search
Wkbk. Sitemap
Introduction
Non SLAC
HOWTO's
Introduction
Logging In
QuickTour
Detector
Info Resources
Software Infrastructure
CM2 Introduction
Unix
OO
SRT
Objectivity
Event Store
Framework
Beta
Modifying Code
Writing and Editing
Compiling
Debugging
Analysis
Framework II
Analysis
Find Data
Batch Processing
PAW
PAW II
ROOT I
ROOT II
ROOT III
Advanced Infrastructure
New Releases
Workdir
Main Packages
Event Displays
Gen/Sim/Reco
Contributing Software
SRT and CVS
Coding
Advanced Topics
Make CM2 Ntuples
New Packages
New Packages 2
Persistent Classes
Java
Site Installation
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

Workbook for BaBar Offline Users - Debugging

The aim of this workbook section is to provide some examples of use of the BaBar debuggers. See also HOWTO-Basic-Debugging.
=== NOTE ===== Unfortunately, someone forgot to include the debugging option in analysis-30. So this page has not been updated to analysis-30. It is still the analysis-26 version. However, some of the methods described on this page will still work in analysis-30.

Contents


Introduction

Compile your code in debug mode and use the debugger to trace problems in a program at the source code level.

A debugger enables you to control a program's execution, symbolically monitoring program control flow, variables, and memory locations. You can also use the debugger to trace the logic and flow of control to acquaint yourself with a program written by someone else.

Different machines have different debuggers. The debugger for Linux machines like yakut and noric is called gdb. The debugger for Sun machines like shire is called dbx. This section will give a brief overview of both debuggers.

This section also includes examples of how to use gdb and dbx. They assume that you already have checked out analysis-26 and edited your code to include the PExample module, as described in the Editing Code section of the Workbook.


Compiling your Code for Debugging

Generally when you compile and link your code for running an analysis job, the code is optimized to run faster. The BaBar packages in your release that you use, but don't check out and edit, will have been compiled optimized as well. When you want to debug code it will typically be your own code that will have bugs that you need to find, as most BaBar releases (particularly analysis releases) have been tested by experts. The best way to do this is to compile and link your code with the flags -noOptimize-Debug. This will generally enable the debuggers to pinpoint the exact line in your code where the problem occured. There are two ways to compile and link in debug mode:
  1. When you issue the srtpath command, select the -noOptimize-Debug option instead of the default. For example, if you are logged into yakut and have checked out the release analysis-26, srtpath will give you four options:
    Select/enter BFARCH (CR=1):
    1) Linux24SL3_i386_gcc323                     [prod][test][active][default]
    2) Linux24SL3_i386_gcc323-Optimize-Profile    [prod]
    3) Linux24SL3_i386_gcc323-noOptimize-Debug    [prod]
    4) Linux24RHEL3_i386_gcc323                   [default2]
    
    The default option is option 1, but if you select 3, all of your gmake commands will run in -noOptimize-Debug mode.
  2. Alternatively, you can select just Linux24SL3_i386_gcc323 architecture. Then when you wish to debug your code, you would just issue the gmake commands with ROPT=-noOptmize-Debug, as follows:
       ana26> bsub -q bldrecoq -o all.log gmake all ROPT=-noOptimize-Debug
    
Note that if you have been compiling and linking in Optimized mode, if you want to recompile for debugging, and haven't changed anything in between, you should issue a gmake clean or gmake cleanarch command to flush out the Optimized library and binary files.

HOWTO-Basic-Debugging

Up-to-date information on debugging BaBar analysis jobs is available in the HOWTO file HOWTO-Basic-Debugging. This very useful HOWTO is written for beginners, and contains information about:
  • How to report problems to get help from others
  • Descriptions of common types of problems
  • Summary of how to use the debuggers
  • Other useful tips, tricks and sources of information

Command-line debuggers

Command-line debuggers are generally best to use to track down basic coding errors as they are fast to use, and so can be easily used when logged in remotely, as well as being much faster in general for debugging than graphical debuggers, which can be more useful for complex problems.

Debugging on Linux: gdb

The debugger for Linux machines is called gdb. gdb allows you to see what is going on inside a program while it executes -- or what the program was doing at the moment it crashed. gdb can do can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:
  • Start your program, specifying anything that might affect its behavior.
  • Make your program stop on specified conditions.
  • Examine what has happened, when your program has stopped.
  • Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.
The basic syntax for gdb is one of the following:
  • gdb program - To debug program.
  • gdb program core - To debug using the core file, produced when program was core dumped.
  • gdb program PID - To debug a running process with process ID number PID.
For more information about gdb, you can look at the man page:
man gdb
or the info page
info gdb
The info page in particular contains a lot of information and even a sample gdb session. The info page looks like a text document, but in fact it has links that you can follow to other pages. To navigate the info page, put the cursor on the menu item that you are interested in, and press enter. To exit, press q for quit.

gdb commands

Here are some of the most frequently needed gdb commands:
Command Description
print [x] Print the object x
break [file:]function Set a breakpoint at function (in file).
run [arglist] Start your program (with arglist, if specified).
bt Backtrace: display the program stack.
print expr Display the value of an expression.
c Continue running your program (after stopping, e.g. at a breakpoint).
next Execute next program line (after stopping); step over any function calls in the line.
edit [file:]function Look at the program line where it is presently stopped.
list [file:]function type the text of the program in the vicinity of where it is presently stopped.
step Execute next program line (after stopping); step into any function calls in the line.
help [name] Show information about gdb command name, or general information about using gdb.
quit Exit from gdb.

Running a quick debug session on yakut to find a segmentation violation

Here is a very quick example of standard use of the debugger on a Linux machine demonstrating the minimal procedure that you are likely to use frequently. The particular responses in this section are from running on analysis-26 on a yakut machine.

To begin, you will deliberately introduce an error into your code. Open the PExample.cc file that you used in the last WorkBook section, and comment out the line where the momentum histogram is initialised:

//  _pHisto = manager->histogram("Momentum",  25,  0.,  1. ); 
Now try to recompile and link (with the Debug flag set to maximise the information we can get when things go wrong):
ana26> gmake cleanarch
ana26> bsub -q bldrecoq -o all-Linux.log gmake all ROPT=-noOptimize-Debug
Since _pHisto is declared in the header file, the code will compile and link with no problems. However, trying to run as you did in the Compile, Like and Run section of the WorkBook, something goes wrong:
workdir> BetaMiniApp snippet.tcl
> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
The job doesn't even get past the first event!, it dies with a message like:
2005-07-18 16:35:18 1288 Err : TUnixSystem::DispatchSignals - segmentation violation
Abort (core dumped)

The program crashed and was core dumped. Now there will be a huge file called core.XXXX (where XXXX = some number) in your workdir directory. Normally you would want to delete this right away, so that it does not use up all of your disk space. However, in this case you will leave it because you are going to use it for debugging in the next section.

A segmentation violation generally means that the program tried to access something that isn't there. In this case we deliberately created a common problem - a new histogram is put into the code, declared in the header, filled in the event() function of the implementation file, but we haven't actually instantiated it - that is, you have to make the histogram before you can fill it.

However, in most cases you do not put in an error deliberately, and have made many small changes to code before checking it. So the message "segmentation fault" isn't particularly useful for determining which of these small changes is the source of the error.

Therefore, we rerun the executable with debugger gdb to find out exactly where it crashes (note that $BFARCH is set up for us when we type srtpath at the start of a session):

workdir> gdb bin/$BFARCH/BetaMiniApp
> run snippet.tcl

These first two lines run gdb on the job "BetaMiniApp snippet.tcl".

At the framework prompt, input your collection as usual:

> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
Again we get a crash, but this time with a helpful pointer to where it went wrong with the output:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1218624864 (LWP 17178)]
0x08094710 in PExample::event (this=0x11545ac8, anEvent=0x12c369a0)
    at /afs/slac.stanford.edu/u/br/penguin/ana26/BetaMiniUser/PExample.cc:133
133         _pHisto->accumulate( trk->p() );
Current language:  auto; currently c++
So the segmentation fault occured at line 133 of PExample.cc - as you would expect (even if you had not put in the error yourself) since this is one of the files that you added, rather than part of the standard BaBar code that you haven't touched.

To try to get a bit more information, you can ask the debugger where it was with all the processes it was running when the crash occured:

(gdb) where
Predictably, one of the places where things were running was:
#0  0x08094710 in PExample::event (this=0x11545ac8, anEvent=0x12c369a0)
    at /afs/slac.stanford.edu/u/br/penguin/ana26/BetaMiniUser/PExample.cc:133
Confirming our knowledge of where the error occured. To look more closely, you can enter:
(gdb) frame 0
to look at the particular region where it went wrong:
#0  0x08094710 in PExample::event (this=0x11545ac8, anEvent=0x12c369a0)
    at /afs/slac.stanford.edu/u/br/penguin/ana26/BetaMiniUser/PExample.cc:133
133         _pHisto->accumulate( trk->p() );
You still don't know for sure that it was the non-instantiation of the _pHisto histogram that caused the problem, but you have really narrowed down the suspects. In this frame, you can also try to interrogate the objects listed to see if you can get a few more hints:
(gdb) print trk
gives output:
$1 = (class BtaCandidate *) 0x14054a40
Which says the object trk is a pointer to a BtaCandidate and has a sensible memory location - this is good. You can also use the command,
(gdb) print trk->p()
to print the magnitude of the 3-momentum of the track:
$2 = 0.73818585099287926
But that doesn't help much in this case. So finally we have a look at our histogram:
print _pHisto
Which tells us what's wrong:
$3 = (struct HepHistogram *) 0x0
The code knows that _pHisto is a pointer to a HepHistogram object, but is has a null memory location.

So now we know where it went wrong, and the task of fixing things is made much simpler.

Now that you know what is wrong, you can quit:

(gdb) quit
The system responds,
The program is running.  Exit anyway? (y or n)

Answer "y", and you're out.

Debugging with a core file

A core file is produced when a program exits abnormally and produces a core dump. In the above example, you were core dumped, so now you have a file called core.XXXX in your workdir directory. This core file contains a very detailed record of your job, up to the point where it crashed.

gdb can debug a core file instead of a running job. For example, let's rerun the above debugging session, but this time using the core file. My core file is called core.1288 (your number is probably different), so I enter:

gdb BetaMiniApp core.1288
Then you can use (almost) all the same commands you used before. To find out where the error occured:
(gdb) where
Again, you find the error in PExample, although this time it is frame 7 instead of frame 0:
#7  0x08094710 in PExample::event (this=0x10226990, anEvent=0x11917b88)
    at /afs/slac.stanford.edu/u/br/penguin/z24/BetaMiniUser/PExample.cc:133
So you look at frame 7:
(gdb) frame 7

#7  0x08094710 in PExample::event (this=0x10226990, anEvent=0x11917b88)
    at /afs/slac.stanford.edu/u/br/penguin/z24/BetaMiniUser/PExample.cc:133
133         _pHisto->accumulate( trk->p() );
Current language:  auto; currently c++
As before, you investigate the object "trk":
(gdb) print trk
$1 = (class BtaCandidate *) 0x12d35c20
But this time you can't print the track's momentum, because the job is not running:
(gdb) print trk->p()
You can't do that without a process to debug.
Finally, you check the histogram and find your problem, as before:
(gdb) print _pHisto
$2 = (struct HepHistogram *) 0x0
Then exit gdb:
(gdb) quit

Debugging on Sun: dbx

Setting up

The debugger for Sun machines is called dbx.

If you have been following the Workbook, then you have probably done all of your work so far on yakut, which is a scientific Linux machine. So before you can debug in Sun, you will need to compile and link in Sun. Login to shire (a Sun machine), and then do:

ana26> srtpath
ana26> cleanarch
ana26> bsub -q bldrecoq -o all-Sun.log gmake all ROPT=-noOptimize-Debug

The cleanarch command removes older Sun libraries and binaries, if there are any. Note, however, that it does not clean out any Linux files. It cleans out only files for the current architecture (as set by srtpath).

dbx commands

The commands and syntax for dbx are similar, but not identical, to those used for gdb. Here are some of the most common (and platform-independent) commands:
Command Description
help Display general help (uses more)
help [command] Display help for command command
run [args] Start the program with argument list args
pathmap [path] Add path to the list of paths in which dbx will look for code
file [filename] Tells the debugger to look in file filename for code
list List lines of source code
print [x] Print the object x
stop in [foo] Set a break point at the beginning of function foo
stop at [line] Set a break point at line line
assign [x]=[y] Set variable x to be y (another variable or a number)
next Step to the next line (stepping over function calls)
step Step to the next line (stepping into functions)
cont Continue to the next stop (e.g. a break-point)
where Print the current activation levels of a program
quit Quit debugging session
For more information, use "man dbx". (Sadly, there does not appear to be an info page for dbx.)

Running with dbx

The syntax of dbx is:
   > dbx [object_file [corefile]]
The object_file is the name of the executable object file that you want to debug. It provides the code that dbx executes.

The following is an example on how to start a debugging section, set a couple of breakpoints and print the value of a variable. For this, we will use BetaMiniApp, which is the program running our PExample.

   workdir> dbx ../bin/$BFARCH/BetaMiniApp
This gives you an information screen, which you get rid of by typing "q":
   :q 
More information scrolls by, until finally you get your dbx prompt:
(/opt/SUNWspro/bin/../WS6U1/bin/sparcv9/dbx)
From now on I will abbreviate the prompt to (dbx). Now you are ready to start:
   (dbx) pathmap ../BetaMiniUser/
   (dbx) file PExample.cc
   (dbx) stop in PExample::event
   (2) stop in PExample::event(AbsEvent*)
This sets a break point at the function event of PExample. Now when you run the program with dbx, it will stop at the break point.

To run the program:

   (dbx) run snippet.tcl
Then, after some initial output, you get your usual framework prompt, and talk to KanEventInput:
> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
After some more output, the job stops with:
t@1 (l@1) stopped in PExample::event at line 123 in file "PExample.cc"
  123     HepAList<BtaCandidate>* trkList  =
Now let's explore:
(dbx) list +15
  123     HepAList<BtaCandidate>* trkList  =
  124       Ifd<HepAList> BtaCandidate > <::get(anEvent, _btaChargedList.value());
  125
  126     //histogram number of tracks in event
  127     _numTrkHisto->accumulate( trkList->length() );
  128
  129     // Loop over track candidates to plot momentum
  130     HepAListIterator<BtaCandidate> iterTrk(*trkList);
  131     BtaCandidate* trk;
  132     while ( 0 != ( trk = iterTrk()) ) {
  133       _pHisto->accumulate( trk->p() );
  134     }
  135
  136     // done
  137     return AppResult::OK;


(dbx) next

t@1 (l@1) stopped in PExample::event at line 127 in file "PExample.cc"
  127     _numTrkHisto->accumulate( trkList->length() );

(dbx) print trkList
trkList = 0x15bf1290

(dbx) print *trkList
*trkList = {
/* try using "print -r" to see any inherited members */
 }

(dbx) print -r *trkList
*trkList = {
    HepAList<BtaCandidate>::HepAListBase::p = 0x15bf7258
    HepAList<BtaCandidate>::HepAListBase::n = 10
    HepAList<BtaCandidate>::HepAListBase::s = 14
}

(dbx) stop at 132
(3) stop at "PExample.cc":132

(dbx) cont
t@1 (l@1) stopped in PExample::event at line 132 in file "PExample.cc"
  132     while ( 0 != ( trk = iterTrk()) ) {

(dbx) status
 (2) stop in PExample::event(AbsEvent*)
*(3) stop at "PExample.cc":132

(dbx) next
t@1 (l@1) stopped in PExample::event at line 133 in file "PExample.cc"
  133       _pHisto->accumulate( trk->p() );

(dbx) print trk->p()
trk->p() = 0.73818585099288

(dbx) delete 2

(dbx) status
(3) stop at "PExample.cc":132

(dbx) quit

GUI Debuggers

GUI debuggers have different interfaces on each platform. They are generally a wrapper on the debugger in use. They let you see the code in context as you step through it.

GUI debuggers are nice because they provide a graphical user interface to confusing debugging programs. On the other hand, they can be CPU-intensive, so they can be very slow to run, especially if you are not at SLAC.

The examples and images below were written for the pre-CM2 Workbook, run with an early analysis release. However, they should work with the following substitutions:

BetaUser -> BetaMiniUser
BetaApp -> BetaMiniApp
../BetaUser/bdbReco.tcl -> snippet.tcl
source ../BetaUser/myData.tcl -> 
                     source $BFROOT/www/doc/workbook/NewExamples/FrmwkTCL/MyKanEventInput.tcl

ddd

ddd is a nice graphical debugger which is available on Linux. To use it, basically follow the commands for workshop below.

workshop

On Sun Solaris there is a nice GUI debugger called workshop.

To access Workshop and all of its tools, at least on the SLAC machines, add

       /afs/slac/package/sunworkshop/test/SUNWspro/bin/ 
to your PATH.
> addpath PATH /afs/slac/package/sunworkshop/test/SUNWspro/bin/
To invoke it type from workdir:
> workshop &
An application toolbar will appear. The debugger is the button with a bug icon, crossed out in red.
toolbar.gif

Select Debug on the toolbar either by pushing it or pressing 'Alt D'. Among the options select New Program.

The Debug New Program window will appear. In the Name box type the name of the program you want to debug:

   ../bin/$BFARCH/BetaApp
and in the Arguments box specify the tcl file you want to use (in this example, as before, we will use the bdbMicro.tcl file)
newname.gif

Press OK when ready.

The debugger will start loading the program:

loading.gif

and a new window will appear:

debugger.gif

The loading is complete when you read Program loaded in the lower left corner of this window.

An additional editor window will appear from which you can access the code.

Let's set the breakpoint in event in PExample.cc.

Push Execute and choose Set Breakpoints .

Select Stop as the Action, In Function as the Event item and type in the corresponding box PExample::event.

Push Add to add this breakpoint.

breakpoint.gif

Note: Occasionally, when I run workshop, I find that the breakpoint window doesn't come up. Sometimes, starting over helps. Or, if you stretch the main debugger window, you will uncover a pane labeled "Dbx Commands:". You can issue any dbx command in this window, such as:
   (dbx) stop in PExample::event
   (dbx) status
to set and examine breakpoints.

To run the program push the Start button startbutton.gif. To interrupt the program push at any time stopbutton.gif

The Input/Output window will now appear. To set the collection and run the first event, type:

   > source ../BetaUser/myData.tcl
   > ev beg -nev 1
in_out_window.gif.

The program will then stop at the breakpoint in PExample and in the Debugger Window you will see the status of the program:

programstatus.gif.
By clicking on any of the blue program names in the lower window you will load the corresponding source code in the Editor Window.

To print the value of anEvent type anEvent in the Expression box and press the Evaluate button. Do the same with *anEvent.

evaluate.gif.

We want now to set a breakpoint at line 148 of PExample.cc. In the Breakpoints Window select At Location for the Event item, type 148 in the corresponding box and press the Add button.

breakpoint2.gif.

Press the continue buttoncontinue.gif (NOT startbutton.gif!!) to continue the execution. The program should stop at line 148 in PExample.cc and on the Editor Window the cursor should be on the corresponding line of PExample.cc.

You can now move to the next statement by pushing the next button next.gif. The editor window shows you the breakpoints in rad and your current line in green.

last.gif.

Print the value of the track momentum by typing trk->p() in the Expression box and pushing the Evaluate button as before.

trk_p.gif

To remove a breakpoint select it in the Breakpoint Window and push the Delete button. To Exit workshop press the Workshop button on the Workshop Window and choose Exit Workshop.


Back to Workbook Front Page

Author: Massimiliano Turri
Contributors:
Art Snyder, James Weatherall
Joseph Perl
Jenny Williams

Last modification: 25 July 2005
Last significant update: 12 February 2003