Workbook for BaBar Offline Users - Debugging
The aim of this workbook section is to provide some examples of use of
the BaBar debuggers. See also HOWTO-Basic-Debugging.
=== NOTE =====
Unfortunately, someone forgot to include the
debugging option in analysis-30.
So this page has not been updated to analysis-30.
It is still the analysis-26 version.
However, some of the methods described
on this page will still work in analysis-30.
Compile your code in debug mode and use the debugger to trace problems
in a program at the source code level.
A debugger enables you to control a program's execution,
symbolically monitoring program control flow, variables, and memory
locations. You can also use the debugger to trace the logic and flow
of control to acquaint yourself with a program written by someone
else.
Different machines have different debuggers. The debugger
for Linux machines like yakut and noric is called gdb. The
debugger for Sun machines like shire is called dbx. This section
will give a brief overview of both debuggers.
This section also includes examples of how to use gdb and dbx.
They assume that you already have checked out analysis-26 and edited your
code to include the PExample module, as described in the
Editing Code section of the Workbook.
Generally when you compile and link your code for running an analysis
job, the code is optimized to run faster. The BaBar packages
in your release that you use, but don't check out and edit, will have
been compiled optimized as well. When you want to debug code it will
typically be your own code that will have bugs that you need to
find, as most BaBar releases (particularly analysis releases) have
been tested by experts. The best way to do this is to compile and link
your code with the flags -noOptimize-Debug. This will generally
enable the debuggers to pinpoint the exact line in your code where the problem
occured.
There are two ways to compile and link in debug mode:
- When you issue the
srtpath command, select
the -noOptimize-Debug option instead of the default. For
example, if you are logged into yakut and have checked out
the release analysis-26, srtpath will give you four options:
Select/enter BFARCH (CR=1):
1) Linux24SL3_i386_gcc323 [prod][test][active][default]
2) Linux24SL3_i386_gcc323-Optimize-Profile [prod]
3) Linux24SL3_i386_gcc323-noOptimize-Debug [prod]
4) Linux24RHEL3_i386_gcc323 [default2]
The default option is option 1, but if you select 3, all of your gmake
commands will run in -noOptimize-Debug mode.
- Alternatively, you can select just
Linux24SL3_i386_gcc323
architecture. Then when you wish to debug your code, you would just issue the
gmake commands with ROPT=-noOptmize-Debug, as follows:
ana26> bsub -q bldrecoq -o all.log gmake all ROPT=-noOptimize-Debug
Note that if you have been compiling and linking in Optimized mode, if
you want to recompile for debugging, and haven't changed anything in
between, you should issue a gmake clean or gmake cleanarch
command to flush out the Optimized library and binary files.
Up-to-date information on debugging BaBar analysis jobs is available
in the HOWTO file HOWTO-Basic-Debugging.
This very useful HOWTO is written for beginners, and contains information
about:
- How to report problems to get help from others
- Descriptions of common types of problems
- Summary of how to use the debuggers
- Other useful tips, tricks and sources of information
Command-line debuggers are generally best to use to track down basic
coding errors as they are fast to use, and so can be easily used when
logged in remotely, as well as being much faster in general for
debugging than graphical debuggers, which can be more useful for
complex problems.
The debugger for Linux machines is called gdb. gdb allows you to see what is
going on inside a program while it executes -- or what the program was doing
at the moment it crashed. gdb can do can do four main kinds of things
(plus other things in support of these) to help you catch bugs in the act:
- Start your program, specifying anything that might affect its behavior.
- Make your program stop on specified conditions.
- Examine what has happened, when your program has stopped.
- Change things in your program, so you can experiment with
correcting the effects of one bug and go on to learn about another.
The basic syntax for gdb is one of the following:
- gdb program - To debug program.
- gdb program core - To debug using the core file, produced when program
was core dumped.
- gdb program PID - To debug a running process with process ID number PID.
For more information about gdb, you can look at the man page:
man gdb
or the info page
info gdb
The info page in particular contains a lot of information and
even a sample gdb session. The info page looks like a text document, but in
fact it has links that you can follow to other pages. To navigate
the info page, put the cursor on the menu item that you are interested
in, and press enter. To exit, press q for quit.
gdb commands
Here are some of the most frequently needed gdb commands:
| Command |
Description |
| print [x] |
Print the object x |
| break [file:]function |
Set a breakpoint at function (in file).
|
| run [arglist] |
Start your program (with arglist, if specified).
|
| bt
| Backtrace: display the program stack.
|
| print expr
| Display the value of an expression.
|
| c
| Continue running your program (after stopping, e.g. at a breakpoint).
|
| next
| Execute next program line (after stopping); step over any function
calls in the line.
|
| edit [file:]function
| Look at the program line where it is presently stopped.
|
| list [file:]function
| type the text of the program in the vicinity of where it is
presently stopped.
|
| step
| Execute next program line (after stopping); step into any function
calls in the line.
|
| help [name]
| Show information about gdb command name, or general information about
using gdb.
|
| quit
| Exit from gdb.
|
Here is a very quick example of standard use of the debugger on
a Linux machine demonstrating the minimal procedure that you are
likely to use frequently. The particular responses in this section are
from running on analysis-26 on a yakut machine.
To begin, you will deliberately introduce an error into your code.
Open the PExample.cc file that you used in the last
WorkBook section, and comment out the line where the momentum
histogram is initialised:
// _pHisto = manager->histogram("Momentum", 25, 0., 1. );
Now try to recompile and link (with the Debug flag set to maximise the
information we can get when things go wrong):
ana26> gmake cleanarch
ana26> bsub -q bldrecoq -o all-Linux.log gmake all ROPT=-noOptimize-Debug
Since _pHisto is declared
in the header file, the code will compile and link with no
problems. However, trying to run as you did in the Compile, Like and Run section of the
WorkBook, something goes wrong:
workdir> BetaMiniApp snippet.tcl
> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
The job doesn't even get past the first event!, it dies with a message
like:
2005-07-18 16:35:18 1288 Err : TUnixSystem::DispatchSignals - segmentation violation
Abort (core dumped)
The program crashed and was core dumped. Now there will be a huge file
called core.XXXX (where XXXX = some number) in your workdir directory.
Normally you would want to delete this right away, so that it does not
use up all of your disk space. However, in this case you will leave
it because you are going to use it for debugging in the next section.
A segmentation violation generally means that the program tried to access
something that isn't there. In this case we deliberately created a
common problem - a new histogram is put into the code, declared in the
header, filled in the event() function of the implementation file, but
we haven't actually instantiated it - that is, you have to make the
histogram before you can fill it.
However, in most cases you do not put in an error deliberately,
and have made many small changes to code before checking it. So
the message "segmentation fault" isn't particularly useful
for determining which of these small changes is the source of the error.
Therefore, we rerun the executable with debugger gdb to find
out exactly where it crashes (note that $BFARCH is set up for us when we
type srtpath at the start of a session):
workdir> gdb bin/$BFARCH/BetaMiniApp
> run snippet.tcl
These first two lines run gdb on the job "BetaMiniApp snippet.tcl".
At the framework prompt, input your collection as usual:
> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
Again we get a crash, but this time with a helpful pointer to where it
went wrong with the output:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1218624864 (LWP 17178)]
0x08094710 in PExample::event (this=0x11545ac8, anEvent=0x12c369a0)
at /afs/slac.stanford.edu/u/br/penguin/ana26/BetaMiniUser/PExample.cc:133
133 _pHisto->accumulate( trk->p() );
Current language: auto; currently c++
So the segmentation fault occured at line 133 of PExample.cc - as
you would expect (even if you had not put in the error yourself) since
this is one of the files that you added, rather than part of the
standard BaBar code that you haven't touched.
To try to get a bit more information, you can ask the debugger
where it was with all the processes it was running when the crash
occured:
(gdb) where
Predictably, one of the places where things were running was:
#0 0x08094710 in PExample::event (this=0x11545ac8, anEvent=0x12c369a0)
at /afs/slac.stanford.edu/u/br/penguin/ana26/BetaMiniUser/PExample.cc:133
Confirming our knowledge of where the error occured. To look more
closely, you can enter:
(gdb) frame 0
to look at the particular region where it went wrong:
#0 0x08094710 in PExample::event (this=0x11545ac8, anEvent=0x12c369a0)
at /afs/slac.stanford.edu/u/br/penguin/ana26/BetaMiniUser/PExample.cc:133
133 _pHisto->accumulate( trk->p() );
You still don't know for sure that it was the non-instantiation of the
_pHisto histogram that caused the problem, but you have really
narrowed down the suspects. In this frame, you can also try to
interrogate the objects listed to see if you can get a few more hints:
(gdb) print trk
gives output:
$1 = (class BtaCandidate *) 0x14054a40
Which says the object trk is a pointer to a BtaCandidate
and has a sensible memory location - this is good.
You can also use the command,
(gdb) print trk->p()
to print the magnitude of the 3-momentum of the track:
$2 = 0.73818585099287926
But that doesn't help much in this case.
So finally we have a look at our histogram:
print _pHisto
Which tells us what's wrong:
$3 = (struct HepHistogram *) 0x0
The code knows that _pHisto is a pointer to a HepHistogram object, but
is has a null memory location.
So now we know where it went wrong, and the task of fixing things
is made much simpler.
Now that you know what is wrong, you can quit:
(gdb) quit
The system responds,
The program is running. Exit anyway? (y or n)
Answer "y", and you're out.
A core file is produced when a program exits abnormally
and produces a core dump. In the above example, you were core
dumped, so now you have a file called core.XXXX in your workdir
directory. This core file contains a very detailed record of your
job, up to the point where it crashed.
gdb can debug a core file instead of a running job. For
example, let's rerun the above debugging session, but this time
using the core file. My core file is called core.1288 (your number
is probably different), so I enter:
gdb BetaMiniApp core.1288
Then you can use (almost) all the same commands you used before.
To find out where the error occured:
(gdb) where
Again, you find the error in PExample, although this time it is
frame 7 instead of frame 0:
#7 0x08094710 in PExample::event (this=0x10226990, anEvent=0x11917b88)
at /afs/slac.stanford.edu/u/br/penguin/z24/BetaMiniUser/PExample.cc:133
So you look at frame 7:
(gdb) frame 7
#7 0x08094710 in PExample::event (this=0x10226990, anEvent=0x11917b88)
at /afs/slac.stanford.edu/u/br/penguin/z24/BetaMiniUser/PExample.cc:133
133 _pHisto->accumulate( trk->p() );
Current language: auto; currently c++
As before, you investigate the object "trk":
(gdb) print trk
$1 = (class BtaCandidate *) 0x12d35c20
But this time you can't print the track's momentum, because the
job is not running:
(gdb) print trk->p()
You can't do that without a process to debug.
Finally, you check the histogram and find your problem, as before:
(gdb) print _pHisto
$2 = (struct HepHistogram *) 0x0
Then exit gdb:
(gdb) quit
The debugger for Sun machines is called dbx.
If you have been following the Workbook, then you have probably
done all of your work so far on yakut, which is a scientific Linux
machine.
So before you can debug in Sun, you will need to compile
and link in Sun. Login to shire (a Sun machine), and then do:
ana26> srtpath
ana26> cleanarch
ana26> bsub -q bldrecoq -o all-Sun.log gmake all ROPT=-noOptimize-Debug
The cleanarch command removes older Sun libraries and binaries, if
there are any. Note, however, that it does not clean out any Linux
files. It cleans out only files for the current architecture (as set
by srtpath).
The commands and syntax for dbx are similar, but not identical, to those
used for gdb. Here are some of the most common (and platform-independent)
commands:
| Command |
Description |
| help |
Display general help (uses more) |
| help [command] |
Display help for command command |
| run [args] |
Start the program with argument list args |
| pathmap [path] |
Add path to the list of paths in which dbx will look for code |
| file [filename] |
Tells the debugger to look in file filename for code |
| list |
List lines of source code |
| print [x] |
Print the object x |
| stop in [foo] |
Set a break point at the beginning of function foo |
| stop at [line] |
Set a break point at line line |
| assign [x]=[y] |
Set variable x to be y (another variable or a number) |
| next |
Step to the next line (stepping over function calls) |
| step |
Step to the next line (stepping into functions) |
| cont |
Continue to the next stop (e.g. a break-point) |
| where |
Print the current activation levels of a program |
| quit |
Quit debugging session |
For more information, use "man dbx". (Sadly, there does not
appear to be an info page for dbx.)
The syntax of dbx is:
> dbx [object_file [corefile]]
The object_file is the name of the executable object file
that you want to debug. It provides the code that dbx
executes.
The following is an example on how to start a debugging section, set a
couple of breakpoints and print the value of a variable. For this, we
will use BetaMiniApp, which is the program running our
PExample.
workdir> dbx ../bin/$BFARCH/BetaMiniApp
This gives you an information screen, which you get rid
of by typing "q":
:q
More information scrolls by, until finally you get your dbx prompt:
(/opt/SUNWspro/bin/../WS6U1/bin/sparcv9/dbx)
From now on I will abbreviate the prompt to (dbx).
Now you are ready to start:
(dbx) pathmap ../BetaMiniUser/
(dbx) file PExample.cc
(dbx) stop in PExample::event
(2) stop in PExample::event(AbsEvent*)
This sets a break point at the function event of
PExample. Now when you run the program with dbx, it will stop
at the break point.
To run the program:
(dbx) run snippet.tcl
Then, after some initial output, you get your usual framework prompt,
and talk to KanEventInput:
> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
After some more output, the job stops with:
t@1 (l@1) stopped in PExample::event at line 123 in file "PExample.cc"
123 HepAList<BtaCandidate>* trkList =
Now let's explore:
(dbx) list +15
123 HepAList<BtaCandidate>* trkList =
124 Ifd<HepAList> BtaCandidate > <::get(anEvent, _btaChargedList.value());
125
126 //histogram number of tracks in event
127 _numTrkHisto->accumulate( trkList->length() );
128
129 // Loop over track candidates to plot momentum
130 HepAListIterator<BtaCandidate> iterTrk(*trkList);
131 BtaCandidate* trk;
132 while ( 0 != ( trk = iterTrk()) ) {
133 _pHisto->accumulate( trk->p() );
134 }
135
136 // done
137 return AppResult::OK;
(dbx) next
t@1 (l@1) stopped in PExample::event at line 127 in file "PExample.cc"
127 _numTrkHisto->accumulate( trkList->length() );
(dbx) print trkList
trkList = 0x15bf1290
(dbx) print *trkList
*trkList = {
/* try using "print -r" to see any inherited members */
}
(dbx) print -r *trkList
*trkList = {
HepAList<BtaCandidate>::HepAListBase::p = 0x15bf7258
HepAList<BtaCandidate>::HepAListBase::n = 10
HepAList<BtaCandidate>::HepAListBase::s = 14
}
(dbx) stop at 132
(3) stop at "PExample.cc":132
(dbx) cont
t@1 (l@1) stopped in PExample::event at line 132 in file "PExample.cc"
132 while ( 0 != ( trk = iterTrk()) ) {
(dbx) status
(2) stop in PExample::event(AbsEvent*)
*(3) stop at "PExample.cc":132
(dbx) next
t@1 (l@1) stopped in PExample::event at line 133 in file "PExample.cc"
133 _pHisto->accumulate( trk->p() );
(dbx) print trk->p()
trk->p() = 0.73818585099288
(dbx) delete 2
(dbx) status
(3) stop at "PExample.cc":132
(dbx) quit
GUI debuggers have different interfaces on each platform. They are
generally a wrapper on the debugger in use. They let you see the code
in context as you step through it.
GUI debuggers are nice because they provide a graphical user
interface to confusing debugging programs. On the other hand, they
can be CPU-intensive, so they can be very slow to run, especially
if you are not at SLAC.
The examples and images below were written for the pre-CM2 Workbook,
run with an early analysis release. However, they should work with the following
substitutions:
BetaUser -> BetaMiniUser
BetaApp -> BetaMiniApp
../BetaUser/bdbReco.tcl -> snippet.tcl
source ../BetaUser/myData.tcl ->
source $BFROOT/www/doc/workbook/NewExamples/FrmwkTCL/MyKanEventInput.tcl
ddd
ddd is a nice graphical debugger which is available on
Linux. To use it, basically follow the commands for workshop
below.
workshop
On Sun Solaris there is a nice GUI debugger called
workshop.
To access Workshop and all of its tools, at least on the SLAC
machines, add
/afs/slac/package/sunworkshop/test/SUNWspro/bin/
to your PATH.
> addpath PATH /afs/slac/package/sunworkshop/test/SUNWspro/bin/
To invoke it type from workdir:
> workshop &
An application toolbar will appear. The debugger is the button with a
bug icon, crossed out in red.
Select Debug on the toolbar either by pushing it or pressing
'Alt D'. Among the options select New Program.
The Debug New Program window will appear. In the
Name box type the name of the program you want to debug:
../bin/$BFARCH/BetaApp
and in the Arguments box specify the tcl file you want to use
(in this example, as before, we will use the bdbMicro.tcl file)
Press OK when ready.
The debugger will start loading the program:
and a new window will appear:
The loading is complete when you read Program loaded in the lower
left corner of this window.
An additional editor window will appear from which you can access the
code.
Let's set the breakpoint in event in PExample.cc.
Push Execute and choose Set Breakpoints .
Select Stop as the Action, In Function as
the Event item and type in the corresponding box
PExample::event.
Push Add to add this breakpoint.
Note: Occasionally, when I run workshop, I find that the
breakpoint window doesn't come up. Sometimes, starting over helps.
Or, if you stretch the main debugger window, you will uncover a pane
labeled "Dbx Commands:". You can issue any dbx
command in this window, such as:
(dbx) stop in PExample::event
(dbx) status
to set and examine breakpoints.
To run the program push the Start button .
To interrupt the program push at any time
The Input/Output window will now appear. To set the
collection and run the first event, type:
> source ../BetaUser/myData.tcl
> ev beg -nev 1
.
The program will then stop at the breakpoint in PExample and in the
Debugger Window you will see the status of the program:
.
By clicking on any of the blue program names in the lower window you
will load the corresponding source code in the Editor
Window.
To print the value of anEvent type anEvent in the
Expression box and press the Evaluate button. Do
the same with *anEvent.
.
We want now to set a breakpoint at line 148 of
PExample.cc. In the Breakpoints Window select At
Location for the Event item, type 148 in the
corresponding box and press the Add button.
.
Press the continue button (NOT !!) to continue the execution. The program should stop
at line 148 in PExample.cc and on the Editor Window the
cursor should be on the corresponding line of PExample.cc.
You can now move to the next statement by pushing the next button
. The editor window shows
you the breakpoints in rad and your current line in green.
.
Print the value of the track momentum by typing trk->p()
in the Expression box and pushing the Evaluate
button as before.
To remove a breakpoint select it in the Breakpoint Window and
push the Delete button. To Exit workshop press the
Workshop button on the Workshop Window and choose
Exit Workshop.
Author:
Massimiliano Turri
Contributors:
Art Snyder, James Weatherall
Joseph Perl
Jenny Williams
Last modification: 25 July 2005
Last significant update: 12 February 2003
|