src/third_party/llvm-project/lldb/www/scripting.html - cobalt - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml">
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
 <link href="style.css" rel="stylesheet" type="text/css" />
 <title>LLDB Example - Python Scripting to Debug a Problem</title>
 </head>

 <body>
     <div class="www_title">
       Example - Using Scripting and Python to Debug in LLDB
     </div>

 <div id="container">
 	<div id="content">
          <!--#include virtual="sidebar.incl"-->
 		<div id="middle">
 			<div class="post">
 				<h1 class ="postheader">Introduction</h1>
 				<div class="postcontent">

                     <p>LLDB has been structured from the beginning to be scriptable in two ways
                     -- a Unix Python session can initiate/run a debug session non-interactively
                     using LLDB; and within the LLDB debugger tool, Python scripts can be used to
                     help with many tasks, including inspecting program data, iterating over
                     containers and determining if a breakpoint should stop execution or continue.
                     This document will show how to do some of these things by going through an
                     example, explaining how to use Python scripting to find a bug in a program
                     that searches for text in a large binary tree.</p>

 				</div>
 				<div class="postfooter"></div>

 			<div class="post">
 				<h1 class ="postheader">The Test Program and Input</h1>
 				<div class="postcontent">

                     <p>We have a simple C program (dictionary.c) that reads in a text file, and
                     stores all the words from the file in a Binary Search Tree, sorted
                     alphabetically.  It then enters a loop prompting the user for a word, searching
                     for the word in the tree (using Binary Search), and reporting to the user
                     whether or not it found the word in the tree.</p>

                     <p>The input text file we are using to test our program contains the text for
                     William Shakespeare's famous tragedy "Romeo and Juliet".</p>

 				</div>
 				<div class="postfooter"></div>

     			<div class="post">
     				<h1 class ="postheader">The Bug</h1>
     				<div class="postcontent">

 		   <p>When we try running our program, we find there is a problem.  While it
                    successfully finds some of the words we would expect to find, such as "love"
                    or "sun", it fails to find the word "Romeo", which MUST be in the input text
                    file:</p>

                    <code color=#ff0000>
                    % ./dictionary Romeo-and-Juliet.txt<br>
                    Dictionary loaded.<br>
                    Enter search word: love<br>
                    Yes!<br>
                    Enter search word: sun<br>
                    Yes!<br>
                    Enter search word: Romeo<br>
                    No!<br>
                    Enter search word: ^D<br>
                    %<br>
                    </code>

 				</div>
 				<div class="postfooter"></div>


     			<div class="post">
     				<h1 class ="postheader">Is the word in our tree: Using Depth First Search</h1>
     				<div class="postcontent">

                    <p>Our first job is to determine if the word "Romeo" actually got inserted into
                    the tree or not.  Since "Romeo and Juliet" has thousands of words, trying to
                    examine our binary search tree by hand is completely impractical.  Therefore we
                    will write a Python script to search the tree for us.  We will write a recursive
                    Depth First Search function that traverses the entire tree searching for a word,
                    and maintaining information about the path from the root of the tree to the
                    current node.  If it finds the word in the tree, it returns the path from the
                    root to the node containing the word.  This is what our DFS function in Python
                    would look like, with line numbers added for easy reference in later
                    explanations:</p>

                    <code>
 <pre><tt>
  1: def DFS (root, word, cur_path):
  2:     root_word_ptr = root.GetChildMemberWithName ("word")
  3:     left_child_ptr = root.GetChildMemberWithName ("left")
  4:     right_child_ptr = root.GetChildMemberWithName ("right")
  5:     root_word = root_word_ptr.GetSummary()
  6:     end = len (root_word) - 1
  7:     if root_word[0] == '"' and root_word[end] == '"':
  8:         root_word = root_word[1:end]
  9:     end = len (root_word) - 1
 10:     if root_word[0] == '\'' and root_word[end] == '\'':
 11:        root_word = root_word[1:end]
 12:     if root_word == word:
 13:         return cur_path
 14:     elif word < root_word:
 15:         if left_child_ptr.GetValue() == None:
 16:             return ""
 17:         else:
 18:             cur_path = cur_path + "L"
 19:             return DFS (left_child_ptr, word, cur_path)
 20:     else:
 21:         if right_child_ptr.GetValue() == None:
 22:             return ""
 23:         else:
 24:             cur_path = cur_path + "R"
 25:             return DFS (right_child_ptr, word, cur_path)
 </tt></pre>
                    </code>

 				</div>
 				<div class="postfooter"></div>


     			<div class="post">
     				<h1 class ="postheader"><a name="accessing-variables">Accessing & Manipulating <strong>Program</strong> Variables in Python</a>
 </h1>
     				<div class="postcontent">

                    <p>Before we can call any Python function on any of our program's variables, we
                    need to get the variable into a form that Python can access.  To show you how to
                    do this we will look at the parameters for the DFS function.  The first
                    parameter is going to be a node in our binary search tree, put into a Python
                    variable.  The second parameter is the word we are searching for (a string), and
                    the third parameter is a string representing the path from the root of the tree
                    to our current node.</p>

                    <p>The most interesting parameter is the first one, the Python variable that
                    needs to contain a node in our search tree. How can we take a variable out of
                    our program and put it into a Python variable?  What kind of Python variable
                    will it be?  The answers are to use the LLDB API functions, provided as part of
                    the LLDB Python module.  Running Python from inside LLDB, LLDB will
                    automatically give us our current frame object as a Python variable,
                    "lldb.frame".  This variable has the type "SBFrame" (see the LLDB API for
                    more information about SBFrame objects).  One of the things we can do with a
                    frame object, is to ask it to find and return its local variable.  We will call
                    the API function "FindVariable" on the lldb.frame object to give us our
                    dictionary variable as a Python variable:</p>

                    <code>
                       root = lldb.frame.FindVariable ("dictionary")
                    </code>

                    <p>The line above, executed in the Python script interpreter in LLDB, asks the
                    current frame to find the variable named "dictionary" and return it.  We then
                    store the returned value in the Python variable named "root".  This answers the
                    question of HOW to get the variable, but it still doesn't explain WHAT actually
                    gets put into "root".  If you examine the LLDB API, you will find that the
                    SBFrame method "FindVariable" returns an object of type SBValue. SBValue
                    objects are used, among other things, to wrap up program variables and values.
                    There are many useful methods defined in the SBValue class to allow you to get
                    information or children values out of SBValues.  For complete information, see
                    the header file <a href="http://llvm.org/svn/llvm-project/lldb/trunk/include/lldb/API/SBValue.h">SBValue.h</a>.  The
                    SBValue methods that we use in our DFS function are
                    <code>GetChildMemberWithName()</code>,
                    <code>GetSummary()</code>, and <code>GetValue()</code>.</p>

 				</div>
 				<div class="postfooter"></div>


     			<div class="post">
     				<h1 class ="postheader">Explaining Depth First Search Script in Detail</h1>
     				<div class="postcontent">

                    <p><strong>"DFS" Overview.</strong>  Before diving into the details of this
                    code, it would be best to give a high-level overview of what it does.  The nodes
                    in our binary search tree were defined to have type <code>tree_node *</code>,
                    which is defined as:

                    <code>
 <pre><tt>typedef struct tree_node
 {
   const char *word;
   struct tree_node *left;
   struct tree_node *right;
 } tree_node;</tt></pre></code>

                    <p>Lines 2-11 of DFS are getting data out of the current tree node and getting
                    ready to do the actual search; lines 12-25 are the actual depth-first search.
                    Lines 2-4 of our DFS function get the <code>word</code>, <code>left</code> and
                    <code>right</code> fields out of the current node and store them in Python
                    variables.  Since <code>root_word_ptr</code> is a pointer to our word, and we
                    want the actual word, line 5 calls <code>GetSummary()</code> to get a string
                    containing the value out of the pointer.  Since <code>GetSummary()</code> adds
                    quotes around its result, lines 6-11 strip surrounding quotes off the word.</p>

                    <p>Line 12 checks to see if the word in the current node is the one we are
                    searching for.  If so, we are done, and line 13 returns the current path.
                    Otherwise, line 14 checks to see if we should go left (search word comes before
                    the current word).  If we decide to go left, line 15 checks to see if the left
                    pointer child is NULL ("None" is the Python equivalent of NULL). If the left
                    pointer is NULL, then the word is not in this tree and we return an empty path
                    (line 16).   Otherwise, we add an "L" to the end of our current path string, to
                    indicate we are going left (line 18), and then recurse on the left child (line
                    19).  Lines 20-25 are the same as lines 14-19, except for going right rather
                    than going left.</p>

                    <p>One other note:  Typing something as long as our DFS function directly into
                    the interpreter can be difficult, as making a single typing mistake means having
                    to start all over.  Therefore we recommend doing as we have done:  Writing your
                    longer, more complicated script functions in a separate file (in this case
                    tree_utils.py) and then importing it into your LLDB Python interpreter.</p>

 				</div>
 				<div class="postfooter"></div>


     			<div class="post">
     				<h1 class ="postheader">Seeing the DFS Script in Action</h1>
     				<div class="postcontent">


                    <p>At this point we are ready to use the DFS function to see if the word "Romeo"
                    is in our tree or not.  To actually use it in LLDB on our dictionary program,
                    you would do something like this:</p>

                    <code>
                      % <strong>lldb</strong><br>
                      (lldb) <strong>process attach -n "dictionary"</strong><br>
                      Architecture set to: x86_64.<br>
                      Process 521 stopped<br>
                      * thread #1: tid = 0x2c03, 0x00007fff86c8bea0 libSystem.B.dylib`read$NOCANCEL + 8, stop reason = signal SIGSTOP<br>
                      frame #0: 0x00007fff86c8bea0 libSystem.B.dylib`read$NOCANCEL + 8<br>
                      (lldb) <strong>breakpoint set -n find_word</strong><br>
                      Breakpoint created: 1: name = 'find_word', locations = 1, resolved = 1<br>
                      (lldb) <strong>continue</strong><br>
                      Process 521 resuming<br>
                      Process 521 stopped<br>
                      * thread #1: tid = 0x2c03, 0x0000000100001830 dictionary`find_word + 16 <br>
                      at dictionary.c:105, stop reason = breakpoint 1.1<br>
                      frame #0: 0x0000000100001830 dictionary`find_word + 16 at dictionary.c:105<br>
                      102 int<br>
                      103 find_word (tree_node *dictionary, char *word)<br>
                      104 {<br>
                      -> 105   if (!word || !dictionary)<br>
                      106     return 0;<br>
                      107 <br>
                      108   int compare_value = strcmp (word, dictionary->word);<br>
                      (lldb) <strong>script</strong><br>
                      Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.<br>
                      >>> <strong>import tree_utils</strong><br>
                      >>> <strong>root = lldb.frame.FindVariable ("dictionary")</strong><br>
                      >>> <strong>current_path = ""</strong><br>
                      >>> <strong>path = tree_utils.DFS (root, "Romeo", current_path)</strong><br>
                      >>> <strong>print path</strong><br>
                      LLRRL<br>
                      >>> <strong>^D</strong><br>
                      (lldb) <br>
                    </code>

                    <p>The first bit of code above shows starting lldb, attaching to the dictionary
                    program, and getting to the find_word function in LLDB.  The interesting part
                    (as far as this example is concerned) begins when we enter the
                    <code>script</code> command and drop into the embedded interactive Python
                    interpreter.  We will go over this Python code line by line.  The first line</p>

                    <code>
                      import tree_utils
                    </code>

                    <p>imports the file where we wrote our DFS function, tree_utils.py, into Python.
                    Notice that to import the file we leave off the ".py" extension.  We can now
                    call any function in that file, giving it the prefix "tree_utils.", so that
                    Python knows where to look for the function. The line</p>

                    <code>
                      root = lldb.frame.FindVariable ("dictionary")
                    </code>

                    <p>gets our program variable "dictionary" (which contains the binary search
                    tree) and puts it into the Python variable "root".  See
                    <a href="#accessing-variables">Accessing & Manipulating Program Variables in Python</a>
                    above for more details about how this works. The next line is</p>

                    <code>
                      current_path = ""
                    </code>

                    <p>This line initializes the current_path from the root of the tree to our
                    current node.  Since we are starting at the root of the tree, our current path
                    starts as an empty string.  As we go right and left through the tree, the DFS
                    function will append an 'R' or an 'L' to the current path, as appropriate. The
                    line</p>

                    <code>
                      path = tree_utils.DFS (root, "Romeo", current_path)
                    </code>

                    <p>calls our DFS function (prefixing it with the module name so that Python can
                    find it).  We pass in our binary tree stored in the variable <code>root</code>,
                    the word we are searching for, and our current path.  We assign whatever path
                    the DFS function returns to the Python variable <code>path</code>.</p>


                    <p>Finally, we want to see if the word was found or not, and if so we want to
                    see the path through the tree to the word. So we do</p>

                    <code>
                      print path
                    </code>

                    <p>From this we can see that the word "Romeo" was indeed found in the tree, and
                    the path from the root of the tree to the node containing "Romeo" is
                    left-left-right-right-left.</p>

 				</div>
 				<div class="postfooter"></div>


     			<div class="post">
     				<h1 class ="postheader">What next?  Using Breakpoint Command Scripts...</h1>
     				<div class="postcontent">

                    <p>We are halfway to figuring out what the problem is.  We know the word we are
                    looking for is in the binary tree, and we know exactly where it is in the binary
                    tree.  Now we need to figure out why our binary search algorithm is not finding
                    the word.  We will do this using breakpoint command scripts.</p>


                    <p>The idea is as follows.  The binary search algorithm has two main decision
                    points:  the decision to follow the right branch; and, the decision to follow
                    the left branch.  We will set a breakpoint at each of these decision points, and
                    attach a Python breakpoint command script to each breakpoint.  The breakpoint
                    commands will use the global <code>path</code> Python variable that we got from
                    our DFS function. Each time one of these decision breakpoints is hit, the script
                    will compare the actual decision with the decision the front of the
                    <code>path</code> variable says should be made (the first character of the
                    path).  If the actual decision and the path agree, then the front character is
                    stripped off the path, and execution is resumed.  In this case the user never
                    even sees the breakpoint being hit.  But if the decision differs from what the
                    path says it should be, then the script prints out a message and does NOT resume
                    execution, leaving the user sitting at the first point where a wrong decision is
                    being made.</p>

 				</div>
 				<div class="postfooter"></div>


     			<div class="post">
     				<h1 class ="postheader">Side Note: Python Breakpoint Command Scripts are NOT What They Seem</h1>
     				<div class="postcontent">

 				</div>
 				<div class="postfooter"></div>

                    <p>What do we mean by that?  When you enter a Python breakpoint command in LLDB,
                    it appears that you are entering one or more plain lines of Python. BUT LLDB
                    then takes what you entered and wraps it into a Python FUNCTION (just like using
                    the "def" Python command).   It automatically gives the function an obscure,
                    unique, hard-to-stumble-across function name, and gives it two parameters:
                    <code>frame</code> and <code>bp_loc</code>.  When the breakpoint gets hit, LLDB
                    wraps up the frame object where the breakpoint was hit, and the breakpoint
                    location object for the breakpoint that was hit, and puts them into Python
                    variables for you.  It then calls the Python function that was created for the
                    breakpoint command, and passes in the frame and breakpoint location objects.</p>

                    <p>So, being practical, what does this mean for you when you write your Python
                    breakpoint commands?  It means that there are two things you need to keep in
                    mind: 1. If you want to access any Python variables created outside your script,
                    <strong>you must declare such variables to be global</strong>.  If you do not
                    declare them as global, then the Python function will treat them as local
                    variables, and you will get unexpected behavior.  2. <strong>All Python
                    breakpoint command scripts automatically have a <code>frame</code> and a
                    <code>bp_loc</code> variable.</strong>  The variables are pre-loaded by LLDB
                    with the correct context for the breakpoint.  You do not have to use these
                    variables, but they are there if you want them.</p>

 				</div>
 				<div class="postfooter"></div>


     			<div class="post">
     				<h1 class ="postheader">The Decision Point Breakpoint Commands</h1>
     				<div class="postcontent">

                    <p>This is what the Python breakpoint command script would look like for the
                    decision to go right:<p>

 <code><pre><tt>
 global path
 if path[0] == 'R':
     path = path[1:]
     thread = frame.GetThread()
     process = thread.GetProcess()
     process.Continue()
 else:
     print "Here is the problem; going right, should go left!"
 </tt></pre></code>

                    <p>Just as a reminder, LLDB is going to take this script and wrap it up in a
                    function, like this:</p>

 <code><pre><tt>
 def some_unique_and_obscure_function_name (frame, bp_loc):
     global path
     if path[0] == 'R':
         path = path[1:]
         thread = frame.GetThread()
         process = thread.GetProcess()
         process.Continue()
     else:
         print "Here is the problem; going right, should go left!"
 </tt></pre></code>

                    <p>LLDB will call the function, passing in the correct frame and breakpoint
                    location whenever the breakpoint gets hit.  There are several things to notice
                    about this function.  The first one is that we are accessing and updating a
                    piece of state (the <code>path</code> variable), and actually conditioning our
                    behavior based upon this variable.  Since the variable was defined outside of
                    our script (and therefore outside of the corresponding function) we need to tell
                    Python that we are accessing a global variable. That is what the first line of
                    the script does.  Next we check where the path says we should go and compare it to
                    our decision (recall that we are at the breakpoint for the decision to go
                    right). If the path agrees with our decision, then  we strip the first character
                    off of the path.</p>

                    <p>Since the decision matched the path, we want to resume execution.  To do this
                    we make use of the <code>frame</code> parameter that LLDB guarantees will be
                    there for us.  We use LLDB API functions to get the current thread from the
                    current frame, and then to get the process from the thread.  Once we have the
                    process, we tell it to resume execution (using the <code>Continue()</code> API
                    function).</p>

                    <p>If the decision to go right does not agree with the path, then we do not
                    resume execution.  We allow the breakpoint to remain stopped (by doing nothing),
                    and we print an informational message telling the user we have found the
                    problem, and what the problem is.</p>

 				</div>
 				<div class="postfooter"></div>

     			<div class="post">
     				<h1 class ="postheader">Actually Using the Breakpoint Commands</h1>
     				<div class="postcontent">

                    <p>Now we will look at what happens when we actually use these breakpoint
                    commands on our program.  Doing a <code>source list -n find_word</code> shows
                    us the function containing our two decision points.  Looking at the code below,
                    we see that we want to set our breakpoints on lines 113 and 115:</p>

 <code><pre><tt>
 (lldb) source list -n find_word
 File: /Volumes/Data/HD2/carolinetice/Desktop/LLDB-Web-Examples/dictionary.c.
 101
 102 int
 103 find_word (tree_node *dictionary, char *word)
 104 {
 105   if (!word || !dictionary)
 106     return 0;
 107
 108   int compare_value = strcmp (word, dictionary->word);
 109
 110   if (compare_value == 0)
 111     return 1;
 112   else if (compare_value < 0)
 113     return find_word (dictionary->left, word);
 114   else
 115     return find_word (dictionary->right, word);
 116 }
 117
 </tt></pre></code>

                    <p>So, we set our breakpoints, enter our breakpoint command scripts, and see
                    what happens:<p>

 <code><pre><tt>
 (lldb) breakpoint set -l 113
 Breakpoint created: 2: file ='dictionary.c', line = 113, locations = 1, resolved = 1
 (lldb) breakpoint set -l 115
 Breakpoint created: 3: file ='dictionary.c', line = 115, locations = 1, resolved = 1
 (lldb) breakpoint command add -s python 2
 Enter your Python command(s). Type 'DONE' to end.
 > global path
 > if (path[0] == 'L'):
 >     path = path[1:]
 >     thread = frame.GetThread()
 >     process = thread.GetProcess()
 >     process.Continue()
 > else:
 >     print "Here is the problem. Going left, should go right!"
 > DONE
 (lldb) breakpoint command add -s python 3
 Enter your Python command(s). Type 'DONE' to end.
 > global path
 > if (path[0] == 'R'):
 >     path = path[1:]
 >     thread = frame.GetThread()
 >     process = thread.GetProcess()
 >     process.Continue()
 > else:
 >     print "Here is the problem. Going right, should go left!"
 > DONE
 (lldb) continue
 Process 696 resuming
 Here is the problem. Going right, should go left!
 Process 696 stopped
 * thread #1: tid = 0x2d03, 0x000000010000189f dictionary`find_word + 127 at dictionary.c:115, stop reason = breakpoint 3.1
   frame #0: 0x000000010000189f dictionary`find_word + 127 at dictionary.c:115
     112   else if (compare_value < 0)
     113     return find_word (dictionary->left, word);
     114   else
  -> 115     return find_word (dictionary->right, word);
     116 }
     117
     118 void
 (lldb)
 </tt></pre></code>


                    <p>After setting our breakpoints, adding our breakpoint commands and continuing,
                    we run for a little bit and then hit one of our breakpoints, printing out the
                    error message from the breakpoint command.  Apparently at this point in the
                    tree, our search algorithm decided to go right, but our path says the node we
                    want is to the left. Examining the word at the node where we stopped, and our
                    search word, we see:</p>

                    <code>
                      (lldb) expr dictionary->word<br>
                      (const char *) $1 = 0x0000000100100080 "dramatis"<br>
                      (lldb) expr word<br>
                      (char *) $2 = 0x00007fff5fbff108 "romeo"<br>
                    </code>

                    <p>So the word at our current node is "dramatis", and the word we are searching
                    for is "romeo".  "romeo" comes after "dramatis" alphabetically, so it seems like
                    going right would be the correct decision.  Let's ask Python what it thinks the
                    path from the current node to our word is:</p>

                    <code>
                      (lldb) script print path<br>
                      LLRRL<br>
                    </code>

                    <p>According to Python we need to go left-left-right-right-left from our current
                    node to find the word we are looking for.  Let's double check our tree, and see
                    what word it has at that node:</p>

                    <code>
                      (lldb) expr dictionary->left->left->right->right->left->word<br>
                      (const char *) $4 = 0x0000000100100880 "Romeo"<br>
                    </code>

                    <p>So the word we are searching for is "romeo" and the word at our DFS location
                    is "Romeo".  Aha!  One is uppercase and the other is lowercase:  We seem to have
                    a case conversion problem somewhere in our program (we do).</p>

                    <p>This is the end of our example on how you might use Python scripting in LLDB
                    to help you find bugs in your program.</p>

 				</div>
 				<div class="postfooter"></div>

     			<div class="post">
     				<h1 class ="postheader">Source Files for The Example</h1>
     				<div class="postcontent">


                 </div>
           	    <div class="postfooter"></div>

                   <p> The complete code for the Dictionary program (with case-conversion bug),
                   the DFS function and other Python script examples (tree_utils.py) used for this
                   example are available via following file links:</p>

 <a href="http://llvm.org/svn/llvm-project/lldb/trunk/examples/scripting/tree_utils.py">tree_utils.py</a>  -  Example Python functions using LLDB's API, including DFS<br>
 <a href="http://llvm.org/svn/llvm-project/lldb/trunk/examples/scripting/dictionary.c">dictionary.c</a>  -  Sample dictionary program, with bug<br>

                     <p>The text for "Romeo and Juliet" can be obtained from the Gutenberg Project
                     (http://www.gutenberg.org).</p>
             </div>
       	</div>
 	</div>
 </div>
 </body>
 </html>