Monday, September 10, 2012

Book review: The Art of Debugging with GDB, DDD and Eclipse (Chapter 4).

When a program crashes

Bugs are often associated with the mishandling or misuse of pointers. This is what this chapter is about.

Background Material: Memory Management

By far the most common cause of a crash is for a program to attempt to access a memory location without having the permission to do so. In such a case, the OS will announce that the program has caused a segmentation fault. On Windows, the corresponding term is a general protection fault. The role played by virtual memory and how virtual memory issues relate to segfaults is the focus of this chapter.

On Unix platforms, a program's set of allocated virtual addresses is typically laid out something like:

  • Text section (.text): the machine instructions produced by the compiler from your program's source code.
  • Data section: all the program variables that are allocated at compile time (global variables). It consists of two subsections: .data with initialized variables, and .bss for uninitialized data
  • Heap: area for additional memory requests from the operating system via calls to for example malloc() or new.
  • Stack: space for dynamically allocated data like function calls. The stack grows and shrinks each time a function is called and returns.

If w is the word size of your machine in bits, then the virtual address space has a range from 0 to 2^w-1. Your program typically uses only a tiny fraction of this, and the OS may also reserve part for its own work.

The virtual address space is viewed as organized into chunks called pages. Physical memory (RAM, ROM) is also viewed as divided into pages. Some of the pages of a program are stored in pages of physical memory by the OS (these pages are resident, the rest is stored on disk). During program execution, program pages must be brought into and out of memory. To manage this, the OS maintains a page table for each process. Each of the process' virtual pages has an entry in the table with the following information:

  • The current physical location of this page in memory or on disk
  • Permissions: read, write, execute, for this page
Note that the OS will not allocate partial pages to a program.

When we run a program, the OS creates a page table that it uses to manage the virtual memory of the process that executes the program code. As the program executes, it will continually access its various sections and the page table will be consulted. Permissions from the page table are important here. The addresses a program generates are virtual and will be converted to a virtual page number v, which is checked in the page table against its permissions.

Important remark for debugging: from the absence of a seg fault, we can't conclude that a memory operation is correct. Seg faults do not always occur in situations where you might expect them to. This is somehow linked to the concept of page-size and how your variables are aligned in these pages.

Signals indicate exceptional conditions and are reported during program execution to allow the OS (or your own code) to react to a variety of events. Each signal has its own default signal handler, which is a function that is called when that particular signal is raised on a process. You can also write your own handlers, but these may cause complications when using GDB/DDD/Eclipse.

When program commits memory-access violation, a SIGSEGV signal is raised on the process and a core file is written to disk.

To tell GDB not to stop when certain signals occur, use the handle command.

Other sources of crashes besides segmentation faults:

  • Floating-Point Exceptions (SIGFPE)
  • bus error (SIGBUS): accessing a physical address that does not exist or pointer errors

Core Files

If a core file is created during a run of your program, you can open your debugger on it and then proceed with your usual GDB operations.

Core files contain a detailed description of the program's state when it died. The Unix command file helpfully tells you the name of the executable that dumped a particular core file.

The writing of core files may be suppressed by your shell. In bash, you can control the creation of core files with the ulimit command:

  ulimit -c n
where n is the maximum size for a core file, in kilobytes.

Extended example

An extended debugging example was given in this subsection and I refer to the book for the details. Most important is that this example emcompasses many aspects of debugging:
  • The Principle of Confirmation
  • Using core files for post-mortem analysis of a process that crashed
  • Correcting, compiling, and re-running a program without ever leaving GDB
  • The inadequacy of printf()-style debugging
  • Using good old fashioned brain power

No comments:

Post a Comment