DEV Community

Yuval
Yuval

Posted on

Python - PDB usage and reproducing program execution

So imagine you have a Python program, and you want to inspect some parameters during an error.

There are many, possible, ways to do that;
I'd like to speak about a basic one, which involves debugger. Just like GDB for C/C++, Python has PDB.

PDB is command line debugger, which can be attached to process or started from within the process.

Just add the lines import pdb; pdb.set_trace() and you will have a shell where you can communicate with the process.

Needless to say, this is good only for CLI programs. Others, like servers, should have other solutions (Rookout etc.., PyCharm remote debugger etc..).

Let's say we run a program, which calls some_erroneous_function and we want to know some value from this function.
main() -> foo() -> some_erroneous_function()

how can we know the value inside some_erroneous_function()?
simple - add next line:

import pdb; pdb.set_trace()

Can't see value of a:
function raising exception

Do manage to see value of a:
adding pdb set_trace to get shell into program execution

What happens when program A runs program B?

When we have
main() -> bar() -> cli_app_bar.py -> some_erroneous_function(),

the import pdb; pdb.set_trace() trick simply doesn't work;
We get a stuck process instead. This is because the pdb opens in the child process, however the parent process is waiting for the child process the finish and we're stuck.

In this case, we should run child process ourselves.

What parts are required to run a child process ourself?

So there are 2 parts which are required; one is obvious, the other part is often forgotten!!
2 parts are:

  • program name + command like arguments
  • Environment variables!!
  • (there's a 3rd part which is IPC messages, but it's very hard to mimic such behavior...)

Let's see how do we capture this:

  • Modify program to save CLI arguments and env vars
  • Run using CLI and env vars

getting cmd + env vars

Several methods; getting env vars for a running process you could use cat /proc/46/environ | tr '\0' '\n' (replace 46 with process id)

From within Python process, we want to print env vars in "ready to go" format, eg with the export prefix:

with open('/tmp/params.txt', 'w') as fout:
    # print all env vars
    for k, v in os.environ.items():
        fout.write('export "%s"="%s"\n' % (k,v))
Enter fullscreen mode Exit fullscreen mode

And then diff with current env vars:

echo "creating bar before"
cat <<EOF > create_before.py
#!/usr/bin/python3
import os
with open('/tmp/params.before.txt', 'w') as fout:
    for k, v in os.environ.items():
        fout.write('export "%s"="%s"\n' % (k,v))
EOF

python create_before.py

echo "print some stats"
wc -l /tmp/params.txt /tmp/params.before.txt

echo "get keys"
cat /tmp/params.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.keys.txt
cat /tmp/params.before.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.before.keys.txt
wc -l /tmp/params.keys.txt /tmp/params.before.keys.txt
diff /tmp/params.keys.txt /tmp/params.before.keys.txt
Enter fullscreen mode Exit fullscreen mode

and we got the newly added env var key, EXTRA:

YuvShell $ diff /tmp/params.keys.txt /tmp/params.before.keys.txt
1d0
< export "EXTRA"
Enter fullscreen mode Exit fullscreen mode

Questions

Q: What is the "YuvShell"??
A: It's just me editing the ~/.bashrc and changing the PS1 (Prompt String) var;

changing bash shell prompt string

Q: What is the different between cat some_file.txt | wc -l and wc -l some_file.txt?
A: with cat + wc we use a pipe to transfer data from the cat output to the wc input; with wc only, we don't use the pipe.

Let's create some big file from urandom, and see time output of both options:

cat /dev/urandom | base64 | head -c 1GB > /tmp/random_1GB_file.txt

time cat /tmp/random_1GB_file.txt | wc -l
time wc -l /tmp/random_1GB_file.txt
Enter fullscreen mode Exit fullscreen mode

performance results of wc with and without pipe

Source Code

Top comments (0)