Python - PDB usage and reproducing program execution

So imagine you have a Python program, and you want to inspect some parameters during an error.

There are many, possible, ways to do that;
I'd like to speak about a basic one, which involves debugger. Just like GDB for C/C++, Python has PDB.

PDB is command line debugger, which can be attached to process or started from within the process.

Just add the lines import pdb; pdb.set_trace() and you will have a shell where you can communicate with the process.

Needless to say, this is good only for CLI programs. Others, like servers, should have other solutions (Rookout etc.., PyCharm remote debugger etc..).

Let's say we run a program, which calls some_erroneous_function and we want to know some value from this function.
main() -> foo() -> some_erroneous_function()

how can we know the value inside some_erroneous_function()?
simple - add next line:

import pdb; pdb.set_trace()

Can't see value of a:

Do manage to see value of a:

What happens when program A runs program B?

When we have
main() -> bar() -> cli_app_bar.py -> some_erroneous_function(),

the import pdb; pdb.set_trace() trick simply doesn't work;
We get a stuck process instead. This is because the pdb opens in the child process, however the parent process is waiting for the child process the finish and we're stuck.

In this case, we should run child process ourselves.

What parts are required to run a child process ourself?

So there are 2 parts which are required; one is obvious, the other part is often forgotten!!
2 parts are:

program name + command like arguments
Environment variables!!
(there's a 3rd part which is IPC messages, but it's very hard to mimic such behavior...)

Let's see how do we capture this:

Modify program to save CLI arguments and env vars
Run using CLI and env vars

getting cmd + env vars

Several methods; getting env vars for a running process you could use cat /proc/46/environ | tr '\0' '\n' (replace 46 with process id)

From within Python process, we want to print env vars in "ready to go" format, eg with the export prefix:

with open('/tmp/params.txt', 'w') as fout:
    # print all env vars
    for k, v in os.environ.items():
        fout.write('export "%s"="%s"\n' % (k,v))

And then diff with current env vars:

echo "creating bar before"
cat <<EOF > create_before.py
#!/usr/bin/python3
import os
with open('/tmp/params.before.txt', 'w') as fout:
    for k, v in os.environ.items():
        fout.write('export "%s"="%s"\n' % (k,v))
EOF

python create_before.py

echo "print some stats"
wc -l /tmp/params.txt /tmp/params.before.txt

echo "get keys"
cat /tmp/params.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.keys.txt
cat /tmp/params.before.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.before.keys.txt
wc -l /tmp/params.keys.txt /tmp/params.before.keys.txt
diff /tmp/params.keys.txt /tmp/params.before.keys.txt

and we got the newly added env var key, EXTRA:

YuvShell $ diff /tmp/params.keys.txt /tmp/params.before.keys.txt
1d0
< export "EXTRA"

Questions

Q: What is the "YuvShell"??
A: It's just me editing the ~/.bashrc and changing the PS1 (Prompt String) var;

Q: What is the different between cat some_file.txt | wc -l and wc -l some_file.txt?
A: with cat + wc we use a pipe to transfer data from the cat output to the wc input; with wc only, we don't use the pipe.

Let's create some big file from urandom, and see time output of both options:

cat /dev/urandom | base64 | head -c 1GB > /tmp/random_1GB_file.txt

time cat /tmp/random_1GB_file.txt | wc -l
time wc -l /tmp/random_1GB_file.txt

Source Code

	#!/usr/bin/python3
	import argparse
	import subprocess
	import os
	from err_module import some_erroneous_function

	def foo():
	some_erroneous_function()

	def bar():
	my_env = os.environ.copy()
	my_env["EXTRA"] = 'True'
	cmd = "python cli_app.py --run"
	cmds = cmd.split(' ')

	stdout, stderr = subprocess.Popen(cmds, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=my_env).communicate()
	print("stdout [%s], stderr [%s]" % (stdout, stderr))

	def do_main():
	parser = argparse.ArgumentParser()
	parser.add_argument('--foo', dest='foo', default=False, action='store_true')
	parser.add_argument('--bar', dest='bar', default=False, action='store_true')
	args = parser.parse_args()

	if args.foo:
	foo()
	if args.bar:
	bar()

	if __name__ == '__main__':
	do_main()

	#!/usr/bin/python3
	import random

	def some_erroneous_function():
	print("some_erroneous_function:: enter")
	a = random.randint(1,100)
	import pdb; pdb.set_trace() # adding PDB to open debugger
	raise Exception("some error")
	print("value of a: %d" % (a))

view raw err_module.py hosted with ❤ by GitHub

	#!/usr/bin/python3
	import sys, os
	from err_module import some_erroneous_function

	# code to save program execution parameters
	with open('/tmp/params.txt', 'w') as fout:
	# print all env vars
	for k, v in os.environ.items():
	fout.write('export "%s"="%s"\n' % (k, v))

	if __name__ == '__main__':
	if sys.argv[-1] == '--run' and os.environ.get('EXTRA') == 'True':
	some_erroneous_function()