So imagine you have a Python program, and you want to inspect some parameters during an error.
There are many, possible, ways to do that;
I'd like to speak about a basic one, which involves debugger. Just like GDB for C/C++, Python has PDB.
PDB is command line debugger, which can be attached to process or started from within the process.
Just add the lines import pdb; pdb.set_trace()
and you will have a shell where you can communicate with the process.
Needless to say, this is good only for CLI programs. Others, like servers, should have other solutions (Rookout etc.., PyCharm remote debugger etc..).
Let's say we run a program, which calls some_erroneous_function
and we want to know some value from this function.
main() -> foo() -> some_erroneous_function()
how can we know the value inside some_erroneous_function()
?
simple - add next line:
import pdb; pdb.set_trace()
What happens when program A runs program B?
When we have
main() -> bar() -> cli_app_bar.py -> some_erroneous_function()
,
the import pdb; pdb.set_trace()
trick simply doesn't work;
We get a stuck process instead. This is because the pdb opens in the child process, however the parent process is waiting for the child process the finish and we're stuck.
In this case, we should run child process ourselves.
What parts are required to run a child process ourself?
So there are 2 parts which are required; one is obvious, the other part is often forgotten!!
2 parts are:
- program name + command like arguments
- Environment variables!!
- (there's a 3rd part which is IPC messages, but it's very hard to mimic such behavior...)
Let's see how do we capture this:
- Modify program to save CLI arguments and env vars
- Run using CLI and env vars
getting cmd + env vars
Several methods; getting env vars for a running process you could use cat /proc/46/environ | tr '\0' '\n'
(replace 46 with process id)
From within Python process, we want to print env vars in "ready to go" format, eg with the export prefix:
with open('/tmp/params.txt', 'w') as fout:
# print all env vars
for k, v in os.environ.items():
fout.write('export "%s"="%s"\n' % (k,v))
And then diff with current env vars:
echo "creating bar before"
cat <<EOF > create_before.py
#!/usr/bin/python3
import os
with open('/tmp/params.before.txt', 'w') as fout:
for k, v in os.environ.items():
fout.write('export "%s"="%s"\n' % (k,v))
EOF
python create_before.py
echo "print some stats"
wc -l /tmp/params.txt /tmp/params.before.txt
echo "get keys"
cat /tmp/params.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.keys.txt
cat /tmp/params.before.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.before.keys.txt
wc -l /tmp/params.keys.txt /tmp/params.before.keys.txt
diff /tmp/params.keys.txt /tmp/params.before.keys.txt
and we got the newly added env var key, EXTRA
:
YuvShell $ diff /tmp/params.keys.txt /tmp/params.before.keys.txt
1d0
< export "EXTRA"
Questions
Q: What is the "YuvShell"??
A: It's just me editing the ~/.bashrc and changing the PS1 (Prompt String) var;
Q: What is the different between cat some_file.txt | wc -l
and wc -l some_file.txt
?
A: with cat + wc
we use a pipe to transfer data from the cat output to the wc input; with wc only, we don't use the pipe.
Let's create some big file from urandom, and see time
output of both options:
cat /dev/urandom | base64 | head -c 1GB > /tmp/random_1GB_file.txt
time cat /tmp/random_1GB_file.txt | wc -l
time wc -l /tmp/random_1GB_file.txt
Top comments (0)