In my previous post, I had a bit of a rant about people not learning the idiosyncracies of the language that is bash, and more generally those of shell languages as a whole, leading to a lot of frankly horrible scripting out in the wild.
I've written so much shell script - and put so much emphasis on clean, reusable code in shell - for the sake of a handful of key operations that just must be commands, where I would have rather been managing much nicer code.
So today I turn that right around: rather than try to apply clean code to shell scripts (and crash against the rocky shores of other devs' "but it's just a shell script"), I'm going to bring the best part of shells to Python: ShellPipe.py
The fact of the matter is, a lot of shell scripting is used to glue other tools together, and that's certainly where it excels. Python by contrast, like most other languages, requires some minor passing and tracking of outputs and inputs to achieve the same effects and, whilst generally more comfortable languages, aren't quite as eloquent to the task of unifying disparate, uninterfaceable tools.
For this reason, I have continued to write bash scripts as glue, rather than try to do that passing around. For that reason in turn, I have written extensive amounts of supporting bash that really should have either been written in another language, or dispensed with entirely were it not for the idiosyncracies.
On the last post, I got a comment from @xtofl indicating that they'd had a quick go at re-purposing the bitwise OR operator in Python into a pipe-like operator. They expanded on that technique in a later post with their proposition for chaining functions, pipe-style, which whilst intersting, does not meet my more basic sysadminy needs.
I remembered their little comment yesterday and decided to have a go at it myself.
I'm quite proud of myself. Though maybe I should feel gravely ashamed. I can now do this in a python script:
from shellpipe import sh
# Run a command
sh() | 'git clone https://github.com/taikedz/shellpipe'
# Chain commands, use strings or lists, embed quoted strings, print to console:
sh() | "find shellpipe/shellpipe" | 'grep -vE ".*\\.pyc"' | ['du', '-sh'] > 1
Predefine commands, use them in chains
DOCKER_PSA = "docker ps -a"
GREP = "grep {}"
def look_for(subname):
found = sh() | DOCKER_PSA | GREP.format(subname)
return str(found).split("\n")
I would have ideally wanted to do something like this:
mysql_result = sh(f'mysql {user} -p{pass} db') < """
CREATE TABLE ...
"""
which unfortunately is not possible whilst also ensuring each step is run immediately - the comparator needs to evaluate the left hand statement (LHS) entirely first, before the right hand (RHS) is checked. My current implementation runs the pipe step on-creation, which means the command itself is run before the "redirect" can be processed.
If I defer the execution until after the redirection is done (this was actually how the first implementation worked), I would have to do something like this:
mysql_result = (sh(f'mysql {user} -p{pass} db') < """
CREATE TABLE ...
""").run()
Which is much less elegant. Also, having the external script in an actual file is better practice in most setups so what I actually need to do with the current implmentation is
with open("script.mysql", "r") as fh:
mysql_result = sh(f'mysql {user} -p{pass} db', stdin=fh)
which is generally more reasonable, anyway. Don't hard-code other scripts in your program, store them neatly (he said, stuffing shell commands into a Python program).
What is this sorcery??
I have hijacked bitwise OR-ing. Or at least, I have for the purpose of my custom class, ShellPipe
(which is simply provided through sleight of assignment as sh = ShellPipe
).
What ShellPipe does is define its own __or__()
function, which is called any time it is placed in a x | y
operation in Python. Similar things exist for __and__
(the &
bitwise AND operator implementor) and __lt__
(the less-than operator implementor) so as to be able to use custom, complex classes as sortable items.
this.__or__(that)
normally should simply return an object of the same type as this
and that
, but we can abuse this a little by not requiring the one side to be of the same type as the other. Conceivably, we could return whatever we want.
When invoking x | y
, only the __or__()
of the object on the left hand side of the statement gets executed, and that pair then returns usually a new object that is the union of the two.
By invoking ShellPipe() | "a string"
, I capitalize on this by allowing ShellPipe
's function to see that on the other side of the operation there is a string, and so it wraps that in a ShellPipe(...)
of its own - and the result is that the string has become a runnable piece of code, in a way.
It looks like this (rather, it is exactly this):
def __or__(self, other):
our_out = None
if self.process:
our_out = self.process.stdout
if type(other) in (str,list,tuple):
other = ShellPipe(command_list=other, stdin=our_out)
return other
So what is happening when I invoke ShellPipe() | "cmd1" | "cmd2"
?
- In this case, the first LHS (an empty instance) doesn't do anything, as it was not built with a command (it could have been, twelve and two sixes as we say here)
- and it turns the RHS into a
ShellPipe("cmd1")
and returns it -cmd1
immediately executes as a result of being defined -
cmd1
is now the new LHS, and it keeps a hold of its output stream, passing it into the construction of the now-new RHS,ShellPipe("cmd2", stdin=cmd1_stdout)
And so on and so forth. Quite simple, really. Once the end of the chain is reached, the last item that was executed is also returned and so in
mypipe = sh() | "cmd1" | "cmd2" | "cmd3"
mypipe
is in fact the ShellPipe("cmd3")
object created by cmd2
It is the output of this last command that we can then inspect with mypipe.get_stdout()
But why??
Is this useful and better than using subprocess.Popen()
directly? It is certainly mostly syntactic sugar, and importing features from one language into another is not always the best answer, but my use cases have veered more towards "I want to use Python for most things, but there's that ONE tool that can only be used as a command." String and stream manipulation is easier in Python (once you need to manage context beyond a single line), and the rich typing experience - which allowed the __or__()
overloading in the first place - is better there than in shell scripts.
The downside of my implementation is that it runs each command entirely before passing on to the next one - if a command should produce a large amount of output, that would be stored to file descriptor (and likely thus in RAM) before being passed to the next command. Also, if several commands take a significant amount of time to run, this is not going to work well either.
But there are just those times, where that one tool that is available as a command only, and nobody has python-packaged for, is easier to just... use as a command.
My next programming session is likely to be converting all the scripts I once wrote heaps of bash and supporting bash for into Python scripts, using shellpipe to cover the command corner cases. I'll probably post a report here after I've done a couple to remark on the difference it has made - if any.
But if I consider . all the . bash code . I've written . where most of it . was just managing variables . for the sake of . a handful of . piped shell commands and clean code ...
... I feel vindicated. This is a good abomination 😌
Top comments (12)
This is neat, but I wonder if allowing bare strings for commands is an attractive nuisance, especially with the convenience of f-strings. If filenames may have spaces in,
f'cat {path}'
is going to need the same kind of quoting or escaping ascat $path
in bash, whereas['cat', path]
avoids that issue.I'm trying to think about how you could make it a proper pipeline, i.e. running commands in parallel, without sacrificing convenience for simple cases.
If you do any sort of shell scripting, it would simply be bad practice to not quote a variable. Unfortunately, I've seen plenty of native shell scripts where variables have been unquoted - invariably because of the author not quite knowing how shell variable substitution works.
That said, I didn't point it out in the write-up here, but this is also valid
(there's just the one example with
['du', '-sh']
in my post which I didn't call out....)As for the parallelization, I did start looking at threading at one point, along with generators, but I never really got much momentum on it, as other concerns drew my attention away after the first working implementation...
I think the tricky bit for running them in parallel is when do you stop and wait for the processes to complete? If you have
sh() | a | b
, the operator beforea
doesn't know if that's the end of the pipeline (so it should wait for a), or there's going to be another pipe (so it shouldn't).I think this is the same sort of thing xtofl was talking about below - if the pipeline is lazy, you need some kind of marker in the code to tell it to run everything to completion and check for errors.
In principle, if you have
mypipe = sh() | "A" | "B"
A's
__or__()
gets called telling it explicitly to build a new pipe with incoming BB then gets created, but its own
__or__()
is not called. From its point of view, it just exists, and is stored inmypipe
Implementing a
wait()
on that, which would probably accumulate until end of stream, could be an idea. Actually, now that I think about it, it does seem a little more complex than that.... hmmm.....Me looking at this thinking it's pseudocode:
OP: What ShellPipe does is define its ofn or() function
Me: oh hell yes
Oh wait you're another Edinburgh dev! waves from a social distance.
👋😷
Lovely!
Maybe the pipeline processes could be constructed by the or-chain, and 'ignited' by a sentinel.
sh() | "ls" | "sort" | go()
Indeed but if the goal is to "just run" that one command it becomes something like
sh() | "docker-compose down" | go()
Which works, but is a bit meh... and like I said, getting
<
was low priority given that you could in fact just use an actual file handle.One thing that is annoying is that
Popen()
expects a proper file descriptor instidn
- passing in aio.StringIO
object fails withMethodNotFoundError
onfileno()
I've overloaded OR here, I've overloaded
__gt__
andXOR__ge__
to write respectively the process's stdout and stderr to the console, or even a file, it might be time for me to stop... ;-)social distancing intensifies
I know a very similar project in Python called Plumbum. Maybe you can inspire there how some issues were solved by the author(s).