DEV Community

xtofl
xtofl

Posted on • Updated on

I want my Bash Pipe

TL; DR;

Pipelines as we know them from shell scripts are a powerful
mechanism to compose functionality.

This same convenience can be achieved with most programming
languages. In this post, I'll show how you can introduce
pipes into your favourite programming enviroment.

What?

Lots of platforms come with a way to build programs out of small blocks. In many contexts, there is a convenient way to do this functional composition. We can call these 'pipelines'.

This post came to being after a discussion on pipes in bash (slightly controversial analysis on why your bash scriptsare rubbish), but the domain is much larger than that.

So this is what we're talking about: a bash script composed of small commands connected through pipes.

cat urls.txt | sed -e "s-http://--" -e "s/\\.com$//g"  | grep ".com$"
Enter fullscreen mode Exit fullscreen mode

The pipe concept exists in different domains, too - sound processing is a nice example (image from gstreamer website)

gstreamer sound processing pipeline

Composition is the main driving goal of most programming languages. We chase some value through a function, and use its return value as input for another function.

std::vector<std::string> report;
for(const auto &url: read_lines("urls.txt")){
    const auto [protocol, domain, path] = explode(url);
    if (domain.ends_with(".com")){
        report.push_back(domain);
    }
}
Enter fullscreen mode Exit fullscreen mode

Clearly, the bash pipeline syntax is way easier to both write and read.

What if...

We could use the simplicity of the pipe syntax...

In functional programming, and in math, this concept is known as point-free function composition. You effectively define a pipeline of functions.

read_lines("urls.txt") \
    | explode_url \
    | tuple_get(1) \
    | filter(ends_with(".com"))
Enter fullscreen mode Exit fullscreen mode

But... how?

I'll be using Python for this demo, just for its succinctness. I may add C++ examples later - C++ allows us to add just a little more syntactic sugar with function overload resolution.

I'll start explaining what fluent interfaces are. Adding operator overloading to the mix, we'll end up with a nice pipeline construction syntax.

Fluent Interfaces

In most programming languages, you have a way to build fluent interfaces, allowing you to chain operations to an object together:

ServerBuilder()\
  .on_address([localhost])\
  .on_port(1050)\
  .with_database(that_db)\
  .that_handles("rq1", handle_rq1)\
  .that_quits_on("quit")\
  .build()
Enter fullscreen mode Exit fullscreen mode

This works by creating methods that return builders again (spot the recursion).

Operators

Now if your language allows you to override operators as well, you're golden1 You can create a class Pipeline and a pipe operator that extends a pipeline instance with an extra function.

class Pipeline:
    def __init__(self, functions = tuple()):
        self.functions = functions

    def __or__(self, f):
        return Pipeline(self.functions + (f,))

    def __call__(self, arg):
        return functools.reduce(
            lambda r, f: f(r),  # function
            self.functions,  # iterable
            arg)  # initializer

"""pipeline starting element"""
ID = Pipeline()
Enter fullscreen mode Exit fullscreen mode

Testing it

And that seems to be all.

It can be demonstrated easily with e.g. pytest . For the sake of this article, let's assume the inc and double functions to respectively increase and double a value.

def test_everything_starts_with_the_identity_function():
    assert all(ID(x) == x for x in (1, 2, "abcd", None))

def test_pipeline_steps_are_applied_in_order():
    pipeline = ID | inc | double
    assert pipeline(0) == (0+1) * 2
    assert pipeline(3) == (3+1) * 2
Enter fullscreen mode Exit fullscreen mode

But still... how???

Now let's explain this step by step.

Building the Pipeline

The Pipeline class is our container of functions to be composed. It does so by storing them in a tuple (self.functions). (As an aside, I prefer tuple rather than list for its immutability)

The module also adds a very first object we can use as a starting point for our construction - a pipeline hat does nothing but returning the identical element it receives. It is called ID, just like in the functional programming world.

Now our class has this special member __or__(self, f). Its sole purpose is to provide the 'pipe' syntax we know from shell scripting: p | inc | double; and in Python, this is achieved through operator overloading.

We could have created a custom name ('and_then') to achieve the same functionality:

    def and_then(self, f):
      return Pipeline(self.functions + (f,))

...
ID.and_then(double).and_then(inc)
Enter fullscreen mode Exit fullscreen mode

But choosing __or__ as a member name tells Python we want this to be used when a Pipeline object is or-ed/piped to a function.

Calling the Pipeline

Again, another special member: the __call__ function. You probably guessed it, but this is what makes objects behave like a function.

I have implemented it using functools.reduce, but you could just as well hand-code a loop feeding the first argument to tne first function, the return value to the next function, and so on.

Here, too, we could have called it something else, like invoke_with. A non-special-member pipeline would have looked like this:

ID.and_then(inc).and_then(double).invoke_with(10)
Enter fullscreen mode Exit fullscreen mode

But choosing __call__ tells Python to choose this method when the braces are used:

(ID | inc | double)(10)
Enter fullscreen mode Exit fullscreen mode

Injecting the first argument

What I would really want to write is something this:


twentytwo = echo(10) | inc | double
fourty_eight = echo("0x18") >> from_hex | double
Enter fullscreen mode Exit fullscreen mode

So we need another trick-helper class that reverses things:

class WithArg:
    def __init__(self, value):
        self.value = value
    def __call__(self, p: Pipeline):
        return p(self.value)
Enter fullscreen mode Exit fullscreen mode

Now we can write

WithArg(10)(ID | inc | double) == (10+1) * 2
Enter fullscreen mode Exit fullscreen mode

If we're willing to give up the | operator, we can drop the parentheses, too. This is due to operator precedence:

assert 1 + 2*3 == 7
assert 1 + 2 * 3 == 1 + (2*3)
assert True | False&True == False
assert True | False&True == True | (False&True)
Enter fullscreen mode Exit fullscreen mode

So we can use e.g. the multiplication operator for composition, and e.g. right shift for argument injection:

class Pipeline:
    ...
    def __mul__(self, f):
        return self | f

class WithArg:
    ...
    def __rshift__(self, p: Pipeline):
        return p(self.value)

assert WithArg(10) >> ID * inc * double == (10+1) * 2
Enter fullscreen mode Exit fullscreen mode

Limitations

The pipes you can build with the code in this article are nice, but lack still one important aspect of the ones you use on you shell. It has to do with breaking structure.

Take a look at this pipeline:

function top5 {
   grep "page" | awk '{print $5 " " $3 " " $9}' | sort -n | tail -5
}
Enter fullscreen mode Exit fullscreen mode

What we see here is a line-based grep and awk. But then comes sort. How is this different? Wel, it will swallow all of its input before generating output.

The python pipeline will allow us to compose functions that accept and return data in lock step. ID | grep("page") | print_elements(5, 3, 9) is going to process a single argument to produce a single value. How are we going to break free from that? Parts of the pipe need to be able to 'buffer' their input to produce one output (or a stream?) when the input stops.

Indeed, text based processing has two kinds of events: new-line, and end-of-file. As a matter of fact, all of these command line stream-processing tools are composed of a buffering/chunking part and a chunk-processing part. We just may use this knowledge to make our pipeline smarter. But not in this post.

Conclusion

Languages allow you to build your own set of constructs these days. We can make use of this to mimic notations from a known domain.

No language is perfect, though, but we can get close enough to be useful.

Top comments (2)

Collapse
 
bloodgain profile image
Cliff

Nice coverage of how to pull this off.

I will note that it turns out that this is non-trivial to do in C++ and maintain good compatibility and expected behavior. Jonathan Boccara has actually implemented a pretty robust pipes library. It's header-only (yay!), but it does require C++14:
github.com/joboccara/pipes

You will note that he chose to implement '>>=' as his pipe operator instead of '|', because it is highly discouraged to overload '|' in C++. I think this would be more accepted in Python, but even then there are valid concerns about order of operator execution.

If you want to nerd out about how Jonathan implemented his solution, he's got a series of articles about it on his blog, Fluent {C++}. Unlike some of his article series, it's not collected with links between them, as these are more development blog entries than concept entries, but here's part one about building composite reusable pipes:
fluentcpp.com/2019/09/17/composite...

Collapse
 
xtofl profile image
xtofl

Thanks! Cross references are always a help. I get his newsletter, but somehow I forgot he had blogged about pipes. The brief implementation I showed in Python is also very limited; in order to allow fan-out/fan-in it'll need more code.

Indeed, operator| should do 'the expected'. In C++, >> has been around since the iostreams library, so we are more used to that. (if you dig deeper into the origin of this operator, you end up with bit shifting, too...).

The >>= refers nicely to Haskell's composition :).

I do notice a lot more enthousiasm when showing people "this is bash; look how similar that functional programming concept is", than "this is haskell; look how you do this in your world". That was my reason to do the wrong thing.