DEV Community

drmikecrowe
drmikecrowe

Posted on • Originally published at pinnsg.com on

Databricks Multiple Filters using a Python Lambda statement

Image description

Simplifying Multiple Null Checks in Databricks

Recently, I ran into a case where I needed to check if 11 different fields were null. Yes, I could have used Copilot (or my preferred Codeium) to generate it for me, but I knew it had to be easier. There had to be an easier way…

The Scenario

You have a list of conditions in a python list (in my case, I pasted it as separate lines and did the following. Doesn’t matter how you specify your conditions, just that it’s a list somehow

conditions = """
condition 1
condition2
...".split("\n")

You have a way of associating this list with a set of Databricks columns

Using reduce from functools

Here’s what I found:

  • There’s a python function reduce in functools that takes a list a reduces it down to a result
  • You specify how this list is “reduced”, such as “and each element of this list together”

Example Implementation

Here’s how it works. first, I have an array of join_columns which are the common columns between two datasets. In my setup, each column of join_columns maps to the corresponding filter:

conditions = [F.col(join_columns[i]) == filter_cols[i] for i in range(2)]
Enter fullscreen mode Exit fullscreen mode

Now, we use the reduce function to combine these into a condition for filtering:

condition = functools.reduce(lambda a, b: a & b, conditions)
Enter fullscreen mode Exit fullscreen mode

Combining Conditions

So, in this case, we are saying all these conditions should be and’d together – or rather all of them should match. Naturally, had I wanted any of them to match, this would have have been:

condition = functools.reduce(lambda a, b: a | b, conditions)
Enter fullscreen mode Exit fullscreen mode

(the & is now a |)

Final Condition for Filtering

That’s it–I now have a condition that I can use:

res_df = input_df.filter(condition)
Enter fullscreen mode Exit fullscreen mode

Image of Datadog

Create and maintain end-to-end frontend tests

Learn best practices on creating frontend tests, testing on-premise apps, integrating tests into your CI/CD pipeline, and using Datadog’s testing tunnel.

Download The Guide

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay