When we use ADF to call Databricks we can pass parameters, nice. When we finish running the Databricks notebook we often want to return something back to ADF so ADF can do something with it. Think that Databricks might create a file with 100 rows in (actually big data 1,000 rows) and we then might want to move that file or write a log entry to say that 1,000 rows have been written.
So, in the Notebook we can exit using dbutils.notebook.exit('plain boring old string')
and in ADF we can retrieve that string using @activity('RunNotebookActivityName').output.runOutput
, that is runOutput, in this case, will be "plain boring old string".
Now, sometimes we want to do more exciting things like return a few pieces of information like a file name and a row count and while you could do something yucky like: dbutils.notebook.exit('plain boring old string, some other value, yuck, yuck, and yuck')
and then do a string split somewhere else down the line. Now, we can do better than this - there is a quirk of ADF, I won't say feature because if it was intentional then it is the only feature of ADF that isn't completly underwhelming. If we return JSON data in our string, ADF thinks that it is an object that we can query:
dbutils.notebook.exit('{"an_object": {"name": {"value": "exciting"}}}')
In ADF we can retrieve "exciting" using:
@activity('Run Notebook - JSON Response').output.runOutput.an_object.name.value
Now that is exciting, imagine if we had a dataset we wanted to return, we could use:
dbutils.notebook.exit(spark.sql('select id from range(100)').toJSON().collect())
** NOTE there is a 2 MB limit here so don't go over that **
This got me wondering what else we can do, how about returning a list?
dbutils.notebook.exit(['first', 'second', 3])
Wouldn't you believe that works like a charm?
@activity('Run Notebook - String List Response').output.runOutput[2]
I then wondered what would happen with a dict, now at this point I was beyond excited for what might happen:
dbutils.notebook.exit({'a_value': 'yo yo'})
However, ADF didn't like that, not one bit so returning a dict failed - however all is not lost, if you need to return a dict then you can use json.dumps()
and we already know we can access a JSON string as an object in the output of a notebook.
Hope it helps, this part .output.runOutput.
alone took me a little while to find out so hopefully no one else has to waste their time fiddling with ADF and notebooks.
Top comments (2)
Thanks for the sharing. Really saved my time.
.output.runOutput. is not working in synapse pipeline