Rasa 2.x gives us a lot of new features as conversational designers / developers. The coolest feature that isn't quite apparent is the ability to group and organize your training data any way you please.
At my company we build "abilities" for our bot - think of them similar to Alexa Skills. Essentially you have some domain information, nlu training data, stories and sometimes actions or forms to round out the ability. The more abilities the bot has, the longer and more complex your files get; which leads to a longer time to grock what you're looking at if you drop in to tweak something, let alone onboard a new developer.
The old days
Rasa 1.x allowed you to split up your nlu and stories files simply by creating a data/nlu
, and data/core
directory in your project and putting the individual files there. You can group your data into separate files which makes it easier to find something if/when you need to change something. For example if you needed to add new chit-chat training data, you could jump into data/nlu/chit-chat.md
and add new data. Initiating the rasa train
command utilizes the files in data/nlu
and data/core
in combination with domain.yml
in the root of your project to train your model.
This was great, but not ideal for me. I built a script to let me split my domain files in a similar way; creating a data/domain
directory and putting my files there. Rasa however, didn't recognize that directory, so I wrote a script to merge these files into a single domain.yml
file and drop it in the root. This allowed the rasa train
command to utilize my separate domain related files.
A New Organizational Paradigm
Rasa 2.x gives us the ability to split up our domain files and the benefit to that is clear; smaller files with more focused data. I also don't have to utilize my custom script now!
Why is this cool? To expand on my explanation above; if your bot can handle chit-chat, weather, restaurant search, and directions you would have a single long domain.yml
file in the root of your project with ALL of your intents, slots, entities, responses, action calls, and form config. Your topical data is interlaced, and it makes it hard to find things. Being able to split this into different files just makes more sense. (Thank you Rasa!)
Your new data directory structure can now change to -
data/core
data/domain
data/nlu
And each of these can contain multiple files that make up your bot's data. You can even do this with your action files.
Kick It Up A Notch!
Here's something that is an amazing side effect / undocumented feature of the way Rasa deals with training data in 2.x You can create directories under core
,domain
, and nlu
and Rasa will recurse down through looking for files during the training process.
I know you're asking - why is this awesome? In our case, as I said we build abilities, which are mostly isolated functions and conversational scenarios. In v1 we adopted a filename convention to differentiate between abilities. In v2, by exploiting this new directory structure we can have individual developers work on a single ability without stepping on the toes of other developers.
They can create a new ability directory - let's say they're working on a book recommendation ability. Our dev creates data/book-recommendation
and in that directory creates a domain.yml
, nlu.yml
, stories.yml
, rules.yml
and works solely from that directory. Fun fact, the filename doesn't matter. Each .yml
file is keyed - intents:
,nlu:
, even rules:
so it doesn't matter how many files you have, or what the names are, it all works!
If you decide to do this, you'll need to run rasa train
with the --domain
parameter so it will find your domain files
rasa train --domain data
If you leave off the --domain
parameter, Rasa will look for domain.yml
in the directory you're running it from so be sure to delete domain.yml
in the root of your project, or you may be quite confused why your latest changes aren't getting pulled in.
Don't Leave Actions Out In The Cold
You can also do this with your action.py
file, albeit in a different location and there's an extra file. We create an ability directory under actions/
, drop in an empty __init__.py
file (making python treat it as a package) then add an actions.py
file (or whatever filename you want)
In our book recommendation example we would have something like this:
actions/__init__.py
actions/book-recommendation
actions/book-recommendation/__init__.py
actions/book-recommendation/actions.py
Doing all of this directory organization centralizes the code and lets your developer spin up a local rasa init
project, and work on that ability from beginning to end, creating a very focused bot complete with tests. One caveat is if the ability being worked on is integrated with another ability in some way. Depending on the level of that reliability, you may think about whether the new code is actually a separate/new function as opposed to an extension of a current ability - but that's getting away from the main topic here.
In practice
We have plan to have a special repo of abilities, so when our devs are done they can just move their directories over issue a PR and that new ability will be available for everyone else on the team to pull down and add to their bot if needed.
But wait, there's more
Up until now, if your ability had any python related action code, you'd have 2 directories to manage. What if you could create a truly self-contained ability in one directory. Literally add one directory of files, retrain and have a new ability in your bot?
You can.
To achieve this, we'll move the action files into our data/book-recommendation
directory. There's some setup to do this however.
Remember the __init__.py
files we've been dropping all over? Python uses those to detect if a directory is loadable (a package).
To get our all-in-one setup we'll need to drop an __init__.py
into data
and then move our __init__.py
and the specific actions.py
file from our actions' ability folder into our data's ability directory. This way everything is 100% self-contained in one single directory like this:
data/__init__.py
data/book-recommendation
data/book-recommendation/__init__.py
data/book-recommendation/actions.py
data/book-recommendation/stories.yml
data/book-recommendation/domain.yml
data/book-recommendation/nlu.yml
The trick here is to run your action server with the --actions
parameter like this:
rasa run actions --actions data
This tells rasa to load the actions files from your data directory, and it will recurse down and load any python files it finds.
As noted above, you'll also need to run rasa train
with the --data
parameter like this:
rasa train --domain data
That will tell Rasa where your domain files are.
Parting Thoughts
I think this is a pretty cool advancement in the ability (no pun intended) to organize our data, streamline our development process and allow a very interesting approach to developing different independent functions.
I'm not sure I like the python being intermingled in the same directory as my .yml
files, it feels a little gross, but I supposed I could also create a data/book-recommendation/actions
directory to move out all the python other than the __init__.py
file of course. Or maybe even go crazy with
/data/book-recommendation/actions
/data/book-recommendation/data
OR even rename our data
directory and create something like this:
/abilities/book-recommendation/actions
/abilities/book-recommendation/data
If you do something crazy like this be sure to alter your --data and --action parameters when firing up Rasa!
Those both feel a little over the top, but the point is the possibilities are endless and you have the ability to organize your files however makes sense to you.
I'll continue iterating on this approach. I'm interested in knowing what others are doing to organize their data. The single file system works for smaller / simpler bots, but anything with some robustness will quickly outgrow that model (pun intended).
Let me know what you think in the comments!
Top comments (1)
Thank you so much for your such a helpful post.