Unknownerror-404

Posted on Jan 3

Understanding YAML

#llm #chatbot #yaml #rasa

Following up Understanding RASA which discussed Featurizers and Classifiers, and Pipelines, this one will dive into Stories, Rules and Policies.

Contents of this blog:

YAML
Intents
Entities
Slots

This blog will introduce readers to essential building blocks of yaml and RASA itself.

YAML

YAML or YML is a markup language, however unlike html or xml, yaml is used not for website design, but when using RASA, it acts as the structural foundation. Before we move on let's begin with understanding what yaml is and how yaml works.

Yaml effectively is a coding language which works using indentations used to form type blocks. In yaml, a type is can be considered as a superclass which consists of all the subtypes of most commonly the particular class.

The most common structure of a block is as provided below:

- Type:
    examples: |
      - e.g. 1
      - e.g. 2
      - e.g. 3

This structure is most commonly used for defining intents, slots or entities when providing examples.

Intents

Intents are essentially what a user aimed at saying from their message. Effectively what the user intended on saying from the message. Now even though RASA uses ML, it is utilised only by setting policies and creating policies. (If you don't know what those are, head on ahead to understanding RASA as it is clearly established within the first blog.)

It is an effective mapping tool which links all the similar meaning to a singular intent which conveys the broader response pattern.

E.g:

I think I have a stomachache,
My stomach hurts,
I might be having abdominal pain.

Are all linked to the intent: intent_symptom_stomach_ache

So an intent answers:
What is the user trying to do or express?

Within RASA itself, intents are the core unit of NLU (Natural Language Understanding). When considering a pipeline, we operate on text as:

                         [Some input query]
                                  |
                                  V
          [RASA model predictions from intents and entities]
                                  |
                                  V
           [Dialogue manager manages which O/P to provide]
                                  |
                                  V
                           [Bot response]



*Pipeline representation using ASCII art.

Internally json handles these queries like:

{
  "intent": {
    "name": "report_symptom",
    "confidence": 0.92
  },
  "entities": [
    {"entity": "symptom", "value": "cough"},
    {"entity": "duration", "value": "two days"}
  ]
}

Here the intent tag defines which type of query was provided by the user and the confidence level provides how confident the module is within this prediction.

In actuality when building chatbots, we would do it as:

version: "3.1"

nlu:
- intent: greet
  examples: |
    - hello
    - hi
    - good morning

- intent: report_symptom
  examples: |
    - I have a headache
    - My head hurts
    - I've been coughing for two days

Here, intent is the group label for where they belong. These are defined under a file called domain.yml. This acts as the initialisation of the intent/group for user query.

intents:
- greet
- goodbye
- affirm
- deny
- mood_great
- mood_unhappy

Entity

When the intent forms the intention behind the sentence, entity forms the specific value, information which the user aimed at finding information for. Without the specificity of entities, the information cannot be used for in-depth responses. Entities allow for procedural dynamism by utilizing branching.

Intent -> Symptom
Entities -> 
  symptomp = fever.
  duration = No. of days.

Without these entities the bot isn't as intelligent as normal. RASA uses entities are extracted by NLU and passed to the dialogue manager. JSON handles these in a similar manner to intents.

"entities": [
    {"entity": "symptom", "value": "fever"},
    {"entity": "duration", "value": "three days"}
  ]

Entity definition occurs within the same file where we define the examples of intents i.e. 'nlu.yml'. However, the actual initialisation occurs within domain.yml in a similar manner.

- intent: report_symptom
  examples: |
    - I have a [fever](symptom)
    - I've been coughing for [two days](duration)
    - My [head](body_part) hurts


*within nlu.yml

Entities are of multiple types ranging from word, categorical, numerical, and lookup or as regex entities. Each time they are used together to take in as much information as possible.

Slots

If intents answer what the user wants, and entities answer which specific information they provided, then slots answer what the assistant remembers.

In simple terms, slots act as RASA’s memory system.

While entities are extracted from a single user message, slots persist across multiple turns of conversation. This allows the chatbot to reason contextually instead of treating every user message as an isolated input.

Why slots are needed
Consider the following interaction:

User: I have a fever.
Bot: How long have you had it?
User: Three days.

Here, the bot must remember that:
“it” refers to fever
the symptom has already been mentioned

This continuity is made possible by slots. Without slots, the dialogue manager would not retain previous information, and the conversation would feel repetitive or incoherent.

Slot representation:
Extending the earlier pipeline representation, slots would act accordingly.

                         [User input]
                               |
                               V
                [Intent & Entity extraction]
                               |
                               V
                 [Slot filling / slot update]
                               |
                               V
               [Dialogue manager (policies)]
                               |
                               V
                        [Bot response]


*extension of the pipeline from intents.

Slots sit between NLU and dialogue management, acting as state variables that influence which action or response is selected next.

Defining slots
Slots are initialised inside domain.yml, similar to intents and entities.

slots:
  symptom:
    type: text
    influence_conversation: true
  duration:
    type: text
    influence_conversation: true

Here:
The type defines how the data is stored and influence_conversation determines whether the slot affects dialogue prediction for the current query.

When an entity is extracted, RASA can automatically map it to a corresponding slot.

Slots in JSON representation
Once filled, slots are stored internally as part of the conversation state.

"slots": {
  "symptom": "fever",
  "duration": "three days"
}

Slots can store different kinds of information depending on the use case.

Text slots : store raw strings (e.g., symptoms, names)
Categorical slots : restrict values to a predefined set (e.g., mild / moderate / severe)
Boolean slots : true/false flags (e.g., emergency_present)
Float / integer slots : numerical values such as age or dosage
List slots : store multiple values (e.g., multiple symptoms) are the most common types.

Now that the most basic definitions have been established, well look into how this behaviour is handled by the predefined pipeline.

The next blog: To be released