DEV Community

Bamimore-Tomi
Bamimore-Tomi

Posted on

Building a Natural Language Processing API with FastAPI.

Image gotten from https://fastapi.tiangolo.com/

Brief History of the Framework ...

FastAPI was built by Sebastián Ramírez. It was built with inspiration from Flask which is a micro-framework i.e it is easy to mix and match other tools and libraries with FastAPI. Specific information about the architecture and design of FastAPI can be found on FastAPI's documentation website

Back to Business ...

We will be building an API with some Natural Language Process features. The structure of the project looks like
File structure for the project
The content of each file will be explained in detail.

To get started run the following commands in terminal

  • pip install virtualenv, virtualenv <folder-name>,cd <folder-name>\scripts,activate. This set of commands are optional but reccommended. Naviagte to the main <folder-name> directory to continue executing the remaining commands.
  • git clone https://github.com/Bamimore-Tomi/inteligencia.git
  • pip install -r requirements.txt. This command will install fastapi, uvicorn, nltk, spacy.
  • python -c "import nltk;nltk.download('wordnet')". This command will download wordnet which contains some trained words needed for the NLP features we will be implementing. Other NLP dependencies would have been installed with the pip install -r requirements.txt command.

The main.py file is where the main FastAPI application lives. We will be spending more time there. I won't dwell on how the NLP functionalities are implemented, this will be treated in subsequent posts.

🚀🚀🚀🚀🚀
We start with a couple of imports in the main.py file

#main.py file
from fastapi import FastAPI
from pydantic import BaseModel
from utils import *
Enter fullscreen mode Exit fullscreen mode

FastAPI is the main application object. The pydantic module is used by FastAPI for type validation on all inputs and outputs for the API, this strict validation reduces the chances of bugs 🐞 and generally help to make your API secure 🔐🔐.

#main.py file
tags_metadata=[
    {
        'name':'similarity',
        'description':'Finds the similarity between 2 sentences using their word vectors.'
    },
    {
        'name':'tokenize',
        'description':'Takes in word, sentences e.t.c and return lexical infromation about each of words. e.g Nouns, Abstract Nouns, Co-ordinating conjunction.'
    },
    {
        'name':'synonyms',
        'description':'Takes in a word or a group of words separated by commas and return a list of English language synonyms for the words.'
    },
    {
        'name':'antonyms',
        'description':'Takes in a word or a group of words separated by commas and return a list of English language antonyms for the words.'
    },
    {
        'name':'tospeech',
        'description':'Takes in a string and returns an audio file of the text.'
    }
]
Enter fullscreen mode Exit fullscreen mode

The main objective of this variable is for documentation of the functionality of different URL paths in this FastAPI application (we will see these paths later on). It is not necessary and can be omitted. Although, it is nice to have when building an API for public consumption. A major perk of FastAPI is how it supports documentation out of the box.

#main.py file
app = FastAPI(title='Tageit',
              description='This is a hobby project for people interesed in using NLP. Email tomibami2020@gmail.com for new functionality you want to be added.',
              openapi_tags=tags_metadata)
Enter fullscreen mode Exit fullscreen mode

This is where the FastAPI application is initialized. Some extra keyword arguments are passed in for documentation. Again, this is nice to have. The application can simply be initialized with app= FastAPI().

Next, some pydantic models that will be used for type validation later on in the file are declared.

#main.py file
class SimilarityIn(BaseModel):
    text_1 : str
    text_2 : str
class SimilarityOut(BaseModel):
    score : float 
class TokenizeIn(BaseModel):
    text : str
class SynonymIn(BaseModel):
    text : str
class AntonymsIn(BaseModel):
    text : str
class TextToSpeech(BaseModel):
    text : str
    language : Optional[str] = 'en'
Enter fullscreen mode Exit fullscreen mode

When working on a large and complex project, it is bad practice to declare your type validation models right in the main application file. This happens to be a small project 😄😄 so it doesn't hurt to declare your models right in the same file. There are many ways to declare your models. The method used here is recommended by FastAPI. The essence of inheriting BaseModel which is a pydantic helper class is to make your type validation classes work the way FastAPI needs it to work. The reason behind each type will be explained as we continue because they will become more intuitive when we see the functionality we want to achieve, the inputs needed, and the required output(s). It will all become clear 💡💡💡.

The API endpoints ...

home :
#main.py file
@app.get('/')
def home():
    return 'Welcome here' 
Enter fullscreen mode Exit fullscreen mode

This is not so important, but mon amie you have successfully created your first flask application which you can now run with uvicorn main:app --reload. This command spawns a uvicorn server on your localhost. The intuition of the command is uviorn <file where your main FastAPI application is>:<name of the FastAPI application variable in the file>.The --reload makes the server reload when you make changes to the application file, this is good for development. FastAPI by default runs on port 8000 so you can turn off all your Django applications 😄😄 because a new Sherrif is in town 👮🏻👮🏻👮🏻. If this goes well, you should see something like

$ uvicorn main:app --reload

INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [1720]
INFO: Started server process [6822]
INFO: Waiting for application startup.
INFO: Application startup complete.
Enter fullscreen mode Exit fullscreen mode
similarity: This endpoint receives a post request containing two texts and returns the cosine similarity between the words.
#main.py file
@app.post('/similarity' , response_model=SimilarityOut,tags=['similarity'])
def similarity(text : SimilarityIn):
    score = similarity_(text.text_1, text.text_2)
    return {'score':score}
Enter fullscreen mode Exit fullscreen mode

This response_model=SimilarityOut argument tells FastAPI what the route is expected to return. The class SimilarityOut was declared earlier on as

#main.py file
class SimilarityOut(BaseModel):
    score : float
Enter fullscreen mode Exit fullscreen mode


. You will notice that the return value of the function reads return {'score':score}. Take a close look at score : float in the class above and the score keyword of the return value of the function. To make it simpler. I have told FastAPI that this route will return a dictionary and the only keyword there will be score and the value of score will be a float. You will also notice that the similarity function has a positional argument text : SimilarityIn. The SimilarityIn class has been defined earlier on as

#main.py file
class SimilarityIn(BaseModel):
    text_1 : str
    text_2 : str 
Enter fullscreen mode Exit fullscreen mode

this tells FastAPI that I will be expecting the post request to have two keys which are text_1 and text_2 and both of them must be of type str.

Phew 😪😪😪, that was quite verbose but you should get the whole FastAPI input and output validation now. This may look stressful and unnecessary; amigo, I promise you'll begin to see the importance as you begin to build larger APIs.

tokenize:

This API endpoint returns lexical information about a text posted to the URL path.

#main.py file
@app.post('/tokenize', response_model=dict,tags=['tokenize'])
def tokenize(text : TokenizeIn):
    tokens = tokenize_(text.text)
    return tokens
Enter fullscreen mode Exit fullscreen mode

You will notice that the response_model for this endpoint says dict. This means that the endpoint just returns a python dict. Unlike the previous endpoint that specifies the content of the response. This is quite flexible and is useful when you know the type but not the exact content of your response. This also shows that your response model can be an inbuilt python data type like int, float, str. On this line tokens = tokenize_(text.text) the text in the post request is gotten through text.text. This happens due to some FastAPI magic that occurs in the text : TokenizeIn positional argument passed into the tokenize function.

I will be skipping the subsequent endpoints declared in the file.

#main.py file
@app.post('/synonyms', response_model=dict, tags=['synonyms'])
def synonyms(text : SynonymIn ):
    words = text.text.replace(' ','').split(',')
    response = {}
    for i in words:
        syns = synonyms_(i.strip())
        response[i]=syns

    return response

@app.post('/antonyms', response_model=dict, tags=['antonyms'])
def antonyms(text : AntonymsIn ):
    words = text.text.replace(' ','').split(',')
    response = {}
    for i in words:
        syns = antonyms_(i.strip())
        response[i]=syns       
    return response
Enter fullscreen mode Exit fullscreen mode

They all follow the same logic as the previous endpoints that have been explained already. The last endpoint has some peculiarities that need explanation.

tospeech:

This endpoint receives a text from the post request and returns a .mp3 file for the text.

#main.py file
@app.post('/tospeech' ,tags=['tospeech'])
def  text_to_speech(text : TextToSpeech ):
    language = text.language
    if len(language)>2:
        language=language[:2].lower()
    elif len(language)<2:
        language='en'
    audio_object = text_to_speech_(text.text,language=language)
    audio_object.save('aud.mp3')
    return FileResponse('aud.mp3')
Enter fullscreen mode Exit fullscreen mode

This is where I make use of the FileResponse imported at the beginning of the file. This class abstracts all the dirty work needed to return a file object. The text_to_speech function takes a positional argument text : TextToSpeech. The TextToSpeech was also declared earlier in the file. A syntax in the class that should be taken note of is language : Optional[str] = 'en'. Optional there means the language keyword might or might not be supplied in the request. If it is not present in the post request, it should default to a string en. You will also notice that response_model is not specified; the FileResponse handles that for us behind the scenes.

Conclusion

We have seen how FastAPI works with documentation, type validation, and file response. The deployed application can be viewed on heroku and source code. We have also seen how easy it is to get things up and running with FastAPI. I hope you enjoy this terrific FastAPI journey you are about to go on.

Top comments (0)