DEV Community

aj1thkr1sh
aj1thkr1sh

Posted on

Try My ~3.5 Million Parameters Model Instead for βœ¨πŸ˜‰

😊 Introduction

I thought to Build a Machine Learning Model to generate Emojis for My Writing content

Like when typing some sentences, I think I need relevant Emojis in the sentences, as it will give more expressiveness, right

And also because currently I am searching in Web(using some website like Emoji Finder etc., it is based on keywords only) to pick Emojis

So I decided to Develop a LSTM Model to automatically generate Emoji for My Sentences

πŸŽ₯ First Demo

Demo Video

πŸ§‘β€πŸ« Explanation

When the user types the paragraph, for each paragraph the Model predicts the approximate Emoji

For example, if you have 5 sentences in a paragraph, the Model will generate 5 Emojis

βš™οΈ Technical Details

I currently have totally 20 Emojis ❀, 😍, πŸ˜‚, πŸ’•, πŸ”₯, 😊, 😎, ✨, πŸ’™, 😘, πŸ“·, πŸ‡ΊπŸ‡Έ, β˜€, πŸ’œ, πŸ˜‰, πŸ’―, 😁, πŸŽ„, πŸ“Έ, 😜

And used Dataset from Hugging Face and used Google Colab for Training

🎯 Model Selection

I have decided to use BiLSTM(Bidirectional LSTM(Long Short-Term Memory))

We could have just used LSTM rather then BiLSTM as we are not doing Seq2Seq Task, like we are not translating English to Emojis

But, here BiLSTM improves the performance by at least 10%

So, decided to use BiLSTM

Note :

Here we could have used BiLSTM with Attention Mechanism or even Fine-Tuned Pre Trained Model like BERT

But decided to use BiLSTM

πŸ—οΈ Architecture

  1. First have used Embedding Layer
  2. Then, BiLSTM with 2 Layers
  3. Dropout for Regularization
  4. Then Linear Layer with 20 output units for each Emojis
    • Note :
    • We are using BiLSTM, so it will hidden states from both the directions
    • So, before passing to Linear Layer we must concatenate both the hidden states

🎲 Hyperparameters

  1. I have used the Maximum length of the Sequence to be 50
  2. Embedding Dimension as 128
  3. Hidden Dimension for LSTM States to be 128
  4. As stated before, we have chosen Number of Layers to be 2
  5. Learning Rate Ξ± : 0.0005

🚨 Accuracy and Important Study

Regarding Accuracy, it is not relatively higher although with BiLSTM and even low with Unidirectional LSTM

Was tweaking and trying to improve the performance then LATER ONLY realized that data has Class Imbalance Problem

Where ~21% of the Data has "❀" which dominates most and followed by ~10% of Data has "😍" and so on... then "😜" was barely minimum

⚠️ Important action

πŸ₯‡ First Rule : Data in Training Set, Development Set and Test Set should follow the same Distribution

Followed by, may be here, we can Augmented Data Synthetically, or may be with some other Architecture so on

Yes, we can also increase Accuracy with the same Data but as of now saved that for later

πŸ“œ Little History

I have used Regular LSTM and saw the Model was Overfitting

So, of various Hyperparameters(still overfitting) after few tweaking various Hyperparameters along with BiLSTM showed little improvement and continued and seems to work fine

Originally Posted : https://ajithraghavan.github.io/blog/emoji-generator/

Thanks for reading

Top comments (0)