DEV Community: Ankita Sahoo

Want to prevent Model Overfitting?

Ankita Sahoo — Sat, 09 Mar 2024 18:32:47 +0000

Five different techniques to prevent overfitting:

Early Stopping: In this method, we track the loss on the validation set during the training phase and use it to determine when to stop training such that the model is accurate but not overfitting.
Image Augmentation: Artificially boosting the number of images in our training set by applying random image transformations to the existing images in the training set.
Dropout: Removing a random selection of a fixed number of neurons in a neural network during training.
Increase dataset size: The more training data you feed, the less likely it is to overfit. The reason is that, as you add more data, the model cannot overfit all the samples, and is forced to generalize to make progress.
Regularization: Regularization is a method to constrain the model to fit our data accurately and not overfit. It can also be thought of as penalizing unnecessary complexity in our model

Did I miss anything?🤔 Let me know in the comments. Happy Learning.😊

Important terminology that you need to know to learn CNN.

Ankita Sahoo — Sun, 03 Mar 2024 12:01:37 +0000

CNN: Convolutional neural network. That is a network that has at least one convolutional layer. A typical CNN also includes other types of layers, such as pooling and dense layers.
Convolution: The process of applying a kernel (filter) to an image
Kernel/filter: A matrix that is smaller than the input, used to transform the input into chunks
Padding: Adding pixels of some value, usually 0, around the input image
Pooling: The process of reducing the size of an image through downsampling. There are several types of pooling layers. For example, average pooling converts many values into one by taking the average. However, max-pooling is the most common.
Max pooling: A process in which many values are converted into a single value by taking the maximum value from among them.
Stride: the number of pixels to slide the kernel (filter) across the image.
Downsampling: The act of reducing the size of an image.

Let me know your feedback in the comments🙂.
Happy Learning.😊

NLP Application (Real-world implementation of Transformer model)

Ankita Sahoo — Sat, 30 Sep 2023 16:57:40 +0000

Natural language processing (NLP) is a field of artificial intelligence that deals with the interaction between computers and humans using natural language. The importance of NLP lies in its ability to transform how humans and computers interact, enabling more intuitive and human-like communication between them.
This has numerous practical applications in information retrieval, sentiment analysis, machine translation, and question-answering, among others. NLP has the potential
to revolutionize many industries, such as healthcare, education, and customer service, by enabling more effective and efficient communication and information management.
As such, NLP has become an important area of research and development, with significant investment being made in its advancement.

The Transformer:

Attention is all you need is the first state of art paper in the field of NLP. The fundamental unit of said architecture, the transformer block, consists of two main components: a multi-head self-attention mechanism and a fully connected feedforward network. The multi-head self-attention mechanism allows the model to focus on different parts of the input sequence at each layer and weigh the importance of each part in making a prediction. This is accomplished by computing attention scores between each element in the input sequence and all other elements, which are then used to weigh the contribution of each token to the final representation. The multi-head attention mechanism allows the model to learn different attention patterns for different tasks and input sequences, making it a more versatile and effective algorithm.

The feedforward network is essentially a multi-layer perceptron (MLP) that takes in the self-attention-generated representation as input, applies linear transformations with activation functions, and outputs the final representation. This final representation is then
passed to the next Transformer block or used for making predictions

BERT:

Later on, based on the Transformer model, One of the most significant developments in this direction was the introduction of BERT(Bidirectional Encoder Representations from Transformers). BERT is a pre-trained transformer model that can be fine-tuned for a wide range of NLP tasks, such as sentiment analysis, named entity recognition, and question answering. BERT was trained using a contrastive task, where it was asked to predict missing tokens in a sentence given the context of the surrounding tokens. This approach allowed BERT to learn rich contextual representations of words, making it highly effective for a wide range of NLP tasks.

GPT:

Another development in the field of transformers was the introduction of GPT (Generative Pretrained Transformer). GPT is a generative model trained on a large corpus of text with the goal of predicting the next token in a sequence given the context of the surrounding tokens. GPT has been shown to be highly effective for tasks such as text generation, language modeling, and question-answering. Unlike BERT, which was trained using a contrastive task, GPT was trained using a generative task, allowing it to learn a more diverse and complete representation of language.

Transformers have not only revolutionized the field of NLP; they are growing beyond it and finding applications in other areas. For example, transformers have been used in computer vision tasks such as image captioning, where they have been used to generate
captions for images based on their content, and have been used in speech recognition, where they have been used to transcribe speech into text.

Another trend in the use of transformers is the development of multimodal models, which allow for the unified modeling and use of text along with other modalities, such as images and audio. These models can help to understand the relationships between different modalities and can use this understanding to perform a wide range of tasks, such as image-to-text generation, text-to-image generation, and audio-to-text generation. Indeed, transformers are growing beyond the field of NLP and are being used in a wide range of
tasks and applications.

Categorization:

The kind of transformer architecture used in NLP applications plays a crucial role in determining the overall performance of the system.

Encoder - decoder based:

Encoder-only transformers are used for discriminative tasks such as sentiment analysis and named entity recognition.
Decoder-only transformers are used for tasks such as text generation and summarization.
Encoder–decoder transformers are used for tasks such as machine translation and image captioning.

Modality based:

Modality refers to the different modes or types of data and information that can be processed and generated beyond text.

Unimodal NLP applications deal with a single modality, such as text or speech
Multimodal NLP applications deal with multiple modalities, such as text, speech, and images. Text often serves as the primary interface in multimodal applications.

Real-world Applications:

Unimodal Applications:

Unimodal applications refer to AI-based systems that primarily focus on processing and analyzing text as their main modality.

i: Language Modeling:

Language modeling is a fundamental task in NLP that involves predicting the next word in a sequence of text based on the preceding words.
Language modeling typically follows the decoder-only architecture.
The goal of language modeling is to estimate the probability distribution of sequences of words in a given language and is used as a building block for many NLP tasks such as machine translation, speech recognition, and text generation. Language modeling can be easily extended to more complex NLP tasks such as sentence-pair modeling, cross-document language modeling, and definition modeling.

ii: Question Answering:

Question Answering is an NLP application that involves automatically answering questions posed in natural language. The goal of question answering is to extract the relevant information from a given text corpus and present it as an answer to a user’s question.
Question-answering systems can operate over a wide range of text types, including news articles, Wikipedia pages, and others, and can be designed to answer a wide range of questions, including fact-based questions, opinion questions, and others.

There are several subtasks within QA, as follows:

Open-Domain Question Answering (ODQA): This task involves finding an answer to a question from an open domain, such as the entire internet or a large corpus of text. The goal is to find the most relevant information to answer the question, even if it requires synthesizing information from multiple sources. Reformer is a deep learning model for ODQA.
Conversational Question Answering (CQA): This task involves answering questions in a conversational setting, where the model must understand the context of the conversation and generate an answer that is relevant and appropriate for the current conversational context. SDNet is a deep-learning model for conversational question answering (CQA)
Answer Selection: This task involves ranking a set of candidate answers for a given question, where the goal is to select the most accurate answer from the candidate set.
Machine Reading Comprehension (MRC): This task involves understanding and answering questions about a given passage of text. The model must be able to comprehend the text, extract relevant information, and generate an answer that is accurate and relevant to the question. XLNet is a deep-learning model MRC.

iii: Machine Translation:

It is the task of automatically converting a source text in
one language to a target text in another language. The goal of machine translation is to produce a fluent and accurate translation that conveys the meaning of the source text in the target language. MT models often follow an encoder–decoder architecture to capture the context effectively using a bidirectional encoder and be able to generate text of arbitrary
length, following the original formulation of transformer architecture. There are several subtasks within MT,

Transliteration: It involves translating text from one script to another, such as translating between the Latin and Cyrillic scripts. It involves preserving the meaning of words, rather than translating the meaning of words to another language.
Unsupervised Machine Translation (UMT): It involves translating between two languages without any parallel training data, meaning that there is no corresponding text in the target language for the source language text. UMT models are typically trained on monolingual data in each language.
Bilingual Lexicon Induction (BLI): It involves inducing a bilingual lexicon, automatically discovering word translation pairs or mappings between two languages without the need for explicit bilingual dictionaries or parallel corpora.

iv: Text classification:

Text classification is the task of categorizing a text into one or more predefined categories based on its content. The goal of text classification is to automatically assign a label to a given text based on its content, allowing it to be organized and categorized for easier analysis and management.
These models are trained on annotated text data in order to learn
the relationship between the text content and its label, and can then be used to classify new unseen text data.
Text classification models typically follow a decoder-only architecture.
its subcategories are

Document Classification: This task involves assigning a label or category to a full document, such as a news article, blog post, or scientific paper. Document classification is typically accomplished by first representing the document as a numerical vector and then using a machine-learning model to make a prediction based on the document’s representation.
LinkBERT extends the pre-training objective of BERT
to incorporate links between documents.
Cause and Effect Classification: This task involves identifying the cause and effect relationship between two events described in a sentence or paragraph.

v: Text Generation:

Text Generation is a task in NLP in which the objective is to produce new text automatically, typically starting from a given prompt or input. The output can be a single word, phrase, sentence, or full-length piece of text, and is used for chatbots, content creation, and more. The generated text should reflect an understanding of the input and the language being generated, and the quality and coherence of the generated text can vary depending on the approach used.
Text generation typically follows a decoder-only architecture, however, recent issues with prompt-injection attacks have
migrated part of the focus towards encoder-decoder models that have been instruction-tuned, such as T5.
Text generation subtasks include:

Dialogue Generation: It focuses on generating text in the form of a conversation between two or more agents. Dialogue generation systems are used in various applications, such as chatbots, virtual assistants, and conversational AI systems. These systems use dialogue history, user input, and context to generate appropriate and coherent responses. P2-BOT is a transmitter–receiver-based framework that aims to explicitly model understanding in chat dialogue systems through mutual persona perception.
Code Generation: It focuses on generating code based on a given input, such as a natural language description of a software problem. Code generation systems are used in software development to automate repetitive tasks, improve productivity, and reduce errors.
Data-to-Text Generation: It focuses on generating natural language text from structured data such as tables, databases, or graphs. Data-to-text generation systems are used in various applications, such as news reporting, data visualization, and technical writing.

vi: Text Summarization:

Text Summarization is a task in NLP where the goal is to condense a given text into a shorter and more concise version while preserving its essential information. This is typically
accomplished by identifying and extracting the most important information, sentences, or phrases from the original text.
Text summarization is used in a variety of applications, such
as news aggregation, document summarization, and more.
Text summarization typically requires an encoder–decoder architecture to completely capture the source information.

its sub-categories are,

Extractive Summarization: It extracts the most important sentences or phrases from a document and presents them as a summary. Extractive summarization methods typically use a combination of information retrieval and natural language processing techniques to identify the most informative sentences or phrases in a document.
Abstractive Summarization: It generates a summary by synthesizing new information based on the input document.
These models are trained on large amounts of data and can generate summaries that are more concise and coherent than extractive summaries. mBart is a sequence-to-sequence transformer trained on multiple large-scale monolingual corpora with the objective of denoising.
Multi-Document Summarization: It summarizes multiple related documents into a single summary. Multi-document summarization methods typically use information retrieval techniques to identify the most important documents and natural language processing techniques to generate a summary from the selected documents.
Query-Focused Summarization: It summarizing a document
based on a specific query or topic. Query-focused summarization methods use information retrieval techniques to identify the most relevant sentences or phrases in a document and present them as a summary.
Sentence Compression: It focuses on reducing the length of a sentence while preserving its meaning. Sentence compression methods typically use natural language processing techniques to identify redundant or unnecessary words or phrases in a sentence and remove them to create a more concise sentence. DistilRoBERTa is a reinforcement learning algorithm to predict a binary classifier that keeps or discards words to reduce sentence length.

vii: Sentiment Analysis:

Sentiment Analysis is a task in NLP with the goal of determining the sentiment expressed in a given text. This is typically accomplished by assigning a sentiment label such as positive, negative, or neutral to the text based on its contents.
The sentiment can be expressed in different forms, such as opinions, emotions, or evaluations, and can be expressed at various levels of granularity, such as at the document, sentence, or aspect level. Sentiment Analysis is used in a variety of applications, such as customer service, marketing, and opinion mining.
The quality of the sentiment analysis results can be influenced by factors such as the subjectivity of the text, the tone, and the context in which the sentiment is expressed.

viii: Named Entity Recognition:

Named Entity Recognition (NER) is a task in NLP with the goal of identifying and categorizing named entities present in a given text into predefined categories such as person names, organizations, locations, dates, and more. NER is used as an intermediate step in various applications such as question-answering, event extraction, and information retrieval.
It typically utilizes an encoder-only architecture. While
the approach of fine-tuning a pre-trained model with a classification head added on top for NER work well in practice, Automated Concatenation of Embeddings (ACE) has
shown improved results using an ensemble of several pre-trained models while training only a simple classifier on top using reinforcement learning.

ix: Information Retrieval:

Information Retrieval (IR) is a task in NLP with the goal of retrieving relevant information from a large collection of documents in response to a user query. This is typically
accomplished by matching the query terms against the document content and ranking the documents based on their relevance to the query.
IR systems can be used for various applications, such as web search, document search, and question answering. The quality
of the retrieval, results can be influenced by factors such as the relevance of the documents, the effectiveness of the ranking algorithm, and the representation of the documents and queries.
IR systems are typically classified further based on the level of granularity, such as document, paragraph, sentence, etc. The typical methods for retrieval include the use of a pre-trained model such as RoBERTa in a Siamese fashion to find the similarity between two embeddings.

Multimodal Applications: Multimodal applications are software or systems that process and integrate information from multiple modalities or types of data, such as text, images, speech, and more, to provide a richer and more comprehensive user experience. These applications leverage multiple sources of data to enhance understanding, enable interaction, and solve complex problems.

i: Generative Control:

Generative Control is a task in multimodal NLP in which text is used as an interface to generate another modality, such as images or speech. The goal of Generative Control is to generate a target modality that corresponds to a given text description or instruction.
For example, based on a textual description of an object, such as "a red sports car," the task of Generative Control would be to generate an image of a red sports car.
Generative Control combines the strengths of NLP and computer graphics or speech synthesis to produce high-quality and semantically meaningful outputs in the target modality. It has
applications in areas such as computer vision, robotics, and human–computer interaction.

ii: Description Generation:

Description generation, a subset of natural language processing (NLP), involves automatically creating human-like text descriptions based on structured data, prompts, or other information. Description Generation would be to generate a textual description of the objects, actions,
and attributes present in the scene.
It aims to generate coherent and contextually relevant text for various applications, such as product descriptions, data visualization, virtual assistants, and content generation. This process can be rule-based, template-based, or driven by machine learning models that capture complex linguistic patterns and context.

iii: Multimodal Question Answering:

Multimodal Question Answering (QA) is a task with the goal of answering questions about a given multimodal input, such as an image or a video, using information from multiple modalities. The task involves combining information from text, images, audio,
and other modalities to accurately answer questions about the content of the input.
For example, given an image of a scene and a question about the scene, such as “What is the color of the car?”, the task of Multimodal QA would be to identify the car in the image
and answer the question with the correct color. Multimodal QA requires the integration of NLP, computer vision, and other relevant modalities
BEiT performs masked language modeling on images, texts, and image-text pairs

Resource: Transformers in the Real World: A Survey on NLP Applications

Day-20 of Machine Learning

Ankita Sahoo — Sat, 10 Dec 2022 16:42:14 +0000

Day-20 of Machine Learning:

Learnt about cross-validation:

Cross-validation is a technique for evaluating a machine learning model and testing its performance.
Cross validation refers to an extra data set that used to cross check the validity or the accuracy of the model.
for example, let a ML model trained over the training-set and the accuracy is 98%. Wow!!! But still we can't say that the model has trained very well, and it is because it is performing well over the data that it already knows and if we expose the model to completely new, unseen data, it might not predict with the same accuracy and it might fail to generalize over the new data. it is the case of overfitting or High variance. Similarly, if the model does not perform well on training-set as well as cross-validation set, then it is Underfitting / High bias.

Using Training-set we fit the parameters [W,b] then choose parameter d (degree of polynomial) using cross-validation set. So, we have not used test-set till now to fit either of any parameters, that is why Jtest will be a fire estimate of the generalization error of the model

Day-19 of Machine Learning:

Ankita Sahoo — Tue, 25 Oct 2022 18:09:18 +0000

Day-19 of Machine Learning:
I. Basic template of TensorFlow implementation:

1. construct the network

model = Sequential(
    [               
        tf.keras.Input(shape=(400,)),    #specify input size

        Dense(25, activation='sigmoid'), 
        Dense(15, activation='sigmoid'), 
        Dense(1,  activation='sigmoid')  


    ], name = "my_model" 
)

Keras Sequential model and Dense Layer with sigmoid activations.

2. loss function

model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(0.001),
)

Here for binary classification, BinaryCrossentropy() is used. We can also use MeanSquareError() for Linear regression.

3. gradient descent to fit the weights of the model to the training data

model.fit(
    X,y,
    epochs=20
)

II. Got to know about different Activation

- Linear Activation:

Activation a = g(Z) = Z
where Z = W.X + b
Output y might be an Integer number (+ve/-ve)

- Sigmoid Activation:

Activation a = g(Z) = 1 / (1 + e ^ (-Z)).
Output y might be 0 or 1 i.e binary classification

- ReLU Activation (Rectified Linear Activation):

Activation a = g(Z) = max (0, Z).
Output y will be any Whole number

III. How to choose Activation?
We can choose different activation within a Neural Network for separate layers and activations can be chosen accordingly requirement and goal of the Neural Network. However some recommendations are,

A neural network with many layers but no activation function is not effective. A Neural network with only linear activation is the same as no activation function.
ReLU are often use than Sigmoid activation. It is because firstly ReLU is a bit faster as it does less computation (max of 0 and Z) than sigmoid which does exponential then inverse and so on. Secondly Gradient Descent goes slow for flat and ReLU goes flat in one place whereas Sigmoid in 2 places.
Use ReLU instead of Linear Activation in Hidden layers.

Day-18 of Machine Learning:

Ankita Sahoo — Fri, 21 Oct 2022 20:10:53 +0000

Day-18 of Machine Learning:

Got a brief intuition on AGI (artificial general intelligence) and realised ANI (artificial narrow intelligence) had made tremendous progress and values and so on AI as compared to AGI with lots of valid reasons.
Learnt how to implement Forward propagation using NumPy instead of TensorFlow and it gave me a basic idea about how dense function is working under the hood in TensorFlow.
Learnt TensorFlow uses matrix to store training-set parameters in instead of vector/list for efficient computation. weight-parameters for i-th unit/neuron is arranged in i-th column of the matrix whereas row is the list of values of a parameter for units/neurons and TensorFlow utilizes a 1-D representation to store bias parameter. Also, inside dense function, matrix multiplication and sigmoid function implementation are performed.

Also,

Practiced some basic DSA questions on LinkedList.
Created a simplistic animation effect (gif) on python using Pillow Library, inspired by CS50P

Day-17 of Machine Learning:

Ankita Sahoo — Wed, 19 Oct 2022 18:36:10 +0000

Day-17 of Machine Learning:
Got a brief intuition on TensorFlow where I learnt about

how tensor stores data in matrix, about dense function which is used to create fully connected layers and about Sequential model.
how neural networks handle complex decisions by dividing the decisions between hidden layers and again each layers in to multiple units or neurons.
Implemented small neural network in TensorFlow binary classification model to recognise handwritten digit that works for only 1 and 0.

Day-16 of Machine Learning:

Ankita Sahoo — Tue, 18 Oct 2022 06:19:04 +0000

Day-16 of Machine Learning:

Started course 2 - Advance Learning algorithm, where got a brief intuition on Neural network, its layer, neurons and a simple visualization how things work in the hidden layer.

Day-15 of Machine Learning

Ankita Sahoo — Thu, 29 Sep 2022 12:00:21 +0000

Day-15 of Machine Learning:

Implemented Sigmoid function, cost function, Gradient descent for logistic regression for Logistic Regression and a prediction function with 0.5 threshold to build a logistic regression model to predict whether a student gets admitted into a university.
Also implemented regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA) where I implemented cost function and gradient descent for Regularised Logistic Regression.

With this I completed Course-1: Supervised Machine Learning: Regression and Classification

Day-14 of Machine Learning

Ankita Sahoo — Sun, 25 Sep 2022 17:59:58 +0000

Day-14 of Machine Learning:

Learnt about Underfit, Generalization and Overfit.
and how to address overfit:
1. Collect more data / training sets.
2. Feature selection.
3. Regularization.
Learnt how to apply regularization on linear regression and Logistic Regression and to penalise or shrink features that will help to build a simpler model.

Day-13 of Machine Learning

Ankita Sahoo — Wed, 21 Sep 2022 05:30:58 +0000

Day-13 of Machine Learning:
Learnt to calculate gradient descent for logistics regression with single-featured data set as well as two-featured data set.

Day-12 of Machine Learning

Ankita Sahoo — Tue, 20 Sep 2022 07:05:15 +0000

Day-12 of Machine Learning:
Learnt about loss function and cost function of logistic regression.