DEV Community

Cover image for How an AI Model Works?
Eloy Gonzalez
Eloy Gonzalez

Posted on • Edited on

How an AI Model Works?

#ai

Stage 1: Detokenization

What it is: The first layer of the model acts as an additional step in converting words into numbers that the model can comprehend.
How it works: In models like Pythia, this initial layer operates solely on the current token (the piece of information the model is processing at that moment).
Importance: Removing this layer confuses the rest of the model, as it loses immediate context. This stage takes individual tokens and groups them into more meaningful entities, such as a full name.

Stage 2: Feature Engineering

What it is: This stage involves creating useful features from the data.
How it works: The early and middle layers of the model recall facts and enhance the identification of spatial and temporal features.
Importance: Features become more complex, progressing from understanding sentence structure to capturing their deeper meaning.

Stage 3: Prediction Ensemble

What it is: This stage converts semantic features into concrete predictions.
How it works: Post midway through the model, a form of "voting" among multiple sub-networks determines the next token (word) to predict.
Importance: Prediction neurons increase the probability of certain tokens, while suppression neurons decrease others, ensuring more accurate predictions.

Stage 4: Residual Refinement

What it is: The final stage fine-tunes and refines predictions.
How it works: The last layers of the model contain many suppression neurons that eliminate unnecessary features and adjust confidence in the final prediction.
Importance: These neurons refine prediction distributions, ensuring they are neither overly confident nor uncertain. The ordering of layers at the beginning and end is crucial due to the significant changes they can induce.

Top comments (0)