Generally writing this for my own benefit as I'm diving into Natural Language Processing (NLP) for a current project. In the era of AI, folks throw around the term "model" and my mind (even as a certified math person™) replaces that with <vague mathy, computer-sciencey magic thingamajig>
.
But I wanted to understand it a little more and did a little digging. My current understanding can be narrowed down to:
- a set of test data
- a set of features (things about the data - like "is capitalized")
- a set of weights (numbers between 0 to 1) for each feature
- a loop where the program makes a guess, changes the weights, and tries again - millions of times until it gets it right enough that it's worthwhile to keep around
Concretely, if you were implementing NLP you might have categories that define a word as being a person, an organization, or a location.
So you'd get some basic features like the below:
word_features = {
"is_capitalized": true,
"previous_word": "new",
"next_word": "announced",
"is_followed_by_Inc": true,
}
And you might start off with random weights and then, through the loops (this is the "training" part of creating a model), it'd eventually get to something like this:
weights = {
"is_capitalized": {
"ORG": 0.8, // High, most organizations are capitalized
"PERSON": 0.7, // ...same for person names
"LOC": 0.6 // Somewhat high for locations - as some are capitalized and some aren't like "school" vs "Fred Meyer"
},
"previous_word": {
"new": {
"ORG": 0.5, // etc... for the rest of the categories and features
},
},
}
Then, of course, there's some probability mathy mathness in there that looks at all the weights across all the features and decides which category is most probable.
Makes me think of those personality tests I took in middle school: "if you answered mostly C, you're a sporty tortoise"! Though I suspect it's more complicated than that.
Top comments (1)
Happy to hear corrections, as long as they're kind and not jargon-y.