Depending on the number of dimensions you can try an embedding (word2vec, gloVe, fastText, ...) or something like a dimensionality reduction (PCA, ISOMAP, ...). I have made mostly positive experience with PCA + kNN. When working on NLP problems word2vec and kNN did well too.
BTW: Thrilled for you VC dimension post. This topic can be confusing.
fyi i wrote up VC Dimensions. it is confusing indeed but i learned a lot from examples. dev.to/swyx/supervised-learning-vc...
thanks - i'm very keen on NLP type problems and unfortunately it doesnt seem to be covered in any detail in this course. so after i end this series i may do a further deep dive on this, or maybe if you wanna write up some things i can explore (a pseudo syllabus?) i will happily do that
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.