How to generate 3d facial animation from text using ML?

Hello, Everybody! I am new to machine learning and have the task to generate vertices for 3d facial animation of the "talking head" from text. 3d animation are created from sequence of OBJ meshes with 20 fps frequency.

Now we have dataset (10,000+ rows) containing text and 3d coordinates of vertices for facial animation. Which model of AI is most suitable for such task? Does anybody knows existing model solving similar tasks?

