Data preprocessing for extractive QA

#nlp #ai #beginners #machinelearning

I want to start a new project to do extractive QA based on a certain text corpus that is hundred of pages long but I don't know how to preprocess the data. I was planning on training BERT on the text corpus that looks like this:

How can I turn this into something that BERT can learn from? If you need me to clarify on anything, just ask. All help is appreciated.