DEV Community

RyanSmoak
RyanSmoak

Posted on

Data preprocessing for extractive QA

I want to start a new project to do extractive QA based on a certain text corpus that is hundred of pages long but I don't know how to preprocess the data. I was planning on training BERT on the text corpus that looks like this:
Image description
How can I turn this into something that BERT can learn from? If you need me to clarify on anything, just ask. All help is appreciated.

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up