DEV Community

RyanSmoak
RyanSmoak

Posted on

Data preprocessing for extractive QA

I want to start a new project to do extractive QA based on a certain text corpus that is hundred of pages long but I don't know how to preprocess the data. I was planning on training BERT on the text corpus that looks like this:
Image description
How can I turn this into something that BERT can learn from? If you need me to clarify on anything, just ask. All help is appreciated.

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay