TensorFlow to filter PDF files

Hello everyone,

So I got the idea to filter resumes using Tensorflow but I don't know how to do it o:).. I found some tutorial about training the model to recognize if on the image is dog or cat.

So what I want to do? Training the model with some keywords (let's say some kind of dictionary) -> I want to filter pdf file depend on those keywords and store it different folder.

Any help?

Did you find this post useful? Show some love!

I would use pdf - to- text and then feed the data to github.com/vi3k6i5/flashtext with annotated dictionary of keywords. Elasticsearch seems to complex solution since you only want some basic filtering like Qa/Devops/Java Dev/Python dev to group resumes by category.

If you still want to play with tensorflow I suggest you to think about what kind of feature you can extract from image and what kind of output to expect. That's 90% of success the rest 10% is just to code up tensorflow model

Hey, Alex thanks for the answer makes sense to me!

I have a question: So what you are saying is to create a dictionary with the keywords and then extract the pdf to text and filter the cv correspondingly? Right?

How will you do the PDF -> txt?

Why don't you just do OCR to extract the PDF to txt?

I think its great with stripping data from pdf first.
I was thinking how to employ Elasticsearch some time ago..
There are ways how to easily index PDF docs, then you can use similarity or scoring to search.

I'm also interested in the topic of tensorflow :))

Classic DEV Post from Oct 12 '17

9 Software Engineering Career Mistakes To Avoid At All Costs

This post was first published on CoderHood as 9 Software Engineering Career Mis...

Follow @lpasqualis to see more of their posts in your feed.
Vasil Vasilev
Member since Nov 26, 2017
Trending on dev.to
Python Scripting Toolbox: Part 2 - String Templates and `argparse`
#python #scripting #tutorial
A better way to handle magic values and constants?
Postgres or mysql
How to deal with being laid off?
#work #job #development #discuss
Biggest aha moment
5 Reasons You Should Write That Blog Post
#career #beginners
What are some good questions for a "State of the Web" survey?
#webdev #discuss
Saying "Hello world" using Masonite Framework
#python #framework #masonite