spiritupbro

Posted on Jun 21, 2020 • Edited on Nov 25, 2020

Create your resume automatically from mongodb

#python #blogspiritbro1english

Introduction

In this guide you will learn how to create resume from data you have in your mongodb database. The resume will be in form of pdf. We will also shorten all the link in the url so we can analyze how much visitor come through our resume for that purpose we will use bit.ly and bit.ly api. We will use docx template to generate the form. When you're finished you will be able to use it to automatically create your resume even if your resume got a lot of project so you dont need to type it one by one in docx.

Prerequisites

python 3.6, install depend on your os. Install it from official python website here
docx template, for the template you can use whatever design you want to use, for this article i use docx template from Universitas Ciputra apple developer academy resume template you can download it here
pip install virtualenv and then python3 -m venv tutorial-env
bit.ly account
mlab or mongodb atlas account, for this one i use mlab and as of today you can't register to mlab anymore as it has been merge with mongodb atlas so if you're new to mongodb and want to try register to mongodb atlas
linux, the OS is actually whatever you want to use but for majority of this article i will use linux command for other OS, google is your friend
libreoffice, this is for convert to PDF later after we create docx

Step 1 — Install the python dependency that we need

First of all we need to install some of the dependency that later we will use it for our need, dont forget to run the virtualenv for every dependency install so it won't be installed on the global scope to install dependency use the command below

pip install requests Pillow docxtpl pymongo wget decouple

or you can install it later using requirements.txt in github link i send you later like this

pip install -r requirements.txt

let's breakdown one by one why we need all of this dependencies here

First off we install requests which is a library for http request we need this to request api from bit.ly i will explain it later how to get access token for bit.ly for personal use
And then we install Pillow this is a library for manipulating image we use this to compress the image all my image is hosted on gyazo you can use any other image host let's say imgur or something for hosting your image or you can put your image in the same folder as your main python file. Sometimes the image file is so big like 3.5 mb and thus make the docx file size big also we won't be uploading that much of a file right and pretty sure reviewer will not do that either so lets make our life easier by making other life easier by compressing the image file with Pillow
After that we install docxtpl this is one of the most important library here as this is the one who make our life easier by helping us edit the docx template for python
pymongo is for communicating with mongodb host actually you can use any database you want for the data but for this article we will use mongodb
wget this one is for downloading image and put it in our folder as all my image is hosted on gyazo i need a tools to get it from there and put it on my folder and later we will use it for our project image in docx template
decouple is for getting the variable in .env

Step 2 - Let's breakdown the docx template

Why we use docx template first of all you can design actually your own docx template if you dont like the one which i put on the prequisite that one is just for example of how to create a docx template and make it able to communicate with our python to make sure our data is put in the docx easily so let's breakdown the docx template

as you can see here you might see a syntax like

{% for i in projects %}

that is a jinja2 syntax it is a syntax popularized by the python django framework for communicating with the view so we use that also for our docx so that python know how to edit the jinja syntax into our data in syntax above you can see that we are actually looping through our data List data in python called projects and input the data one by one through jinja syntax let's breakdown what is that jinja syntax is doing here

{% for i in projects %}

This syntax is for looping through data called projects

{{loop.index}}

On the right side of the image i write loop.index what is this for? loop in jinja syntax is used for getting the index of the data we loop through we can get which index and how much length of the data using loop syntax

{{i.title}}

After we loop through List of data of projects we will get dictionary objects data of projects that consist of title,image,year,role,link,description,and page break.

{{name}}

We will also need dictionary object like name, from and contact so we can dynamically change our name through the loop of the projects

{%if loop.index != loop.length%}{{r i.page_break}} {% endif %}

using loop as our condition here we actually asked the python to stop using page break if the projects List is the last index

Step 3 — Let's code

First of all as i said earlier you need token for accessing the bit.ly api using requests dependency of python so after your register using bitly you can get an access token in the dashboard click on the top right corner of your dashboard you will get a menu like below

click on Profile Settings

click on Generic Access Token

type your password and click generate token you will get your token to access the bitly api

first lets create an environment variable like below

MONGO_HOST=
BITLY_TOKEN=
NAME=
FROM=
CONTACT=

please fill all the environment variable for example like below

MONGO_HOST=mongodb://asdas:asdsa@ds14112.mlab.com:24122/resume
BITLY_TOKEN=12321321311112bubu2ubjjjjjjjjjj
NAME=Nori Roin
FROM=Public (Non Universitas Ciputra)
CONTACT=+6281336226985 noriroin@gmail.com

Now lets create a file called shorten.py this file is later used for shorten our link with the code below

import requests
from decouple import config

def shorten(link,title):
    headers = {'Content-Type': 'application/json',"Authorization":config('BITLY_TOKEN')}
    payload = {'long_url': link,'title':title}
    r = requests.post("https://api-ssl.bitly.com/v4/bitlinks", json=payload,headers=headers)
    return r.json()["link"]

now let's create a new file called pdf.py to convert the docx to pdf

import sys
import subprocess
import re


def convert_to(folder, source, timeout=None):
    args = [libreoffice_exec(), '--headless', '--convert-to', 'pdf', '--outdir', folder, source]

    process = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout)
    filename = re.search('-> (.*?) using filter', process.stdout.decode())

    if filename is None:
        raise LibreOfficeError(process.stdout.decode())
    else:
        return filename.group(1)


def libreoffice_exec():
    # TODO: Provide support for more platforms
    if sys.platform == 'darwin':
        return '/Applications/LibreOffice.app/Contents/MacOS/soffice'
    return 'libreoffice'


class LibreOfficeError(Exception):
    def __init__(self, output):
        self.output = output

now let's create a new file called index.py with code below this is the main code that we can run later to generate resume pdf

from shorten import shorten
import wget
import pprint
from pymongo import MongoClient
from docxtpl import DocxTemplate,R, InlineImage
from docx.shared import Mm,Inches
import os
from PIL import Image
from decouple import config
from pdf import convert_to
client = MongoClient(config('MONGO_HOST'))
context = {"name":config('NAME'),"from":config('FROM'),"contact":config('CONTACT'),"projects":[]}
db = client['resume']
projects=db.resumes
images=[]
doc = DocxTemplate("resume.docx")
for post in projects.find().sort("_id",-1):
   imajin=InlineImage(doc, "a.jpg", height=Inches(3.97),width=Inches(6.25))
   if post['i']:
       image_filename = wget.download(post['i'])
       foo = Image.open(image_filename)
       im=foo.convert('RGB')
       im.save(image_filename+".jpg","JPEG",optimize=True,quality=65)
       images.append(image_filename)
       images.append(image_filename+".jpg")       
       imajin=InlineImage(doc, image_filename+".jpg", height=Inches(3.97),width=Inches(6.25))
   context['projects'].append({"page_break":R('\f'),'description':post['d'],'title':post['t'],'year':'2019','role':'programmer','link':shorten(post['p'],post['t']) if post['p'] else shorten(post['g'],post['t']) if post['g'] and post['t'] else '-' 
   ,'image':imajin})

doc.render(context)
doc.save("generated_resume.docx")
for image in images:
    os.remove(image)
convert_to("./","generated_resume.docx")
os.remove("generated_resume.docx")

Okay let's breakdown one by one what is this python index.py doing

from shorten import shorten
import wget
import pprint
from pymongo import MongoClient
from docxtpl import DocxTemplate,R, InlineImage
from docx.shared import Mm,Inches
import os
from PIL import Image
from decouple import config
from pdf import convert_to

this line here is for importing the necessary library we earlier create shorter.py what that for? that is for shorten our preview link later to showcase our work so that when somebody click that link we can analyze how much people go to our link in bit.ly and the last line we write from pdf import convert_to that is for converting our docx to pdf

client = MongoClient(config('MONGO_HOST'))

this line here is to connect our python to our mongodb database, config('MONGO_HOST') right here is for to get the environment variable from .env we create earlier

context = {"name":config('NAME'),"from":config('FROM'),"contact":config('CONTACT'),"projects":[]}

here we create a dictionary with the following data to later put it in our docx template

db = client['resume']
projects=db.resumes

here we get the database in mongodb in my case i named the database resume and the collection named resumes so i create variable projects that get the data from collections in mongodb called resumes for the data inside it you can take a look at the image of my mongodb collection below

as you can see i had the example data like this so i will break this down one by one why i create my data like this so the data column is t , g ,p ,d ,i . Thats actually stand for title , github , preview , description , image . I create it like this because later if you want to create for example a production mongodb database you can minimize the data you store on the mongodb because most of the database host like mlab for example they charge based on data size so the smaller the data the lower the cost. And maybe you notice that in i data i use gyazo to host my image gyazo is cropping tool for linux and other os i think to crop the part of our screen and its free for public image.

images=[]

this line here is for appending the list of image, because i host my image on gyazo i need to download all the image and compress it so that the size of the document is smaller also so this List actually for later use when i finished convert it to pdf i can remove all the image so that we can free up some space on our storage

doc = DocxTemplate("resume.docx")

this line here is for to communicate with the docx i named the docx resume.docx you can named it whatever you want and you should put it in the same folder as the index.py

for post in projects.find().sort("_id",-1):
   imajin=InlineImage(doc, "a.jpg", height=Inches(3.97),width=Inches(6.25))
   if post['i']:
       image_filename = wget.download(post['i'])
       foo = Image.open(image_filename)
       im=foo.convert('RGB')
       im.save(image_filename+".jpg","JPEG",optimize=True,quality=65)
       images.append(image_filename)
       images.append(image_filename+".jpg")       
       imajin=InlineImage(doc, image_filename+".jpg", height=Inches(3.97),width=Inches(6.25))
   context['projects'].append({"page_break":R('\f'),'description':post['d'],'title':post['t'],'year':'2019','role':'programmer','link':shorten(post['p'],post['t']) if post['p'] else shorten(post['g'],post['t']) if post['g'] and post['t'] else '-' 
   ,'image':imajin})

this is the part where you put the data into dictionary and compress image also using Pillow and put it into List projects

doc.render(context)
doc.save("generated_resume.docx")

after that we can save the docx in the same folder as the index.py

for image in images:
    os.remove(image)
convert_to("./","generated_resume.docx")
os.remove("generated_resume.docx")

we remove the image convert the docx to pdf then remove the generated docx

python index.py

run using this command

and voila! you will get file named generated_resume.pdf

Conclusion

in this article i've shown you how to create a resume pdf using python automatically using mongodb you can try to edit my code using other database for example postgresql or any database you want below is the github link for the source code of the article