Hello there!
In this post (my first one!) I am going to explain how I created a simple Python package and published it on PyPI. I started it as part of my final project for CS50x course and decided to share the steps to reach the end.
The repository in GitHub with all the files discussed is here and the published package is here.
📘 🐍 Skoobpy
Some context is important to begin. Skoob is a social network focused on books very popular in Brazil and it is similar to goodreads. There it is possible to save books in different bookshelves, such as read, currently reading, desired ones, and so on.
As a user, I always wanted a way to be able to get these books data to use it for some purpose. For example, if I am paying attention to some book sales I have to browse for many pages in the site to see all the books that I saved on my desired bookshelf as the site does not have an API.
In this context, the package returns all the data, as the title, author, publisher, page numbers, from the desired books in a CSV file for a specific user.
How it works
skoobpy
can be run in a command-line followed by an user_id
. The data will be stored in a CSV file named books_user_id.csv
.
$ python skoobpy <user_id>
Or it could be imported into a python file to use the data in other ways.
import skoobpy
from skoobpy import *
Building the package
Creating a Virtual Environment
In order to prevent future issues because of running into dependency issues due to changes that I may use in the project I created a virtual environment
. For instance, if I use some version of the package request
and in a future update they modified something, some part of my code that works just fine could just stop working. Also, if I am collaborating with someone else in the project, it is a great idea to be sure that everyone is working in the same environment.
First, I run the command to install virtualenv
:
$ pip install virtualenv
Inside a folder called skoobpy
I run the command below. This creates a folder called venv
.
$ virtualenv venv
Now it is necessary to activate the environment. There is a difference depends on what operating system you are using here.
- For Windows, while using the WSL (Windows Subsystem for Linux) you should run the first command below (and if you are a beginner as I am, read this). If you are not using WSL, run the second one:
$ source ./venv/Scripts/activate
(venv) $
$ \pathto\venv\Scripts\activate
(venv) $
- For linux you should run:
$ source ./venv/bin/activate
(venv) $
After this, the prompt will be prefixed with the name of the environment (venv) as showed below. This indicates that venv is currently active and python executable will only use this environment’s packages. To deactivate an environment simply run deactivate
.
(venv) $ deactivate
$
Finally, I installed here all the dependencies that are going to be necessary to build the package. They are wheel
, setuptools
, twine
, requests
and to perform some tests pytest
. I put all the names that I need in a file called requirements.txt
to install everything at once and then my environment is ready for work.
$ pip install -r requirements.txt
Looking at the source code
Now that I have presented the idea, I am going to show how I did it. To begin, let's take a look at the directory structure of skoobpy
:
skoobpy/
│
├─ skoobpy/
│ ├── __init__.py
│ ├── __main__.py
│ └── skoobpy.py
│
├── tests/
│ └── test_skoobpy.py
│
├── venv/
│
├── LICENSE
├── README.md
├── requirements.txt
└── setup.py
In this section, I will show the details of the code file by file. All the files can be seen in the GitHub repository.
📂 skoobpy/
Besides the setup.py
, there is still the files LICENSE
that I take the MIT default one for open-source projects and the README.md
that documents the package.
The setup.py
contains all the information that is important to PyPI. Here we define every aspect of the package, let's see some of them:
-
name
defines the actual name that will appear at the time to install the package. - In
package
you can define what is going to be include or exclude from your package. I included onlyskoobpy
to avoid the foldertests
. -
version
shows the actual version of your package. A good source to understand the semantic of the version is looking at this. -
description
presents a short description of what the package does. - In
long_description
it is possible to give a better description of the functionalities of the package. Here I simply used the content inREADME.md
. - The
long_description_content_type
makes it possible to use a markdown file as the long description. -
author
andauthor_email
are important if you want to let people contact you about the package. -
url
presents where to find more information about it. Usually the repository. -
install_requires
shows which other packages are mandatory to use this one. It is not necessary to list packages that are part of the standard Python library. -
classifiers
are important to make it easy to find the package on the PyPI site.
from setuptools import find_packages, setup
with open('README.md', 'r', encoding='utf-8') as file:
long_description = file.read()
setup(
name ='skoobpy',
packages =find_packages(include=['skoobpy']),
version =__version__,
description ='extracts user\'s desired books from Skoob.com.br',
long_description = long_description,
long_description_content_type='text/markdown',
author ='Diego Lourenço',
author_email ='diego.lourenco15@gmail.com',
license ='MIT',
url ='https://github.com/Diegoslourenco/skoobpy',
platforms =['Any'],
py_modules =['skoobpy'],
install_requires =[],
classifiers =[
'Development Status :: 3 - Alpha',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python',
],
)
📂 skoobpy/skoobpy/
__init__.py
This file represents the root of the package. Could be left empty, but I put the variable __version__
inside it to track the version in the future.
# __init__.py
__version__ = '0.1.3'
__main__.py
Briefly, this is the entry part of the program and has the responsibility to call others as needed. There are two imports here.
First, we have to import the argv
from sys
as it is taking the second argument (argv[1]
) from the command line as the user_id
.
In the other import, we take all the content in the file skoobpy
that we are going to see in detail soon.
# __main__.py
from skoobpy import *
def main():
from sys import argv
user_id = argv[1]
books_json = get_all_books(user_id)
books_desired = filter_desired_books(books_json)
export_csv(books_desired, user_id)
if __name__ == "__main__":
main()
skoobpy.py
This is the file that does all the work. It imports requests
to make the request to the site, the json
to get the data from the site in a format that it is possible to work and csv
to export what we want.
There are three functions defined here: get_all_books
, filter_desired_books
and export_csv
.
The
get_all_books
compose anurl
using theurl_base
skoob.com.br and theuser_id
number.
Depends on the number of books saved by the user, it results in many pages on the site. For this reason, it is necessary to get thetotal_books
that represents the total number of books. Thetotal_books_url
represents the final URL to request.
Finally, a request to thetotal_books_url
is made and the result is parsed as an object JSON is saved in the variablebooks_json
and that is what the function return. Now we have all the book data from a user from skoob.filter_desired_books
receives the data in a JSON and to take only the desired books, it has to check if the book fielddesejado
(desired
in portuguese) is equal to 1. In a positive case, it saves the data from the book in a list. If the value is equal to zero, it means that this book is not desired. It returns the listbooks
populated with the desired ones.export_csv
defines in theheader
the first row for the CSV file. After this, using theheader
and thebooks_list
it opens a CSV file namedbooks_{user_id}
saving each element of the list corresponding to a row.
# skoobpy.py
import requests
import json
import csv
url_base = 'https://www.skoob.com.br'
def get_all_books(user_id):
url = f'{url_base}/v1/bookcase/books/{user_id}'
print(f'request to {url}')
user = requests.get(url)
total = user.json().get('paging').get('total')
total_books = f'{url}/shelf_id:0/page:1/limit:{total}'
books_json = requests.get(total_books).json().get('response')
return books_json
def filter_desired_books(books_json):
books = []
for book in books_json:
if book['desejado'] == 1:
ed = book['edicao']
# if there is a subtitle, it must be concatenate to title
if ed['subtitulo'] != '':
book_title = str(ed['titulo']) + ' - '+ str(ed['subtitulo'])
else:
book_title = ed['titulo']
book_url = url_base + ed['url']
book_data = [book_title, ed['autor'], ed['ano'], ed['paginas'], ed['editora'], book_url]
books.append(book_data)
return books
def export_csv(books_list, user_id):
header = ['Title', 'Author', 'Published Year', 'Pages', 'Publisher', 'Skoob\'s Page']
with open(f'books_{user_id}.csv', 'w', encoding='utf-8', newline='') as csvfile:
data = csv.writer(csvfile, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL)
data.writerow(header)
for book in books_list:
data.writerow(book)
return
📂 skoobpy/tests/
test_skoobpy.py
There are a couple of tests here to verify if the functions are taking the correct data before export to the CSV file from a specific user (my own user in this case).
# test_skoobpy.py
# Tests for the skoobpy module
# standard import
import csv
# third party import
import pytest
# skoobpy import
from skoobpy import *
@pytest.fixture
def total_books():
user_id = 1380619
return get_all_books(user_id)
@pytest.fixture
def total_desired_books():
user_id = 1380619
all_books = get_all_books(user_id)
return filter_desired_books(all_books)
# Tests
def test_total_books(total_books):
assert len(total_books) == 619
def test_total_desired_books(total_desired_books):
assert len(total_desired_books) == 466
Building the library
After all, the content is ready and everything is working well, it is time to build the package running:
python setup.py sdist bdist_wheel
This will create a new folder dist
with two files.
- The
sdist
creates the source distribution (skoobpy-0.1.3.tar.gz). - The
bdist_wheel
creates the wheel file to install the package (skoobpy-0.1.3-py3-none-any.whl)
skoobpy/
│
└── dist/
├── skoobpy-0.1.3-py3-none-any.whl
└── skoobpy-0.1.3.tar.gz
Checking for errors
The first step is to look inside the skoobpy-0.1.3.tar.gz
and see if everything is here, running the command below. The new files are created based on the information provided in the setup.py
.
$ tar tzf ./dist/skoobpy-0.1.3.tar.gz
skoobpy-0.1.3/
skoobpy-0.1.3/PKG-INFO
skoobpy-0.1.3/README.md
skoobpy-0.1.3/setup.cfg
skoobpy-0.1.3/setup.py
skoobpy-0.1.3/skoobpy/
skoobpy-0.1.3/skoobpy/__init__.py
skoobpy-0.1.3/skoobpy/__main__.py
skoobpy-0.1.3/skoobpy/skoobpy.py
skoobpy-0.1.3/skoobpy.egg-info/
skoobpy-0.1.3/skoobpy.egg-info/PKG-INFO
skoobpy-0.1.3/skoobpy.egg-info/SOURCES.txt
skoobpy-0.1.3/skoobpy.egg-info/dependency_links.txt
skoobpy-0.1.3/skoobpy.egg-info/top_level.txt
Using twine
to check if the distribution will render correctly on PyPI is another way to verify if everything is going as planned.
$ twine check dist/*
Checking dist/skoobpy-0.1.3-py3-none-any.whl: PASSED
Checking dist/skoobpy-0.1.3.tar.gz: PASSED
The final check could be performed by uploading the package to TestPyPI. This will confirm if the package is going to show the information on the site and execute as it should be. It is mandatory to have an account as the twine
will ask for a username and password. After the upload, it is possible to go to TestPyPI, see the package there, and install it to test.
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/*
Uploading the package
The final step of the journey is to upload it to PyPI. Once more it is mandatory to have an account and it is not the same as the TestPyPI one. Two registers have to be made in the two sites. The final command to run is:
$ twine upload dist/*
Following all the steps, just install the package using pip and use it!
pip install skoobpy
Conclusion
To summarise in this post I showed:
- The idea of
skoobpy
and how to use it - How I prepared a virtual environment
- How I built the package
- Perform some tests
- Some ways to check if the package is going to show as expected
- How to upload the package
Succeeding some (much!) research to understand and solve many unexpected and unknown errors, I accomplished the goal. Hope it can be helpful to someone out there.
Thank you for reading!
Diego
Top comments (4)
Very nice read, I really enjoyed reading it.
the Windows command looks the same as Linux, I think it should be without the 'source'
I'm glad you enjoyed this! Thank you for the feedback. I will look at it.
Awesome!
Thank you, Kishan!