DEV Community

Cover image for Advance your Python skills by Building a Whatsapp Chat Analyser: a Guided Project
Nityesh Agarwal
Nityesh Agarwal

Posted on • Updated on • Originally published at

Advance your Python skills by Building a Whatsapp Chat Analyser: a Guided Project

Finding ways to apply your knowledge after the learning process essentially means that the learning happened without much sense of a destination. All we were trying to do was amass all the knowledge we could, in the hope that it would come of use in some distant, mystical future.

Doesn't that feel like procrastination?

I believe in an approach to learning that gives doing projects the primary importance.

When you try to make something, you discover a hundred things that you don’t know. You discover things that you thought you knew but don’t really know. You trip over things that seemed so simple that you didn’t even pay attention to them. You fill the gaps in your learning.

Also, it is super fun and adventurous!

You can get all that only if you do a project. So, I think that it’s worth it to center your entire learning around completing a project.

If you want to dive into building something interesting and learn useful Python/programming skills along the way, this guide is for you.

With this guide, I aim to walk you through building something interesting, allow you to experience difficult-to-grab programming intuitions as you build it and make you to go from a basic Pythonista to an advanced one.

Most importantly, I want to give you motivation and the incentive for you to teach yourself.

What you will learn

Here are some textbook skills that you will pick up:

  • File handling
  • String operations in Python
  • Functions
  • Modules
  • PiP and using 3rd party packages
  • Regular Expressions (RegEx) in Python

But this is not a textbook. So along with them, you will also develop intuitions about good programming practices like:

  • The importance of readability of your code and coding style
  • When and how to break your code into functions
  • How to go about debugging your code (when you want to bang your head against the wall, instead)
  • How to look things up on the Internet - use Google, use StackOverflow, read documentation etc.
  • Understand the need for different data structures and when to use what

Let's get to it then!

Q: "Alright, what am I building?" 😃

Okay so here's the idea:

When chatting with a close friend, have you ever wanted to know -

  • the number of messages sent by each of you
  • your the average length of messages
  • who texts first and the first text in each conversation
  • your chatting time patterns - hourly, daily and monthly
  • most shared website links
  • most common words that each of you use

Wouldn't it be cool if you wrote a program that would just calculate all this stuff for you?!?

Q: "But how cool is it, really?" 😒

Reddit says that it is "14k points"-cool!

Whatsapp Chat Analyser on Reddit

Your program is going to find similar results and print them for you without those graphs and visuals.

Q: "Cool! But am I ready?" 😳

“Every great developer you know got there by solving problems they were unqualified to solve until they actually did it.”

— Patrick McKenzie

Thinking along these lines, I believe that:

  • If you know the basics of the following in Python - variables, lists, dictionaries, loops, conditions, functions - you are ready.
  • Otherwise, if you are new to Python but know the basics in some other language - go through this quick Python tutorial and I think you'll be ready.

Just dive into the 1st "hello world"-equivalent exercise below. If you can complete it, you are are ready!

Q: "And how will I build it?" 😕

Whatsapp allows you to export any chat into a text file that looks something like this - 

Whatsapp chat with Piyush screenshot

So you can write a program that will read this chat file, parse it, analyse it and give you the results.

But that's not enought help, right?

  • That is why I have written this short guide for you to follow like a roadmap. I have divided the task of building into 10 milestones (MS) and have written small pieces of advice on what you need to learn to cover each milestone. Treat it like apprenticeship.

"Okay, let's do it then!" 😃

A Roadmap for building a Whatsapp Chat Analyser

MS0: Set up your environment

When you are starting out, you don't want to spend hours setting up your environment. Half your motivation gets killed right there! Right? is the way out of the setup-frustration.

It is a website that provides an online IDE for almost every language, which you can access for free with just a few clicks. It is great for small projects like the one we are building.

Twitter praise

MS1: An Assurance That Things Work (the "Hello World!" equivalent)

Every programming book/tutorial ever starts out with a "Hello World!" program. Why is it so?

Apart from being welcoming to newcomers, this program does the job of reassuring the learner that her environment is set up and that things work. So, if she does it right, her program will work too!

With these goals in mind, here is your Hello World-equivalent program:

Print "I love you 3000".. 3000 times! (Any Marvel fans out there? :p)

I love you 3000 Marvel

This is a good opportunity to go deep and:

  • See, if you are ready to dive deeper into the project

If not, then its time to do the basics of Python. Don't worry, it isn't too difficult.

MS2: Read your chat file using your Python program

Here onwards, you will build a piece of the project with each chapter.

There are 2 files that you will need for the project - 

  1. Your Whatsapp chat file (ending in .txt)
  2. A Python code file (ending in .py)

Once you have them, this first chapter requires you to open the chat file using your Python program and print all of its contents.

This is a good opportunity to go deep and:

  • Understand how to handle files with Python

File handling in Python - from Zero to Hero:

You know that any editor that you use to open a text file on your computer (Notepad, VS Code, Vim, etc.) is a program, right?

You know what? - you can make your own Python program do that. Almost easily!

Go through this excellent tutorial by Real Python to learn the concepts of file handling in Python.

MS3: Features #1 and #2 - Count the total no. of messages and total no. of words

Count the number of messages that you and your friend have exchanged. Then, count each of your individual share - both according to the number of messages and the number of words. Print the results.

This is a good opportunity to go deep and:

  • Understand Strings in Python

Important things to remember about strings in Python:

  • Strings are treated as lists. So you can do search like this:
if "- Paridhi:" in chat_line:
Enter fullscreen mode Exit fullscreen mode
  • Python strings are famous (as compared to the ones in other languages) because Python powers them with a rich library of in-built methods that you can use to perform operations on them. I suggest that you use this tutorial by W3Schools as your reference material for those methods.

  • Python's ability to slice and negative index strings can be really handy at times!

Caution: Now onwards, you will feel your program grow in size and complexity. As it does so, you should start getting conscious about your coding style and keep the readability of your code in mind.

Coding style and readability of code:

Brian W. Kerninghan says in his book - The Practice of Programming:

"The purpose of style is to make the code easy to read for yourself and others, and good style is crucial to good programming."

Coding violent psychopath

Personally, whenever I try to take decisions about readability of my code, this line from the Zen of Python plays in my brain:

Explicit is better than implicit

Here are 3 simple, actionable rules that you can keep in mind to develop a good coding style:

1. Put some thought into choosing your variables' names

I find Brian W. Kerninghan's advice really helpful here:

  • Global functions, classes, and structures should have descriptive names that suggest their role in a program.
  • By contrast, shorter names suffice for local variables; within a function, n may be sufficient, npoints is fine, and numberofPoints is overkill.
  • Local variables used in conventional ways can have very short names. The use of i and j for loop indices, p and q for pointers, and s and t for strings is so frequent that there is little profit and perhaps some loss in longer names.

2. Use functions wherever necessary

  • Break long pieces of code into functions
  • Don't Repeat Yourself (DRY) - use functions to remove duplicate pieces of code

More on functions in the next chapter.

3. Write helpful comments

  • Comments are meant to help the reader of a program. They do not help by saying things the code already plainly says, or by contradicting the code, or by distracting the reader with elaborate typographical displays.
  • As much as possible, write code that is easy to understand; the better you do this, the fewer comments you need. Good code needs fewer comments than bad code. Comments are, at best, a necessary evil.
  • Don't contradict the code. Most comments agree with the code when they are written, but as bugs are fixed and the program evolves, the comments are often left in their original form, resulting in disagreement with the code.

In the end, remember that the principles of programming style are based on common sense guided by experience, not on arbitrary rules and prescriptions.

MS4: Feature #3 - Calculate the average length of messages sent by each party

Now, that you have calculated your individual share using 2 metrics - message count and word count - you can use it to calculate each of yours average length of messages. Print the results.

This is a good opportunity to go deep and:

Understand functions as a means to:

  • reduce repetition
  • make code more readable

Deep dive into using functions - motivation and style:

Duplication may be the root of all evil in software. Functions were one of the first techniques developed to control this evil.

It is easy to understand the syntax of writing functions but it takes practice and some sense of design to learn when to break the code into functions. One goal is to design functions such that they can be reused when extending your program to new cases.

What more? Making such design choices are what make programming fun!

Here are 3 heuristics from Bob Martin's book Clean Code that will guide you while making such choices:

  • Functions should be small; how small? No more than a screenful or 20 lines
  • Functions should have descriptive names. The smaller and more focused a function is, the easier it is to choose a descriptive name. Don’t be afraid to make a name long. A long descriptive name is better than a short enigmatic name. A long descriptive name is better than a long descriptive comment.
  • Functions should do only one thing and have no "side effects" - its intent should be clear from its name

When you first write a function it will probably come out long and complicated and not follow any of the above rules. And that's ok. You can refine and reformat your code later. I don't think anyone could start with writing functions that follow all the rules mentioned above.

Remember they are function-building goals that you need to strive towards. Don't let them paralyse you.

MS5: Feature #4 - Count no. of first texts and show them

Do you want to resolve the issue of "who texts first" once and for all? 😜
After this milestone, you will. You will know exactly how many conversations each of you have initiated and have a list of those first texts. Print all that out.

This is a good opportunity to go deep and:

  • Understand modules  - you'll need Python's time module here
  • Learn how to look things up and read the documentation

Caution: Don't be intimidated by the docs. They're your friends.

What are modules?

Every file of Python source code whose name ends in a .py extension
is a module.

Python installation comes with a standard library that contain such modules out-of-the-box. These are useful pieces of code that you don't have to write!

MS6. Feature #5  - Chatting time patterns - hourly, daily and monthly

Now, its time to find out your usual chatting patterns.

  • What hour of the day do you chat the most? What about the rest of the hours?
  • Which day of the week do you usually chat the most? What about the rest of the days?
  • Which month have you chatted the most? What about the rest?

Print the results.

This is a good opportunity to go deep and:

  • Understand the need for different data structures for storing all this data and think upon how to design a data structure to suit your needs

Note: You will need the time module again here. It's important for you to know that it's okay if you don't remember it; you are allowed to use Google and check the documentation as many times as you need.

Caution: Implementing this can be quite tricky. You are likely to spend a majority of your coding time banging your head over broken code.
Remember: "It's not the computer but your code that is at fault." :)

How to debug your code:

  • Explain the code to a friend or use the "Rubber duck technique"
  1. Pick a friend (or a rubber duck)
  2. Open the problematic code and explain it to him (/her/it), line by line, slowly and patiently
  3. Find the problem staring at you, in your face, without any help of your friend (or the duck), as if by magic!
  • Add print statements
Print statement debugging meme
You know.. these print statements! 😂

Although adding such print statements isn't the correct way to debug, I find them incredibly effective at times. Especially, when I'm working with a text editor like VIM and not on a full-fledged IDE that has a debugger (or when you are too lazy to learn how to use a debugger :p).

But I have to say, once you learn how to use an IDE debugger, there is no going back..

  • Use an IDE debugger

As of writing, doesn't fully support a debugger yet. My favourite IDEs for Python that do support it are PyCharm or VS Code.

A debugger can be so useful that I will recommend you to make the switch and learn how to use the debugger in it. Trust me, it is totally worth the pain! (Especially now, that your code is of a considerable complexity.)

Personal advice: I using "IDE debugger" because Python provides a debugger in the standard library module - *pdb** - and I will suggest that you don't get into using it now.

MS7. Feature #7 - Most shared websites

This is a good opportunity to go deep and:

  • Learn RegEx
  • Understand Python dictionaries as traditional hashtables: mapping from website name to the number of occurrences

Quick intro to RegEx

A regular expression is a special text string for describing a search pattern.

You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. You can think of regular expressions as wildcards on steroids.

"I want every string that is between "http://" or "https://" and the second / after that, if present. Or else, the first /."

Here are a few favourite resources to learn Regex:

MS8. Feature #8  -  Most common words

I'll let you figure this one out on your own!

MS9. Print all of the above in pretty, neat tables

You must be using some print statements to print the results of each milestone. Now, its time to focus on the presentation of those results. Print all the above results in pretty, neat tables.

To do this, you might need to restructure a large portion of the code in order to decouple the print statements from the function definitions (assuming you haven't already been doing it).

This is a good opportunity to go deep and:

  • Realize what it means when people advise - "functions should do just one thing"
  • Learn to search, install and use 3rd party modules that Python's awesome, vibrant community provides through pip
  • Give a personal touch to the project with the way you design the tables!

Quick primer on Python's rich ecosystem of open-source, 3rd-party packages

Python's ecosystem has contributors ranging from individual developers to megacorps like Facebook and Google (rich ecosystem, eh? :p). They offer modules and libraries of code to aid in website construction, numeric programming, game development, data science, machine learning, deep learning and well, printing pretty tables.

Now, that's a whole lot of code you don't have to write!

PyPi is the home to all these 3rd-party Python packages. You can find a page on every open-source, 3rd party package here.

Here are a few things that will get you up to speed to using PyPi:

  • You can install every package using a simple terminal command - pip. You can find exactly what you need to type on a package's page in PyPi.
pip install tabulate
Enter fullscreen mode Exit fullscreen mode
  • Any good package also has a How To Use guide (or documentation) on its page in PyPi
  • Even newbies can publish their experimental packages to Python as well. You should be careful before using them; they may be incomplete or unmaintained. You can check out a package's Release History or its Github statistics to determine its credibility.

MS10: Make all of this work for group chat files

With this milestone, you will be extending your program to a new case - group chats. Up untill this point, you would have a direct message chat file with one friend. Now, you will modify your program so that it will work with Whatsapp group chat files as well.

This is a good opportunity to go deep and:

  • Evaluate your functions. Are you able to reuse at least some of them?
  • Feel the benefits of a good coding style and good programming practices
  • See the importance of version control system and learn Git

Good software

It will do you well to remember what Brian W. Kerninghan says about good software in his book The Practice of Programming:

The basic principles that form the bedrock of good software are simplicity, which keeps programs short and manageable; clarity, which makes sure they are easy to understand, for people as well as machines; generality, which means they work well in a broad range of situations and adapt well as new situations arise; and automation, which lets the machine do the work for us, freeing us from mundane tasks.

Alright, I hope this has been useful for you. You will gain a true understanding of all the mini-lessons in this guide, once you actually dive into doing the project yourself.

Here's some code to give your start a boost:

Don't be afraid of starting out because things will difficult when you get stuck. That is the adventure; it will feel super cool everytime you dig yourself out.

Also, you can ask me or your fellow learners your doubts in the Build To Learn Slack group.

As an ending note, I would like you to remember the words of Jen Simmons as you work on this project (or any other programming project for that matter):

Whatsapp Chat Analyser is one of the 20 cool programming projects that I mentioned in the last post in the series - Build To Learn. If you want me to do a similar guide for any of the others, feel free to comment below or reach out to me directly!

Subscribe to the Build To Learn newsletter to get an email when I do new guides and articles.

You can reach out me on both Twitter and LinkedIn.

Top comments (9)

kostantine2014 profile image
Kostas Ntine

Wow! A couples of months ago I analyzed the chat history I had with my now ex-girlfriend. I wanted to investigate the timeseries of certain parameters e.g. number of messages, specific emojis etc. in order to check at which point our distance relationship was declining. That was so fun despite we broke up😂😭

nityeshaga profile image
Nityesh Agarwal • Edited

Hahha! That is so geeky, I love it! 😜 Awesome (and sad) stuff man!

bketelboeter profile image
Brian Ketelboeter

I don't have any Whatsapp chats to use. Suggestions?

nityeshaga profile image
Nityesh Agarwal

You can follow the rest of the guide with any other chat file as well. Can you get you chat archive on some other chat service that you use?

damian_chisom profile image
Damian Emerah

Hello, please how can i separate the time and the dates in the text file from the main chat?

nityeshaga profile image
Nityesh Agarwal

Hey @demigod, you can use the time module for that. Sorry for the late response. I would request you to join the above mentioned Slack group to get faster help from me or your peers as well! :)

bharzzy profile image

Hello Nityesh, you've got an awesome project here. How do i join the slack group? is there a link or something... pls post below

hamzamateen profile image

Hello! I love your project, but can you give me a hint on how to Code MS5: Feature 4, btw thanks for guided project.

nityeshaga profile image
Nityesh Agarwal

Hey Hamza, you chould try to take the difference between every 2 consecutive msg and see if it is more than a certain threshold. You'll probably need to use Python's time module here.

Sorry for the late response. I would request you to join the above mentioned Slack group to get faster help from me or your peers as well! :)