DEV Community: Vadim Beskrovnov

How to Boost Your Productivity as a Developer

Vadim Beskrovnov — Sun, 04 Dec 2022 19:20:44 +0000

Are you struggling to stay focused and productive as a developer? You're not alone. In today's fast-paced world, it can be difficult to stay on top of your work and achieve your goals. But fear not! In this article, I'll share some simple tips and tricks to help you boost your productivity and get more done in less time.

Create a to-do list. It sounds simple, but having a clear and concise to-do list can make a big difference in your productivity. Start each day by writing down the tasks you need to complete, and prioritize them based on importance and deadline. This will help you stay focused and avoid getting overwhelmed by a long list of tasks.
Set aside dedicated time for focused work. It's easy to get distracted by emails, social media, or other interruptions when you're working. To avoid these distractions, set aside dedicated time for focused work. This could be in the form of a daily time block, where you turn off all distractions and focus on your most important tasks.
Take regular breaks. It may seem counterintuitive, but taking regular breaks can actually help boost your productivity. When you work for long periods of time without a break, your brain becomes fatigued and you lose focus. By taking regular breaks, you give your brain a chance to recharge and come back to your work with renewed energy and focus.
Use productivity tools and apps. There are many productivity tools and apps available that can help you stay organized and on track. Some popular tools include Trello for project management, Evernote for note-taking, and Todoist for task management. Experiment with different tools to find the ones that work best for you.
Surround yourself with a supportive community. As a developer, it can be easy to feel isolated and alone. But having a supportive community of like-minded individuals can make a big difference in your productivity. Consider joining a local developer meetup group, or participating in online forums and communities where you can share ideas and learn from others.

By following these tips, you can boost your productivity as a developer and achieve your goals more efficiently. Give them a try and see how they can help you stay focused and productive in your work.

What are your favorite productivity tips for developers?

Product design interview. My experience

Vadim Beskrovnov — Sat, 10 Sep 2022 16:10:41 +0000

Intro

I would like to share my experience passing 45 min product design interview(NOT system design interview) in tech company. Everything described below is my experience, it is NOT a benchmark answer and does not claim to be one.

Task

You need to design an API to implement a news feed like on the following picture.

Solution

Task understanding

As usual, the task itself is very high level and unclear, and our first goal is to ask as many important questions as possible to understand the context of the task.
Let's ask the first portion of questions:

Are there restrictions on the use of API types? It allows us to get rid of the first decision, which API should we use: REST, Web socket, GraphQL, etc. Or at least shows the interviewer that we care about it.
How many users do we have, and what is the expected load? This is one of the main factors that influence all decisions during the interview, and it is better to understand this requirement right away to save time.
How many client types(iPhone, Android, PC, etc.) are expected? Will the design vary? If there is only one client type, or at least the design is the same for all clients, this will allow us to make only one version of the API, otherwise we will have to solve an additional problem.

There are definitely much more questions which we can to ask, but we should remember that time is limited, and we only need to get the most critical information to make decisions.

First steps

Based on answers above, we decided to start designing the required API. Let's imagine that we have to use REST API. So we need at least one endpoint to get a list of recent posts:

Endpoint

GET /posts

Request

{
    "count": 2
}

Response

{
  "posts": [
    {
      "picture_url": "",
      "text": "",
      "comments_ammount": 42,
      "likes_amount": 146
    },   
    {
      "picture_url": "",
      "text": "",
      "comments_ammount": 22,
      "likes_amount": 246
    }
  ]
}

This endpoint allows us to render the exact layout we need, but it has multiple problems:

How are we going to load next posts during scrolling? If we are going to make another request with count: 2, we can get the same posts.
How are we going to sort posts? What if returned posts will be the oldest ones, but we require the newest?
How can we get exact comments or list of people who liked the post?

This is not an exhaustive list of issues, but in my opinion they are the most critical, so let's try to fix them.

Improvements

So how can we solve issue #1?
I can see two ways here:

Simple but limited. Load the maximum amount of posts, let's say 1000, and allow users to scroll feed only to this amount.
Use pagination and get the posts in batches.

It's hard to even call it a trade-off, as the choice is obvious. The best option is to use pagination, it allows us to get posts portionally.

Request

{
    "offset": 0,
    "limit": 2
}

Response

{
  "offset": 0,
  "limit": 2,
  "posts": [
    {
      "picture_url": "",
      "text": "",
      "comments_ammount": 42,
      "likes_amount": 146
    },   
    {
      "picture_url": "",
      "text": "",
      "comments_ammount": 22,
      "likes_amount": 246
    }
  ]
}

Now we can get posts batch by batch, which help us to avoid rendering the same post twice.

To solve the problem #2 we need to add more fields to be able to sort our posts. We have at least 2 ways here:

More flexible. We can receive to fields: how to sort and what to sort. For example, sort in ascending order by comments_ammount field. It makes our solution flexible but increase complexity.
More simple. If we are talking about news feed, in the most cases users want to sort posts by date only. So we can just have only one field: how to sort. For example, sort in ascending order – get the oldest posts first.

I would choose more simple option here for now, anyway it can be improved later.

Request

{
    "offset": 0,
    "limit": 2,
    "sort": "ASCENDING"
}

Response

{
  "offset": 0,
  "limit": 2,
  "posts": [
    {
      "picture_url": "",
      "text": "",
      "comments_ammount": 42,
      "likes_amount": 146,
      "date": "2022-09-08T09:45:55Z"
    },   
    {
      "picture_url": "",
      "text": "",
      "comments_ammount": 22,
      "likes_amount": 246,
      "date": "2022-09-10T09:45:55Z"
    }
  ]
}

Let's move on to the problem #3 – how to see comments and likes? Again, we have multiple ways of solving:

Implement separate endpoint to get all comments/likes for post by post_id. This approach will have better performance for feed loading, as we will get fewer data. But it makes users wait when they want to see comments.
Include comments/likes in feed response. This approach is the opposite of the previous one. It will slow down feed loading, but it will speed up comments/likes viewing
Hybrid approach. Include top N comments/likes in feed response and implement separate endpoints to get full data. This one is the most complex, but the most optimal.

The hybrid approach looks like the most appropriate here, and it doesn't require a lot of effort, so let's use it.

Endpoint

GET /posts

Request

{
    "offset": 0,
    "limit": 2,
    "sort": "ASCENDING"
}

Response

{
  "offset": 0,
  "limit": 2,
  "posts": [
    {
      "id": 12345,
      "picture_url": "",
      "text": "",
      "comments_ammount": 42,
      "likes_amount": 146,
      "date": "2022-09-08T09:45:55Z",
      "top_comments": [
        {
          "id": 1,
          "author_id": 2,
          "text": ""
        },
        {
          "id": 2,
          "author_id": 4,
          "text": ""
        }
      ],
      "top_likes": [
        {
          "id": 1,
          "author_id": 5
        },
        {
          "id": 2,
          "author_id": 8
        }
      ]
    },   
    {
      "id": 12346,
      "picture_url": "",
      "text": "",
      "comments_ammount": 22,
      "likes_amount": 246,
      "date": "2022-09-10T09:45:55Z",
      "top_comments": [
        {
          "id": 4,
          "author_id": 10,
          "text": ""
        },
        {
          "id": 5,
          "author_id": 11,
          "text": ""
        }
      ],
      "top_likes": [
        {
          "id": 6,
          "author_id": 12
        },
        {
          "id": 7,
          "author_id": 16
        }
      ]
    }
  ]
}

Endpoint

GET /posts/{id}/comments

Request

{
    "offset": 0,
    "limit": 4,
    "sort": "ASCENDING"
}

Response

{
  "offset": 0,
  "limit": 4,
  "comments": [
    {
      "id": 4,
      "author_id": 10,
      "text": ""
    },
    {
      "id": 5,
      "author_id": 11,
      "text": ""
    },
    {
      "id": 6,
      "author_id": 10,
      "text": ""
    },
    {
      "id": 7,
      "author_id": 11,
      "text": ""
    }
  ]
}

Endpoint for likes looks the same, so we omit it here.

Time ended

It is likely that by this point the 45-minute interview will have come to an end, and it is worth finalizing the decision. We have certainly prepared a working API, but there are still a lot of problems, and it is worth going over them quickly.

What if new posts will be created during scrolling? It means that offset will be shifted and on the next request we will receive a duplicated post.
What about speed of pictures loading? Should we add multiple options of pictures URL: preview – small but low quality and fullsize – big with good quality?
How to deal with new posts loading? Should we use long pooling or publish/subscribe model?

Conclusion

Note that there are several options for solving an issue (trade-offs), your job as a candidate is to show that you see them and try to justify your choice of the one best option. And don't worry if after the interview you understand, that you miss something. The goal of the interview is to see the way, you're thinking, not to prepare the real design of the product.

How I parsed estate marketplace to build price graph stats

Vadim Beskrovnov — Tue, 09 Aug 2022 21:53:27 +0000

My goal

I was looking for a flat to buy, and I wanted to find something cheaper than the market. Estate market is quite efficient, so it is almost impossible to find cheap item manually, that’s why I decided to automate this process.

Solution

Idea

To understand, if a specific item is “cheap” or not, we need to have historical data of previous deals of this object or at least of similar objects in the same location. There is no public service, which can provide deals data, but there are a lot of property websites which allows you to find sales offers. I decided to write an application, which parses one of such sites and save all properties into the database. Then I could make queries and analyse price trend in the specific area.

Tools

Based on my experience, I decided to choose Java language to implement it. I started with command line application using Spring Boot and Spring Batch. High level data process looked as follows.

Architecture

Let's go through these components one by one.

Properties website

It is a website to parse. That time I was interested in property in Russia, so I used local portal: https://www.avito.ru. There are multiple categories, including flats. The structure is as following: there are a list of ads in each category with multiple pages and 50 items per page. Each item contains information about a specific property, that I was needed in.

Page Parser

This is the first component of my application, it receives category URL as an input: /moskva/kvartiry/prodam-ASgBAgICAUSSA8YQ?cd=1&p={page}. As you can see there are two parameters, the first one cd is always the same and the second one p is responsible for page number. Then using jsoup library, I read each page in cycle and collected URLs.

Elements items = document.select(".item");
for (Element item: items) {
    Elements itemElement = item.select(".item-description-title-link");
    String relativeItemReference = itemElement.attr("href");
    urls.add(relativeItemReference);
}

After reading of each page, I sent a list of URLs to the next component.

ID exists filter

Each item has the following URL: /moskva/kvartiry/2-k._kvartira_548m_911et._2338886814, it contains an identifier in the end (2338886814). This is the unique ID of the ad. I used it as a key in cache to avoid parsing the same items twice.

But some items can be parsed twice anyway because cache writes were made later, so multiple ads with the same ID could pass this gate.

Item Parser

After the filter, all unique IDs went to the next component – Item Parser. It uses ID to go to item page and read all data from this page.

Elements attributes = doc.select(".item-params-list-item");
Map <String, String> attrs = attributes.stream().collect(Collectors.toMap(
    attr -> attr.text().split(":")[0].trim(),
    attr -> attr.text().split(":")[1].trim()));

estate
    .setTotalSpace(Double.parseDouble(attrs.getOrDefault("Общая площадь", "").split(" ")[0]));

estate
    .setLiveSpace(Double.parseDouble(attrs.getOrDefault("Жилая площадь", "").split(" ")[0]));

...

As a result, an object with all property info is built and passed forward – to saver component.

Saver

This component is the last one in my pipeline. It receives items from Item Parser converts them to JSON and then saves to Elasticsearch, using batches to improve performance.

As a result, I was able to build multiple Kibana dashboards with prices and popularity metrics. One of the most useful components is the interactive map, that allows you to render data with coordinates(I got coordinates from the ad's description). It helps me find perfect property in good area with good price.

Problems

During this experiment, I faced some problems and tried different solutions, which I want to share.

IP address blocking

As you can guess, nobody wants to allow parsing their data, so this site also has different layers of protection. Thus, during development everything worked fine, because I made small amount of requests. But as soon as I started testing I faced with huge amount of 403 errors.

Firstly, I tried to use multiple headers and cookies to simulate a real user with a browser.

Document imageDoc = Jsoup
    .connect(url)
    .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36")
    .header("referer", "https://www.avito.ru" + relativeItemReference)
    .header("accept", "*/*")
    .ignoreContentType(true)
    .get();

It didn’t help me. I think that they have much more intelligent checks than just verifying userAgent.

So my next try was to find the smallest timeout which can help me avoid blocking. To find it, I used free VPN services to be able to quickly change IP addresses. I have experimentally set a minimum timeout of 25 seconds. But it means that I can parse only ~3500 items per day, and it is definitely not enough.

To increase parsing speed, I decided to parallel my algorithm and use a proxy for each thread.

doc = Jsoup.connect(url)
    .proxy(proxy.getHost(), proxy.getPort())
    .userAgent("Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2")
    .header("Content-Language", "en-US")
    .timeout(timeout)
    .get();

So my only limitation now was the number of proxies available, and I was using public free proxies. I didn’t want to pay for it, so I had to use free unstable proxies, so some of them were slow, some were unstable.

My next improvement was to choose the best proxies from my list for each request. I made a scheduled job, that was checking each proxy every N minutes and save useful metadata like connection speed and number of errors.

@Scheduled(fixedDelay = 100, initialDelay = 1)
@Transactional
public void checkProxy() {
    ProxyEntity foundProxy = getProxyWithOldestUpdate()
        .orElseThrow(() -> new RuntimeException("Proxy not found"));

    int retries = 0;
    while (retries < retryCount) {
        log.debug("Attempt {}", retries + 1);
        if (checkProxy(foundProxy)) {
            log.debug("Proxy [{}] UP", foundProxy.getHost());
            foundProxy.setActive(true);
            break;
        }
        RequestUtils.wait(1000);
        retries++;
    }
    if (retries == retryCount) {
        foundProxy.setActive(false);
        log.debug("Proxy [{}] DOWN", foundProxy.getHost());
    }
    foundProxy.setCheckDate(LocalDateTime.now());
    proxyRepository.save(foundProxy);
}

Items duplicating

Sometimes it happens that people create a new ad for the same property, so for the system there are two different ads with different IDs. And it is fine for property website, but not for statistic and data analysis.

To get rid of duplicated items in my database, I just checked by title and description using string comparison. Sometimes it can be false positive, so I removed not a real duplicate, but a different ad with the same text. It is completely opposite, because it is totally fine for data analysis but critical for property website. Anyway, it solved my problem.