DEV Community: Tomas Laurinavicius

Digital Warm Up

Tomas Laurinavicius — Tue, 14 Jan 2025 18:25:14 +0000

I've realized that even when working on projects I'm passionate about, I need a digital warm-up period. This discovery came from my recent increased focus on coding.

Sometimes, grasping the big picture requires significant mental resources, and this is when my brain tends to seek diversions.

However, I've noticed an interesting pattern in my work habits.

Instead of completely avoiding work, I find myself gravitating toward smaller, still valuable tasks:

Organizing project files and documentation
Cleaning up code and removing redundancies
Completing small, manageable tasks

These activities create a sense of accomplishment and progress, effectively tricking my brain into a productive mindset. After this warm-up period, I naturally transition into the main project, often achieving a flow state that lasts for 2-3 hours of focused work.

What I've come to understand is that the actual coding isn't the challenging part.

The real complexity lies in the mental compilation process—organizing thoughts and concepts before translating them into prompts or actual code.

Boost SEO: A Comprehensive Guide to Crawl Budget Optimization (2024)

Tomas Laurinavicius — Wed, 03 Jan 2024 16:33:55 +0000

Crawl Budget is an SEO term.

It’s the number of pages a search engine like Google will visit and index on your website over a certain period. It depends on two things: crawl limit and crawl demand.

For more on optimizing your crawl budget to boost SEO, check out this free guide: A Technical SEO’s Guide to Crawl Budget Optimization.

Why Crawl Budget Matters for SEO

A crawl budget is critical for SEO as it affects how search engines find and index your site’s pages.

If Google doesn’t index a page, it won’t rank in search results. In other words, it won’t be in Google’s database.

Some pages won’t be indexed if your site has more pages than your crawl budget. These pages can be accessed directly but won’t attract search engine traffic.

Image source: Prerender.

Most sites don’t need to worry about crawl budget, as Google is efficient at finding and indexing pages.

However, it’s important in these situations:

Large sites: If your site (like an e-commerce site) has 10k+ pages, Google might only find some of them.
New pages: If you’ve added a new section with hundreds of pages, ensure your crawl budget can accommodate quick indexing.
Redirects: Numerous redirects and redirect chains can consume your crawl budget.

Understanding Crawl Budget and Crawl Limit

The crawl limit fluctuates based on several factors:

Crawl health: If your site responds quickly, the limit increases, allowing more connections for crawling. If your site slows down or returns server errors, the limit decreases, and Googlebot crawls less.
Limit set in Search Console: You can choose to reduce Googlebot’s crawling of your site.
Google’s crawling capacity: Google has many resources, but they are not unlimited.

What Determines Crawl Budget?

Google decides the crawl budget. It considers website size, page speed, crawl limit in Search Console, and crawl errors.

Image source: Prerender.

Website structure, duplicate content, soft 404 errors, low-value pages, website speed, and security issues also affect the crawl budget.

Crawl Budget and Crawl Rate

Crawl budget refers to the number of pages a search engine will crawl over a specific time. The crawl rate, however, is the speed at which these pages are crawled.

Simply put, crawl rate is the frequency at which a search engine visits a page or directory within a specific time frame.

How Crawl Budget Impacts SEO Factors

Here’s how crawl budget impacts SEO factors:

HTTPS Migration: When a site migrates, Google increases crawl demand to update its index with new URLs quickly.
URL Parameters: Too many URL parameters can create duplicate content, draining the crawl budget and reducing the chances of indexing important pages.
XML Sitemaps: A well-structured, updated XML sitemap helps Google find new pages faster, potentially increasing the crawl budget.
Duplicate Content: Sites with lots of duplicate content may get a lower crawl budget, as Google might see these pages as less important.
Mobile-First Indexing: This is how Google crawls, indexes, and ranks pages based on smartphone user-agent content. It doesn’t directly affect rankings but can influence how many pages are crawled and indexed.
Robots.txt: Disallowed URLs in your robots.txt file don’t affect your crawl budget. But, using robots.txt helps guide Google bots to pages you want indexed.
Server Response Time: Quick server responses to Google’s crawl requests can lead to more pages being crawled on your site.
Site Architecture: A well-structured site helps Googlebot find and index new pages more efficiently.
Site Speed: Faster pages can lead to Googlebot crawling more of your site’s URLs. Slow pages consume valuable Googlebot time.

Managing Crawl Budget

Effective crawl budget management helps your essential pages get crawled and indexed, boosting their search engine visibility.

Crawl Budget Management and Optimization

Here are some strategies to manage and optimize your crawl budget effectively:

Monitor crawl stats: Regularly check your site’s crawl stats in Google Search Console to understand Googlebot’s crawling pattern.
Improve site speed: Enhancing site speed promotes efficient crawling.
Streamline site structure: A well-organized site aids Googlebot in finding and indexing new pages.
Minimize redirects: Excessive redirects can deplete your crawl budget.
Manage URL parameters: Avoid creating duplicate URLs for the same content with too many URL parameter combinations.
Eliminate 404 and 410 error pages: These error pages can unnecessarily consume your crawl budget.
Prioritize key pages: Make sure Googlebot can easily access your most important pages.
Update your XML sitemap regularly: This helps Google discover new pages faster.
Increase page popularity: Pages with more visits are crawled more frequently.
Utilize canonical tags: These tags help prevent duplicate content issues.

Image source: Prerender.

To further optimize your crawl budget, follow these steps:

Navigate to “Settings” -> “Crawl stats” and note the average pages crawled per day.
Divide your total page count by this number.
If the result exceeds ~10 (indicating you have 10x more pages than what’s crawled daily), consider optimizing your crawl budget.

Noindex and Crawl Budget

Noindex is a directive to tell search engines not to index a particular page.

This can be a powerful tool for optimizing your crawl budget. Here’s how:

Better crawl budget allocation: By using noindex on less important or low-value pages, you can effectively guide search engine bots to focus their efforts on crawling and indexing your main, high-value content. This ensures that your crawl budget is spent where it matters most.
Avoid duplicate content: Duplicate content can drain your crawl budget as search engines might crawl the same content multiple times. Using noindex on duplicate pages can prevent this, preserving your crawl budget.
High ‘noindex’ to indexable URL ratio: While a high ratio of ‘noindex’ to indexable URLs doesn’t usually affect how Google crawls your site, it could become a problem if many noindexed pages need to be crawled to reach a few indexable ones. In such cases, noindex can help ensure that crawl budget is not wasted on pages that won’t be indexed.

JavaScript and SEO

JavaScript enables dynamic web content, but it can complicate traditional web crawling.

Image source: Prerender.

If JavaScript alters or loads content, crawlers may struggle to access or extract this data, leading to incomplete or incorrect data retrieval.

Optimizing JavaScript for SEO

Optimizing JavaScript for SEO ensures search engines can crawl, render, and index JavaScript-generated content. That’s particularly important for websites and Single Page Applications (SPAs) built with JavaScript frameworks like React, Angular, and Vue.

Here are some JavaScript SEO tips:

Assign unique titles and snippets to your pages.
Write search engine-friendly code.
Use appropriate HTTP status codes.
Prevent soft 404 errors in SPAs.

JavaScript Frameworks and SEO

JavaScript frameworks like React, Angular, and Vue.js help build complex web applications. They improve user experience and create interactive web pages.

These frameworks also enhance website performance and optimize rendering.

Using server-side rendering (SSR) or prerendering, developers can ensure search engine bots can easily access and index the content.

Other Ways to Index JavaScript Sites

There are two main ways to crawl data from websites: the traditional way and the JavaScript-enabled way.

The traditional way parses the HTML structure of web pages to get the information we want.

But, it can struggle with JavaScript-heavy websites.

JavaScript-enabled crawling solutions fix this.

They act like humans by rendering JavaScript elements, which lets them access content loaded dynamically.

These solutions can reach more, especially websites that use a lot of JavaScript.

Dynamic Rendering

Dynamic rendering is a method that provides different versions of a webpage to users and search engine bots.

When a bot visits your site, it receives a prerendered, static HTML version of the page.

This version is simpler for the bot to crawl and index, enhancing your site’s SEO.

Dynamic Rendering and SEO

Dynamic rendering boosts your site’s SEO.

It enhances the crawlability and indexability of your site, quickens page load times, and improves mobile-friendliness.

It’s especially useful for JavaScript-heavy websites, as it ensures all content is reachable by search engine bots.

Prerendering: A Solution

Prerendering is a form of dynamic rendering. It preloads all page elements for a web crawler.

Image source: Prerender.

When a bot visits, the prerender service provides a cached, fully rendered version of your site.

This method improves bot interactions.

Why Use Prerendering?

Prerendering helps SEO in several ways:

Improves crawl budget and indexing: Prerendering loads all links and content together, making it easier for search engines to find every page quickly. This boosts crawl speed and efficiency.
Speeds up indexing: Prerendering serves your pages to search engines in less than a second, improving speed and Core Web Vitals (CWV).
Ensures no content is missed: Prerendering shows a snapshot of your content to Google’s crawlers as static content. This ensures all your text, links, and images are crawled and indexed correctly, enhancing content performance.

Dynamic Rendering vs Server-Side Rendering

Server-side rendering (SSR) and dynamic rendering are two methods used to present web content to users and search engines.

SSR involves rendering the entire page on the server before sending it to the browser.
This means all JavaScript is run on the server-side, and the user receives a fully rendered page.
It can improve performance and SEO but also put a heavier load on your server.

On the other hand, dynamic rendering provides a static HTML version of the page to search engines and a regular (client-side rendered) version to users.

This means that when a search engine bot visits your site, it receives a prerendered, static HTML version of the page, which is easier for the bot to crawl and index.

Meanwhile, users receive a version of the page that’s rendered in their browser, which can provide a more interactive experience.

Both methods have benefits.

The best choice depends on your specific needs and circumstances.

How to Implement Prerendering

To set up prerendering, you need to add suitable middleware to your backend, CDN, or web server.

The middleware identifies a bot asking for your page and sends a request to the prerender service.
If it’s the first request, the prerender service gets resources from your server and renders the page on its server.
After that, the prerender service gives the cached version when it identifies a bot user-agent.

Wrapping Up

We’ve looked at crawl budget optimization and its effect on SEO.

We’ve discussed SEO challenges for JavaScript sites, best practices for JavaScript SEO, and how JavaScript frameworks affect SEO. We’ve also examined other ways to index JavaScript sites, focusing on dynamic rendering and prerendering.

To learn more about crawl budget optimization and how it can help your SEO, download Prerender’s free guide, A Technical SEO’s Guide to Crawl Budget Optimization.

Disclosure: I’m a growth consultant at Prerender.

40+ Black Friday & Cyber Monday Software Deals

Tomas Laurinavicius — Mon, 27 Nov 2023 12:51:37 +0000

If you're building something, check out these exciting SaaS deals from 40+ top brands, featuring 40+ deals of marketing, SEO, sales, CRM, dev, product, and business software.

Deals vary from extended trials, extra credits, discounts to lifetime plans. Ending soon. ⏳

Tower

Git client for Mac & Windows

50% discount on any Tower plan.

Offer validity: November 20 - December 1

ScraperAPI

Data collection tool

20% discount on annual plans.

Use coupon code: BLCK20

Offer validity: November 24 - December 1

Juicer

Social media aggregator

40% off Medium & Large monthly plans

50% off Medium & Large yearly plans

Offer validity: November 24 - December 8

Pipeline CRM

CRM for salespeople

50% off your first 3 months

Use coupon code: BlackFriday23

Offer validity: November 24 - December 1

Keyword.com

Keyword position & SERP tracker

40% off for 12 months on Agency package and above + 400 free keywords

Offer validity: November 24 - November 27

Sniply

Link shortener & tracker

30% discount on first 12 months of any Sniply subscription

Coupon code: BF2023

Offer validity: November 20 - December 1

Beekast

Collaborative tool

50% off on Beekast's Starter, Pro and Business plan

Coupon: BF23

Offer validity: November 20 - December 1

Usersnap

User feedback platform

20% off the first 3 months or 30% off the first 3 months with a yearly subscription

Monthly offer coupon: FRIDAY23M

Yearly offer coupon: FRIDAY23Y

Offer validity: November 24 - December 10

DashThis

Marketing reporting tool

Get a 1 free month discount on any DashThis plan.

Offer validity: November 20 - December 1

DeployBot

Code deployment tool

Up to 80% discount in your first month!

Coupon: DEPLOYFRIDAY80

Offer validity: November 20 - December 1

Rewardful

Affiliate program software

30% off for the first 3 months on all monthly plans

Coupon code: BLACKFRIDAY30

Offer validity: November 24 - December 1

Seobility

All-in-one SEO software

60-day free trial and 15% discount on the monthly price of Seobility Premium

Voucher code: blackfriday23

Offer validity: November 20 - December 3

MyWorks

eCommerce accounting automation

30% LIFETIME discount on annual Grow plan for $27/month (billed annually, originally $39/month)

Offer validity: November 20 - November 27

Snitcher

B2B visitor identification platform

20% off for 6 months, email leon@snitcher.com mentioning saas.blackfriday to claim the deal

Offer validity: November 24 - December

Get the Deal Now

Syften

Keyword monitoring

3 months for free.

Offer validity: November 24 - December 1

Guideflow

Interactive demo software

30% off the first year on our Solo, Pro or Growth plans with the code BF30.

Offer validity: November 24 - December 1

For the full list of deals, terms and instructions, go to saas.blackfriday.

The Secret to Rapid Scaling: How Scraping Helped These Startups Go From Zero to $1.2+ Trillion

Tomas Laurinavicius — Tue, 28 Mar 2023 12:10:02 +0000

As of today, March 28, 2023, Airbnb, Amazon, and Netflix have a cumulative market cap of $1.2+ trillion thanks to this one unsexy strategy.

To put that in perspective, if their worth was a country's GDP, it would rank 15th in the world (right below Spain).

What’s their secret to rapid growth and market dominance?

It’s data extraction at scale (also known as web scraping). It’s been used by the most explosive startups to acquire users and grow.

Read on to find out what’s web scraping and how you can benefit from using publicly available data for your business.

Web Scraping: The Secret to Scalable Growth

In today’s digital economy, data is the new differentiator.

Having reliable data at your disposal can give your business a competitive edge.

Amazon (Market Cap: $1.01T)

Amazon leverages big data collected from the internet, and their customers’ behavior, to update their product pricing approximately every ten minutes. Their pricing is set according to the general trends in the market, users’ shopping patterns, and business goals—among others.

By capturing big data, Amazon can smartly offer discounts on best-selling items and, at the same time, earn large proﬁts on less popular products. This data-driven strategy has proven fruitful as they signiﬁcantly doubled their annual sales from 2018 to 2021.

Netflix (Market Cap: $148.45B)

Netﬂix experienced similar success. They used web data acquisition to gather data about the preferences of their viewers and potential subscribers.

Unsurprisingly, many of the Netﬂix Original shows are a hit, helping them maintain a low churn rate of 2.4% from 2019 to 2021.

Airbnb (Market Cap: $74.50B)

In the early days of Airbnb, the company used Craigslist as a source of listings and scraped data from the site to populate its own platform.

This helped Airbnb rapidly acquire many listings and users.

These examples show that data harvesting is helpful in various businesses, regardless of the industry, type, or size.

Every organization that strives to scale should leverage publicly available data and use it to its advantage.

But how?
How can organizations collect web data at a large scale, automatically, and within minutes?

The answer is web scraping.

Three major beneﬁts of data harvesting:

Give insight into the market condition
Close observation of competitors
Deep understanding of consumer behavior

What is Web Scraping?

Web scraping is a method for extracting large amounts of data from the internet. This intelligent automated approach gathers everything from prices to product speciﬁcations, property listings, and publicly available data.

The results can be presented in structured ﬁle formats: XML or JSON.

Put simply, web scraping can be compared to “copy-pasting” content from websites, but it differs in the process and the tools needed to perform the action.

As you can imagine, data scraping requires a web scraper and a few lines of code to function. Some common programming languages and libraries used include Python BeautifulSoup and Python Scrapy.

Furthermore, unlike manual copy-pasting, a web scraper can harvest information from thousands of URLs by queuing requests in bulk.

This scalable solution eliminates any human intervention during the scraping process, saving you time and manual labor.

But Is Web Scraping Legal?

One general concern around web scraping is whether or not it’s legal.

No government has passed laws explicitly legalizing or de-legalizing web scraping thus far (2023). Therefore, we can only make strong assumptions based on case law about web scraping activity (e.g., HiQ vs. LinkedIn) and other data-related regulations.

We know that web scraping itself is legal—but it can be illegal depending on what type of data you scrape and how you scrape it. In general, you can legally scrape the internet as long as:

The data is publicly available
You don’t scrape private information
You don’t scrape copyrighted data
You don’t need to create an account and log in to access the website, OR you have read and fully understood the Terms and Conditions (T&Cs)

⚠️ Disclosure: I’m no expert, and the information given is provided for informational purposes only. Please seek legal advice if you’re in doubt about your web scraping project to ensure you’re not scraping the web illegally.

The Standard Sync Web Scraping Process

There are two primary components of a web scraper, the web crawler and the web scraper itself.

Web crawlers

The web crawler works similarly to a search engine bot. It crawls a list of URLs and catalogs the information. Then, it visits all the links it can ﬁnd within the current and subsequent pages until it hits a speciﬁed limit or there are no more links to follow.

Web scrapers

After the web crawler visits the dedicated web pages, the web scraper will collect the data. An integral element of a web scraper called ‘data locators’ will ﬁnd, select, and collect the targeted data from the HTML ﬁle of a website at scale without being blocked.

In simple words, this is how web crawling feeds into sync scraping: once data is crawled, it can be harvested. When the ﬁrst scraping request is complete, you can begin the next task.

Of course, the purpose of your scraping needs will always determine the type of scraper and method/s you use. Depending on your timeline and the volume of data collection you need, you may face challenges when you try to use a standard sync scraper to complete multiple tasks. Why? Because you’re bound to a limited response (timeouts) and the need to re-submit tasks.

Using an asynchronous scraper service, you can scrape at scale without these problems. It requires less coding and less infrastructure needed to build or maintain on your side. This speedy, modern method allows you to submit a large batch of requests simultaneously—still working to achieve the highest reachable success rate.

Once the job is done, you’ll be notiﬁed.

Source: ScraperAPI white paper.

Web scraping process

The web crawlers visit the given URLs.
The web scrapers request the page’s HTML ﬁle, parsing the response to generate a node tree. Most web scrapers will only parse the HTML code on the page, but more advanced web scrapers will also fully render the CSS and JavaScript of the page.
The scraper bots extract the data based on pre-set criteria (name, address, price, etc.) by targeting elements using HTML tags or CSS/Xpath sectors.
After the information is harvested, the scraper bots export the data into a database, spreadsheet, JSON ﬁle, or any other structured format, and it’s ready to be repurposed.

Learn Web Scraping: The Next Step

If you want to learn more about web scraping, I suggest starting with the basics and familiarizing yourself with the jargon. This will allow you to quickly search Google and find answers to any specific questions for your use case.

If you don’t know what “parallel requests,” “custom headers,” or “honeypots” are, you’ll have a hard time figuring out how to make things work.

If you’re interested, download this web scraping white paper (it’s free) to learn about:

🤖 Web scraping benefits and processes

💽 Types of data collection and web scrapers

😾 Common challenges (and how to overcome them)

✈️ Industries that use scrapers in their day-to-day tasks

🪄 Tips for using a web scraping API more effectively

👉 Web Scraping: The Basics Explained

Disclosure: I’m a growth consultant at ScraperAPI.

Featured image credit: Visual Capitalist.