<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: CrawlNow</title>
    <description>The latest articles on DEV Community by CrawlNow (@crawlnow).</description>
    <link>https://dev.to/crawlnow</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F7334%2F7603f83b-ac76-49cf-8787-3369af3c222e.png</url>
      <title>DEV Community: CrawlNow</title>
      <link>https://dev.to/crawlnow</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/crawlnow"/>
    <language>en</language>
    <item>
      <title>The Why And How Of Scraping Online Job Postings Data</title>
      <dc:creator>Hina Batool</dc:creator>
      <pubDate>Wed, 27 Oct 2021 00:00:00 +0000</pubDate>
      <link>https://dev.to/crawlnow/the-why-and-how-of-scraping-online-job-postings-data-36nn</link>
      <guid>https://dev.to/crawlnow/the-why-and-how-of-scraping-online-job-postings-data-36nn</guid>
      <description>&lt;p&gt;&lt;em&gt;The rise in hiring activities occurring online means all the more job data to harness. There are several innovative uses of the information and several businesses and individuals that can benefit from scraping job sites. Whether you want to market a new ed-tech course, build a niche job aggregator, or find talent for your company, job board scraping is the answer. Let’s explore how job data scraping works, the opportunities it offers to different people, and its challenges.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Can you guess the most popular mobile job-seeking location for Americans? According to a 2016 research by &lt;a href="https://www.statista.com/topics/2727/online-recruiting/"&gt;Statista&lt;/a&gt;, most Americans look for jobs in their “&lt;strong&gt;bed”&lt;/strong&gt;. Do you scroll opportunities on LinkedIn lying in the bed before dozing off? &lt;/p&gt;

&lt;p&gt;The US has &lt;a href="https://www.apollotechnical.com/job-search-statistics/#6--where-job-seekers-find-employers-"&gt;a 65% audience reach on LinkedIn&lt;/a&gt;. Indeed.com and Glassdoor are also very popular sources for both recruiters and job seekers. With the pandemic, online recruitment is the new normal. From job posting, job searching, interviewing, and recruiting, everything is possible through the browser. These statistics give job aggregators all the more reason to make money off.&lt;/p&gt;

&lt;p&gt;While the entire business of job aggregators stands on job data scraping, it’s also a handy tool for recruiters, job seekers, staffing agencies, ed-tech platforms, and many other people. Let’s dive straight into job scraping, learn how it’s done, and all the amazing opportunities it unfolds. It’s likely you’ll find something that fits right into your business to drive growth. &lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Job Scraping?
&lt;/h2&gt;

&lt;p&gt;Job scraping is the automated gathering of job postings data across websites and download it in a structured format (e.g., CSV, JSON) suitable for spreadsheets, databases, or other software applications.&lt;/p&gt;

&lt;p&gt;Automated programs, called crawlers (or bots), can scrape job sites to discover hundreds of thousands, even millions, of jobs posted online. Below are some of the fields that can be extracted from these job postings: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Job Title&lt;/li&gt;
&lt;li&gt;Company&lt;/li&gt;
&lt;li&gt;Job Location&lt;/li&gt;
&lt;li&gt;Job Posting Date&lt;/li&gt;
&lt;li&gt;Job Description&lt;/li&gt;
&lt;li&gt;Job Type (full-time or part-time)&lt;/li&gt;
&lt;li&gt;Salary Range&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scraping job sites is an effective tool for collecting data for various use cases - including building jobs aggregation for niche job boards, lead generation, tracking competitors, and market research. &lt;/p&gt;

&lt;h3&gt;
  
  
  Sources Of Job Postings Data
&lt;/h3&gt;

&lt;p&gt;There are three main sources of online job postings data:&lt;/p&gt;

&lt;h4&gt;
  
  
  Career Pages
&lt;/h4&gt;

&lt;p&gt;Most companies have a career page, where they would list all of their open jobs For example, here is &lt;a href="https://careers.microsoft.com/us/en"&gt;Microsoft’s career page&lt;/a&gt;. Career pages typically are the original, and most up-to-date, source of job postings for a company.&lt;/p&gt;

&lt;h4&gt;
  
  
  Job Boards
&lt;/h4&gt;

&lt;p&gt;Indeed, LinkedIn, Glassdoor, Monster, Careerbuilder, etc., are examples of some of the popular job boards in the US. Job boards typically aggregate job postings from tens of thousands of companies, so they serve as a great place for sourcing jobs across many companies, locations, professions, etc. However, job boards usually use anti-scraping technology which makes them harder to scrape compared to company pages. &lt;/p&gt;

&lt;p&gt;There are many small niche job boards as well, targeted to specific industries, etc. &lt;/p&gt;

&lt;h4&gt;
  
  
  Applicant Tracking Systems (ATS)
&lt;/h4&gt;

&lt;p&gt;Some companies use applicant tracking systems, e.g. Greenhouse, Lever, Jobvite, to post their jobs online. So, they could be a good place to source jobs across multiple companies without having to scrape company websites separately.&lt;/p&gt;

&lt;p&gt;Sometimes, the same jobs may be posted on multiple websites. If you are working with CrawlNow as your &lt;a href="https://www.crawlnow.com/products/data-extraction-services"&gt;data scraping partner&lt;/a&gt;, we will work with you to pick a scraping strategy that works best for your use case and budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Is Job Postings Data So Valuable?
&lt;/h2&gt;

&lt;p&gt;Did you know that Indeed.com boasts over &lt;a href="https://www.indeed.com/lead/indeed-200-million-unique-visitors"&gt;200 million unique users per month&lt;/a&gt;? Job seekers keep an eye out for career breakthroughs while recruiters don’t want to miss out on the best talent. No wonder indeed job scraping is one of the most widely used use cases of web scraping. In short, the web is brimming with job data, giving you endless opportunities to monetize it.  &lt;/p&gt;

&lt;p&gt;Here are some popular uses of job scraping:&lt;/p&gt;

&lt;h3&gt;
  
  
  Build/Populate Niche Job Boards
&lt;/h3&gt;

&lt;p&gt;The normalization of online recruiting means a higher customer base for job boards and job aggregators than ever before. The volume, freshness, and authenticity of job postings are important criteria for success in the aggregation business, and a good job scraping solution would guarantee that. With job board scraping, you can create your website into a reliable, fast &lt;strong&gt;one-stop job repository&lt;/strong&gt; for job seekers in any niche. &lt;/p&gt;

&lt;h3&gt;
  
  
  Salary Analysis
&lt;/h3&gt;

&lt;p&gt;As a recruiter, are you ready to offer salaries higher than the industry standards and increase your costs? Or are you willing to offer salaries lower than your competitors and let go of the top talent as a price? Employers who want to maintain just the right threshold between the company’s costs and top talent, often bank on job data scraping to &lt;strong&gt;compare salaries&lt;/strong&gt; before deciding their offers.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Lead Generation
&lt;/h3&gt;

&lt;p&gt;Generating a list of companies hiring talent in your niche is another use case where tapping jobs data will up your game. Whether you’re offering ed-tech courses or services in any niche, scraping Indeed helps you find companies &lt;strong&gt;hiring your talent&lt;/strong&gt;. Reach the right companies to make winning pitches.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Competitive Analysis
&lt;/h3&gt;

&lt;p&gt;Keeping an eye on the competition is an important aspect of all businesses. How can you track your &lt;strong&gt;competitor’s next move&lt;/strong&gt; when they’ll make every effort to keep it a secret? The one place where they’ll let their guard down is the career page. You can use a job crawler to identify the technologies they’re hiring in and use it to work out their plans for the future. &lt;/p&gt;

&lt;h3&gt;
  
  
  Analyzing Labor Market
&lt;/h3&gt;

&lt;p&gt;Ed-tech platforms, market research, real estate, consulting firms, and many other use cases need deep insight into the labor market to optimize their content. By far, Indeed job scraping is the most efficient way to extract data for analyzing labor trends. &lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges Of Scraping Job Postings
&lt;/h2&gt;

&lt;p&gt;Job scraping is a smart choice when finding your streak of luck in the job market. However, it is by no means simple. Once you start scraping, there are many challenges that will confront you:&lt;/p&gt;

&lt;h3&gt;
  
  
  Diverse Sources To Scrape
&lt;/h3&gt;

&lt;p&gt;Job data is available on the career pages of companies' official websites and job aggregation sites. It’s not hard to write a scraping program to scrape a few pages from a single website. However, websites are structured differently. A web scraper that’s built on a specific website’s interface won’t work on websites that are designed differently. &lt;/p&gt;

&lt;p&gt;When you want to extract data from 50 different sources, each with its own interface, it’s quite possible that you’ll have to write 50 different programs! What if the website interface changes? The job crawler will need to be consistently updated in sync with the website’s design to continue functioning. &lt;/p&gt;

&lt;h3&gt;
  
  
  Battling On New Anti Scraping Arenas Each Time
&lt;/h3&gt;

&lt;p&gt;A job crawler has many foes - captchas, IP blocks, honeypot traps, sign-in requirements, and legal complications to name a few. There are plenty of anti-scraping techniques that websites can deploy to control visits. Working around these obstacles is time-consuming, expensive, and above all - frustrating!&lt;/p&gt;

&lt;h3&gt;
  
  
  High Engineering Cost
&lt;/h3&gt;

&lt;p&gt;Stale job postings can be a business nightmare for job boards. A job seeker who calls up a company for a position they found on a job board, only to discover that the post is already filled, is unlikely to use that job board again. To prevent vandalizing your business’s credibility, you need to maintain your crawlers to continue bypassing new obstacles and regularly scrape &lt;strong&gt;up-to-date&lt;/strong&gt; information for you. Besides the upfront cost of deploying a job scraper, you’ll also need a developer to handle the ongoing maintenance of your in-house scraping system. &lt;/p&gt;

&lt;h2&gt;
  
  
  Alternatives For Job Data Scraping
&lt;/h2&gt;

&lt;p&gt;If you want to scrape job data off the web, there are some alternatives you can choose among. &lt;/p&gt;

&lt;h3&gt;
  
  
  Partnering With A Web Scraping Service
&lt;/h3&gt;

&lt;p&gt;The simplest possible option is to outsource your job scraping requirements to a web scraping service. The web scraping service includes a team of professional experts that guarantee you efficient delivery of up-to-date information. Since the data is delivered directly to you, there’s no learning curve, maintenance costs, or obstacles that would concern you.  Partnering with a professional web scraping service means you can &lt;strong&gt;focus on utilizing the data&lt;/strong&gt; rather than extracting it. &lt;/p&gt;

&lt;h3&gt;
  
  
  Using A Self-Service Scraping Tool
&lt;/h3&gt;

&lt;p&gt;There are many self-service web scraping tools available on the market. You can use them for small, one-time job scraping tasks. They usually have a steep learning curve though and may require HTML know-how to activate advanced features.&lt;/p&gt;

&lt;p&gt;While it might be a budget-friendly option, self-service scraping tools are difficult to customize and are often only suitable for small-scale scraping.&lt;/p&gt;

&lt;h3&gt;
  
  
  In-House Web Scraping Setup
&lt;/h3&gt;

&lt;p&gt;Building an in-house solution for commercial-scale job data scraping can be expensive. It might make sense when most of the following conditions are true:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your company already has resources to develop and maintain sophisticated software systems.&lt;/li&gt;
&lt;li&gt;You have a very unique use case and want tight control of the operations.&lt;/li&gt;
&lt;li&gt;You understand the maintenance costs of operating and maintaining web crawlers. Many people underestimate those hidden costs until they hit issues like captcha challenges, IP blocking.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Benefits Of Partnering With A Web Scraping Service
&lt;/h2&gt;

&lt;p&gt;The easiest way to bypass the challenges of web scraping and yet capture the perks of scraping job sites in their entirety is to partner with a web scraping service. A professional web scraping service, like &lt;a href="https://www.crawlnow.com/products/data-extraction-services"&gt;CrawlNow&lt;/a&gt;, offers several benefits. Here’s what you can expect:&lt;/p&gt;

&lt;h3&gt;
  
  
  Access The Best Resources
&lt;/h3&gt;

&lt;p&gt;A dedicated web scraping service already has all the resources in place to efficiently source data for you. You can leverage the best scraping infrastructure without the hassle of maintaining an IT team and resources in-house. &lt;/p&gt;

&lt;h3&gt;
  
  
  Scalable Scraping Solutions
&lt;/h3&gt;

&lt;p&gt;Often job postings are to be scraped off the career pages of different websites. Every website has a different layout and behavior. That warrants the need for developing a separate crawler for every website. That will not be scalable, and will quickly become a maintenance nightmare if you have to crawl more than a few websites. This is because website layouts can change frequently, which will break the crawlers causing interruptions to data delivery.&lt;/p&gt;

&lt;p&gt;Also, if you try to scrape more than a few pages from most websites, you will be met with challenges like IP blacklisting, captcha challenges, etc. &lt;/p&gt;

&lt;h3&gt;
  
  
  Uninterrupted Data Delivery
&lt;/h3&gt;

&lt;p&gt;The primary purpose of a web scraping service is to deliver data to its clients. Whether you want a one-off delivery, or daily, weekly or monthly data feeds, you can arrange with your web scraping service to make the deliveries at a frequency of your choice. With the on-time, uninterrupted flow of data, you can focus better on bigger goals for the company. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;With the explosive growth in online recruiting after the COVID-19 outbreak, the online job repository is also expanding. More and more companies are exploring innovative ways to harness the job data and expand their business. While there are endless ways to exploit job data, the fastest and most profitable way to extract it is to seek a web scraping service. &lt;a href="https://www.crawlnow.com/contact"&gt;Contact CrawlNow&lt;/a&gt; to learn about the exciting ways in which scraping job sites can give a boost to your business.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Related Readings
&lt;/h2&gt;

&lt;p&gt;If you’re interested in learning the prospects of job data scraping for your business, here are some resources you may find helpful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.crawlnow.com/industries/jobs-staffing"&gt;Data Solutions For Jobs &amp;amp; Staffing Agencies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.crawlnow.com/blog/is-web-scraping-legal"&gt;Is Web Scraping Legal? The Definitive Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.techradar.com/best/us-job-sites"&gt;Best job sites of 2021 by TechRadar&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webscraping</category>
      <category>scraping</category>
      <category>digitalmarketing</category>
      <category>automation</category>
    </item>
    <item>
      <title>Is Web Scraping Legal? The Definitive Guide</title>
      <dc:creator>Hafiz Hamid</dc:creator>
      <pubDate>Sun, 19 Sep 2021 00:00:00 +0000</pubDate>
      <link>https://dev.to/crawlnow/is-web-scraping-legal-all-you-need-to-know-4ale</link>
      <guid>https://dev.to/crawlnow/is-web-scraping-legal-all-you-need-to-know-4ale</guid>
      <description>&lt;p&gt;Originally published at &lt;a href="https://www.crawlnow.com/blog/is-web-scraping-legal"&gt;CrawlNow&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It is a data-driven world. Sourcing and consuming external data is the necessity of many businesses. Not only that, leveraging publicly available data is the only way to survive and undercut competition for many businesses. While web scraping is the key to unlocking access to web data, there is lots of confusion, and myths, around the legality and ethics of web scraping. This article aims to address those and bring clarity to the topic. It also goes over the best practices you should follow, as well as the legal and ethical boundaries you should respect, to get the best out of web scraping while keeping it safe and legal.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Web scraping is a great way to source useful external data for data-driven businesses around the globe. However, there is lots of confusion on the legality of web scraping. If you type the question, “Is web scraping legal?” on Google, you’ll find opposing views on the topic, depending on &lt;strong&gt;who&lt;/strong&gt; is answering it. While data scraping companies will try to paint an optimistic picture to get more business, anti-scraping service providers will equate it with data theft to sell their solutions.&lt;/p&gt;

&lt;p&gt;The truth is that almost all big companies use web scraping, one way or the other, to collect data about their competitors and markets. They do not see it as unethical for their own use. However, it may irk them when they find others scraping their own websites.&lt;/p&gt;

&lt;p&gt;In this blog, I will try to take an unbiased view. Things may not always be black and white, and may be open to interpretation in some situations, though. So, I would recommend seeking legal advice when in doubt. This article does not intend to provide legal advice.&lt;/p&gt;

&lt;p&gt;Before we look into whether web scraping is legal or illegal, let’s understand what it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Web Scraping?
&lt;/h2&gt;

&lt;p&gt;Web scraping is the use of computers and automation to visit pages on one or more websites, and extract information from their HTML and Javascript source code, in a format that is readable for software applications, e.g. spreadsheets or databases.&lt;/p&gt;

&lt;p&gt;The same operation could be performed by humans, but it will be much slower. An example is to download product attributes for a few thousand items on Amazon.&lt;/p&gt;

&lt;h2&gt;
  
  
  So, Is Web Scraping Legal?
&lt;/h2&gt;

&lt;p&gt;So, is it legal to scrape a website, then?&lt;/p&gt;

&lt;p&gt;There is no law in the US, or elsewhere, that says web scraping is illegal. So does that mean web scraping is legal? It depends on what data you are scraping and how you are using it. &lt;/p&gt;

&lt;p&gt;Web Scraping is simply a tool to automate what humans can otherwise do manually. A tool itself cannot be legal or illegal. It’s the use of the tool that can be legal or illegal.  &lt;/p&gt;

&lt;p&gt;Data scraping has been in use for a long time. Search engines use bots to discover and index web pages. Price comparison websites use scraping to inform their consumers before they make purchases. You could even scrape your own website for analytics. At the same time, bad actors may use scraping to conduct fraudulent activities such as data theft or DDoS attacks.&lt;/p&gt;

&lt;p&gt;Though web scraping is not illegal, it’s a technology you should use with care. There are boundaries that you would want to respect to make sure you don’t get into legal trouble. If you scrape smartly, abiding by the ethical web scraping practices, it’s highly unlikely for it to be held against you even if the websites you are scraping do not like it.&lt;/p&gt;

&lt;p&gt;It comes down to three things that decide legality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What you scrape&lt;/li&gt;
&lt;li&gt;How you scrape it&lt;/li&gt;
&lt;li&gt;How you use the data you scrape&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following section will help you evaluate your use case and determine whether your web scraping use case lies in the safe zone or not. &lt;/p&gt;

&lt;h2&gt;
  
  
  Questions To Ask Yourself Before You Scrape
&lt;/h2&gt;

&lt;p&gt;Asking yourself the following 6 questions, pertaining to the generally accepted web scraping ethics, will help you stay compliant. &lt;/p&gt;

&lt;h3&gt;
  
  
  Are You Scraping Personal Data?
&lt;/h3&gt;

&lt;p&gt;Personal data scraping could be an unsafe area where you need to be extra cautious. Different jurisdictions have different laws governing access and use of personal data. While it might be okay to scrape personal data in some US states, you may get into trouble for doing the same in California. Wherever you are, check your local regulations before you scrape personal data. &lt;/p&gt;

&lt;p&gt;Extending to the territorial laws, even if you are situated in a place where scraping data is okay but you scrape the data of a person situated in the EU, for example, the laws in EU may apply to you. The EU is very particular about their citizens' privacy, so you may want to review the General&lt;a href="https://ec.europa.eu/info/law/law-topic/data-protection/data-protection-eu_en"&gt; Data Protection Regulation&lt;/a&gt; (GDPR) before scraping their information. &lt;/p&gt;

&lt;p&gt;Next, you may ask, &lt;strong&gt;what is personal data&lt;/strong&gt;? &lt;/p&gt;

&lt;p&gt;According to the &lt;a href="https://oag.ca.gov/privacy/ccpa"&gt;California Consumer Privacy Act&lt;/a&gt; (CCPA), personal information is the data that can identify or be linked to an individual or household. It includes, but is not limited to, a person’s name, birthday, contact details, IP address, and audio and video recordings. &lt;/p&gt;

&lt;p&gt;On the bright side, you won’t typically need to worry about personal data when scraping for price intelligence or competitive analysis. &lt;/p&gt;

&lt;p&gt;However, when scraping reviews and social media data, personal data is often a consideration. Usernames, names, profile pictures, among other things can be categorized under personal data in this case. In such scenarios, there are multiple ways to avoid web crawling legal issues. For example, you can anonymize the data by omitting fields like username, emails etc.. &lt;/p&gt;

&lt;p&gt;When you’re working with &lt;a href="https://www.crawlnow.com/"&gt;CrawlNow&lt;/a&gt;, we carefully review your specific use case and work hand in hand with you to make sure you comply with laws related to personal data, including GDPR, CCPA and your local jurisdictions. &lt;/p&gt;

&lt;h3&gt;
  
  
  Are You Scraping Non-Public Data?
&lt;/h3&gt;

&lt;p&gt;Before scraping a website, you should know what is public data and what is not. Websites generally keep certain data available to the public. As long as you are scraping only the publicly available content, you should generally be safe. However, there are a few other things to keep in mind that are discussed in the following sections. &lt;/p&gt;

&lt;p&gt;Non-public data is something that is not accessible to everyone on the web. You will typically need to login to view this data. If the data is only available after you have logged in, it directly means that it is not available for public access. If you scrape non-public content, you may be inviting trouble, but it depends on the context. &lt;/p&gt;

&lt;p&gt;Facebook, for instance, may allow you to scrape data in certain conditions, but only after “&lt;a href="https://www.facebook.com/apps/site_scraping_tos_terms.php"&gt;Facebook’s express written permission&lt;/a&gt;”. &lt;/p&gt;

&lt;h3&gt;
  
  
  Are You Scraping Copyrighted Data?
&lt;/h3&gt;

&lt;p&gt;A lot of the content available on the internet is protected by some kind of copyright. Scraping and using copyrighted material irresponsibly may fall under &lt;a href="https://www.copyright.gov/title17/"&gt;copyright infringement&lt;/a&gt;. Music, news, blogs, research papers, movies, images, databases and logos are some potentially copyrightable data. Even when not explicitly declared a “copyright”, every private, original work is automatically copyrighted for the author under the &lt;a href="https://www.lawfirms.com/resources/technology-law/technology-and-intellectual-property/copyright-internet.htm"&gt;Berne Convention&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, not all information on the internet can be flagged under copyrights. Some of it are plain facts, and consequently a safe resource for web scrapers. Product name, product descriptions, price data, and number of sales or views, which is the core input of price intelligence and competitive analysis, are some examples of plain facts. &lt;/p&gt;

&lt;p&gt;Images, videos and databases are some of the content types that may come up in web scraping projects. In such cases, it’s important to look at the use case, since you may be able to scrape copyrighted data in certain situations, depending on how you use it. &lt;/p&gt;

&lt;p&gt;Aggregators, for example, typically use snippets from different sources and attach a link that directs the viewer to the original source, i.e. the copyright holder. In many situations, you may want to scrape copyrighted data for analysis. In many jurisdictions, these may be considered as ethical web scraping. However, scraping copyrighted data and publishing it as your own is undoubtedly illegal. &lt;/p&gt;

&lt;h3&gt;
  
  
  Is The Crawling Rate Tolerable?
&lt;/h3&gt;

&lt;p&gt;Web scrapers are prefered over manual data extraction because they can fetch you data in mere seconds. Though web scrapers are efficient tools, you should not hit a website’s server with too many requests in a small interval. &lt;/p&gt;

&lt;p&gt;Scraping websites aggressively can overload the website’s server and may even crash them if the website has no rate limiting in place. In this case, you damage a website’s functionality and may be held liable under “&lt;a href="https://ilt.eff.org/Trespass_to_Chattels.html"&gt;Trespass to Chattels&lt;/a&gt;” law (more on this later). &lt;/p&gt;

&lt;p&gt;Most websites specify a “crawl-delay” directive in their robot.txt file (more on this later, also). crawl-delay 10 means that a bot should wait at least 10 seconds between two consecutive requests. &lt;/p&gt;

&lt;p&gt;If the crawl-delay directive isn’t specified by the website, 1 request per 10 to 15 seconds is a reasonable crawl rate in most scenarios. As long as you stay within the reasonable crawl rate, there’s no reason to get into web crawling legal issues. &lt;/p&gt;

&lt;h3&gt;
  
  
  Are You Abiding By The Terms Of Service?
&lt;/h3&gt;

&lt;p&gt;Websites can attempt to discourage scraping activities by laying down the conditions in their ToS (“Terms of Service” or “Terms of Use”). While websites can put whatever they want in their ToS, the conditions are not always enforceable. The terms may or may not be contractually binding on web scrapers, depending on how they appear on the website. &lt;/p&gt;

&lt;p&gt;Agreements can be either &lt;em&gt;browsewrap&lt;/em&gt; or &lt;em&gt;clickwrap&lt;/em&gt;. Browsewrap agreements are concluded upon visiting the website. However, in many cases, they either appear inconspicuously at the bottom of the screen or within a drop-down menu. In such cases, they are generally not binding by law. However, if the agreement appears as a pop-up window or the website provides a link to the ToS at a noticeable position, they may be enforceable. You’ll better understand the legal theory behind browsewrap agreements by looking at a &lt;a href="https://en.wikipedia.org/wiki/Browse_wrap"&gt;summary of related court cases&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In contrast, clickwrap agreements are those that require the user to tick a checkbox or click a button. Below the button or checkbox, something around the lines, “By clicking, you agree to our Terms and Conditions” will be written. After you take the required action, the Terms and Conditions are legally binding on you and the court may enforce it. &lt;/p&gt;

&lt;h3&gt;
  
  
  Are You Complying With robots.txt File?
&lt;/h3&gt;

&lt;p&gt;If you want to use web scraping tools, you should know about &lt;a href="http://www.robotstxt.org/"&gt;robots.txt&lt;/a&gt;. Consider it as an instruction manual that the website places for bots. &lt;/p&gt;

&lt;p&gt;The “Disallow: /” command tells the robots which pages the website owner does not want them to visit. Minimum allowed delay between successive requests may also be mentioned under the “crawl-delay” command. &lt;/p&gt;

&lt;p&gt;It is generally a good idea to visit the website’s robot.txt file before scraping it and respect the directives laid down in it.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Legal Precedent
&lt;/h2&gt;

&lt;p&gt;Let’s look at some important laws governing web scraping and some high profile judgements that carve the present and future of the data collection world. &lt;/p&gt;

&lt;h3&gt;
  
  
  HiQ vs. LinkedIn
&lt;/h3&gt;

&lt;p&gt;Very recently, &lt;a href="https://law.justia.com/cases/federal/appellate-courts/ca9/17-16783/17-16783-2019-09-09.html"&gt;HiQ vs. LinkedIn&lt;/a&gt; case came out as a landmark for web scrapers. LinkedIn came into dispute with a small data analytics company, HiQ Labs, by sending an official letter demanding the latter to cease all scraping activities on LinkedIn. The letter also stated that LinkedIn had blocked HiQ Labs from accessing public profiles.&lt;/p&gt;

&lt;p&gt;Did HiQ back out?   &lt;/p&gt;

&lt;p&gt;No. HiQ Labs took the case to the court saying scraping publically available data is not illegal, and blocking it gives big companies like LinkedIn the unfair advantage of hoarding public information. &lt;/p&gt;

&lt;p&gt;In September 2019, US Ninth Circuit gave an unprecedented decision in favor of HiQ, stating that collecting publicly available data was not a violation of CCFA. In June 2020, the Supreme Court granted LinkedIn the petition for writ of certiorari and sent the case back to the 9th circuit for further consideration. Though the case is still pending, a decision in favor of HiQ could mean a groundbreaking victory for ethical web scraping. &lt;/p&gt;

&lt;h3&gt;
  
  
  Facebook vs. Power Ventures
&lt;/h3&gt;

&lt;p&gt;“&lt;a href="https://en.wikipedia.org/wiki/Facebook,_Inc._v._Power_Ventures,_Inc."&gt;Facebook vs. Power Ventures&lt;/a&gt;” is another well-known dispute in the web scraping community. It began in 2009 by Facebook taking legal action against Power Ventures for extracting Facebook’s user information and displaying it on their own website. Facebook alleged that the action caused violations of &lt;a href="https://en.wikipedia.org/wiki/CAN-SPAM"&gt;CAN-SPAM Act&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Computer_Fraud_and_Abuse_Act"&gt;CFAA&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Digital_Millennium_Copyright_Act"&gt;DMCA&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Unfair_competition"&gt;UCL&lt;/a&gt; and Copyright infringement.&lt;/p&gt;

&lt;p&gt;What happened next?&lt;/p&gt;

&lt;p&gt;Though the court dismissed other claims, three claims, violation under CAN-SPAM Act, CFAA and California Penal Code, were held for the final decision. Finally, the decision went in favor of Facebook and the court ordered Power to pay Facebook a hefty sum of $79,640.50. &lt;/p&gt;

&lt;p&gt;Comparing the two cases, “HiQ vs. LinkedIn” and “Facebook vs. Power Ventures”, it’s easier to understand where data scraping may or may not be legal. Facebook controls access to its data by requesting login and password. When you scrape their user profiles, you scrape behind the login. Is data scraping legal in this case? Power Ventures was sued for it, what do you think!&lt;/p&gt;

&lt;p&gt;In contrast, LinkedIn’s public profiles are accessible directly through the browser. You don’t need to login to view these profiles. Is scraping legal here? According to how the case is turning out in court, there’s a good chance it could be.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Computer Fraud and Abuse Act (CFAA)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.nacdl.org/Landing/ComputerFraudandAbuseAct"&gt;CFAA&lt;/a&gt; is another important law that might be relevant when considering the legality of your scraping activity. The act says that intentionally accessing a computer system without either authorization or in excess of authorization may be subject to legal action.&lt;/p&gt;

&lt;p&gt;So what does that mean to web scrapers? &lt;/p&gt;

&lt;p&gt;Though the HiQ vs LinkedIn case is sent back for revision to the Ninth Circuit Court, the preliminary decision of the court suggests that when a server’s data is publically available, accessing it may not be a violation of CFAA. But we’ll have to wait for the final decision on the case to know for sure. &lt;/p&gt;

&lt;p&gt;Besides how the ruling on the HiQ vs. LinkedIn case turns out, CFAA may still apply on web scraping in cases where non-public data is involved. Websites that hold certain information behind the login may hold you liable for scraping it under CFAA. &lt;/p&gt;

&lt;h3&gt;
  
  
  Trespass To Chattels
&lt;/h3&gt;

&lt;p&gt;Everyone knows that trespassing someone’s property is illegal. Digital trespass is equally illegal. A website is the property of the website’s owner. &lt;a href="https://ilt.eff.org/Trespass_to_Chattels.html"&gt;Trespass To Chattels&lt;/a&gt; is a law that governs the wrongful use of someone’s digital property. &lt;/p&gt;

&lt;p&gt;When you enter a website, which is the personal digital property of the website’s owner, you should behave in a responsible manner. If irresponsible behavior when using a website causes any damage to the website’s condition, quality or value, you may be held liable under Trespass To Chattels. For instance, if a high crawling rate crashes the website’s server, the website’s owner may file a lawsuit under “Trespass To Chattels”.  &lt;/p&gt;

&lt;p&gt;That being said, as long as you scrape a website responsibly, and make sure no damage is inflicted in any way, you wouldn’t have to worry about violating Trespass To Chattels.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fair Use in the United States
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.copyright.gov/fair-use/more-info.html"&gt;Fair Use&lt;/a&gt; is a legal doctrine in the United States that permits scraping and use of copyrighted content in certain situations. Under this law, certain uses, including criticism, research, teaching, and news reporting, of copyrighted material may be considered “fair use”. &lt;/p&gt;

&lt;p&gt;However, there are four factors that govern whether a use case falls under fair use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Transformative” uses, in which the user adds something new to extend the purpose of the original content, are typically considered fair use. Aggregators that generate lists for competitive purposes are likely to fall under this category.&lt;/li&gt;
&lt;li&gt;Nature of the copyrighted material that was used is also a factor. Scraping factual material, including new articles, technical writings, are more likely to support the claim of fair use than creative work, such as movies or novels. &lt;/li&gt;
&lt;li&gt;Scraping a small portion of the copyrighted material is more likely to be considered “fair use” than using a substantial portion of it. &lt;/li&gt;
&lt;li&gt;The court also weighs the extent to which the use of copyrighted material damages the market for the original work, if at all, in deciding whether it may be considered “fair use” or not. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So what does it come down to? Is web scraping legal or not? We firmly believe it is. It is nothing more than the automation of work, done otherwise by humans.&lt;/p&gt;

&lt;p&gt;You just have to respect certain legal boundaries and best practices. Respect robots.txt, don’t swamp the website with unreasonably high crawl rates, be extra cautious with copyrightable content and personal data. Seek professional legal advice whenever in doubt.&lt;/p&gt;

&lt;p&gt;Generally, partnering with a professional &lt;a href="https://www.crawlnow.com/products/data-extraction-services"&gt;web scraping service&lt;/a&gt; makes it easier to follow these principles.&lt;/p&gt;

&lt;p&gt;When conducted in a responsible manner, web scraping is a powerful technology for gathering information, and even creating new information, on the internet. From content aggregation and competitive research to creating datasets for training machine learning models, the use cases for web scraping are endless.    &lt;/p&gt;

&lt;p&gt;Speak to a &lt;a href="https://www.crawlnow.com/contact"&gt;CrawlNow&lt;/a&gt; data expert today to explore new opportunities for using data to fuel growth for your business.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;p&gt;In case you would like to dig further on certain topics, here’s a list of some enlightening texts you can read:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/@tjwaterman99/web-scraping-is-now-legal-6bf0e5730a78"&gt;Web scraping is now legal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/technology/us-supreme-court-revives-linkedin-bid-shield-personal-data-2021-06-14/"&gt;U.S. Supreme Court revives LinkedIn bid to shield personal data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;The latest on HiQ Labs vs. LinkedIn:  &lt;a href="https://www.natlawreview.com/article/supreme-court-vacates-linkedin-hiq-scraping-decision-remands-to-ninth-circuit"&gt;Supreme Court Vacates LinkedIn-HiQ Scraping Decision, Remands to Ninth Circuit for Another Look&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.theverge.com/2019/9/10/20859399/linkedin-hiq-data-scraping-cfaa-lawsuit-ninth-circuit-ruling"&gt;Scraping public data from a website probably isn’t hacking, says court&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://cyberlaw.stanford.edu/blog/2016/02/digital-trespass-what-it-and-why-you-should-care?source=post_page---------------------------"&gt;Digital Trespass - What Is It And Why You Should Care&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webscraping</category>
      <category>scraping</category>
      <category>automation</category>
      <category>bot</category>
    </item>
    <item>
      <title>7 Ways Web Scraping Helps Your E-Commerce Business</title>
      <dc:creator>Hafiz Hamid</dc:creator>
      <pubDate>Wed, 08 Sep 2021 00:00:00 +0000</pubDate>
      <link>https://dev.to/crawlnow/7-ways-web-scraping-helps-your-e-commerce-business-pc1</link>
      <guid>https://dev.to/crawlnow/7-ways-web-scraping-helps-your-e-commerce-business-pc1</guid>
      <description>&lt;p&gt;Originally published at &lt;a href="https://www.crawlnow.com/blog/7-ways-web-scraping-helps-your-ecommerce-business"&gt;CrawlNow&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;With a large percentage of retail activities taking place online, web scraping offers massive potential for e-commerce businesses to differentiate and grow. Web data extraction can enable online retailers to optimize their strategies by tracking competitors, understanding customers’ needs, and staying on top of market trends. This post highlights a few key ways in which you can leverage publicly available data to beat your competition in the e-commerce space.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With the recent boom in the e-commerce industry, having access to the right data is the key to making your mark among competitors. There are about 7.9 million online retailers globally, of which 2.1 million operate in the US. The total e-commerce revenue in the United States amounted to &lt;a href="https://www.statista.com/topics/2443/us-ecommerce/"&gt;431.6 billion USD in the year 2020&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;While the e-commerce opportunity is huge, the competition is also fierce, and the organizations that are data-driven stand a better chance at surviving and thriving the competition. When it comes to data, the most important data about your competitors, customers and your market exists out there on the web. And web scraping is the tool that can make this data accessible to you.&lt;/p&gt;

&lt;p&gt;In the following sections, I will start with a brief introduction to web scraping, and then we will explore a few powerful ways both established and new e-commerce businesses can leverage web data extraction to undercut the competition.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Web Scraping?
&lt;/h2&gt;

&lt;p&gt;Web scraping is a process that uses automated software for navigating websites to parse and collect useful data from a large number of web pages. The data is transformed into structured formats so it could be imported into spreadsheets or databases for human or machine consumption. The technique is useful to many industries, including e-commerce.&lt;/p&gt;

&lt;p&gt;Let’s discuss how web scraping can be used to drive value and competitive advantage for e-commerce businesses.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Can Data Scraping Benefit Your E-Commerce Business?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Price Intelligence: Price It Right
&lt;/h3&gt;

&lt;p&gt;Research by the e-tailing group reveals that &lt;a href="http://www.e-tailing.com/content/wp-content/uploads/2009/12/winbuyer_102209_brief.pdf"&gt;94% of online shoppers compare prices before purchasing a commodity&lt;/a&gt;. With a large number of online stores out there, products and services are no longer unique. It only makes sense to do thorough research and price your offerings correctly, so you could maximize conversions while not leaving margins on the table.&lt;/p&gt;

&lt;p&gt;Feeding product prices regularly into a dynamic pricing software to track competitors’ prices for the same products has become a mainstay in e-commerce companies. Automated price tracking minimizes the time in comparing prices offered by other stores and optimizing their own prices based on the analyzed data.&lt;/p&gt;

&lt;p&gt;Scraping Amazon product listings alone can make a huge difference given Amazon’s expansive product selection and broad outreach. According to a &lt;a href="https://chainstoreage.com/study-most-product-searches-begin-amazon"&gt;study&lt;/a&gt;, 74% of U.S. consumers begin their product searches on Amazon.com and use Amazon’s prices as a baseline. Not only that, shoppers turn to Amazon for product reviews and ratings to help guide their buying decisions. Hence, scraping Amazon to monitor their pricing will help you inform your own pricing as well as help you develop a more competitive marketing strategy. &lt;/p&gt;

&lt;h3&gt;
  
  
  Enrich Product Listings
&lt;/h3&gt;

&lt;p&gt;Unlike brick and mortar stores where you can actually see the product before buying it, online shoppers have to rely on product details that appear on the store’s website. It goes without saying that shoppers will turn away from your site if you don’t offer rich enough content on your product listing pages. &lt;/p&gt;

&lt;p&gt;Conventionally, product enrichment meant employing people to spend hours each day copy-pasting up-to-date product details from manufacturer sites or popular e-commerce platforms like Amazon and Walmart.&lt;/p&gt;

&lt;p&gt;To speed up the process, an increasing number of online retailers use web scraping to extract rich content details, including images, color and size variations, descriptions, product specifications and reviews, to enrich their product listing pages. Rich product description pages not only help educate shoppers about your product’s features and benefits, leading to better conversion rates; but also helps you rank better on search engine results pages, increasing the organic search traffic to your website. &lt;/p&gt;

&lt;h3&gt;
  
  
  Improve Marketing Strategies
&lt;/h3&gt;

&lt;p&gt;Given the large customer base, the online market is an exhaustive repository of supply and demand data. Businesses use historical and current market data to conduct predictive analysis and identify market trends. Based on the findings, you can keep your marketing strategies aligned with the dynamic preferences of your target audience. &lt;/p&gt;

&lt;p&gt;Marketing managers regularly use web data to make important decisions that can drive attention to their brand and improve conversion rates. Improving product design, enhancing customer experience, and choosing the correct platforms for advertising campaigns are some important domains that can benefit from these datasets. &lt;/p&gt;

&lt;p&gt;Structured data feeds acquired through web scraping can directly be fed into marketing automation tools, such as Marketo or Eloqua, to simplify and streamline everyday marketing processes.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Monitor Product Performance
&lt;/h3&gt;

&lt;p&gt;Analyzing and improving product performance is yet another important area that can benefit from web data. By scraping product listings, retail data and customer reviews, you can assess a product’s standing in the online market. By comparing your product’s data with that of similar products offered by competitor stores, you can gather valuable insights, including the product’s value in the market, and your product’s standing among competition. &lt;/p&gt;

&lt;h3&gt;
  
  
  Increase Visibility Through SEO Analysis
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.nchannel.com/blog/retail-data-ecommerce-statistics/"&gt;44% online shoppers start their purchasing venture by typing in a search engine&lt;/a&gt;. It’s further observed that more than &lt;a href="https://serpwatch.io/blog/seo-statistics/"&gt;two-third of the total clicks on search engines go to the top 5 results&lt;/a&gt;. These numbers directly imply the importance of improving your brand’s organic SEO strategies to bring it among the top search results on Google.   &lt;/p&gt;

&lt;p&gt;From your competitor’s position on SERP to the keywords they use to rank high in e-commerce, all the data is out there on the web. Web scraping is a valuable technique that you can employ to collect and organize all this information. Using the insights gathered from the scraped data, you can use keywords that heighten the chances of ranking high in search results. &lt;/p&gt;

&lt;p&gt;Additionally, analyzing SEO-related data helps you optimize keyword density in your product descriptions and blog posts and discover and employ the best strategies competitors are using in the niche. &lt;/p&gt;

&lt;h3&gt;
  
  
  Monitor Competitors
&lt;/h3&gt;

&lt;p&gt;While there’s massive competition in the online marketplace, the opportunity to conduct competitor analysis and stay ahead in the game is just as huge. Besides tracking their product prices, you can also analyze their product line, categories, and ratings etc. to capitalize on their gaps and weaknesses. Through these audits, you can pinpoint the specific features, style or ideas they are trending for. You may discover that certain product bundles spike sales or certain days or timings for flash sales attract the most customers.&lt;/p&gt;

&lt;p&gt;Competitor’s data can also give you a good overview of their marketing strategies, inventory availability, marketing spend and more. Though you may not be able to locate the exact data for their budget, you can perform analysis on their PPC spend, for instance, by scraping competitor’s data on &lt;a href="https://www.spyfu.com/"&gt;SpyFu&lt;/a&gt;. By comparing what they are spending on each keyword and the number of clicks these keywords are driving, you can make intelligent decisions on which keywords to pay for and which to avoid. &lt;/p&gt;

&lt;h3&gt;
  
  
  Consumer Sentiment Analysis
&lt;/h3&gt;

&lt;p&gt;Every retailer wants to know how customers perceive their products. Once you know what the customers like and dislike about your product, you can fine-tune the design, description and advertisements to match your customers’ preferences. &lt;/p&gt;

&lt;p&gt;Scraping customer’s feedback from review sites and social media pages gives you a chance to perform extensive sentiment analysis and explore what the customers are saying about your product online. The scraped consumer reviews data can directly be used to train machine learning tools to identify emotions from text and pinpoint flaws that customers are experiencing with your product. &lt;/p&gt;

&lt;p&gt;By detecting the tone in the customers’ opinion about your brand, you’ll be able to better understand what they’re already experiencing with you and how to improve these experiences.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Efficient data scraping requires technical know-how and presents several challenges, like dealing with anti-scraping protection employed by most websites today. Several online retailers bypass the hassle by partnering with a professional web scraping company to source all the required data for them. &lt;/p&gt;

&lt;p&gt;CrawlNow offers &lt;a href="https://www.crawlnow.com/products/data-extraction-services"&gt;web scraping services&lt;/a&gt; customized to your specific needs. Get in touch to speak with one of our data experts. They will be happy to provide personalized advice on how publicly available web data can be leveraged as a key differentiator for your business and what would be the most cost-effective way to put the right data into your hands.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>ecommerce</category>
      <category>scraping</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
