<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: NexGenData</title>
    <description>The latest articles on DEV Community by NexGenData (@nexgendata).</description>
    <link>https://dev.to/nexgendata</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3856502%2Fe35e3ca7-6327-4c88-b6dd-c50cc4c21464.png</url>
      <title>DEV Community: NexGenData</title>
      <link>https://dev.to/nexgendata</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nexgendata"/>
    <language>en</language>
    <item>
      <title>Public Real Estate Data Scrapers for Regional Market Research</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Fri, 26 Jun 2026 05:45:21 +0000</pubDate>
      <link>https://dev.to/nexgendata/public-real-estate-data-scrapers-for-regional-market-research-p7d</link>
      <guid>https://dev.to/nexgendata/public-real-estate-data-scrapers-for-regional-market-research-p7d</guid>
      <description>&lt;p&gt;Real estate research used to mean buying a Bloomberg terminal seat and hoping the vendor licensed the right region. In 2026 the data is mostly public — Zillow, Redfin, Rightmove, Singapore's Urban Redevelopment Authority (URA), Hong Kong's Centaline Property Index, India's MagicBricks — but it sits behind five different websites, four different languages, and zero unified APIs. The result: most analyst comps stop at the US border, REIT screens lean heavily on one country's listings, and PropTech VCs evaluating a Jakarta or Mumbai thesis end up paying a regional aggregator $40K/year for data the source publishes for free.&lt;/p&gt;

&lt;p&gt;This guide is a playbook for the alternative: &lt;strong&gt;region-specific public real estate data scrapers&lt;/strong&gt; , run on demand, paid by usage, exported to CSV or piped into a BI stack. We will walk through the actors that cover the US, UK, Singapore, Hong Kong, Denmark, and India; how to combine them into market-entry briefs and cap-rate comparisons; and where the regulatory and methodological landmines sit. If you run a REIT desk, a relocation practice, an urban-planning model, or a PropTech competitive intelligence function, this is the new stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Real Estate Data Is Hyper-Regional and Hyper-Fragmented
&lt;/h2&gt;

&lt;p&gt;Property markets are local in a way that, say, equities are not. A NASDAQ ticker trades the same way from Tokyo or Toronto. A two-bedroom condo in Tanjong Pagar trades against entirely different rules, taxes, leasehold structures, and buyer pools than a two-bedroom in Brooklyn or Battersea. The data infrastructure reflects that fragmentation. &lt;a href="https://apify.com/nexgendata/zillow-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=zillow-scraper" rel="noopener noreferrer"&gt;Zillow&lt;/a&gt; and &lt;a href="https://apify.com/nexgendata/redfin-real-estate-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=redfin-real-estate-scraper" rel="noopener noreferrer"&gt;Redfin&lt;/a&gt; dominate the US Multiple Listing Service (MLS) ecosystem. &lt;a href="https://apify.com/nexgendata/rightmove-uk-real-estate-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=rightmove-uk-real-estate-scraper" rel="noopener noreferrer"&gt;Rightmove&lt;/a&gt; owns roughly 80% of UK listing traffic. Singapore splits its disclosures across the Housing Development Board (HDB) for public housing and the URA for private and commercial transactions. Hong Kong's Centaline Property Index is the de facto Case-Shiller of the territory. India's listings live primarily on MagicBricks and 99acres.&lt;/p&gt;

&lt;p&gt;The structural problem: &lt;strong&gt;no single vendor covers all of these well&lt;/strong&gt;. Western real estate data platforms barely touch Asia. The Asian government feeds rarely structure data in a form Western analysts can plug into a model. Aggregators that claim global coverage are typically thin everywhere except their home market. And legacy enterprise vendors charge five- or six-figure annual contracts for what is, at the source, freely published. The pragmatic answer is per-region scrapers — small, cheap, focused tools that pull each market's authoritative source and hand you a tidy CSV. That is the playbook below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Structured Real Estate Data Matters Right Now
&lt;/h2&gt;

&lt;p&gt;Several research workflows are quietly being rebuilt on this kind of bottom-up listing data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PropTech VC research.&lt;/strong&gt; When a Series A pitch claims a $2.8B TAM for short-term rentals in Southeast Asia, the diligence team needs to back-check against actual listing counts, median rents per sqm, and absorption rates. Public scrapers give you that without a vendor procurement cycle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;REIT analysis.&lt;/strong&gt; Listed REITs trade on net asset value (NAV) and funds from operations (FFO), but the underlying property comps that justify NAV come from listings. Independent scrapers let buy-side analysts build their own comp set rather than trust the manager's reported cap rates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relocation pricing.&lt;/strong&gt; Corporate relocation desks need credible 90th-percentile rent estimates for executive housing in 20+ cities. Aggregating Zillow plus Rightmove plus Singapore HDB plus Boliga gives you a defensible policy benchmark.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Urban-planning research.&lt;/strong&gt; City planners and academic researchers use listing data to model gentrification, affordability gaps, and the price-to-income ratio shift over five-year windows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mortgage and lending models.&lt;/strong&gt; Lenders running loan-to-value (LTV) and default models need fresh comparable sales by ZIP/postcode. Listing data is a leading indicator before recorded deed data closes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regional property arbitrage.&lt;/strong&gt; Family offices comparing Lisbon vs. Athens vs. Kuala Lumpur for yield need consistent price-per-sqft (sqm) numbers across markets. Scrapers normalize the inputs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What the Actors Extract: Source Coverage at a Glance
&lt;/h2&gt;

&lt;p&gt;Here is a quick coverage map of the actors we will use in the rest of this guide. All linked actors below are public on Apify; the affiliate parameter helps support this site.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Region&lt;/th&gt;
&lt;th&gt;Coverage&lt;/th&gt;
&lt;th&gt;Key Fields&lt;/th&gt;
&lt;th&gt;Update Frequency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/zillow-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=zillow-scraper" rel="noopener noreferrer"&gt;Zillow Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;US&lt;/td&gt;
&lt;td&gt;Residential for-sale &amp;amp; for-rent&lt;/td&gt;
&lt;td&gt;Zestimate, list price, beds/baths, lot size, days on market, price history&lt;/td&gt;
&lt;td&gt;Daily-fresh listings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/redfin-real-estate-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=redfin-real-estate-scraper" rel="noopener noreferrer"&gt;Redfin Real Estate Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;US&lt;/td&gt;
&lt;td&gt;Residential MLS-grade comps&lt;/td&gt;
&lt;td&gt;Sold price, $/sqft, school score, hotness, last-sold date&lt;/td&gt;
&lt;td&gt;Near real-time on MLS push&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/apartments-com-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=apartments-com-scraper" rel="noopener noreferrer"&gt;Apartments.com Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;US&lt;/td&gt;
&lt;td&gt;Multi-family &amp;amp; SFR rental&lt;/td&gt;
&lt;td&gt;Asking rent, sqft, amenities, availability, concessions&lt;/td&gt;
&lt;td&gt;Daily&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/rightmove-uk-real-estate-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=rightmove-uk-real-estate-scraper" rel="noopener noreferrer"&gt;Rightmove UK Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;UK&lt;/td&gt;
&lt;td&gt;Residential sales &amp;amp; lettings&lt;/td&gt;
&lt;td&gt;Asking price, EPC rating, tenure (freehold/leasehold), postcode&lt;/td&gt;
&lt;td&gt;Daily&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/singapore-hdb-resale-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=singapore-hdb-resale-tracker" rel="noopener noreferrer"&gt;Singapore HDB Resale Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Singapore&lt;/td&gt;
&lt;td&gt;Public housing resale transactions&lt;/td&gt;
&lt;td&gt;Block, flat type, floor area sqm, resale price, lease commence date&lt;/td&gt;
&lt;td&gt;Monthly (gov publish cycle)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/store?search=singapore+ura+property&amp;amp;fpr=2ayu9b" rel="noopener noreferrer"&gt;Singapore URA Private (catalog)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Singapore&lt;/td&gt;
&lt;td&gt;Private residential transactions&lt;/td&gt;
&lt;td&gt;Project, district, tenure, $/psf, sale date&lt;/td&gt;
&lt;td&gt;Weekly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/store?search=singapore+commercial+property&amp;amp;fpr=2ayu9b" rel="noopener noreferrer"&gt;Singapore URA Commercial (catalog)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Singapore&lt;/td&gt;
&lt;td&gt;Office &amp;amp; retail transactions&lt;/td&gt;
&lt;td&gt;Building, $/psf, transaction date, use class&lt;/td&gt;
&lt;td&gt;Quarterly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/store?search=hong+kong+property&amp;amp;fpr=2ayu9b" rel="noopener noreferrer"&gt;Hong Kong Centaline Index (catalog)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Hong Kong&lt;/td&gt;
&lt;td&gt;Residential index &amp;amp; transactions&lt;/td&gt;
&lt;td&gt;CCL index, district, sqft, $/sqft, transaction direction&lt;/td&gt;
&lt;td&gt;Weekly index, daily txn&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/store?search=magicbricks+india+real+estate&amp;amp;fpr=2ayu9b" rel="noopener noreferrer"&gt;India MagicBricks (catalog)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;India&lt;/td&gt;
&lt;td&gt;Residential sales &amp;amp; rent listings&lt;/td&gt;
&lt;td&gt;Locality, BHK, super built-up area, asking price, possession status&lt;/td&gt;
&lt;td&gt;Daily&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/boliga-denmark-real-estate?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=boliga-denmark-real-estate" rel="noopener noreferrer"&gt;Boliga Denmark Real Estate&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Denmark&lt;/td&gt;
&lt;td&gt;Residential listings &amp;amp; sales history&lt;/td&gt;
&lt;td&gt;Address, kvm (m²), asking price, days on market, price changes&lt;/td&gt;
&lt;td&gt;Daily&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A note on the "catalog" links: a handful of Asian-specific actors are either in private beta or being rebuilt. The catalog links above route you to the current public Apify actors that cover those data sources — substitute the equivalent vendor as needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Workflow: Building a Singapore PropTech Competitor Brief
&lt;/h2&gt;

&lt;p&gt;Let us run through a concrete brief — the kind of deliverable a Series B PropTech founder or a regional REIT analyst might commission. Goal: a one-page market brief on Singapore residential plus a Hong Kong comparison, suitable for an investment committee memo.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pull HDB resale transactions (public housing).&lt;/strong&gt; Run the &lt;a href="https://apify.com/nexgendata/singapore-hdb-resale-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=singapore-hdb-resale-tracker" rel="noopener noreferrer"&gt;Singapore HDB Resale Tracker&lt;/a&gt; for the last 12 months across all towns. Output: ~25,000 rows with block, flat type, floor area sqm, lease balance, and resale price. Compute median $/sqm by town and 12-month price CAGR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer URA private residential transactions.&lt;/strong&gt; Run the &lt;a href="https://apify.com/store?search=singapore+ura+property&amp;amp;fpr=2ayu9b" rel="noopener noreferrer"&gt;URA private transactions&lt;/a&gt; actor (or catalog equivalent) for the same period. This gives you the private condo side: project, district, tenure (freehold vs 99-year leasehold), $/psf. Critical for any thesis touching the private market, which trades at a 2–4× multiple to HDB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add URA commercial.&lt;/strong&gt; Use the &lt;a href="https://apify.com/store?search=singapore+commercial+property&amp;amp;fpr=2ayu9b" rel="noopener noreferrer"&gt;URA commercial transactions&lt;/a&gt; actor for Grade-A office and retail. This unlocks cap-rate analysis — divide net operating income estimates by sale price to ballpark commercial yields by district.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-reference Hong Kong with Centaline.&lt;/strong&gt; Run the &lt;a href="https://apify.com/store?search=hong+kong+property&amp;amp;fpr=2ayu9b" rel="noopener noreferrer"&gt;Centaline&lt;/a&gt; actor for the Centaline City Leading (CCL) index plus district-level transactions. This lets you anchor Singapore numbers against HK's larger but more volatile market — useful for "Asia luxury residential" pitches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalize and export.&lt;/strong&gt; Pipe each actor's dataset to CSV (Apify supports this natively), or push directly to BigQuery / Snowflake via the Apify integrations. Normalize sqft vs sqm (1 sqm = 10.764 sqft) and convert prices to USD using a snapshot FX rate so cross-market comps are apples-to-apples.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build the BI dashboard.&lt;/strong&gt; A simple Metabase or Looker Studio dashboard with three views: median $/sqm by district, 12-month CAGR by segment, and rental yield estimates (asking rent ÷ asking price × 12). That is your competitor brief.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total cost on Apify pay-per-event pricing for one full refresh: typically under $20 for all four markets. Compare with a $35K/year regional aggregator subscription and the build-vs.-buy math is uncomfortable for the incumbents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cases: Who Actually Uses This Data
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;REIT research desks&lt;/strong&gt; build independent NAV models with first-party comp sets rather than trusting manager-reported cap rates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Market-entry consultants&lt;/strong&gt; compile city-by-city affordability and yield reports for institutional clients evaluating new geographies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buy-side comp pulls&lt;/strong&gt; for private real estate funds doing diligence on a portfolio acquisition — independent verification of seller-supplied comparables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-estate journalism&lt;/strong&gt; uses the same data to fact-check developer claims about "record-breaking" prices and to chart 5-year affordability shifts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mortgage pricing teams&lt;/strong&gt; calibrate LTV cutoffs and default-probability models with fresh, granular listing data by postcode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relocation consulting&lt;/strong&gt; firms produce defensible executive housing benchmarks for global mobility programs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PropTech competitor intel&lt;/strong&gt; — track which markets a rival listings platform is adding inventory in, what their median price is, and which features (virtual tours, EPC ratings) they are standardizing on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Urban-planning research&lt;/strong&gt; models gentrification, displacement risk, and the housing affordability gap with primary-source data instead of decennial census snapshots.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mortgage default research&lt;/strong&gt; at academic and policy institutions backtests stress scenarios against actual listing-derived price trajectories.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Family-office property arbitrage&lt;/strong&gt; compares yield, tax treatment, and currency risk across 8–10 candidate markets on a quarterly cadence.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Run It Yourself: Start with the Redfin Real Estate Scraper
&lt;/h2&gt;

&lt;p&gt;If you want a single starting point that demonstrates the workflow end-to-end — fresh comps, $/sqft, ARV (after-repair value) calculations, neighborhood-level filters — the &lt;a href="https://apify.com/nexgendata/redfin-real-estate-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=redfin-real-estate-scraper" rel="noopener noreferrer"&gt;&lt;strong&gt;Redfin Real Estate Scraper on Apify&lt;/strong&gt;&lt;/a&gt; is the cleanest first run. Drop in a city or ZIP, pick sold or for-sale, hit run, and have a CSV of MLS-grade comps in under five minutes. Use it to validate the workflow on a market you know cold, then layer in the regional actors above for the geographies you do not.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/nexgendata/redfin-real-estate-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=redfin-real-estate-scraper" rel="noopener noreferrer"&gt;Run the Redfin Real Estate Scraper on Apify&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Actors and Internal Reading
&lt;/h2&gt;

&lt;p&gt;Cross-link these actors when you build a multi-market view:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/zillow-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=zillow-scraper" rel="noopener noreferrer"&gt;Zillow Scraper&lt;/a&gt; — US for-sale and for-rent inventory with Zestimate and price history.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/rightmove-uk-real-estate-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=rightmove-uk-real-estate-scraper" rel="noopener noreferrer"&gt;Rightmove UK Real Estate Scraper&lt;/a&gt; — the canonical UK listings dataset, including EPC and tenure fields.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/apartments-com-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=apartments-com-scraper" rel="noopener noreferrer"&gt;Apartments.com Scraper&lt;/a&gt; — US multi-family rentals with concession and amenity data.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/singapore-hdb-resale-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=singapore-hdb-resale-tracker" rel="noopener noreferrer"&gt;Singapore HDB Resale Price Tracker&lt;/a&gt; — public-housing transactions, the backbone of any Singapore residential model.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/boliga-denmark-real-estate?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=boliga-denmark-real-estate" rel="noopener noreferrer"&gt;Boliga Denmark Real Estate&lt;/a&gt; — Nordic coverage for cross-EU comparative analysis.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/real-estate-mcp-server?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=real-estate-mcp-server" rel="noopener noreferrer"&gt;Real Estate MCP Server&lt;/a&gt; — connect any of these actors to Claude or Cursor as an MCP tool so AI agents can query property data on demand.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/redfin-mcp-server?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=redfin-mcp-server" rel="noopener noreferrer"&gt;Redfin MCP Server&lt;/a&gt; — drop-in MCP wrapper for the Redfin actor.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For deeper reading on related workflows, see &lt;a href="https://thenextgennexus.com/2026/05/19/29-redfin-neighborhood-comparison/" rel="noopener noreferrer"&gt;Neighborhood-by-Neighborhood: Comparing Real Estate Markets with Redfin Data&lt;/a&gt;, &lt;a href="https://thenextgennexus.com/2026/05/16/24-redfin-vs-zillow-data/" rel="noopener noreferrer"&gt;Redfin vs Zillow Data: Which Is Better for Real Estate Market Research?&lt;/a&gt;, &lt;a href="https://thenextgennexus.com/2026/05/16/19-redfin-price-per-sqft/" rel="noopener noreferrer"&gt;How to Find Undervalued Properties Using Redfin Data and Price-Per-Square-Foot Analysis&lt;/a&gt;, and the broader category page &lt;a href="https://thenextgennexus.com/real-estate-data-tools/" rel="noopener noreferrer"&gt;Real Estate Data Tools&lt;/a&gt;. For Asia-specific data context beyond property, see &lt;a href="https://thenextgennexus.com/2026/05/24/asian-market-data-scrapers-for-public-business-research/" rel="noopener noreferrer"&gt;Asian Market Data Scrapers for Public Business Research&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is real estate listing data public?
&lt;/h3&gt;

&lt;p&gt;Listings displayed on public-facing real estate portals are generally accessible to anyone with a browser, and government transaction registries (URA, HDB, UK Land Registry, US county recorders) explicitly publish them. Terms of service vary by site, so review each portal's terms and applicable local law before bulk collection, and prefer government sources where they exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I bulk-export results to CSV or a data warehouse?
&lt;/h3&gt;

&lt;p&gt;Yes. Every actor on Apify writes its results to a dataset that exports natively to CSV, JSON, Excel, or RSS. Native integrations let you push directly to BigQuery, Snowflake, S3, Google Sheets, Airtable, or a webhook — no glue code required.&lt;/p&gt;

&lt;h3&gt;
  
  
  How fresh is the data?
&lt;/h3&gt;

&lt;p&gt;Most listing actors re-fetch on demand, so freshness is determined by when you run them. Daily-scheduled runs are typical for active analyst workflows; government feeds (HDB, URA, Centaline index) refresh weekly or monthly, matching the source publishing cadence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do you cover commercial real estate?
&lt;/h3&gt;

&lt;p&gt;Yes — URA commercial transactions, Centaline commercial subsets, and the commercial filters on Rightmove and Zillow cover office, retail, and industrial assets. For pure-play commercial data (CoStar-style), expect to combine these with broker-published quarterly reports to back out cap rates and absorption.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about Asian markets beyond Singapore and Hong Kong?
&lt;/h3&gt;

&lt;p&gt;India is best served via MagicBricks and 99acres scrapers; Japan typically via SUUMO or LIFULL HOME'S; Malaysia via PropertyGuru and iProperty. The general pattern: identify the dominant national portal, run a per-portal actor, and normalize fields downstream. Catalog search on Apify is the fastest way to find current public actors for any country.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I track price changes over time on the same listing?
&lt;/h3&gt;

&lt;p&gt;Yes. The Redfin, Zillow, Rightmove, and Boliga actors all surface price-change history on a listing. Schedule a daily run and store snapshots in your warehouse to build your own price-change time series — useful for spotting motivated sellers (multiple price drops) or detecting cap-rate compression in a submarket.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does this compare to paid vendors like CoStar or REIS?
&lt;/h3&gt;

&lt;p&gt;Enterprise vendors offer richer commercial datasets, valuation models, and analyst support, and they are appropriate when those features pay for themselves. Public-data scrapers excel at coverage breadth (any country with a major portal), cost (pay-per-run instead of seat licenses), and customization. Most sophisticated teams run both: a paid vendor for their primary market and scrapers for the long tail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are these actors suitable for AI agents and LLM pipelines?
&lt;/h3&gt;

&lt;p&gt;Yes — the Real Estate MCP Server and Redfin MCP Server wrap these actors as Model Context Protocol tools, so Claude, Cursor, or any MCP-compatible agent can query property data directly. This is increasingly how analyst desks expose internal datasets to AI assistants.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Using Apify Actors for Business Intelligence Workflows</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 19:05:21 +0000</pubDate>
      <link>https://dev.to/nexgendata/using-apify-actors-for-business-intelligence-workflows-96o</link>
      <guid>https://dev.to/nexgendata/using-apify-actors-for-business-intelligence-workflows-96o</guid>
      <description>&lt;p&gt;BI teams that rely only on internal data — CRM, billing, product analytics — are working with half a map. Your dashboards tell you what &lt;em&gt;you&lt;/em&gt; are doing, not what the market is doing around you. Competitor price changes, new 13F filings, fresh Y Combinator batches, Shopify app rankings, press releases — all of it is public, all of it is structured, and all of it can land in your warehouse on a nightly schedule if you wire it up correctly.&lt;/p&gt;

&lt;p&gt;This tutorial walks through exactly that wiring. By the end you'll have a working pattern for: scheduling an Apify actor to run nightly, exporting the dataset to S3 (or GCS, or directly to Snowflake), loading it into a warehouse table, building a dashboard view on top, and firing a Slack alert when a metric moves. We'll use real BI tooling (Snowflake, BigQuery, dbt, Airflow, Google Sheets, Looker) and real actors from the &lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;NexGenData catalog&lt;/a&gt;. Code samples are copy-pasteable. If you maintain a competitive-intel dashboard, a lead-gen pipeline, or a market-monitoring report, you should be able to ship something useful from this post in an afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anatomy of an Apify actor in a BI context
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;actor&lt;/strong&gt; is a serverless scraper. You pass it JSON input (URLs, search terms, filters), it runs in Apify's cloud, and it writes results to a &lt;strong&gt;dataset&lt;/strong&gt; — a key-value store that exposes the rows as JSON, CSV, XLSX, JSONL, RSS, or HTML via a stable URL.&lt;/p&gt;

&lt;p&gt;For BI purposes, three properties matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Output is structured.&lt;/strong&gt; Each actor publishes a schema. Records come back as flat JSON objects, easy to &lt;code&gt;COPY INTO&lt;/code&gt; a warehouse table or &lt;code&gt;pd.read_json&lt;/code&gt; in a notebook.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Billing is pay-per-event (PPE).&lt;/strong&gt; Most NexGenData actors charge $0.05–$0.50 per result, not per minute. Your cost scales with rows ingested, which makes monthly forecasting trivial: &lt;code&gt;rows_per_run × runs_per_month × $/row&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live-running beats batch downloads.&lt;/strong&gt; You can hit the &lt;code&gt;run-sync-get-dataset-items&lt;/code&gt; endpoint, block until the actor finishes, and stream the dataset back in one HTTP call — ideal for ad-hoc analyst queries. Or you schedule a batch run and have Apify webhook the dataset URL to your ingest layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Three patterns for ingesting actor data
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern A: Direct API call from your ETL
&lt;/h3&gt;

&lt;p&gt;Simplest pattern. Your Airbyte custom source, Fivetran function, or hand-rolled Python job calls the actor and writes results to the warehouse. Good for low-frequency runs (daily/weekly) and small-to-medium volumes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
    curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="s2"&gt;"https://api.apify.com/v2/acts/nexgendata~saas-pricing-tracker/run-sync-get-dataset-items?token=&lt;/span&gt;&lt;span class="nv"&gt;$APIFY_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
        "vendors": ["notion.so", "airtable.com", "monday.com"],
        "plans": ["all"]
      }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; pricing_&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y-%m-%d&lt;span class="si"&gt;)&lt;/span&gt;.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From Python, the equivalent using the official SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apify_api_xxx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nexgendata/saas-pricing-tracker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vendors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notion.so&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;airtable.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stg_competitor_pricing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;warehouse_engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;append&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern B: Scheduled run + webhook to object storage
&lt;/h3&gt;

&lt;p&gt;The production pattern. Apify's scheduler triggers the actor; on &lt;code&gt;ACTOR.RUN.SUCCEEDED&lt;/code&gt; a webhook posts to your endpoint with the dataset URL. Your endpoint (Lambda, Cloud Function, or a small Flask container) streams the dataset to S3/GCS, and Snowpipe/BigQuery auto-ingest picks it up.&lt;/p&gt;

&lt;p&gt;A webhook payload looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"abc123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"createdAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-24T02:00:00.000Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"eventType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ACTOR.RUN.SUCCEEDED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"eventData"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"actorId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nexgendata~serp-rank-tracker-lite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"actorRunId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"xYzRunId789"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"actorTaskId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nightlyRankCheck"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"xYzRunId789"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SUCCEEDED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"defaultDatasetId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dsId456"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"stats"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"computeUnits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.02&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"datasetItemCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;240&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your receiver fetches &lt;code&gt;https://api.apify.com/v2/datasets/dsId456/items?format=json&amp;amp;clean;=true&lt;/code&gt; and writes the body to &lt;code&gt;s3://bi-raw/apify/serp/2026-05-24.json&lt;/code&gt;. A Snowpipe definition pointed at that prefix loads it automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern C: Google Sheets via IMPORTDATA
&lt;/h3&gt;

&lt;p&gt;For analysts who live in Sheets, you don't need a warehouse at all. Apify exposes dataset items as a CSV URL that &lt;code&gt;IMPORTDATA&lt;/code&gt; can consume. Drop this in cell A1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
    =IMPORTDATA("https://api.apify.com/v2/acts/nexgendata~yc-companies-directory-scraper/run-sync-get-dataset-items?token=APIFY_TOKEN&amp;amp;format;=csv&amp;amp;clean;=true")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sheets refreshes the formula on edit and on a one-hour interval. Wrap it in &lt;code&gt;QUERY()&lt;/code&gt; to filter, or feed it into a Looker Studio data source for a free dashboard layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Worked example: a daily competitor pricing BI dashboard
&lt;/h2&gt;

&lt;p&gt;Goal: a Snowflake-backed Looker dashboard that, every morning at 7am, shows yesterday's pricing changes across your top 10 SaaS competitors, their SERP rank movement on five priority keywords, and any new music-gear pricing trends if you happen to sell music gear. We'll use three actors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/saas-pricing-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=saas-pricing-tracker" rel="noopener noreferrer"&gt;saas-pricing-tracker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/serp-rank-tracker-lite?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=serp-rank-tracker-lite" rel="noopener noreferrer"&gt;serp-rank-tracker-lite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/reverb-musical-instrument-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=reverb-musical-instrument-scraper" rel="noopener noreferrer"&gt;reverb-musical-instrument-scraper&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: schedule actor runs nightly
&lt;/h3&gt;

&lt;p&gt;In the Apify Console, go to &lt;strong&gt;Schedules -&amp;gt; Create new&lt;/strong&gt;. Set cron to &lt;code&gt;0 2 * * *&lt;/code&gt; (2am UTC). Add three "actions", one per actor, each pointing at a saved task with its input pre-filled. Save. You're done with scheduling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: configure dataset export via webhook
&lt;/h3&gt;

&lt;p&gt;On each actor task, open &lt;strong&gt;Integrations -&amp;gt; Webhooks&lt;/strong&gt;. Add a webhook for &lt;code&gt;ACTOR.RUN.SUCCEEDED&lt;/code&gt; pointing at your ingest endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
    https://ingest.yourcompany.com/apify?source={{eventData.actorId}}&amp;amp;run;={{eventData.actorRunId}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A minimal Python receiver (FastAPI on Cloud Run):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
    &lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/apify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ingest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;dataset_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resource&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.apify.com/v2/datasets/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dataset_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/items?format=jsonl&amp;amp;clean;=true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apify/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;~&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bi-raw&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rows&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resource&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stats&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;datasetItemCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: load to data warehouse
&lt;/h3&gt;

&lt;p&gt;Snowflake DDL for the raw and curated layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;
    &lt;span class="c1"&gt;-- Raw landing table, schemaless&lt;/span&gt;
    &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apify_pricing&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="n"&gt;load_ts&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMP_NTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;source_file&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;VARIANT&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;-- Snowpipe pointed at s3://bi-raw/apify/nexgendata/saas-pricing-tracker/&lt;/span&gt;
    &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;PIPE&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pipe_apify_pricing&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
    &lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apify_pricing&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;METADATA&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;FILENAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;bi_raw_stage&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;apify&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;nexgendata&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;saas&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pricing&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;FILE_FORMAT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;-- Curated view, one row per vendor-plan-day&lt;/span&gt;
    &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;competitor_pricing&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
      &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;STRING&lt;/span&gt;       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;plan_name&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;STRING&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;plan_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;price_usd&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;FLOAT&lt;/span&gt;     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;price_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;billing_period&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;STRING&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;billing_period&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;scraped_at&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;DATE&lt;/span&gt;     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;observation_date&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apify_pricing&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BigQuery equivalent (for teams on GCP): create an external table on the GCS prefix, then a scheduled query that materializes the curated view nightly at 3am.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: build the dashboard view
&lt;/h3&gt;

&lt;p&gt;The query the dashboard sits on top of — day-over-day price deltas plus a flag for movers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plan_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observation_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price_usd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plan_name&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;observation_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;prev_price&lt;/span&gt;
      &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;competitor_pricing&lt;/span&gt;
      &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;observation_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
      &lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plan_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observation_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;prev_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;price_usd&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;prev_price&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;delta_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;price_usd&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;prev_price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prev_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;delta_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;ABS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price_usd&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;prev_price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt; &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="k"&gt;FALSE&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;price_changed&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;prev_price&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;observation_date&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;ABS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta_pct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Point Looker, Mode, Hex, Tableau, or Power BI at that view. A "Price Movers" tile filtering on &lt;code&gt;price_changed = TRUE AND observation_date = CURRENT_DATE - 1&lt;/code&gt; gives you the morning briefing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: alert on changes via Slack
&lt;/h3&gt;

&lt;p&gt;A 50-line cron job converts the same query into a Slack ping:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;snowflake.connector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;snowflake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
      SELECT vendor, plan_name, prev_price, price_usd, delta_pct
      FROM analytics.competitor_pricing_changes
      WHERE observation_date = CURRENT_DATE - 1 AND ABS(delta_pct) &amp;gt;= 5
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*Competitor price moves (&amp;gt;=5%) overnight:*&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;• &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; -&amp;gt; $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SLACK_WEBHOOK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Schedule it with Airflow, Prefect, GitHub Actions, or a plain crontab. You now have a closed loop: scrape -&amp;gt; load -&amp;gt; transform -&amp;gt; alert.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost calibration
&lt;/h2&gt;

&lt;p&gt;NexGenData actors price between $0.05 and $0.50 per result. A realistic monthly bill for the pipeline above:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;saas-pricing-tracker:&lt;/strong&gt; 10 vendors × 4 plans each = 40 rows/night × 30 nights = 1,200 rows/month × $0.10 = &lt;strong&gt;$120/mo&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;serp-rank-tracker-lite:&lt;/strong&gt; 5 keywords × 10 SERP positions = 50 rows/night × 30 = 1,500 × $0.05 = &lt;strong&gt;$75/mo&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;reverb-musical-instrument-scraper:&lt;/strong&gt; 500 listings/night × 30 × $0.05 = &lt;strong&gt;$750/mo&lt;/strong&gt; (only worth it if music gear is core to your business; otherwise drop it)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the lean pricing-and-SERP pipeline lands around $200/month — cheaper than a single seat of most competitive-intel SaaS tools, and you own the data. If you sign up for Apify through the &lt;a href="https://apify.com/?fpr=2ayu9b" rel="noopener noreferrer"&gt;NexGenData referral link&lt;/a&gt; 30% of your usage flows back as affiliate credit, which effectively rebates the cost.&lt;/p&gt;

&lt;p&gt;Rule of thumb for estimating before you commit: prototype with a 1-day run, multiply &lt;code&gt;datasetItemCount&lt;/code&gt; by 30, multiply by the actor's per-result price (visible on the actor page), and that's your monthly floor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best actors for BI use cases
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Actor&lt;/th&gt;
&lt;th&gt;What it returns&lt;/th&gt;
&lt;th&gt;Typical run cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Market intel&lt;/td&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/pr-newswire-press-releases-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=pr-newswire-press-releases-scraper" rel="noopener noreferrer"&gt;pr-newswire-press-releases-scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Press release feed by company/topic&lt;/td&gt;
&lt;td&gt;$0.05–$0.10/release&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/eastmoney-china-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=eastmoney-china-stock-screener" rel="noopener noreferrer"&gt;eastmoney-china-stock-screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;A-share screener data, fundamentals&lt;/td&gt;
&lt;td&gt;$0.05/ticker&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/sec-form-13f-tracker-pro?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=sec-form-13f-tracker-pro" rel="noopener noreferrer"&gt;sec-form-13f-tracker-pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Institutional holdings from 13F filings&lt;/td&gt;
&lt;td&gt;$0.10/position&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Competitor intel&lt;/td&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/saas-pricing-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=saas-pricing-tracker" rel="noopener noreferrer"&gt;saas-pricing-tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Vendor pricing tiers, plan features&lt;/td&gt;
&lt;td&gt;$0.10/plan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/serp-rank-tracker-lite?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=serp-rank-tracker-lite" rel="noopener noreferrer"&gt;serp-rank-tracker-lite&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Keyword SERP positions&lt;/td&gt;
&lt;td&gt;$0.05/result&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/shopify-app-store-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=shopify-app-store-scraper" rel="noopener noreferrer"&gt;shopify-app-store-scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;App listings, ratings, install counts&lt;/td&gt;
&lt;td&gt;$0.05/app&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lead-gen feeds&lt;/td&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/contact-info-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=contact-info-scraper" rel="noopener noreferrer"&gt;contact-info-scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Emails, phones, social handles from any URL&lt;/td&gt;
&lt;td&gt;$0.05/contact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/yc-companies-directory-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=yc-companies-directory-scraper" rel="noopener noreferrer"&gt;yc-companies-directory-scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;YC batch directory: companies, founders, status&lt;/td&gt;
&lt;td&gt;$0.05/company&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For deeper dives on specific verticals, see our sister posts on &lt;a href="https://thenextgennexus.com/?p=1398" rel="noopener noreferrer"&gt;YC data for VC workflows&lt;/a&gt;, &lt;a href="https://thenextgennexus.com/?p=1416" rel="noopener noreferrer"&gt;Eastmoney for China equity research&lt;/a&gt;, and &lt;a href="https://thenextgennexus.com/?p=1409" rel="noopener noreferrer"&gt;sanctions/compliance scraping&lt;/a&gt;. Browse the full category collections at &lt;a href="https://thenextgennexus.com/market-intelligence-tools/" rel="noopener noreferrer"&gt;/market-intelligence-tools/&lt;/a&gt;, &lt;a href="https://thenextgennexus.com/lead-generation-data-tools/" rel="noopener noreferrer"&gt;/lead-generation-data-tools/&lt;/a&gt;, and &lt;a href="https://thenextgennexus.com/financial-data-tools/" rel="noopener noreferrer"&gt;/financial-data-tools/&lt;/a&gt;, or hit the full &lt;a href="https://thenextgennexus.com/resources/" rel="noopener noreferrer"&gt;resources hub&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations and when NOT to use actors
&lt;/h2&gt;

&lt;p&gt;Actors are powerful, but they're not always the right tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;If a vendor has a real API, use the API.&lt;/strong&gt; Salesforce, HubSpot, Stripe, Segment — these have first-party connectors in Fivetran/Airbyte. Don't scrape what you can query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If a SaaS already aggregates the dataset, weigh the math.&lt;/strong&gt; SimilarWeb, Ahrefs, Sensor Tower, Pitchbook all charge $1k–$10k/month but give you historical depth that's expensive to backfill via scraping. Use actors when you need a specific narrow slice (10 competitors, 5 keywords) at 1/10th the price, or when the SaaS doesn't cover your niche.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Respect ToS and robots.txt.&lt;/strong&gt; Public data is fair game in most jurisdictions, but a target site's terms may prohibit automated access. LinkedIn, Glassdoor, and a few others are litigious. Check the actor's documentation for compliance notes, and never scrape gated/authenticated content you don't have rights to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you need sub-minute freshness, scraping isn't the architecture.&lt;/strong&gt; Actors run on a schedule. For real-time price changes or fraud signals, you need streaming sources, not nightly batches.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can I run actors from a Jupyter notebook?&lt;/strong&gt; Yes — install &lt;code&gt;apify-client&lt;/code&gt;, call &lt;code&gt;client.actor("name").call(run_input={...})&lt;/code&gt;, iterate the dataset with &lt;code&gt;client.dataset(run["defaultDatasetId"]).iterate_items()&lt;/code&gt;. Results come back as Python dicts, ready for pandas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Apify support webhooks?&lt;/strong&gt; Yes. Every actor run can fire webhooks on &lt;code&gt;STARTED&lt;/code&gt;, &lt;code&gt;SUCCEEDED&lt;/code&gt;, &lt;code&gt;FAILED&lt;/code&gt;, &lt;code&gt;TIMED_OUT&lt;/code&gt;, or &lt;code&gt;ABORTED&lt;/code&gt;. Payload includes the dataset ID so your receiver can pull results immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I refresh nightly?&lt;/strong&gt; Apify Console -&amp;gt; Schedules -&amp;gt; Create new, cron syntax. Or trigger from Airflow's &lt;code&gt;SimpleHttpOperator&lt;/code&gt; against the run-sync endpoint if you want orchestration in one place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about long-running actors ( &amp;gt;1h)?&lt;/strong&gt; Don't use run-sync. Use the async &lt;code&gt;POST /acts/{id}/runs&lt;/code&gt; endpoint, then either poll &lt;code&gt;GET /actor-runs/{id}&lt;/code&gt; or wait for the SUCCEEDED webhook. Apify runs can go up to 168 hours with the right memory configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I pipe data to dbt/Airflow?&lt;/strong&gt; Yes. Land actor results in a raw warehouse table (Pattern A or B), then dbt models the curated layer. For Airflow, a DAG with a PythonOperator that calls the Apify SDK, followed by a SnowflakeOperator that runs the dbt build, is the canonical shape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the JSON format stable?&lt;/strong&gt; Each actor publishes a JSON Schema on its detail page and versions it. NexGenData actors include a &lt;code&gt;schema_version&lt;/code&gt; field on every row so your downstream models can branch on changes. Subscribe to actor update notifications in the console to catch breaking changes before they hit production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational tips from production deployments
&lt;/h2&gt;

&lt;p&gt;A few hard-won lessons from teams that have run this pattern in production for 6+ months:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency keys matter.&lt;/strong&gt; Set &lt;code&gt;?clean=true&amp;amp;fields;=id,vendor,plan_name,price_usd,scraped_at&lt;/code&gt; on dataset pulls and dedupe in your raw layer on the actor's natural key. Webhooks can fire twice during Apify platform incidents; without dedup you'll double-count.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Land raw, model later.&lt;/strong&gt; Resist the urge to flatten in the ingest layer. Drop the full payload as VARIANT/JSON and let dbt or your transformation layer pick fields. When the actor schema evolves (it will), you won't have to backfill — the raw rows are intact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tag every run with a build_id.&lt;/strong&gt; Pass &lt;code&gt;{"build": "1.2"}&lt;/code&gt; in &lt;code&gt;run_input.metadata&lt;/code&gt; so you can join scraped rows back to the actor version that produced them. Useful when investigating why a metric jumped one Tuesday.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor actor health, not just data freshness.&lt;/strong&gt; Apify exposes &lt;code&gt;GET /v2/users/me/usage/monthly&lt;/code&gt; — alert when daily compute units drift more than 30% from baseline. A silently broken actor returns zero rows, which looks like "no price changes" on the dashboard. Set a freshness SLA in dbt with &lt;code&gt;dbt source freshness&lt;/code&gt; and page on stale data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backfill on first deploy.&lt;/strong&gt; Run the actor 7–14 times with date inputs (where supported) before going live, so day-over-day deltas have a baseline.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Airflow DAG sketch
&lt;/h3&gt;

&lt;p&gt;The canonical orchestration shape, for teams already running Airflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.python&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PythonOperator&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.providers.snowflake.operators.snowflake&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SnowflakeOperator&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Variable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vendors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;COMPETITORS&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ti&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;xcom_push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dataset_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;competitor_pricing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0 2 * * *&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2026&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;scrape&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PythonOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scrape&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;python_callable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_actor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="n"&gt;op_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actor_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nexgendata/saas-pricing-tracker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;load&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SnowflakeOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;load&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CALL sp_load_apify_pricing(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{{ ti.xcom_pull(key=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;dataset_id&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;) }}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;transform&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SnowflakeOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CALL run_dbt_models(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;competitor_pricing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;scrape&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;load&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;transform&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three tasks, one DAG, runs in under a minute end-to-end for the volumes in our worked example. Plug the same shape into Prefect, Dagster, or Mage if Airflow isn't your stack.&lt;/p&gt;

</description>
      <category>finance</category>
      <category>marketing</category>
      <category>ecommerce</category>
      <category>automation</category>
    </item>
    <item>
      <title>How to Build a Startup Lead List from YC Company Data</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 17:05:15 +0000</pubDate>
      <link>https://dev.to/nexgendata/how-to-build-a-startup-lead-list-from-yc-company-data-9b8</link>
      <guid>https://dev.to/nexgendata/how-to-build-a-startup-lead-list-from-yc-company-data-9b8</guid>
      <description>&lt;p&gt;If you sell to startups, recruit engineers, or build a book of business off freshly funded teams, the Y Combinator company directory is one of the highest-signal lead sources on the public internet. Every company in YC has been vetted, funded, and pushed through a structured program — they have budget, urgency, and a habit of buying tools quickly. The catch is that the directory was built for browsing, not outbound. Pulling a clean, segmented lead list out of it is where most GTM teams stall. This post is a tactical playbook for turning YC into a working pipeline asset.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: YC's Directory Is Built for Browsing, Not Pipeline
&lt;/h2&gt;

&lt;p&gt;Anyone who has tried to source startups manually from &lt;a href="https://www.ycombinator.com/companies" rel="noopener noreferrer"&gt;ycombinator.com/companies&lt;/a&gt; knows the pain. The UI lets you filter by batch, industry, region, and a handful of tags, but you cannot export. You cannot save a segment. You cannot stack a "founded in 2024, hiring engineers, based in SF, NOT in stealth" filter and walk away with a CSV. Copy-paste works for ten rows. By row fifty, you have lost an afternoon and your data is already stale because S25 just dropped.&lt;/p&gt;

&lt;p&gt;The other failure mode is static lists. People download a one-off CSV, load it into Apollo or HubSpot, and run sequences against it for six months. The YC directory is a living dataset: companies change status (Active, Acquired, Public, Inactive), team sizes move, new batches launch twice a year. A list clean in March is full of zombies by September, and your reply rates quietly tank.&lt;/p&gt;

&lt;p&gt;The third gap is segmentation depth. The native filters are coarse. You cannot easily ask "show me every Series A-stage YC company in fintech, US-based, 11-50 employees, with an open engineering role, excluding stealth." That is a textbook ICP query for an SDR team, and you cannot run it in the UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why YC Company Data Matters for Outbound
&lt;/h2&gt;

&lt;p&gt;YC companies are unusually attractive buyers for B2B sellers. They are well-capitalized — even a fresh-batch company typically closes on $500K from YC plus a SAFE round shortly after Demo Day. They move fast on tooling because the founders are usually the buyers, with no procurement gauntlet. And they have a multi-year window where the stack is still being chosen. If you land a YC company in months 0-12, you are usually in before they pick an incumbent, which means you ride the expansion curve from 5 to 50 to 500 employees.&lt;/p&gt;

&lt;p&gt;For recruiters, the math is even better. YC companies hire aggressively in years 1-3 and pay above market for senior engineering, product, and GTM talent. A founder posting their first three engineering roles is a warm intro waiting to happen. For VCs and corp dev, the directory is a real-time map of who is shipping in any vertical. For agencies, journalists, and BD professionals, it is the cleanest public list of which startups are alive, who runs them, and what they do. The value of a YC lead decays over time — the earlier you reach a company, the higher the conversion.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the YC Companies Directory Scraper Extracts
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://apify.com/nexgendata/yc-companies-directory-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=yc-companies-directory-scraper" rel="noopener noreferrer"&gt;YC Companies Directory Scraper&lt;/a&gt; on Apify pulls a structured row for every company in the directory, from batch S05 forward through the latest cohort. For each company you get roughly thirty fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identity:&lt;/strong&gt; company name, slug, YC profile URL, primary website, logo URL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch metadata:&lt;/strong&gt; batch code (W25, S25, W24, etc.), batch year, top-company badge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positioning:&lt;/strong&gt; one-line tagline, long description, primary industry, sub-industries, regions, locations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational signals:&lt;/strong&gt; current status (Active, Acquired, Public, Inactive), team size, hiring flag, careers/jobs URL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;People:&lt;/strong&gt; founder names and titles where listed on the YC profile&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Geo:&lt;/strong&gt; primary city, country, remote flag&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is enough surface area to build a tight ICP filter. Stack conditions like "batch in (W25, S25) AND status = Active AND team_size 5-50 AND hiring = true AND region = US" and you end up with a few hundred companies that are exactly your target — not a few thousand junk rows to triage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Workflow: From Directory Dump to Live Sequence
&lt;/h2&gt;

&lt;p&gt;Here is the concrete five-step playbook GTM teams run on this data. End-to-end it takes about an hour the first time, roughly ten minutes per refresh after that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Pull the batch slice.&lt;/strong&gt; Run the &lt;a href="https://apify.com/nexgendata/yc-companies-directory-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=yc-companies-directory-scraper" rel="noopener noreferrer"&gt;YC scraper&lt;/a&gt; with the batches you care about (typically the two most recent, plus the prior year for longer cycles). Export to CSV or push to Google Sheets. For most SDR teams the right slice is the last 18 months, which gives you 600-1,200 companies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Apply your ICP filter.&lt;/strong&gt; Filter by status (drop Inactive and Acquired unless those fit your persona), team size, industry, and hiring flag. A typical SDR ICP cuts the raw list by 60-80%. You are now sitting on the qualified subset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Enrich for contacts.&lt;/strong&gt; The YC profile gives you founder names but not direct emails. Pipe the qualified list through &lt;a href="https://apify.com/nexgendata/contact-info-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=contact-info-scraper" rel="noopener noreferrer"&gt;contact-info-scraper&lt;/a&gt; against each company website to surface published emails, phones, and socials. For deeper coverage on founder and exec emails, run &lt;a href="https://apify.com/nexgendata/lead-list-enricher?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=lead-list-enricher" rel="noopener noreferrer"&gt;lead-list-enricher&lt;/a&gt; to append titles, LinkedIn URLs, and verified work emails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 — Personalize on the YC tagline.&lt;/strong&gt; The one-liner and long description fields are gold for first-touch personalization. Use them as the merge variable: "Saw [Company] is building [one-liner]. We help YC-stage teams in [industry] solve [pain]." This is the single highest-leverage step for reply rate. The tagline already tells you what the founder cares about this quarter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 — Push to outbound and set cadence.&lt;/strong&gt; Upload the enriched list to Apollo, Outreach, Salesloft, or HubSpot. Build a YC-specific sequence with a three-touch cadence over ten days (email, LinkedIn connect, email). Tag by batch so you can measure reply rate by cohort — fresh batches almost always outperform older ones. Re-run the scraper monthly to catch new batches and status changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cases Across GTM, Recruiting, and Capital
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SDR / BDR outbound:&lt;/strong&gt; batch-specific sequences for the freshest cohort each quarter, with founder-direct email as the primary channel. Highest reply rates land in months 2-6 post-batch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Account executive territory planning:&lt;/strong&gt; assign new YC accounts by industry or region so AEs own a clean named-account book with intent signals built in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RevOps list hygiene:&lt;/strong&gt; use the refreshed scrape as source of truth to auto-deactivate stale CRM records. Inactive and Acquired companies should not be in active sequences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recruiter pipeline sourcing:&lt;/strong&gt; filter for hiring = true, cross-reference founder LinkedIn for warm intros. YC companies hire engineers at a multiple of the broader market rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VC competitive intel:&lt;/strong&gt; map every new YC company in your thesis areas the day each batch is announced. Decide who to chase before the herd notices.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agency lead generation:&lt;/strong&gt; design, dev, and growth agencies use YC as a primary ICP. Filter by batch and team size to find the sweet spot where companies have budget but no in-house function yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Founder peer outreach:&lt;/strong&gt; founders selling to founders use the directory to find peers for partnership, beta testing, and design-partner conversations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Journalist and analyst sources:&lt;/strong&gt; reporters use the scrape to surface trends and find founders to interview before they are saturated with press requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Corp dev and BD targeting:&lt;/strong&gt; enterprises identify M&amp;amp;A; and partnership targets filtered by stage and team size.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Investor relations:&lt;/strong&gt; VCs map portfolio peers, comps, and follow-on opportunities in the YC ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get the Data: Run the YC Companies Directory Scraper
&lt;/h2&gt;

&lt;p&gt;The fastest way to turn this playbook into pipeline is to run the actor and pull your first batch. The scraper is pay-per-result, so you only pay for the rows you actually use, and the output drops straight into CSV, JSON, Excel, or Google Sheets. Set it on a monthly schedule and you have a self-refreshing lead source that beats any static list you can buy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://apify.com/nexgendata/yc-companies-directory-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=yc-companies-directory-scraper" rel="noopener noreferrer"&gt;Run the YC Companies Directory Scraper on Apify-&amp;gt;&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Actors for Building a Complete Lead Stack
&lt;/h2&gt;

&lt;p&gt;The YC scraper is the seed list. To turn it into a contactable, segmented outbound dataset, pair it with these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/contact-info-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=contact-info-scraper" rel="noopener noreferrer"&gt;Contact Info Scraper&lt;/a&gt; — pulls emails, phones, and social profiles from any company website.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/company-enrichment-tool?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=company-enrichment-tool" rel="noopener noreferrer"&gt;Company Enrichment Tool&lt;/a&gt; — appends firmographic details (industry, size, tech stack hints, social presence) to the YC base record.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/lead-list-enricher?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=lead-list-enricher" rel="noopener noreferrer"&gt;Lead List Enricher&lt;/a&gt; — converts company rows into person-level records with titles, LinkedIn URLs, and verified work emails.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/b2b-leads-finder?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=b2b-leads-finder" rel="noopener noreferrer"&gt;B2B Leads Finder&lt;/a&gt; — discovery layer for decision-makers when YC profiles list only founders and you need VPs or directors.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/website-email-extractor?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=website-email-extractor" rel="noopener noreferrer"&gt;Website Email Extractor&lt;/a&gt; — lightweight bulk email harvester for fast sweeps across YC website URLs.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;Indie Hackers product trackers&lt;/a&gt; — complementary feed for bootstrapped founders outside the YC universe.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/founders-fund-portfolio-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=founders-fund-portfolio-scraper" rel="noopener noreferrer"&gt;Founders Fund Portfolio Scraper&lt;/a&gt; — parallel feed for sellers targeting tier-one VC-backed startups beyond YC.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/lightspeed-portfolio-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=lightspeed-portfolio-scraper" rel="noopener noreferrer"&gt;Lightspeed Portfolio Scraper&lt;/a&gt; — same pattern for the Lightspeed portfolio, useful for layering multiple VC feeds into one startup ICP.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For deeper workflows see our guide on &lt;a href="https://thenextgennexus.com/2026/05/24/export-yc-company-directory-data-vc-sourcing/" rel="noopener noreferrer"&gt;exporting YC company directory data for VC sourcing&lt;/a&gt;, the &lt;a href="https://thenextgennexus.com/2026/05/24/startup-funding-data-investors-recruiters-sales/" rel="noopener noreferrer"&gt;startup funding data playbook&lt;/a&gt;, and our walkthrough on &lt;a href="https://thenextgennexus.com/2026/05/24/how-to-extract-contact-information-for-lead-generation-workflows/" rel="noopener noreferrer"&gt;extracting contact information for lead-gen workflows&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is YC company data public?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. The directory is published on ycombinator.com/companies and is publicly browsable. Scraping the same fields visible in the UI for research, sales, and recruiting is standard practice. Respect rate limits, do not republish the raw dataset, and use the data to inform outreach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How fresh is the data?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The scraper pulls live from the directory each run. A monthly schedule catches new batches within weeks of launch and picks up status changes as companies get acquired, go public, or wind down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I filter by batch?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes — every row includes batch code (W25, S25, etc.) and batch year. Most outbound teams slice the two most recent batches plus the prior year to focus on the high-conversion window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about Demo Day-only listings?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Companies that present at Demo Day but stay off the public directory will not appear in this scrape, since the actor mirrors what is published. For Demo Day-specific intel you typically need an investor login or press partnership.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I get founder emails directly from the YC profile?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The YC profile lists founder names but not direct emails. Pipe the company list through &lt;a href="https://apify.com/nexgendata/contact-info-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=contact-info-scraper" rel="noopener noreferrer"&gt;contact-info-scraper&lt;/a&gt; or use &lt;a href="https://apify.com/nexgendata/lead-list-enricher?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=lead-list-enricher" rel="noopener noreferrer"&gt;lead-list-enricher&lt;/a&gt; to append verified work emails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is this against YC's terms of service?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Scraping publicly available pages for research is generally permitted under standard web norms, and the YC directory is a public marketing asset. The actor accesses only what is rendered to any anonymous visitor. Consult your own legal counsel for your specific use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does this compare to Crunchbase Pro or Apollo?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Crunchbase and Apollo carry YC tags, but their batch-level metadata often lags the official directory by weeks or months. Scraping the source gives you the canonical, freshest version of every field at a fraction of the per-record cost. Most teams use the YC scrape as the authoritative base and layer Apollo or Crunchbase for person-level enrichment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How often should I refresh the list?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Monthly is the sweet spot for most outbound teams. Weekly is overkill unless you run a high-velocity SDR motion or VC scout program. Quarterly is too slow and silently degrades reply rate.&lt;/p&gt;

</description>
      <category>marketing</category>
      <category>api</category>
      <category>webscraping</category>
      <category>opensource</category>
    </item>
    <item>
      <title>New: Japan BoJ Macro Rates &amp; JGB Auctions — BoJ policy rate, JGB yield curve, MoF auction calendar &amp; policy statements</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:18:13 +0000</pubDate>
      <link>https://dev.to/nexgendata/new-japan-boj-macro-rates-jgb-auctions-boj-policy-rate-jgb-yield-curve-mof-auction-calendar-3h6j</link>
      <guid>https://dev.to/nexgendata/new-japan-boj-macro-rates-jgb-auctions-boj-policy-rate-jgb-yield-curve-mof-auction-calendar-3h6j</guid>
      <description>&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;This actor is a JPY source-of-truth feed for Japan rates — Bank of Japan macro rates, the JGB yield curve, the MoF JGB auction calendar, money supply, FX reserves, and BoJ Policy Board monetary policy statements. It consolidates several official Japanese sources into one structured output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who it's for
&lt;/h2&gt;

&lt;p&gt;Japan-rates desks, macro PMs, JGB and swaps traders, and AI-agent integrations that need consolidated BoJ/MoF data without scraping multiple Japanese-language sites.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample fields / output
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;indicator / series&lt;/li&gt;
&lt;li&gt;date / period&lt;/li&gt;
&lt;li&gt;value&lt;/li&gt;
&lt;li&gt;JGB tenor&lt;/li&gt;
&lt;li&gt;yield&lt;/li&gt;
&lt;li&gt;auction date&lt;/li&gt;
&lt;li&gt;bid-to-cover&lt;/li&gt;
&lt;li&gt;money supply&lt;/li&gt;
&lt;li&gt;FX reserves&lt;/li&gt;
&lt;li&gt;policy statement text&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example use cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Track the JGB yield curve and BoJ policy rate together for a Japan-rates dashboard.&lt;/li&gt;
&lt;li&gt;Pull the MoF JGB auction calendar and bid-to-cover history for supply analysis.&lt;/li&gt;
&lt;li&gt;Feed BoJ Policy Board statements into an NLP pipeline for tone/stance scoring.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://apify.com/nexgendata/japan-boj-macro-rates-jgb-auctions?fpr=2ayu9b" rel="noopener noreferrer"&gt;&lt;strong&gt;▶ Try the Japan BoJ Macro Rates &amp;amp; JGB Auctions actor on Apify&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related actors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/treasury-yields-bonds?fpr=2ayu9b" rel="noopener noreferrer"&gt;Treasury Yields &amp;amp; Bonds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/japan-tdnet-timely-disclosures?fpr=2ayu9b" rel="noopener noreferrer"&gt;Japan TDnet Timely Disclosures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/japan-jpx-short-selling-balances?fpr=2ayu9b" rel="noopener noreferrer"&gt;Japan JPX Short-Selling Balances&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What sources does this consolidate?
&lt;/h3&gt;

&lt;p&gt;Bank of Japan macro rates and policy statements plus MoF JGB auction and yield-curve data, in one structured feed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it include monetary policy statements?
&lt;/h3&gt;

&lt;p&gt;Yes — BoJ Policy Board monetary policy statements are included alongside the rates data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is auction data covered?
&lt;/h3&gt;

&lt;p&gt;Yes — the MoF JGB auction calendar with details such as bid-to-cover is included.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>New: Japan MoF FX Intervention Tracker — every MoF yen FX intervention back to 1991 with JPY + USD estimates</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:17:34 +0000</pubDate>
      <link>https://dev.to/nexgendata/new-japan-mof-fx-intervention-tracker-every-mof-yen-fx-intervention-back-to-1991-with-jpy-usd-3l3f</link>
      <guid>https://dev.to/nexgendata/new-japan-mof-fx-intervention-tracker-every-mof-yen-fx-intervention-back-to-1991-with-jpy-usd-3l3f</guid>
      <description>&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;This actor delivers a structured feed of Japan's Ministry of Finance (MoF) yen FX intervention disclosures — every USD/JPY, EUR/JPY, and GBP/JPY operation going back to 1991, including the 2022 and 2024 yen-defense episodes. Amounts are normalized in JPY with USD estimates and bilingual (Japanese/English) descriptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who it's for
&lt;/h2&gt;

&lt;p&gt;FX desks, macro PMs, and options-vol quants who need a clean, historical record of MoF intervention rather than reconstructing it from press releases and monthly PDFs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample fields / output
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;operation date&lt;/li&gt;
&lt;li&gt;currency pair&lt;/li&gt;
&lt;li&gt;direction (yen buy / sell)&lt;/li&gt;
&lt;li&gt;amount (JPY, normalized)&lt;/li&gt;
&lt;li&gt;USD estimate&lt;/li&gt;
&lt;li&gt;reporting period&lt;/li&gt;
&lt;li&gt;bilingual description (JP / EN)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example use cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Backtest USD/JPY behavior around historical MoF intervention dates.&lt;/li&gt;
&lt;li&gt;Build an alert for new intervention disclosures during yen-defense episodes.&lt;/li&gt;
&lt;li&gt;Quantify cumulative intervention size across the 2022 and 2024 episodes for a macro note.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://apify.com/nexgendata/japan-mof-fx-intervention-tracker?fpr=2ayu9b" rel="noopener noreferrer"&gt;&lt;strong&gt;▶ Try the Japan MoF FX Intervention Tracker actor on Apify&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related actors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/fx-rates-tracker?fpr=2ayu9b" rel="noopener noreferrer"&gt;FX Rates Tracker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/japan-jpx-short-selling-balances?fpr=2ayu9b" rel="noopener noreferrer"&gt;Japan JPX Short-Selling Balances&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/japan-tdnet-timely-disclosures?fpr=2ayu9b" rel="noopener noreferrer"&gt;Japan TDnet Timely Disclosures&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How far back does the intervention data go?
&lt;/h3&gt;

&lt;p&gt;Back to 1991, covering every disclosed MoF operation including the 2022 and 2024 yen-defense episodes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are amounts available in USD?
&lt;/h3&gt;

&lt;p&gt;Yes — operations are normalized in JPY with USD estimates alongside.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which currency pairs are included?
&lt;/h3&gt;

&lt;p&gt;USD/JPY, EUR/JPY, and GBP/JPY operations as disclosed by Japan's Ministry of Finance.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>New: Korea BoK ECOS — Base Rate &amp; Macro — Bank of Korea base rate, M1/M2, GDP, CPI &amp; FX reserves as structured macro records</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:16:34 +0000</pubDate>
      <link>https://dev.to/nexgendata/new-korea-bok-ecos-base-rate-macro-bank-of-korea-base-rate-m1m2-gdp-cpi-fx-reserves-as-5bf8</link>
      <guid>https://dev.to/nexgendata/new-korea-bok-ecos-base-rate-macro-bank-of-korea-base-rate-m1m2-gdp-cpi-fx-reserves-as-5bf8</guid>
      <description>&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;This actor wraps the Bank of Korea (BoK) ECOS Open API into clean, structured macro records — the Korea base rate, monetary aggregates (M1/M2), GDP, CPI, FX reserves, and trade balance. It turns ECOS series into consistent JSON you can feed straight into a model or dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who it's for
&lt;/h2&gt;

&lt;p&gt;KRW/USD FX desks, EM-rates PMs, Korean-equity strategists, macro researchers, and AI-agent integrations that need Bank of Korea series without hand-managing ECOS series codes and pagination.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample fields / output
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;series / statistic code&lt;/li&gt;
&lt;li&gt;indicator name&lt;/li&gt;
&lt;li&gt;period (date)&lt;/li&gt;
&lt;li&gt;value&lt;/li&gt;
&lt;li&gt;unit&lt;/li&gt;
&lt;li&gt;frequency&lt;/li&gt;
&lt;li&gt;base rate %&lt;/li&gt;
&lt;li&gt;monetary aggregate (M1 / M2)&lt;/li&gt;
&lt;li&gt;CPI / GDP / FX-reserves value&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example use cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Track the BoK base rate alongside CPI to model the Korean rate path for an FX or rates book.&lt;/li&gt;
&lt;li&gt;Pull M1/M2 and FX reserves into a macro dashboard for KRW positioning.&lt;/li&gt;
&lt;li&gt;Give an AI agent grounded Bank of Korea macro data instead of stale or hallucinated figures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://apify.com/nexgendata/korea-bok-monetary-policy-ecos?fpr=2ayu9b" rel="noopener noreferrer"&gt;&lt;strong&gt;▶ Try the Korea BoK ECOS — Base Rate &amp;amp; Macro actor on Apify&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related actors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/india-rbi-monetary-policy-statements?fpr=2ayu9b" rel="noopener noreferrer"&gt;India RBI Monetary Policy Statements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/kospi-stock-screener?fpr=2ayu9b" rel="noopener noreferrer"&gt;KOSPI Stock Screener&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/korea-dart-opendart-filings?fpr=2ayu9b" rel="noopener noreferrer"&gt;Korea DART OpenDART Filings&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Which Bank of Korea series are covered?
&lt;/h3&gt;

&lt;p&gt;Core macro series including the base rate, M1/M2, GDP, CPI, FX reserves, and trade balance from the ECOS Open API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is the data official?
&lt;/h3&gt;

&lt;p&gt;Yes — it comes directly from the Bank of Korea's ECOS Open API, normalized into structured records.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use this for AI agents?
&lt;/h3&gt;

&lt;p&gt;Yes — the structured output is designed to ground FX/rates agents and macro models in real BoK data.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>New: Korea KRX Market Stats — daily KOSPI/KOSDAQ/KONEX prices, foreign-ownership ratios &amp; market-cap rankings</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:15:33 +0000</pubDate>
      <link>https://dev.to/nexgendata/new-korea-krx-market-stats-daily-kospikosdaqkonex-prices-foreign-ownership-ratios--22l9</link>
      <guid>https://dev.to/nexgendata/new-korea-krx-market-stats-daily-kospikosdaqkonex-prices-foreign-ownership-ratios--22l9</guid>
      <description>&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;This actor returns daily market statistics from KRX (Korea Exchange) across KOSPI, KOSDAQ, and KONEX — settlement prices, foreign-ownership ratios, trading volumes, and market-cap rankings. Query by market or by a single ticker over a date range and get structured Korean-equity reference data back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who it's for
&lt;/h2&gt;

&lt;p&gt;Korean-equity strategists, EM and Asia-focused PMs, quant teams modeling foreign-investor flows, and AI agents that need clean KRX reference data without parsing the exchange's Korean-language portal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample fields / output
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;ticker code&lt;/li&gt;
&lt;li&gt;issuer name&lt;/li&gt;
&lt;li&gt;market (KOSPI / KOSDAQ / KONEX)&lt;/li&gt;
&lt;li&gt;settlement price (KRW)&lt;/li&gt;
&lt;li&gt;foreign-ownership ratio %&lt;/li&gt;
&lt;li&gt;trading volume&lt;/li&gt;
&lt;li&gt;market capitalization&lt;/li&gt;
&lt;li&gt;market-cap rank&lt;/li&gt;
&lt;li&gt;trade date&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example use cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Track foreign-ownership ratios over time to model foreign-investor positioning in Korean names.&lt;/li&gt;
&lt;li&gt;Build a daily KOSPI/KOSDAQ reference table for a screener or factor model.&lt;/li&gt;
&lt;li&gt;Pull a single ticker's price and market-cap history across a date range for backtests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://apify.com/nexgendata/korea-krx-market-statistics?fpr=2ayu9b" rel="noopener noreferrer"&gt;&lt;strong&gt;▶ Try the Korea KRX Market Stats actor on Apify&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related actors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/kospi-stock-screener?fpr=2ayu9b" rel="noopener noreferrer"&gt;KOSPI Stock Screener&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/korea-dart-opendart-filings?fpr=2ayu9b" rel="noopener noreferrer"&gt;Korea DART OpenDART Filings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/korea-ipo-pipeline-tracker?fpr=2ayu9b" rel="noopener noreferrer"&gt;Korea IPO Pipeline Tracker&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is this real-time market data?
&lt;/h3&gt;

&lt;p&gt;No — it's daily reference/statistics data (settlement prices, ownership ratios, volumes), not a live tick feed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I query a single ticker over a date range?
&lt;/h3&gt;

&lt;p&gt;Yes — you can request by market or by an individual ticker across a date range.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does the foreign-ownership ratio represent?
&lt;/h3&gt;

&lt;p&gt;It's the share of a stock held by foreign investors as published in KRX statistics — useful for tracking foreign flow.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>New: Sri Lanka CSE — Colombo Stock Exchange Data — live Colombo Stock Exchange prices, gainers/losers &amp; company announcements</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:15:28 +0000</pubDate>
      <link>https://dev.to/nexgendata/new-sri-lanka-cse-colombo-stock-exchange-data-live-colombo-stock-exchange-prices-5b1g</link>
      <guid>https://dev.to/nexgendata/new-sri-lanka-cse-colombo-stock-exchange-data-live-colombo-stock-exchange-prices-5b1g</guid>
      <description>&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;This actor pulls structured market data from the Colombo Stock Exchange (CSE) in Sri Lanka — today's share prices, the day's top gainers and losers, and company announcements. Instead of scraping the exchange site by hand or stitching together PDFs, you get clean JSON ready to drop into a model, dashboard, or index pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who it's for
&lt;/h2&gt;

&lt;p&gt;Frontier- and emerging-market fund managers, index providers, quant researchers, and fintech apps that need reliable Sri Lanka equity coverage where mainstream data vendors are thin or expensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample fields / output
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;ticker / symbol&lt;/li&gt;
&lt;li&gt;company name&lt;/li&gt;
&lt;li&gt;last price (LKR)&lt;/li&gt;
&lt;li&gt;change &amp;amp; change %&lt;/li&gt;
&lt;li&gt;trading volume&lt;/li&gt;
&lt;li&gt;turnover&lt;/li&gt;
&lt;li&gt;gainer / loser rank&lt;/li&gt;
&lt;li&gt;announcement title&lt;/li&gt;
&lt;li&gt;announcement date&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example use cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Maintain a daily CSE price snapshot for an emerging-markets index or factor model.&lt;/li&gt;
&lt;li&gt;Surface the day's biggest movers on the Colombo exchange for a research desk or newsletter.&lt;/li&gt;
&lt;li&gt;Feed company announcements into an alerting workflow for Sri Lanka-listed names.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://apify.com/nexgendata/sri-lanka-cse-market-data?fpr=2ayu9b" rel="noopener noreferrer"&gt;&lt;strong&gt;▶ Try the Sri Lanka CSE — Colombo Stock Exchange Data actor on Apify&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related actors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/nse-india-stock-screener?fpr=2ayu9b" rel="noopener noreferrer"&gt;NSE India Stock Screener&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/bse-india-stock-screener?fpr=2ayu9b" rel="noopener noreferrer"&gt;BSE India Stock Screener&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/nexgendata/sgx-singapore-stock-screener?fpr=2ayu9b" rel="noopener noreferrer"&gt;SGX Singapore Stock Screener&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is the data real-time?
&lt;/h3&gt;

&lt;p&gt;No — it provides end-of-day / reference market data suitable for research, indexing, and screening rather than live trading.&lt;/p&gt;

&lt;h3&gt;
  
  
  What currency are prices in?
&lt;/h3&gt;

&lt;p&gt;Prices and turnover are reported in Sri Lankan rupees (LKR), as published by the Colombo Stock Exchange.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I get the day's top gainers and losers directly?
&lt;/h3&gt;

&lt;p&gt;Yes — gainers and losers are returned with rank, price, and change %, so you don't have to compute them yourself.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Building a Research Pipeline: From Google Scholar Search to Citation Network Analysis</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 15:09:38 +0000</pubDate>
      <link>https://dev.to/nexgendata/building-a-research-pipeline-from-google-scholar-search-to-citation-network-analysis-327g</link>
      <guid>https://dev.to/nexgendata/building-a-research-pipeline-from-google-scholar-search-to-citation-network-analysis-327g</guid>
      <description>&lt;h1&gt;
  
  
  Building a Research Pipeline: From Google Scholar Search to Citation Network Analysis
&lt;/h1&gt;

&lt;p&gt;If you've ever tried to stay current in a fast-moving research field, you know the problem: there's too much being published to read everything, but missing key papers means missing critical context. You end up doing what researchers have always done—manually searching Google Scholar, reading abstracts, following citation trails, and hoping you find the important work before it's obsoleted by the next breakthrough.&lt;/p&gt;

&lt;p&gt;What if you automated that entire workflow? What if you could systematically extract papers, analyze their citation networks, identify the most influential authors and venues, and automatically classify emerging vs. established research?&lt;/p&gt;

&lt;p&gt;That's the power of a research pipeline. Let's build one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Research Pipeline Architecture
&lt;/h2&gt;

&lt;p&gt;A complete research system has five stages:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1: Query and Extraction&lt;/strong&gt;&lt;br&gt;
Search for papers matching your research interest. Collect metadata: title, authors, publication date, abstract, citation count.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2: Retrieval and Enrichment&lt;/strong&gt;&lt;br&gt;
Get the full citation details for each paper. Extract references cited by each paper. Build a bidirectional citation graph.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 3: Classification&lt;/strong&gt;&lt;br&gt;
Categorize papers by research area, methodology, or stage of maturity (foundational vs. incremental vs. applied).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 4: Network Analysis&lt;/strong&gt;&lt;br&gt;
Identify key papers (high in-degree citations), influential authors (frequently cited across papers), core venues (conferences/journals where key work is published).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 5: Trend Detection&lt;/strong&gt;&lt;br&gt;
Compare recent papers vs. older papers. Which topics are accelerating? Which are becoming established? Which are declining?&lt;/p&gt;

&lt;p&gt;Let's work through each stage with practical code.&lt;/p&gt;
&lt;h2&gt;
  
  
  Stage 1: Query and Extraction
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://apify.com/nexgendata/google-scholar-scraper?fpr=2ayu9b" rel="noopener noreferrer"&gt;Google Scholar Scraper&lt;/a&gt; is your foundation. Configure it to search for your research topic and capture all papers meeting your criteria.&lt;/p&gt;

&lt;p&gt;Sample extracted data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"papers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Attention Is All You Need"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"authors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Ashish Vaswani"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Noam Shazeer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Parmar N."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Uszkoreit J."&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"publication_year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2017&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"venue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NIPS 2017"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"abstract"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"citation_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;84320&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pdf_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://arxiv.org/pdf/1706.03762.pdf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"google_scholar_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scholar.google.com/scholar?q=Attention+Is+All+You+Need"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Language Models are Unsupervised Multitask Learners"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"authors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Tom B. Brown"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Benjamin Mann"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Nick Reiley"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"publication_year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2019&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"venue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenAI Blog"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"abstract"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Recent work has demonstrated that transfer learning can greatly improve performance on natural language tasks. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of Internet text..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"citation_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;34290&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pdf_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"authors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Jacob Devlin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Ming-Wei Chang"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Kenton Lee"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Kristina Toutanova"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"publication_year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2018&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"venue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NAACL 2019"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"abstract"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"We introduce BERT, a new method of pre-training language representations that obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"citation_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;67450&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pdf_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://arxiv.org/pdf/1810.04805.pdf"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"search_query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"transformer language models"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1247&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"results_extracted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"extraction_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-05T10:30:00Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Stage 2: Citation Network Construction
&lt;/h2&gt;

&lt;p&gt;Here's the critical part. For each paper, you need to extract what it cites and what cites it. This creates a citation graph where nodes are papers and edges are citation relationships.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CitationNetworkBuilder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;  &lt;span class="c1"&gt;# paper_id -&amp;gt; paper_data
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# paper_id -&amp;gt; list of cited paper_ids
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cited_by&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# paper_id -&amp;gt; list of papers citing it
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_paper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Store paper metadata&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;paper_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_create_paper_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;authors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;authors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;publication_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;venue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;venue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;citation_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;citation_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;abstract&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;abstract&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;paper_id&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_citation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;citing_paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cited_paper_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Record that paper A cites paper B&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;citing_paper_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cited_paper_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cited_by&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cited_paper_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;citing_paper_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_influential_papers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_citations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Find papers cited by many others in the network&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;influential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;citing_papers&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cited_by&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;in_degree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;citing_papers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;in_degree&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;min_citations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;influential&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;paper_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;in_degree&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;in_degree&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cited_by&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;citing_papers&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# First 5 citers
&lt;/span&gt;                &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;influential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;in_degree&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_influential_authors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Find authors whose work is most cited in the network&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;author_citation_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;citing_count&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cited_by&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;paper_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;authors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;authors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;citing_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# How many papers in our network cite this
&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;authors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;author_citation_score&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;

        &lt;span class="n"&gt;top_authors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;author_citation_score&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;network_citation_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_authors&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_key_venues&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Identify conferences/journals where influential papers are published&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;venue_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;in_degree&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cited_by&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;paper_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;in_degree&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;venue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;venue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;venue_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;venue&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;in_degree&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;venue_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_paper_maturity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Classify paper as foundational, core, or emerging&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;paper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;in_degree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cited_by&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;years_published&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Heuristic: old papers with high citations are foundational
&lt;/span&gt;        &lt;span class="c1"&gt;# New papers with any citations are emerging
&lt;/span&gt;        &lt;span class="c1"&gt;# Middle ground are core
&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;years_published&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;in_degree&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Foundational&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;years_published&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;in_degree&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Emerging&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;in_degree&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Core&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Peripheral&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_create_paper_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create deterministic ID from title&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Usage example
&lt;/span&gt;&lt;span class="n"&gt;network&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CitationNetworkBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Add papers from your Google Scholar export
&lt;/span&gt;&lt;span class="n"&gt;papers_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;google_scholar_papers.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;paper&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;papers_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;papers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;paper_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_paper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Simulate citation relationships (in reality, you'd extract these from papers)
# This would come from parsing paper PDFs or Google Scholar citation links
&lt;/span&gt;&lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_citation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attention_is_all_you_need&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bert_pre_training_of_deep_bidirectional&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_citation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;language_models_are_unsupervised&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attention_is_all_you_need&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_citation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bert_pre_training_of_deep_bidirectional&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attention_is_all_you_need&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Analyze the network
&lt;/span&gt;&lt;span class="n"&gt;influential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_influential_papers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;min_citations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;top_authors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_influential_authors&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;key_venues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_key_venues&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Most Influential Papers in Your Research Area:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;paper&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;influential&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) - cited &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;in_degree&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; times&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Most Influential Authors:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_authors&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; - score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Key Venues:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;venue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;key_venues&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;venue&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; - score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Stage 3: Paper Classification
&lt;/h2&gt;

&lt;p&gt;Not all papers are equally important for your understanding. Some are seminal foundational work. Some are recent applications. Some are incremental extensions.&lt;/p&gt;

&lt;p&gt;Classify papers automatically based on multiple signals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_paper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;network_context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Multi-factor paper classification&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;abstract&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;abstract&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;publication_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;paper_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;citation_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Topic classification
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;term&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;abstract&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;term&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;survey&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;review&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;overview&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;topic_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Survey&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;term&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;abstract&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;term&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;benchmark&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dataset&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;corpus&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;topic_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Resource&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;term&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;abstract&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;term&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;implementation&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;case study&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;topic_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Application&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;term&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;abstract&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;term&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;novel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;new&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;algorithm&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;topic_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Method&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;topic_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;General&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="c1"&gt;# Maturity classification
&lt;/span&gt;    &lt;span class="n"&gt;years_old&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;years_old&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;maturity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Established&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;years_old&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;maturity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Emerging&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;maturity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Established-Recent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="c1"&gt;# Impact classification (based on citations)
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;impact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Landmark&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;impact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;High-Impact&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;impact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Moderate-Impact&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;impact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Low-Impact&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;topic_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;maturity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;maturity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;impact&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;impact&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;citations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Classify your papers
&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;google_scholar_papers.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;papers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;classifications&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;paper&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;classification&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify_paper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;classifications&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;classification&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Analyze distribution
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;
&lt;span class="n"&gt;maturity_dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;classification&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;maturity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;classifications&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Paper Maturity Distribution:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maturity_dist&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;impact_dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;classification&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;impact&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;classifications&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Paper Impact Distribution:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;impact_dist&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Stage 4 &amp;amp; 5: Trend Analysis
&lt;/h2&gt;

&lt;p&gt;Now the real intelligence emerges. Analyze trends across your research area:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_trends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Identify what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s accelerating, what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s established, what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s declining&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;by_year&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_citations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;papers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]})&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;paper&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;publication_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;by_year&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;by_year&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_citations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;citation_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;by_year&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;papers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Calculate averages
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;by_year&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;by_year&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;by_year&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_citations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by_year&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_citations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Sort by year
&lt;/span&gt;    &lt;span class="n"&gt;sorted_years&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by_year&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="c1"&gt;# Calculate growth rates
&lt;/span&gt;    &lt;span class="n"&gt;trends&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sorted_years&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;prev_year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prev_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sorted_years&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;curr_year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;curr_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sorted_years&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;growth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;curr_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;prev_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;prev_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
                  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;prev_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;trends&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;curr_year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;papers_published&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;curr_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yoy_growth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;growth&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_citations_per_paper&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;curr_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_citations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;trends&lt;/span&gt;

&lt;span class="n"&gt;trends&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyze_trends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research Trend Analysis:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;trend&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;trends&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:]:&lt;/span&gt;  &lt;span class="c1"&gt;# Last 5 years
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;trend&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;trend&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;papers_published&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; papers &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
          &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(YoY growth: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;trend&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yoy_growth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
          &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avg citations: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;trend&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_citations_per_paper&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Identify emerging topics
&lt;/span&gt;&lt;span class="n"&gt;recent_papers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;papers&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;publication_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;recent_keywords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_keywords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent_papers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;older_keywords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_keywords&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;papers&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;publication_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;emerging&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent_keywords&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;older_keywords&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;declining&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;older_keywords&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent_keywords&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Emerging Topics:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;emerging&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Declining Topics:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;declining&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;p&gt;Here's your complete workflow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 1&lt;/strong&gt;: Set up the Google Scholar Scraper to monitor your research area&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configure search queries for your field&lt;/li&gt;
&lt;li&gt;Extract all papers matching your criteria&lt;/li&gt;
&lt;li&gt;Store results as JSON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Week 2-3&lt;/strong&gt;: Build the citation network&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For each paper, extract references (manual parsing or use citation APIs)&lt;/li&gt;
&lt;li&gt;Build the citation graph&lt;/li&gt;
&lt;li&gt;Identify influential papers and authors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Week 4&lt;/strong&gt;: Classify and analyze&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Classify papers by type, maturity, impact&lt;/li&gt;
&lt;li&gt;Analyze trends&lt;/li&gt;
&lt;li&gt;Identify emerging topics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ongoing&lt;/strong&gt;: Run weekly or monthly&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Re-run the scraper for new papers&lt;/li&gt;
&lt;li&gt;Update citation counts&lt;/li&gt;
&lt;li&gt;Track trend changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical Use Cases
&lt;/h2&gt;

&lt;p&gt;Once your pipeline is built, you can:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stay Ahead of Your Field&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set alerts for when a new influential paper is published&lt;/li&gt;
&lt;li&gt;Know when your domain shifts before competitors do&lt;/li&gt;
&lt;li&gt;Identify which authors to follow for emerging trends&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Inform Product Development&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product team wants to know what's technically feasible? Check if there's recent published work.&lt;/li&gt;
&lt;li&gt;Are you solving a solved problem? The citation network tells you.&lt;/li&gt;
&lt;li&gt;What's actually novel in your approach? Compare against foundational and recent work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Build Competitive Intelligence&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which research venues are competitors focused on?&lt;/li&gt;
&lt;li&gt;What problems is the academic community solving that might become products?&lt;/li&gt;
&lt;li&gt;Which authors are most influential in your space?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Research Direction&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where are the gaps in published work? (Low citation count despite relevance)&lt;/li&gt;
&lt;li&gt;Which methods are becoming standard vs. exploratory?&lt;/li&gt;
&lt;li&gt;What adjacent fields should you be monitoring?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Use the &lt;a href="https://apify.com/nexgendata/google-scholar-scraper?fpr=2ayu9b" rel="noopener noreferrer"&gt;Google Scholar Scraper&lt;/a&gt; to extract papers systematically. Set it up to monitor your research area continuously. Then build the analysis layers on top—citation networks, classifications, trend analysis.&lt;/p&gt;

&lt;p&gt;The advantage of automating this process is that you see patterns humans miss. After running this pipeline for 3-6 months, you'll have better market intelligence about your research domain than most researchers who are doing this manually.&lt;/p&gt;

&lt;p&gt;Start with your core research area. Run the pipeline. Let the data patterns guide your next steps.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>api</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Scraping China A-Share Stock Data from Eastmoney</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 15:04:54 +0000</pubDate>
      <link>https://dev.to/nexgendata/scraping-china-a-share-stock-data-from-eastmoney-3oaf</link>
      <guid>https://dev.to/nexgendata/scraping-china-a-share-stock-data-from-eastmoney-3oaf</guid>
      <description>&lt;p&gt;If you cover Chinese equities for a living, you already know the data problem. Bloomberg's mainland coverage is exhaustive but priced for the buy-side. Refinitiv and FactSet are similar. Wind and Choice are excellent if you can read Chinese, negotiate a domestic license, and route payment onshore. And the free Western feeds -- Yahoo Finance, Google Finance, Alpha Vantage -- have spotty Shenzhen coverage and almost no Beijing Stock Exchange tickers at all. For a quant team trying to build a multi-factor model across the full A-share universe, that gap is a real research blocker.&lt;/p&gt;

&lt;p&gt;This post walks through the &lt;a href="https://apify.com/nexgendata/eastmoney-china-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=eastmoney-china-stock-screener" rel="noopener noreferrer"&gt;Eastmoney China A-Shares Screener actor&lt;/a&gt; on Apify -- what it pulls, how to wire it into a screening workflow, and how to combine it with adjacent China and Hong Kong data sources to build a coherent EM research stack without paying terminal-class fees.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The problem: China A-share data coverage is broken outside China
&lt;/h2&gt;

&lt;p&gt;The structural issue is simple. Eastmoney (东方财富, &lt;code&gt;eastmoney.com&lt;/code&gt;) is the de facto retail and semi-professional data portal for Chinese equities. It aggregates exchange data from the Shanghai Stock Exchange (SSE), Shenzhen Stock Exchange (SZSE), and the Beijing Stock Exchange (BSE), plus derived analytics, sector classifications, northbound flow snapshots, and consensus estimates. If you ask a Chinese sell-side analyst where they check a quote intraday, the answer is usually Eastmoney or Tonghuashun.&lt;/p&gt;

&lt;p&gt;The catch: Eastmoney's UI is Chinese-only, there is no documented public REST API, and the underlying JSON endpoints rotate, paginate inconsistently, and gzip-encode with non-standard headers. Western vendors either skip the full A-share universe or charge enterprise prices. Bloomberg with mainland entitlements runs $24,000+ per seat-year. Free vendors like Yahoo Finance and Alpha Vantage carry the SSE Composite and a curated slice of large caps but miss the Shenzhen Main Board long tail, almost all of ChiNext, most of the STAR Market, and effectively none of the Beijing Stock Exchange.&lt;/p&gt;

&lt;p&gt;MSCI's progressive A-share inclusion has dragged passive flow into mainland names since 2018, but coverage of the long tail -- Shenzhen mid-caps, ChiNext growth names, STAR semis, and the smaller BSE listings -- remains expensive or unavailable through Western channels. Scraping Eastmoney directly is the pragmatic workaround, and an Apify actor is the cleanest way to do it without maintaining your own headless-browser pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Why this data matters for global allocators
&lt;/h2&gt;

&lt;p&gt;China's A-share market is the world's second-largest equity market by capitalization, behind only the United States. Combined market cap across SSE, SZSE, and BSE sits in the USD 10-12 trillion range depending on the day, with roughly 5,300 listed companies as of mid-2026. For comparison, that is more than four times the listed universe of Hong Kong and roughly twice the listed count of Japan.&lt;/p&gt;

&lt;p&gt;Several investor archetypes need clean A-share data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EM and global allocators&lt;/strong&gt; rebalancing against MSCI EM, MSCI ACWI, and FTSE Global All Cap need security-master and fundamental data for every A-share constituent. MSCI's A-share inclusion factor sits at 20% for large caps; further upweights will mechanically move billions of passive flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stock Connect strategists&lt;/strong&gt; running northbound and southbound flows need eligibility flags and daily quota awareness. Northbound daily quota is RMB 52 billion per channel; southbound RMB 42 billion. Quota utilization is a tracked positioning signal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QFII and RQFII funds&lt;/strong&gt; holding A-shares directly need fundamentals normalized against international accounting concepts (IFRS / US GAAP) rather than Chinese Accounting Standards.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;China-focused hedge funds&lt;/strong&gt; running long-short books need the full universe, not the curated index slice. Alpha in China concentrates outside the largest 300 names where sell-side coverage is thinnest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quant researchers&lt;/strong&gt; backtesting factor strategies (value, quality, momentum, low-vol) need historical fundamentals across the full listed history, including delisted tickers to avoid survivorship bias.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In every case, missing the bottom half of the cap distribution -- where alpha is most likely to live -- is a non-starter.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. What the Eastmoney actor extracts
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://apify.com/nexgendata/eastmoney-china-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=eastmoney-china-stock-screener" rel="noopener noreferrer"&gt;Eastmoney China A-Shares Screener&lt;/a&gt; returns a normalized JSON record per ticker with both Chinese and English-language fields. Coverage spans all three mainland exchanges (Shanghai, Shenzhen including Main Board and ChiNext, and Beijing) plus the STAR Market segment of Shanghai. The current field set includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ticker&lt;/code&gt; -- six-digit code (e.g. &lt;code&gt;600519&lt;/code&gt; Kweichow Moutai, &lt;code&gt;000858&lt;/code&gt; Wuliangye, &lt;code&gt;300750&lt;/code&gt; CATL, &lt;code&gt;688981&lt;/code&gt; SMIC)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;name_cn&lt;/code&gt; / &lt;code&gt;name_en&lt;/code&gt; -- Chinese and best-effort English company name&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;exchange&lt;/code&gt; -- &lt;code&gt;SH&lt;/code&gt; (Shanghai), &lt;code&gt;SZ&lt;/code&gt; (Shenzhen), &lt;code&gt;BJ&lt;/code&gt; (Beijing)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;board&lt;/code&gt; -- &lt;code&gt;Main&lt;/code&gt;, &lt;code&gt;ChiNext&lt;/code&gt; (创业板), &lt;code&gt;STAR&lt;/code&gt; (科创板), &lt;code&gt;BSE&lt;/code&gt; (北交所)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sector&lt;/code&gt; / &lt;code&gt;industry&lt;/code&gt; -- CSRC top-level sector and Eastmoney sub-industry classification&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;market_cap_rmb&lt;/code&gt; -- total market cap in CNY&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;free_float_market_cap_rmb&lt;/code&gt; -- float-adjusted cap, relevant for index-weighted exposure&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pe_ttm&lt;/code&gt;, &lt;code&gt;pb&lt;/code&gt;, &lt;code&gt;ps_ttm&lt;/code&gt; -- valuation ratios&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dividend_yield&lt;/code&gt; -- trailing twelve-month dividend yield&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;roe_ttm&lt;/code&gt;, &lt;code&gt;net_margin_ttm&lt;/code&gt; -- quality factors&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;revenue_growth_yoy&lt;/code&gt;, &lt;code&gt;eps_growth_yoy&lt;/code&gt; -- growth factors&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;eps_ttm&lt;/code&gt;, &lt;code&gt;bvps&lt;/code&gt; -- per-share metrics&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;price&lt;/code&gt;, &lt;code&gt;change_pct&lt;/code&gt;, &lt;code&gt;volume&lt;/code&gt;, &lt;code&gt;turnover_rmb&lt;/code&gt; -- live quote fields&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;week_52_high&lt;/code&gt;, &lt;code&gt;week_52_low&lt;/code&gt; -- trailing range&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;stock_connect_eligible&lt;/code&gt; -- boolean for northbound (HK -&amp;gt; mainland) eligibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A sample output record looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ticker"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"600519"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name_cn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"贵州茅台"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name_en"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Kweichow Moutai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"exchange"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"board"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Main"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"sector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Consumer Staples"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"industry"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Baijiu"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"market_cap_rmb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1932000000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"free_float_market_cap_rmb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1816000000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pe_ttm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;22.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;7.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ps_ttm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;11.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"dividend_yield"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0341&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"roe_ttm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.342&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"net_margin_ttm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.521&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"revenue_growth_yoy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.151&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"eps_growth_yoy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.139&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"eps_ttm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;68.42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"bvps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;197.30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1538.20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"change_pct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;-0.0042&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"volume"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1820000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"turnover_rmb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2802000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"week_52_high"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1812.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"week_52_low"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1428.10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"stock_connect_eligible"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pricing is being repriced to &lt;strong&gt;$0.10 per stock record&lt;/strong&gt; effective May 26, 2026, under Apify's Pay-Per-Event model. A full A-share universe pull of roughly 5,300 names lands at about $530 -- a small fraction of any single seat of a Western terminal that covers the same universe, and orders of magnitude below the multi-year contracts demanded by Wind or Choice for offshore institutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Example workflow: a deep-value screen across the A-share universe
&lt;/h2&gt;

&lt;p&gt;A canonical use case for this dataset is a classical Graham-style value screen. Most Western screeners can't even express this query against the full A-share universe because half the tickers are missing.&lt;/p&gt;

&lt;p&gt;The recipe:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run the Eastmoney actor with no ticker filter to pull the full universe.&lt;/li&gt;
&lt;li&gt;Filter for &lt;code&gt;pe_ttm &amp;lt; 15&lt;/code&gt;, &lt;code&gt;pb &amp;lt; 1.5&lt;/code&gt;, and &lt;code&gt;dividend_yield &amp;gt; 0.03&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Drop names with negative trailing EPS, negative free float, or market cap below RMB 5 billion (basic liquidity screen).&lt;/li&gt;
&lt;li&gt;Tag each survivor with its board (Main / ChiNext / STAR / BSE) using the &lt;code&gt;board&lt;/code&gt; field, and segment results by board to see where value is concentrated.&lt;/li&gt;
&lt;li&gt;Optionally enrich with the &lt;a href="https://apify.com/nexgendata/china-etf-flow-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=china-etf-flow-tracker" rel="noopener noreferrer"&gt;China ETF Flow Tracker&lt;/a&gt; to see whether sector ETFs covering survivors are seeing accumulation or redemption -- a useful confirmation signal for sector-level value rotation.&lt;/li&gt;
&lt;li&gt;Cross-reference survivors against the &lt;a href="https://apify.com/nexgendata/china-ashare-insider-trades?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=china-ashare-insider-trades" rel="noopener noreferrer"&gt;China A-Share Insider Trades&lt;/a&gt; actor to flag names where the executive cohort is buying (高管增持). Insider buys are a high-information signal in the mainland market because mandatory disclosure thresholds are tighter than in many comparable jurisdictions.&lt;/li&gt;
&lt;li&gt;For each survivor with a Hong Kong dual listing, join the H-share equivalent and compute the AH Premium spread -- useful both as a relative-value entry signal and as a hedge mechanism.&lt;/li&gt;
&lt;li&gt;Export the joined dataset to CSV and load into your portfolio tooling -- Portfolio Visualizer, a custom risk model, or a Jupyter notebook running &lt;code&gt;pandas&lt;/code&gt;, &lt;code&gt;numpy&lt;/code&gt;, and &lt;code&gt;statsmodels&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A typical screen against May 2026 data returns 180-220 names. Banks, steel, coal, highways, and selected consumer staples dominate. ChiNext and STAR names are largely absent -- growth-oriented boards rarely trade below book. Invert the screen -- &lt;code&gt;revenue_growth_yoy &amp;gt; 0.25&lt;/code&gt;, &lt;code&gt;roe_ttm &amp;gt; 0.15&lt;/code&gt;, &lt;code&gt;pe_ttm &amp;lt; 30&lt;/code&gt;, restricted to ChiNext and STAR -- and you get a very different list: semi equipment, EV supply chain, biotech, industrial robotics. Both screens take a single actor run plus a few lines of &lt;code&gt;pandas&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Use cases across the EM research stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;China A-share quant research:&lt;/strong&gt; build value, quality, momentum, and low-vol factor portfolios against the full mainland universe with monthly rebalances and CSRC sector neutralization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EM portfolio rebalancing:&lt;/strong&gt; reconcile your benchmark weights when MSCI updates A-share inclusion factors or when index providers reclassify names between the Main, ChiNext, and STAR boards.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stock Connect through-channel analysis:&lt;/strong&gt; filter on &lt;code&gt;stock_connect_eligible&lt;/code&gt; to isolate the Hong Kong-accessible subset, then overlay HKEX disclosed northbound positioning to track foreign flow concentration by name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sector rotation backtests:&lt;/strong&gt; group by CSRC sector and build long-only or long-short sector momentum strategies; the &lt;code&gt;industry&lt;/code&gt; field gives finer granularity for sub-sector trades like baijiu within consumer staples or rare-earth processors within materials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QFII and RQFII strategy backtests:&lt;/strong&gt; foreign institutional investors operating under the QFII/RQFII regime need fundamental data normalized for international comparison, particularly around accounting differences between CAS and IFRS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;China-focused factor models:&lt;/strong&gt; construct Fama-French-style factor portfolios (SMB, HML, RMW, CMA) calibrated to A-share data rather than re-using developed-market loadings, which empirically misprice the value premium in China.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-listing arbitrage with H-shares:&lt;/strong&gt; pair the A-share series with the corresponding H-share Hong Kong listing and trade the Hang Seng AH Premium spread mechanically, with the A-share fundamentals as your anchor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sentiment overlays:&lt;/strong&gt; combine fundamentals with retail sentiment from the &lt;a href="https://apify.com/nexgendata/china-trends-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=china-trends-tracker" rel="noopener noreferrer"&gt;China Trends Tracker&lt;/a&gt; (Weibo, Baidu, Douyin) to detect retail-driven momentum on small-cap names before it shows up in price.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insider-flow event studies:&lt;/strong&gt; use the insider-trades actor to test whether executive buying predicts forward returns on A-share names, conditioning on board and market-cap quintile.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Investigative journalism:&lt;/strong&gt; reporters covering Chinese listed companies -- particularly around accounting concerns, related-party transactions, or sanctioned entities -- need a quick way to pull fundamentals and ownership color without a Bloomberg seat.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. Run it on Apify
&lt;/h2&gt;

&lt;p&gt;The fastest way to start is to run the actor directly against the Shanghai or Shenzhen ticker space and inspect the output before wiring it into a pipeline. The interface is straightforward: paste a ticker list (or leave blank to pull the full universe), pick the fields you want, hit run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://apify.com/nexgendata/eastmoney-china-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=eastmoney-china-stock-screener" rel="noopener noreferrer"&gt;Run the Eastmoney China A-Shares Screener on Apify-&amp;gt;&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The actor runs on Apify's standard infrastructure -- no proxy setup, no headless-browser maintenance, no Chinese-language UI to navigate. Pull a single ticker for a sanity check, then scale to the full universe. Results land in Apify's dataset format and can export to JSON, CSV, Excel, S3, Google Sheets, BigQuery, Snowflake, or a webhook. Schedule end-of-day for daily snapshots and you have a self-maintaining A-share data archive.&lt;/p&gt;

&lt;p&gt;For broader context see our &lt;a href="https://thenextgennexus.com/2026/05/17/free-china-a-share-data-scraping-apify-guide-zh/" rel="noopener noreferrer"&gt;Chinese-language deep dive 免费抓取中国A股数据&lt;/a&gt;, the &lt;a href="https://thenextgennexus.com/2026/05/15/10-best-free-stock-market-apis-2026/" rel="noopener noreferrer"&gt;Best Free Stock Market APIs guide&lt;/a&gt;, and the &lt;a href="https://thenextgennexus.com/2026/05/22/36-real-time-fx-dashboard-apify-google-sheets/" rel="noopener noreferrer"&gt;FX Dashboard with Apify and Google Sheets&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Related actors for an integrated China and Asia stack
&lt;/h2&gt;

&lt;p&gt;The Eastmoney actor is the foundation, but a serious China research workflow usually combines several feeds. The following are all publicly available on Apify under the same publisher and stitch together cleanly via shared ticker keys (mainland six-digit codes for A-shares, four-digit codes for HKEX, exchange-specific codes for KRX and NSE).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/china-etf-flow-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=china-etf-flow-tracker" rel="noopener noreferrer"&gt;China ETF Flow Tracker (东方财富ETF资金流向)&lt;/a&gt; -- daily ETF subscription and redemption flows. Useful as a sector-level demand signal that can lead price by hours to days, particularly for thematic ETFs covering semis, EV, and biotech.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/china-ashare-insider-trades?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=china-ashare-insider-trades" rel="noopener noreferrer"&gt;China A-Share Insider Trades (高管增减持)&lt;/a&gt; -- executive and major-shareholder buying and selling disclosed to CSRC. The canonical insider-flow dataset for the mainland market.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/hkex-insider-short-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=hkex-insider-short-tracker" rel="noopener noreferrer"&gt;HKEX Insider Trades and Short Interest Tracker&lt;/a&gt; -- pairs naturally with the A-share insider feed when you are trading the A/H cross-listing spread or hedging A-share exposure via HKEX shorts.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/hkex-ipo-calendar?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=hkex-ipo-calendar" rel="noopener noreferrer"&gt;HKEX IPO Calendar&lt;/a&gt; -- Hong Kong new listings, including secondary listings of mainland names returning home via H-share dual listings and the increasing flow of US-listed Chinese ADRs converting to Hong Kong primary listings.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/kospi-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=kospi-stock-screener" rel="noopener noreferrer"&gt;KOSPI Stock Screener&lt;/a&gt; -- Korea is the obvious adjacency for a North Asia equity strategy; structure is similar to the Eastmoney actor and Korean semis are often correlated with Chinese supply chain names.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/nse-india-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=nse-india-stock-screener" rel="noopener noreferrer"&gt;NSE India Stock Indices Screener&lt;/a&gt; -- the other big EM Asia market; useful for cross-region factor work and for any allocator running a barbelled EM Asia book.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How fresh is the data?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Eastmoney refreshes intraday quotes in near real time during mainland trading hours (09:30-11:30 and 13:00-15:00 CST). The actor pulls live values at run time. Fundamentals refresh daily for ratios and quarterly for underlying statements per CSRC deadlines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this include Shenzhen ChiNext and the Beijing Stock Exchange?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. The actor covers all three mainland exchanges: Shanghai (SH), Shenzhen Main Board and ChiNext (SZ), the STAR Market on Shanghai, and the Beijing Stock Exchange (BJ). BSE coverage is the main differentiator versus most Western vendors, which historically skipped BSE entirely. BSE launched in 2021 for innovation-oriented SMEs and remains effectively invisible to most non-Chinese feeds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I get historical price data?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The current actor focuses on snapshot screener fields. For historical OHLCV, schedule the actor end-of-day and build a time series in object storage or your warehouse. A dedicated historical-price actor with adjusted prices and corporate-action handling is on the roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are foreign-language names included?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. Each record includes &lt;code&gt;name_cn&lt;/code&gt; (official Chinese name) and &lt;code&gt;name_en&lt;/code&gt; (best-effort English name). English names are not always present for the smallest BSE listings, in which case fall back to a transliteration or the company's own annual-report English name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does pricing compare to Bloomberg or Wind?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
At $0.10 per stock record, a full A-share universe pull is roughly $530. Bloomberg seats covering mainland China run $24,000+ per year. Wind pricing typically lands in the high four to low five figures per year in RMB and requires onshore payment routing. The Apify approach is two orders of magnitude cheaper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I screen by Shanghai-Hong Kong Stock Connect eligibility?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. The &lt;code&gt;stock_connect_eligible&lt;/code&gt; boolean flags whether a name is northbound-accessible across both Shanghai and Shenzhen Connect channels. Re-pull after each quarterly Stock Connect list revision. Current northbound daily quota is RMB 52 billion per channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the actor handle Chinese character encoding correctly?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
All Chinese fields are returned as UTF-8 strings. Python &lt;code&gt;pandas&lt;/code&gt;, JavaScript, Excel (with UTF-8 BOM), and BigQuery handle the output natively. Use Excel's Data -&amp;gt; From Text/CSV wizard with explicit UTF-8 to avoid character mangling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use this dataset for production trading?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes, with caveats. Reconcile against your prime broker's security master and official exchange feeds (SSE Datafeed, SZSE Datafeed) where regulatory requirements apply. Best suited to research, backtesting, screening, and reporting.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;See also:&lt;a href="https://thenextgennexus.com/2026/06/10/new-dividend-aristocrats-tracker-kings-achievers-and-25-year-dividend-streaks-via-api/" rel="noopener noreferrer"&gt;New -- Dividend Aristocrats Tracker&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;See also:&lt;a href="https://thenextgennexus.com/2026/06/10/new-short-interest-tracker-days-to-cover-and-squeeze-scores-from-finra-data/" rel="noopener noreferrer"&gt;New -- Short Interest Tracker&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;See also: New -- &lt;a href="https://apify.com/nexgendata/kuaishou-trending-tracker?fpr=2ayu9b" rel="noopener noreferrer"&gt;Kuaishou Trending Tracker&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Asian Market Data Scrapers for Public Business Research</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 14:43:52 +0000</pubDate>
      <link>https://dev.to/nexgendata/asian-market-data-scrapers-for-public-business-research-5bf3</link>
      <guid>https://dev.to/nexgendata/asian-market-data-scrapers-for-public-business-research-5bf3</guid>
      <description>&lt;p&gt;Ask any Western fund manager what their best source is for Bombay Stock Exchange equities, Hong Kong insider trades, or Singapore HDB resale prices, and the answer is usually a wince. Bloomberg covers headline indices well, Refinitiv has decent A-share fundamentals if your firm pays the seven-figure subscription, and Crunchbase's APAC startup coverage thins the moment you cross south of Tokyo or west of Mumbai. For everyone else — emerging-markets equity analysts, market-entry consultants, OSINT researchers, regional VC associates — Asia-Pacific public data is a patchwork of native-language portals, government registries, and exchange microsites that no single vendor stitches together affordably.&lt;/p&gt;

&lt;p&gt;This post catalogues the Asian market data scrapers we build at NexGenData. They pull from East Money, the HK Companies Registry, MCA21, ACRA BizFile+, EDINET, the Korea Exchange, IDX, SET, PSE, Bursa Malaysia, HOSE, and the major regional social platforms — turning fragmented APAC sources into clean JSON or CSV. Whether you're tracking northbound Stock Connect flows, building a Vietnam consumer-electronics market map, or watching for SEBI enforcement, the goal is structured, refresh-on-demand data that doesn't require a Mandarin-speaking analyst and a Selenium farm.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why APAC public data stays underserved
&lt;/h2&gt;

&lt;p&gt;The reasons Western vendors under-index on Asia are structural. East Money, the de facto retail terminal for mainland Chinese investors, publishes mostly in Simplified Chinese behind aggressive bot mitigation. The HK Companies Registry charges per-document fees and gates structured exports. India's MCA21 requires CAPTCHA solves for bulk lookups. ACRA BizFile+ is priced for one-off lookups, not analyst workflows. EDINET returns XBRL bundles needing parsing, and Korea's DART lives behind a Korean-language UI. Add regional CDNs that block non-APAC IPs and the operational cost of running headless browsers across a dozen jurisdictions, and the math stops working for generalist vendors. The result: APAC equity coverage on the major Western platforms is a mile wide and an inch deep — Nikkei 225, Hang Seng, Sensex, KOSPI 200, and not much else with the same depth as the S&amp;amp;P 500 dataset.&lt;/p&gt;

&lt;p&gt;That gap is where targeted scrapers shine. Each actor below focuses on one source, handles the local-language fields, the anti-bot dance, and the API or HTML quirks specific to that exchange or registry, and returns a normalised payload.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this data matters now
&lt;/h2&gt;

&lt;p&gt;APAC weight in global equity benchmarks keeps climbing, MSCI inclusion factors for China A-shares have ratcheted up, and India's float-adjusted market cap crossed $5T in 2024. Practical use cases driving demand for structured APAC data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;China A-share research&lt;/strong&gt; — northbound flow analysis, A/H premium tracking, CSI 300 vs ChiNext vs STAR Market rotation, SOE insider activity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;India market-entry diligence&lt;/strong&gt; — MCA company master data on local competitors, director-network mapping, SEBI filings, MagicBricks for physical footprint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Singapore property and family-office research&lt;/strong&gt; — URA transactions, HDB resale, MAS-licensed institutions, ACRA UEN counterparty lookups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Southeast Asian e-commerce intel&lt;/strong&gt; — JD.com, Made-in-China, Alibaba B2B catalogues for supplier discovery and category share estimates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regional VC sourcing&lt;/strong&gt; — IPO calendars across HKEX, SGX, KOSPI plus the APAC sweep for pre-listing diligence and comp sets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OSINT and forensic research&lt;/strong&gt; — HK Companies Registry shell-tracing, Land Registry asset attribution, SFC enforcement, India MCA director-DIN graphs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;M &amp;amp;A targeting&lt;/strong&gt; — public filings, ownership disclosures, insider trade clusters, CNIPA patent grants for IP-adjacent deals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Macro and policy tracking&lt;/strong&gt; — RBI MPC statements, MAS and SFC enforcement, exchange-level IPO pipelines as leading indicators.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's covered, grouped by sub-region
&lt;/h2&gt;

&lt;p&gt;The roster below is organised by geography. Public actors link straight through; specialised regional actors not on the public marketplace are reachable from the &lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;NexGenData Apify catalog&lt;/a&gt; — message us or trigger a custom run.&lt;/p&gt;

&lt;h3&gt;
  
  
  China — financial markets
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Actor&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it returns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/eastmoney-china-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=eastmoney-china-stock-screener" rel="noopener noreferrer"&gt;Eastmoney A-Shares Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;East Money / 东方财富&lt;/td&gt;
&lt;td&gt;Full Shanghai + Shenzhen A-share universe with quotes, PE, PB, market cap, sector tags.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/china-etf-flow-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=china-etf-flow-tracker" rel="noopener noreferrer"&gt;China ETF Flow Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;East Money&lt;/td&gt;
&lt;td&gt;Daily inflows, outflows, AUM, premium/discount across mainland-listed ETFs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/china-ashare-insider-trades?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=china-ashare-insider-trades" rel="noopener noreferrer"&gt;China A-Share Insider Trades&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;SSE / SZSE disclosures&lt;/td&gt;
&lt;td&gt;Executive 增减持 records — buyer, role, share count, price, transaction window.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;HKEX Hang Seng Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;HKEX&lt;/td&gt;
&lt;td&gt;Hang Seng constituents, sector breakdown, fundamentals, A/H pair flagging.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;STAR Market Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;SSE STAR / 科创板&lt;/td&gt;
&lt;td&gt;Innovation-board listings with R&amp;amp;D intensity, lockup status, sponsor banks.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;ChiNext Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;SZSE ChiNext / 创业板&lt;/td&gt;
&lt;td&gt;Growth-board names with fundamentals, IPO date, suspension flags.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;Chinese ADRs Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;NYSE / NASDAQ&lt;/td&gt;
&lt;td&gt;US-listed China names with ADR ratio, sector, regulator status, delisting risk.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  China — social and e-commerce signal
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Actor&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it returns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/china-trends-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=china-trends-tracker" rel="noopener noreferrer"&gt;China Trends Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Weibo + Baidu + Douyin&lt;/td&gt;
&lt;td&gt;Unified daily trending topics across the three biggest discovery surfaces.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/bilibili-video-search?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=bilibili-video-search" rel="noopener noreferrer"&gt;Bilibili Video Search&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Bilibili / B站&lt;/td&gt;
&lt;td&gt;Keyword search results — title, uploader, views, likes, danmaku count.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/rednote-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=rednote-scraper" rel="noopener noreferrer"&gt;RedNote (Xiaohongshu) Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Xiaohongshu / 小红书&lt;/td&gt;
&lt;td&gt;Posts, engagement, hashtags from the dominant Gen-Z product-discovery platform.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;Weibo Hot Search Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Weibo / 微博&lt;/td&gt;
&lt;td&gt;Hourly hot-search board snapshots with rank, topic, view count.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;Douyin Trending Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Douyin / 抖音&lt;/td&gt;
&lt;td&gt;Trending video board with creator handle, view count, hashtag set.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;Zhihu Q&amp;amp;A Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Zhihu / 知乎&lt;/td&gt;
&lt;td&gt;Hot questions and answer engagement — useful for B2B and pro-consumer themes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/made-in-china-b2b-suppliers?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=made-in-china-b2b-suppliers" rel="noopener noreferrer"&gt;Made-in-China B2B Suppliers&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Made-in-China.com&lt;/td&gt;
&lt;td&gt;Supplier directory with category, location, certifications, capacity notes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;Alibaba B2B Supplier Finder&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Alibaba.com&lt;/td&gt;
&lt;td&gt;Wholesale supplier intel — Gold member status, transaction volume, MOQ.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Hong Kong &amp;amp; Taiwan
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Actor&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it returns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/hkex-ipo-calendar?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=hkex-ipo-calendar" rel="noopener noreferrer"&gt;HKEX IPO Calendar&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;HKEX&lt;/td&gt;
&lt;td&gt;Upcoming Hong Kong listings — pricing range, sponsor, lockup terms, debut date.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/hkex-insider-short-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=hkex-insider-short-tracker" rel="noopener noreferrer"&gt;HKEX Insider &amp;amp; Short Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;HKEX CCASS + disclosures&lt;/td&gt;
&lt;td&gt;Substantial-shareholder filings, director dealings, short-interest aggregates.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/hk-sfc-enforcement-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=hk-sfc-enforcement-tracker" rel="noopener noreferrer"&gt;HK SFC Enforcement Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Securities &amp;amp; Futures Commission&lt;/td&gt;
&lt;td&gt;Enforcement notices, sanctions, license suspensions across HK financial sector.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/hk-land-registry-transactions?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=hk-land-registry-transactions" rel="noopener noreferrer"&gt;HK Land Registry Transactions&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;HK Land Registry&lt;/td&gt;
&lt;td&gt;Property transaction records with consideration, parties, date, address.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;HK Companies Registry&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;HK Companies Registry&lt;/td&gt;
&lt;td&gt;CR number, directors, officers, registered address, dissolution status.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;HK Centaline Property Index&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Centaline CCL&lt;/td&gt;
&lt;td&gt;Weekly residential index — citywide and by sub-market for HK property cycle.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;HK Trademark Search&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;HK IP Department&lt;/td&gt;
&lt;td&gt;Trademark registry lookups — applicant, class, status, conflict checks.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;Taiwan TWSE Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Taiwan Stock Exchange&lt;/td&gt;
&lt;td&gt;TWSE-listed universe with fundamentals, dividend yield, foreign holdings.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  India
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Actor&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it returns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/nse-india-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=nse-india-stock-screener" rel="noopener noreferrer"&gt;NSE India Indices Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;National Stock Exchange&lt;/td&gt;
&lt;td&gt;Nifty 50, Nifty Next 50, sector indices — constituents and live fundamentals.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;BSE India Stock Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Bombay Stock Exchange&lt;/td&gt;
&lt;td&gt;Sensex and full BSE universe with quotes, fundamentals, group classification.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/ogd-india-companies-registry?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=ogd-india-companies-registry" rel="noopener noreferrer"&gt;OGD India Companies Lookup&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;data.gov.in MCA master&lt;/td&gt;
&lt;td&gt;Company master records — CIN, status, category, paid-up capital, address.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;India MCA Companies&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;MCA21&lt;/td&gt;
&lt;td&gt;CIN-keyed director lookups, charges, recent filings, signatory history.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;India MCA INC-22 / INC-32 Filings&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;MCA21&lt;/td&gt;
&lt;td&gt;Registered-office and incorporation filings — useful for fresh-funded leads.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/india-sebi-filings-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=india-sebi-filings-tracker" rel="noopener noreferrer"&gt;India SEBI Filings Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;SEBI&lt;/td&gt;
&lt;td&gt;Listed-company filings, takeover-code disclosures, enforcement orders.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/india-rbi-monetary-policy-statements?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=india-rbi-monetary-policy-statements" rel="noopener noreferrer"&gt;India RBI Monetary Policy&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Reserve Bank of India&lt;/td&gt;
&lt;td&gt;MPC statements, repo decisions, governor speeches with date and text payload.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;India MagicBricks Real Estate&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;MagicBricks&lt;/td&gt;
&lt;td&gt;Listings — price, sqft, locality, builder, possession date across metros.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Singapore
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Actor&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it returns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;Singapore ACRA Companies&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;ACRA BizFile+&lt;/td&gt;
&lt;td&gt;UEN lookup, directors, registered address, entity type, status.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;SG HDB Resale Prices&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;data.gov.sg HDB&lt;/td&gt;
&lt;td&gt;Flat resale transactions — block, sqm, lease, price, town, transaction month.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;SG URA Property Transactions&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;URA REALIS&lt;/td&gt;
&lt;td&gt;Private property caveats — project, type, area, price, tenure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;SG MAS Financial Institutions&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;MAS register&lt;/td&gt;
&lt;td&gt;Licensed banks, capital-markets services holders, insurance, payment firms.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;SG Rental Market Tracker&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;HDB + URA rental&lt;/td&gt;
&lt;td&gt;HDB and private rental contracts with median rent by district and unit type.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;SG SGX Stock Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;SGX&lt;/td&gt;
&lt;td&gt;STI constituents, sector splits, REIT yield, mainboard vs Catalist.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/singapore-mas-enforcement?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=singapore-mas-enforcement" rel="noopener noreferrer"&gt;SG MAS Enforcement&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;MAS enforcement&lt;/td&gt;
&lt;td&gt;Fines, prohibition orders, and regulator notices with date and party.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/singapore-mycareersfuture-jobs?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=singapore-mycareersfuture-jobs" rel="noopener noreferrer"&gt;SG MyCareersFuture Jobs&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;MyCareersFuture.gov.sg&lt;/td&gt;
&lt;td&gt;Job postings — employer, title, salary band, EP eligibility flag.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Japan &amp;amp; Korea
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Actor&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it returns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;TSE Japan Stock Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Tokyo Stock Exchange&lt;/td&gt;
&lt;td&gt;Nikkei 225 + Prime constituents, fundamentals, foreign ownership ratio.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/japan-edinet-insider-filings?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=japan-edinet-insider-filings" rel="noopener noreferrer"&gt;Japan EDINET Insider Filings&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;EDINET&lt;/td&gt;
&lt;td&gt;Insider trading disclosures, large-shareholder reports, parsed XBRL fields.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/kospi-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=kospi-stock-screener" rel="noopener noreferrer"&gt;KOSPI Stock Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Korea Exchange&lt;/td&gt;
&lt;td&gt;KOSPI listings — market cap, quotes, foreign holding %, sector classification.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Southeast Asia
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Actor&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it returns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;HOSE Vietnam Stock Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Ho Chi Minh Exchange&lt;/td&gt;
&lt;td&gt;VN30 + full HOSE universe with quotes, foreign room, sector tags.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;IDX Indonesia Stock Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Indonesia Stock Exchange&lt;/td&gt;
&lt;td&gt;LQ45 and full IDX list, fundamentals, ownership data.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;SET Thailand Stock Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Stock Exchange of Thailand&lt;/td&gt;
&lt;td&gt;SET50 constituents, ETF list, dividend yield, sector breakdown.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;PSE Philippines Stock Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Philippine Stock Exchange&lt;/td&gt;
&lt;td&gt;PSEi constituents, fundamentals, board lots, sector classification.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;Bursa Malaysia Stock Screener&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Bursa Malaysia&lt;/td&gt;
&lt;td&gt;KLCI universe — quotes, syariah flag, sector splits, dividends.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/nexgendata/apac-ipo-calendar-sweep?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=apac-ipo-calendar-sweep" rel="noopener noreferrer"&gt;APAC IPO Calendar Sweep&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;HKEX + SGX + KOSPI + others&lt;/td&gt;
&lt;td&gt;Pan-Asia upcoming listings consolidated in one table with venue, sponsor, date.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Example workflow: building an India market-entry brief
&lt;/h2&gt;

&lt;p&gt;Suppose you're a strategy consultant scoping India entry for a European industrial-automation client. Crunchbase is thin past the funded startups, LinkedIn is noisy, and the official MCA portal won't let you bulk-query. Here's a 90-minute workflow that produces a defensible diligence pack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — define the competitor set.&lt;/strong&gt; Run the &lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;BSE India Stock Screener&lt;/a&gt; filtered for Capital Goods / Industrial Manufacturing. Cross-reference with the &lt;a href="https://apify.com/nexgendata/nse-india-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=nse-india-stock-screener" rel="noopener noreferrer"&gt;NSE India Indices Screener&lt;/a&gt;. You now have a ranked universe of listed competitors with market cap, sector, and ticker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — pull company master data.&lt;/strong&gt; For every listed competitor plus unlisted private rivals, run the &lt;a href="https://apify.com/nexgendata/ogd-india-companies-registry?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=ogd-india-companies-registry" rel="noopener noreferrer"&gt;OGD India Companies Lookup&lt;/a&gt; or &lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;India MCA Companies&lt;/a&gt; for CIN, registered address, paid-up capital, directors, charges, last filing date. Spine of the diligence pack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — overlay regulatory signal.&lt;/strong&gt; Pipe the same CIN list through the &lt;a href="https://apify.com/nexgendata/india-sebi-filings-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=india-sebi-filings-tracker" rel="noopener noreferrer"&gt;India SEBI Filings Tracker&lt;/a&gt; to flag takeover-code disclosures, insider trades, and enforcement orders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 — physical-footprint check.&lt;/strong&gt; Use the &lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;India MagicBricks Real Estate&lt;/a&gt; actor to sweep commercial listings in competitor HQ cities — useful for facility-size and rental benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 — macro overlay.&lt;/strong&gt; Append the latest &lt;a href="https://apify.com/nexgendata/india-rbi-monetary-policy-statements?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=india-rbi-monetary-policy-statements" rel="noopener noreferrer"&gt;RBI Monetary Policy&lt;/a&gt; statement for rate-environment context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6 — export.&lt;/strong&gt; Every actor returns CSV / JSON. Drop outputs into BigQuery, Snowflake, or a single workbook. The pipeline runs unattended on a weekly schedule, so the brief stays current through the deal cycle.&lt;/p&gt;

&lt;p&gt;The same shape works for a China A-share thesis (East Money + A-share insider trades + Trends tracker), a Singapore property pitch (URA + HDB + Centaline), or a Vietnam consumer scan (HOSE + B2B suppliers + social signal).&lt;/p&gt;

&lt;h2&gt;
  
  
  Who uses these
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EM equity analysts&lt;/strong&gt; building China A-share, India, or ASEAN coverage without a Bloomberg APAC seat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fund managers&lt;/strong&gt; tracking northbound Stock Connect flows, A/H premium, and KOSPI foreign ownership shifts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Market-entry consultants&lt;/strong&gt; producing diligence packs for European or US clients scoping APAC expansion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Family offices and wealth platforms&lt;/strong&gt; in Singapore and Hong Kong needing licensed-counterparty checks via MAS and SFC registers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VC associates&lt;/strong&gt; sourcing pre-IPO and recently-listed APAC names from HKEX, SGX, KOSPI calendars and the consolidated APAC IPO sweep.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OSINT and due-diligence firms&lt;/strong&gt; running shell-company traces through HK Companies Registry, Land Registry, and India MCA director graphs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trade-finance and supply-chain teams&lt;/strong&gt; validating Chinese suppliers through Made-in-China, Alibaba B2B, and CNIPA patent records.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Journalists covering APAC business&lt;/strong&gt; needing primary-source structured data instead of vendor screenshots.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recruiters and talent intelligence&lt;/strong&gt; mapping the Singapore tech market via MyCareersFuture and EP-flagged listings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Macro and policy desks&lt;/strong&gt; tracking RBI MPC decisions, MAS and SFC enforcement, and regional IPO pipelines as leading indicators.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Start here
&lt;/h2&gt;

&lt;p&gt;Browse the full set of Asian market actors on the &lt;a href="https://apify.com/nexgendata?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=storefront" rel="noopener noreferrer"&gt;NexGenData Apify catalog&lt;/a&gt;, or jump straight into the workhorse — the &lt;a href="https://apify.com/nexgendata/eastmoney-china-stock-screener?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=eastmoney-china-stock-screener" rel="noopener noreferrer"&gt;Eastmoney A-Shares Screener&lt;/a&gt; — to see the data shape before committing to a workflow. Most actors run on per-event pricing, so you can pull a few hundred records to validate fit before scaling up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related actors worth a look
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/japan-edinet-insider-filings?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=japan-edinet-insider-filings" rel="noopener noreferrer"&gt;Japan EDINET Insider Filings&lt;/a&gt; — for Japan-focused activist or M&amp;amp;A workflows.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/hk-land-registry-transactions?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=hk-land-registry-transactions" rel="noopener noreferrer"&gt;HK Land Registry Transactions&lt;/a&gt; — asset attribution and HK property research.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/rednote-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=rednote-scraper" rel="noopener noreferrer"&gt;RedNote (Xiaohongshu) Scraper&lt;/a&gt; — Gen-Z product-discovery signal for consumer brands targeting mainland China.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/apac-ipo-calendar-sweep?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=apac-ipo-calendar-sweep" rel="noopener noreferrer"&gt;APAC IPO Calendar Sweep&lt;/a&gt; — single pan-regional IPO feed for syndicate desks.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/made-in-china-b2b-suppliers?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=made-in-china-b2b-suppliers" rel="noopener noreferrer"&gt;Made-in-China B2B Suppliers&lt;/a&gt; — sourcing and supplier-risk research.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/china-trends-tracker?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=china-trends-tracker" rel="noopener noreferrer"&gt;China Trends Tracker&lt;/a&gt; — combined Weibo / Baidu / Douyin signal for consumer thesis work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Related reading: &lt;a href="https://thenextgennexus.com/2026/05/17/free-china-a-share-data-scraping-apify-guide-zh/" rel="noopener noreferrer"&gt;Free A-Share data scraping (Chinese-language guide)&lt;/a&gt; and &lt;a href="https://thenextgennexus.com/2026/05/22/36-real-time-fx-dashboard-apify-google-sheets/" rel="noopener noreferrer"&gt;Building a real-time FX dashboard&lt;/a&gt; for the currency overlay any APAC workflow needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why do Western data vendors miss so much APAC data?
&lt;/h3&gt;

&lt;p&gt;Three structural reasons. Language — most APAC primary sources publish in CJK, Bahasa, Thai, or Vietnamese, and parsing requires per-source engineering. Anti-bot — mainland and Indian government sites block non-local IPs and rotate CAPTCHA. Economics — bespoke pipelines for ACRA, MCA21, EDINET, DART, IDX don't pencil out for generalist vendors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do these handle Chinese, Japanese, and Korean characters cleanly?
&lt;/h3&gt;

&lt;p&gt;Yes. Every actor that targets a CJK source emits UTF-8 with the original character set preserved alongside any Romanised or English fields the source provides. East Money, EDINET, KOSPI, and HKEX actors have been hardened against the usual encoding pitfalls (mojibake, half-width vs full-width punctuation, traditional vs simplified mixing).&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I monitor multiple regions in one workflow?
&lt;/h3&gt;

&lt;p&gt;Yes — the most common pattern is a scheduled Apify task that fires three or four region-specific actors in sequence and writes a combined output to a single dataset, Google Sheet, or webhook. The APAC IPO Calendar Sweep is the in-house example of that pattern, pulling HKEX, SGX, KOSPI, and adjacent venues into one stream.&lt;/p&gt;

&lt;h3&gt;
  
  
  How fresh is the China A-share data?
&lt;/h3&gt;

&lt;p&gt;The Eastmoney A-Shares Screener refreshes within seconds of East Money's own page render — effectively real-time during mainland trading hours, with end-of-day snapshots persisted to your dataset. ETF flows and insider trades follow the underlying exchange disclosure cadence (daily for most fields, intra-day for flow tracking).&lt;/p&gt;

&lt;h3&gt;
  
  
  Are these compliant with PRC and APAC data laws?
&lt;/h3&gt;

&lt;p&gt;The actors target only public, publicly-disclosed data — exchange filings, government registries, public social platforms — without bypassing login walls or paywalls. PRC PIPL, India DPDPA, Singapore PDPA, and similar regimes principally regulate personal data; structured corporate, listing, and market data is generally outside those scopes. That said, downstream use is your responsibility — if you're republishing or selling derived datasets, get local counsel involved.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you handle anti-bot defences on Chinese sites?
&lt;/h3&gt;

&lt;p&gt;A mix of rotating residential proxy pools sized for each region, fingerprint randomisation, request pacing tuned per target, and per-actor fall-through logic that retries with a different network path on classification failure. The infrastructure is shared across actors, so reliability improvements on one Chinese-source actor lift the rest.&lt;/p&gt;

&lt;h3&gt;
  
  
  What if I need a custom field, region, or source not listed?
&lt;/h3&gt;

&lt;p&gt;Most actors expose configuration that already covers common variations (ticker lists, date ranges, region filters). If you need a genuinely new source — say a specific provincial registry or a niche exchange — message via the Apify console or the NexGenData catalog page and we'll spec it. Many of the listed actors started life as custom builds for a single customer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I export to BI tools or warehouses?
&lt;/h3&gt;

&lt;p&gt;Yes — every Apify run writes to a dataset you can pull as CSV, JSON, JSONL, or Excel, or push via webhook into BigQuery, Snowflake, Postgres, Google Sheets, Airtable, or any system that accepts a POST. The dashboard guides on this site walk through the FX and sentiment-pipeline patterns that work just as well for APAC data.&lt;/p&gt;

&lt;p&gt;See also: New -- &lt;a href="https://apify.com/nexgendata/pse-edge-disclosures?fpr=2ayu9b" rel="noopener noreferrer"&gt;PSE Edge Disclosures&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>finance</category>
      <category>ai</category>
      <category>api</category>
    </item>
    <item>
      <title>Company Registry Data Tools for Business Intelligence</title>
      <dc:creator>NexGenData</dc:creator>
      <pubDate>Thu, 25 Jun 2026 07:04:56 +0000</pubDate>
      <link>https://dev.to/nexgendata/company-registry-data-tools-for-business-intelligence-3p2</link>
      <guid>https://dev.to/nexgendata/company-registry-data-tools-for-business-intelligence-3p2</guid>
      <description>&lt;p&gt;&lt;strong&gt;Company registry data is the backbone of every credible KYC, KYB, and M &amp;amp;A workflow — yet it lives in 200+ fragmented government portals, each with its own login flow, CAPTCHA, and download cap.&lt;/strong&gt; This guide walks BD teams, due-diligence analysts, sanctions investigators, and corporate-credit underwriters through the structured company-registry data tools we publish on Apify, which jurisdictions they cover, and how to assemble them into a production-grade entity-resolution pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Problem: Company Registries Are Fragmented Across 200+ Jurisdictions
&lt;/h2&gt;

&lt;p&gt;Every country runs its own corporate registry. The UK has Companies House. France has Pappers (a private aggregator over INPI/RCS). India has the Ministry of Corporate Affairs (MCA). Singapore has ACRA's BizFile+. Hong Kong has the Companies Registry's ICRIS portal. Australia has ASIC Connect. The United States has a fifty-state patchwork of Secretary-of-State filings, plus Delaware as the de facto incorporation hub. Cayman, BVI, Jersey, and Guernsey each run their own opaque, often-paywalled systems.&lt;/p&gt;

&lt;p&gt;For a KYB analyst trying to onboard a single multinational supplier, this means logging into six portals, solving three CAPTCHAs, paying a £3 fee for one PDF in Hong Kong, and copy-pasting director names into a spreadsheet because nothing exports cleanly. For a sanctions investigator mapping the ultimate beneficial owner (UBO) of a shell-company chain across three jurisdictions, it means days of manual cross-referencing. For a sales operations team trying to enrich 50,000 inbound leads with corporate registration numbers (CRN, CIN, UEN, ABN), it means the project never ships.&lt;/p&gt;

&lt;p&gt;The fragmentation isn't going away. Even where APIs exist (UK Companies House publishes one of the best), rate limits, schema drift, and bulk-download caps make them unworkable for production. The result: KYC/KYB workflows that can't scale, EDD that takes weeks instead of hours, and procurement teams approving vendors with no UBO visibility because lookup cost is too high. The fix is a thin, consistent layer of registry scrapers — one per jurisdiction, normalized, billed per result, runnable from CLI, n8n, Zapier, or any HTTP client.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Why This Data Matters: KYC, KYB, BD, M&amp;amp;A, Sanctions, Credit, OSINT
&lt;/h2&gt;

&lt;p&gt;Structured company registry data sits underneath nearly every regulated and unregulated B2B workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;KYC onboarding&lt;/strong&gt; — verify legal entity name, registration number, registered address, and incorporation date before opening a corporate account. A bank or fintech that skips this fails its AML program audit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KYB vendor due diligence&lt;/strong&gt; — procurement teams confirm a supplier is a real, active company with disclosed directors before signing a master services agreement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BD account enrichment&lt;/strong&gt; — sales teams append CRN, industry SIC codes, employee count, and director names to inbound leads to route them correctly and personalize outbound.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;M &amp;amp;A target screening&lt;/strong&gt; — corporate development teams build target lists by filtering registries on jurisdiction, SIC code, incorporation year, share-capital range, and director overlap with existing portfolio companies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sanctions enrichment and UBO mapping&lt;/strong&gt; — investigators chain registry data with sanctions watchlists to surface entities whose beneficial owners appear on OFAC SDN, UK HMT, EU Consolidated, or UN sanctions lists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credit underwriting&lt;/strong&gt; — trade credit insurers and B2B lenders pull filed accounts, charges, and director histories to score default risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OSINT and investigative journalism&lt;/strong&gt; — reporters trace shell-company networks across the UK, France, Delaware, and offshore jurisdictions to expose money laundering, tax evasion, or political corruption.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulatory horizon scanning&lt;/strong&gt; — compliance teams monitor enforcement registers (FCA, ASIC, MAS, SFC) for early warning that a counterparty or competitor is under investigation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common thread: every one of these workflows needs the data in JSON, in bulk, on a schedule, with predictable latency and cost. None of them are well-served by manual portal lookups.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. What the Actors Extract — Registry Coverage Matrix
&lt;/h2&gt;

&lt;p&gt;Below is the coverage map across the public NexGenData actor fleet. Identifiers are listed in the format each registry actually uses, because no two jurisdictions agree on what to call a company number.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Country&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Identifier&lt;/th&gt;
&lt;th&gt;Key fields extracted&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;United Kingdom&lt;/td&gt;
&lt;td&gt;Companies House&lt;/td&gt;
&lt;td&gt;CRN (8-digit)&lt;/td&gt;
&lt;td&gt;Officers, appointments, dates of birth (partial), nationality, occupation, resigned/active status&lt;/td&gt;
&lt;td&gt;KYB, director-overlap mapping, EDD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;United Kingdom&lt;/td&gt;
&lt;td&gt;Companies House PSC register&lt;/td&gt;
&lt;td&gt;CRN&lt;/td&gt;
&lt;td&gt;People with Significant Control (PSC), nature of control, share %, voting %, corporate PSCs&lt;/td&gt;
&lt;td&gt;UBO mapping, sanctions enrichment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;France&lt;/td&gt;
&lt;td&gt;Pappers (RCS / INPI)&lt;/td&gt;
&lt;td&gt;SIREN (9-digit) / SIRET (14)&lt;/td&gt;
&lt;td&gt;Officers (dirigeants), share capital, NAF/APE code, registered address, RCS filings&lt;/td&gt;
&lt;td&gt;French KYB, M&amp;amp;A screening in EU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;India&lt;/td&gt;
&lt;td&gt;MCA / OGD&lt;/td&gt;
&lt;td&gt;CIN (21-char)&lt;/td&gt;
&lt;td&gt;Master data, registered address, paid-up capital, ROC, listing status, directors (DIN), date of incorporation&lt;/td&gt;
&lt;td&gt;India KYB, supplier vetting, group-structure mapping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;India&lt;/td&gt;
&lt;td&gt;MCA filings (INC-22/32)&lt;/td&gt;
&lt;td&gt;CIN / SRN&lt;/td&gt;
&lt;td&gt;Registered-office changes (INC-22), director appointments/changes (INC-32/DIR-12), filing dates&lt;/td&gt;
&lt;td&gt;Change detection, EDD trigger events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Singapore&lt;/td&gt;
&lt;td&gt;ACRA BizFile+&lt;/td&gt;
&lt;td&gt;UEN (9-10 char)&lt;/td&gt;
&lt;td&gt;Entity name, status, address, directors, shareholders, business activities (SSIC), paid-up capital&lt;/td&gt;
&lt;td&gt;Singapore KYB, regional HQ verification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hong Kong&lt;/td&gt;
&lt;td&gt;Companies Registry (ICRIS)&lt;/td&gt;
&lt;td&gt;CR number (7-digit)&lt;/td&gt;
&lt;td&gt;Name, status, directors, secretary, registered office, charges, annual return dates&lt;/td&gt;
&lt;td&gt;HK KYB, offshore-structure investigation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Australia&lt;/td&gt;
&lt;td&gt;ASIC Connect&lt;/td&gt;
&lt;td&gt;ACN (9-digit) / ABN (11)&lt;/td&gt;
&lt;td&gt;Entity status, registration date, type, jurisdiction, address, EX/AX flags&lt;/td&gt;
&lt;td&gt;AU KYB, ASX-adjacent due diligence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;United States&lt;/td&gt;
&lt;td&gt;Multi-state Secretary of State&lt;/td&gt;
&lt;td&gt;State filing number / EIN&lt;/td&gt;
&lt;td&gt;Entity name, status, registered agent, formation date, jurisdiction, principal address&lt;/td&gt;
&lt;td&gt;US KYB across all 50 states&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;United States&lt;/td&gt;
&lt;td&gt;Delaware Division of Corporations&lt;/td&gt;
&lt;td&gt;File number&lt;/td&gt;
&lt;td&gt;Entity name, file number, incorporation date, status, registered agent&lt;/td&gt;
&lt;td&gt;Delaware-domiciled entity verification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;China&lt;/td&gt;
&lt;td&gt;CNIPA&lt;/td&gt;
&lt;td&gt;Patent number / applicant name&lt;/td&gt;
&lt;td&gt;Applicant entity, address, IPC classification, grant/publication dates&lt;/td&gt;
&lt;td&gt;China entity discovery via IP filings (adjacent)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Output schemas are normalized across actors where possible: &lt;code&gt;entity_name&lt;/code&gt;, &lt;code&gt;jurisdiction&lt;/code&gt;, &lt;code&gt;identifier&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;incorporation_date&lt;/code&gt;, &lt;code&gt;registered_address&lt;/code&gt;, &lt;code&gt;officers[]&lt;/code&gt;, &lt;code&gt;beneficial_owners[]&lt;/code&gt;. Source-specific fields (PSC nature of control, NAF code, SIC code, SSIC, etc.) are preserved in a &lt;code&gt;raw&lt;/code&gt; object so downstream parsers don't lose fidelity.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Example Workflow — Building a UK Supplier KYB Pipeline
&lt;/h2&gt;

&lt;p&gt;Imagine you're the head of procurement compliance at a mid-market UK SaaS company. You've inherited a vendor master list of 4,200 active suppliers. Internal audit wants UBO disclosure on every supplier by end of quarter, plus an exception report for anyone with adverse regulatory history. Here's the pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Normalize identifiers.&lt;/strong&gt; Your ERP exports company names and VAT numbers. For UK suppliers, resolve to a CRN by name+postcode lookup or VAT-to-CRN cross-reference. Aim for &amp;gt;95% match rate; flag the rest for manual review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Enrich officers via Companies House.&lt;/strong&gt; Run the &lt;a href="https://apify.com/nexgendata/business-registration-lookup?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=business-registration-lookup" rel="noopener noreferrer"&gt;Business Registration Lookup actor&lt;/a&gt; (UK Companies House Officers actor available on request) over all 4,200 CRNs in batches of 500. You get a director-level dataset: name, role, appointed date, resigned date, date of birth (month/year), nationality, occupation. Materialize this into a &lt;code&gt;suppliers_officers&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Pull PSC / beneficial ownership.&lt;/strong&gt; Run the &lt;a href="https://apify.com/nexgendata/ogd-india-companies-registry?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=ogd-india-companies-registry" rel="noopener noreferrer"&gt;OGD India Companies Master Data Lookup&lt;/a&gt; for India suppliers, and a comparable UK PSC actor over the same CRNs. PSC records give you the legally-disclosed beneficial owners with &amp;gt;25% ownership/voting/control, plus corporate PSCs (where a holding company is itself the PSC). For corporate PSCs, recurse: pull &lt;em&gt;their&lt;/em&gt; PSC register, and again, until you bottom out at a natural person or hit a non-UK jurisdiction (Cayman, BVI, Jersey — flag for manual EDD).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 — Cross-check against regulatory enforcement.&lt;/strong&gt; Run the resolved officer and beneficial-owner names against the &lt;a href="https://apify.com/nexgendata/australia-asic-enforcement?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=australia-asic-enforcement" rel="noopener noreferrer"&gt;Australia ASIC Enforcement Tracker&lt;/a&gt; (and the equivalent UK FCA Enforcement Tracker) for financial-services regulatory history, and against the &lt;a href="https://apify.com/nexgendata/delaware-corporations-search-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=delaware-corporations-search-scraper" rel="noopener noreferrer"&gt;Delaware Corporations Search&lt;/a&gt; for US-incorporated counterparties (and the OFAC SDN Watchlist scraper from the catalog) for US sanctions exposure. Fuzzy-match on name + DOB to avoid false positives on common names.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 — Score risk.&lt;/strong&gt; Compute a per-supplier risk score: base score from entity status (dissolved/in-liquidation = high), director-overlap with known-bad entities, PSC concealment (corporate PSCs in opaque jurisdictions), and any enforcement hit. Anything above a threshold goes to enhanced due diligence (EDD).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6 — Feed procurement tooling.&lt;/strong&gt; Push the enriched dataset into your procurement system (Coupa, Ariba, Ivalua) as supplier attributes, and set up a weekly delta job: re-run steps 2–4 only for suppliers whose Companies House &lt;code&gt;last_updated&lt;/code&gt; timestamp has changed, plus any new suppliers added in the past 7 days.&lt;/p&gt;

&lt;p&gt;End-to-end this is a 1–2 day build, versus 4–6 weeks of analyst time manually. Marginal cost per supplier is well under $0.10 including all enrichment and screening calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Use Cases at a Glance
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;KYC onboarding for fintechs and banks&lt;/strong&gt; — verify corporate customers at account opening, populate AML/CDD records, log evidence for regulator audit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KYB vendor due diligence&lt;/strong&gt; — procurement and TPRM (third-party risk management) teams baseline every new vendor before contracting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sales operations account enrichment&lt;/strong&gt; — append registration data to inbound leads for routing, scoring, and outbound personalization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;M &amp;amp;A target screening&lt;/strong&gt; — corp dev teams filter on jurisdiction, SIC code, age, and director patterns to build long lists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Beneficial-owner mapping&lt;/strong&gt; — investigators trace ownership chains across PSC, UEN, CIN, and offshore jurisdictions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sanctions exposure screening&lt;/strong&gt; — chain UBO data with OFAC, HMT, EU, UN watchlists to surface indirect sanctions risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OSINT and investigative journalism&lt;/strong&gt; — map shell-company networks, expose hidden directorships, surface political exposure (PEP-adjacent).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trade credit underwriting&lt;/strong&gt; — pull filed accounts, charges, and director histories to predict default risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulatory horizon scanning&lt;/strong&gt; — monitor FCA, ASIC, MAS, SFC, and SEBI enforcement actions tied to counterparty entities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Director-overlap analytics&lt;/strong&gt; — surface directors who sit on the boards of competitors, suppliers, or distressed entities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. Run It Yourself — Business Registration Lookup
&lt;/h2&gt;

&lt;p&gt;The fastest way to feel the difference between portal-by-portal lookups and a normalized API is to actually run one. Start with the highest-volume use case: enriching a list of US-incorporated companies with their state filing records and registered-agent data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://apify.com/nexgendata/business-registration-lookup?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=business-registration-lookup" rel="noopener noreferrer"&gt;Run the Business Registration Lookup on Apify →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Paste a CSV of entity names or filing numbers, pick output format (JSON / CSV / Excel), and run. Results stream into your Apify dataset and can be pulled via API, webhook, n8n, Zapier, or downloaded as a single file. Pricing is pay-per-result with no monthly minimum, so you can validate the workflow on 50 companies before committing to 50,000.&lt;/p&gt;

&lt;p&gt;For Delaware-domiciled holdcos, chain it with the &lt;a href="https://apify.com/nexgendata/delaware-corporations-search-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=delaware-corporations-search-scraper" rel="noopener noreferrer"&gt;Delaware Corporations Search actor&lt;/a&gt; — the schemas are designed to join on &lt;code&gt;entity_name&lt;/code&gt; + &lt;code&gt;state&lt;/code&gt; with no munging.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Related Actors for Cross-Jurisdiction and Risk Enrichment
&lt;/h2&gt;

&lt;p&gt;Real KYB/KYC programs span jurisdictions and data domains. These actors slot into the same pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/singapore-acra-company-lookup?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=singapore-acra-company-lookup" rel="noopener noreferrer"&gt;Singapore ACRA / BizFile+ Company Lookup&lt;/a&gt; — UEN-based entity verification for Singapore-registered suppliers, regional HQs, and APAC fund structures.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/ogd-india-companies-registry?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=ogd-india-companies-registry" rel="noopener noreferrer"&gt;OGD India Companies Master Data Lookup&lt;/a&gt; — CIN lookup over India's Open Government Data master file, ideal for India KYB and group-structure mapping.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/business-registration-lookup?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=business-registration-lookup" rel="noopener noreferrer"&gt;Business Registration Lookup (US multi-state)&lt;/a&gt; — Secretary-of-State filing records across the US, including Delaware, California, Texas, New York, and Florida.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/delaware-corporations-search-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=delaware-corporations-search-scraper" rel="noopener noreferrer"&gt;Delaware Corporations Search&lt;/a&gt; — dedicated Delaware Division of Corporations lookup for the world's most common incorporation jurisdiction.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/australia-asic-enforcement?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=australia-asic-enforcement" rel="noopener noreferrer"&gt;Australia ASIC Enforcement Tracker&lt;/a&gt; — regulatory-action cross-reference for Australian entities and directors.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/nexgendata/yc-companies-directory-scraper?fpr=2ayu9b&amp;amp;utm_source=thenextgennexus&amp;amp;utm_medium=page&amp;amp;utm_campaign=yc-companies-directory-scraper" rel="noopener noreferrer"&gt;YC Companies Directory&lt;/a&gt; — private-market cross-reference for early-stage US-incorporated entities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Are company registries public data?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Most national corporate registries are statutorily public — the UK Companies House, French Pappers (over INPI/RCS), Singapore ACRA, India MCA, Australia ASIC, and US state Secretary-of-State systems all publish entity and officer data because incorporation law requires it. Some registries charge fees for individual document downloads (HK Companies Registry charges per filing; Cayman and BVI are largely paywalled), but the existence and basic data of a registered company is public almost everywhere. Scraping is legally distinct from the data being public — always review each registry's terms of service and applicable jurisdictional law before automating at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's UBO and how is it captured?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
UBO stands for Ultimate Beneficial Owner — the natural person who ultimately owns or controls a legal entity, typically defined as &amp;gt;25% ownership, voting rights, or control. The UK captures this through the PSC (People with Significant Control) register, mandated since 2016. Singapore captures it via the ACRA Register of Registrable Controllers. France captures it via the RBE (Registre des Bénéficiaires Effectifs). Hong Kong via the SCR (Significant Controllers Register). Coverage and accessibility vary widely; the UK PSC register is the most open and machine-readable, which is why it's the starting point for most cross-jurisdiction UBO investigations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I bulk-screen 10,000 suppliers?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. The actors are designed for bulk inputs. For 10,000 UK CRNs, batch into runs of 500–1,000 and stream results via webhook into your data warehouse. End-to-end runtime for a 10K CRN officers-and-PSC pull is typically a few hours, with marginal cost in the low tens of dollars depending on the actor. For ongoing monitoring, set up a weekly delta job keyed on &lt;code&gt;last_updated&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do you cover the Cayman Islands or BVI?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Not yet for the offshore jurisdictions (Cayman, BVI, Jersey, Guernsey, Bermuda) — these registries are paywalled per-document and largely don't expose bulk lookup. The pragmatic workflow for offshore exposure: identify the offshore entity via PSC chains in covered jurisdictions (UK PSC will name a Cayman holding company as the corporate PSC), then escalate to a paid lookup service (Companies House International, OpenCorporates Premium, or a regulated EDD provider) for the offshore leg. Coverage of offshore registries is on the roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How fresh is the data?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Each actor pulls live from the source registry at run time, so freshness equals registry freshness. UK Companies House updates within minutes of a filing. Singapore ACRA and Australia ASIC update within hours to a day. India MCA can lag 24–72 hours. France Pappers refreshes daily from INPI. The actors don't cache stale data — each run is a fresh query against the source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I export to my CRM or KYC platform?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. Apify exposes datasets via REST API, webhooks, and direct integrations with n8n, Zapier, Make, and dozens of warehouse connectors. Common patterns: webhook to a Lambda/Cloud Function that upserts into Salesforce, HubSpot, or your KYC platform (Onfido, Sumsub, ComplyAdvantage); scheduled run with output to S3 or Google Cloud Storage for warehouse ingestion (Snowflake, BigQuery, Databricks); direct CSV download for ad-hoc analyst use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you handle name fuzzy-matching across jurisdictions?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Entity names are messy — "Acme Holdings Limited" in the UK might be "Acme Holdings (HK) Ltd" in Hong Kong and "Acme Holdings Pte Ltd" in Singapore. The actors return canonical names from each registry; cross-jurisdiction resolution is the caller's responsibility. We recommend a two-stage match: (1) exact match on registration number where available (LEI, CRN, UEN, CIN), (2) fuzzy match on normalized name (lowercase, strip suffixes, strip punctuation) + address + director-overlap as a tiebreaker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is this compliant with GDPR?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Officer and PSC data published by national registries is public by statute, and processing it for KYC, KYB, AML, sanctions screening, and due diligence is generally a legitimate-interest or legal-obligation basis under GDPR Article 6. You remain controller for onward processing — document lawful basis, honor data-subject rights, and consult your DPO before launch.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Published as part of the NexGenData public registry data tools series. Explore the full&lt;a href="https://thenextgennexus.com/category/public-registry-data-tools/" rel="noopener noreferrer"&gt;Public Registry Data Tools category&lt;/a&gt; for jurisdiction-specific deep dives, or browse related coverage on &lt;a href="https://thenextgennexus.com/2026/05/24/sanctions-data-tools-for-due-diligence-and-risk-research/" rel="noopener noreferrer"&gt;sanctions data tools&lt;/a&gt;, &lt;a href="https://thenextgennexus.com/2026/05/24/track-asic-enforcement-actions-structured-data/" rel="noopener noreferrer"&gt;ASIC enforcement tracking&lt;/a&gt;, and &lt;a href="https://thenextgennexus.com/2026/05/14/court-records-research-for-legal-due-diligence-workflows-2026/" rel="noopener noreferrer"&gt;court records due diligence&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>finance</category>
      <category>api</category>
    </item>
  </channel>
</rss>
