<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Simon Mak</title>
    <description>The latest articles on DEV Community by Simon Mak (@simonplmakcloud).</description>
    <link>https://dev.to/simonplmakcloud</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3772685%2Ff76b8949-5316-4b38-b4d1-63992d239ee1.jpeg</url>
      <title>DEV Community: Simon Mak</title>
      <link>https://dev.to/simonplmakcloud</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/simonplmakcloud"/>
    <language>en</language>
    <item>
      <title>Why AI Agents Need Accessibility Skills: Building WCAG AAA Compliance Into AI Code Generation</title>
      <dc:creator>Simon Mak</dc:creator>
      <pubDate>Sun, 15 Feb 2026 13:35:53 +0000</pubDate>
      <link>https://dev.to/simonplmakcloud/why-ai-agents-need-accessibility-skills-building-wcag-aaa-compliance-into-ai-code-generation-4mdl</link>
      <guid>https://dev.to/simonplmakcloud/why-ai-agents-need-accessibility-skills-building-wcag-aaa-compliance-into-ai-code-generation-4mdl</guid>
      <description>&lt;p&gt;I have open-sourced a toolkit that is both a traditional design system and an &lt;strong&gt;AI agent skill&lt;/strong&gt; for building WCAG 2.2 AAA compliant web applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/simonplmak-cloud/wcag-aaa-web-design" rel="noopener noreferrer"&gt;simonplmak-cloud/wcag-aaa-web-design&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My goal was to solve two problems at once:&lt;/p&gt;

&lt;h2&gt;
  
  
  The ESG Problem
&lt;/h2&gt;

&lt;p&gt;Companies need a reliable way to meet digital accessibility requirements (such as the European Accessibility Act, effective June 2025) for their ESG social responsibility goals. Digital inclusion is a key part of the "S" in ESG. This toolkit provides a production-ready, token-based design system for building enterprise web applications that meet the highest accessibility standard.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI Problem
&lt;/h2&gt;

&lt;p&gt;AI coding agents are powerful, but they often generate inaccessible code. This creates a future where the automated web is unusable for people with disabilities. This project is structured as an &lt;strong&gt;AI agent skill&lt;/strong&gt;, meaning an AI assistant can use it to autonomously build a fully compliant website by enforcing accessibility at the component and template level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why These Two Problems Are Connected
&lt;/h2&gt;

&lt;p&gt;AI agents navigate the web using the same &lt;strong&gt;Accessibility Tree&lt;/strong&gt; as screen readers. Research shows agents are significantly more effective on accessible sites (~85% task success vs. ~50% on inaccessible ones). By making accessibility a core part of the AI development process, we ensure the agent-driven web is inclusive by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Toolkit Includes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A full, token-based corporate design system (CSS custom properties, no hardcoded values)&lt;/li&gt;
&lt;li&gt;Secure, responsive, accessible HTML/CSS/JS templates (header, footer, data tables, sidebar navigation, empty states)&lt;/li&gt;
&lt;li&gt;In-depth reference guides on WCAG 2.2 AAA compliance, ARIA patterns, enterprise UX, security, and error handling&lt;/li&gt;
&lt;li&gt;Automated validation scripts (contrast checking, pa11y auditing)&lt;/li&gt;
&lt;li&gt;Framework-agnostic: works with any tech stack&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;This is an attempt at responsible innovation. As AI agents increasingly write our code and navigate our websites, accessibility is no longer just about compliance. It is the shared interface between humans and machines. Building for accessibility is building for the AI agent economy.&lt;/p&gt;

&lt;p&gt;I would welcome any feedback on this approach, or contributions to the project.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Repository&lt;/em&gt;: &lt;a href="https://github.com/simonplmak-cloud/wcag-aaa-web-design" rel="noopener noreferrer"&gt;github.com/simonplmak-cloud/wcag-aaa-web-design&lt;/a&gt;&lt;br&gt;
&lt;em&gt;License&lt;/em&gt;: MIT&lt;/p&gt;

</description>
      <category>a11y</category>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building a Financial Data Pipeline: How I Scraped 25 Years of Stock Market Filings with Python and a Graph Database</title>
      <dc:creator>Simon Mak</dc:creator>
      <pubDate>Sat, 14 Feb 2026 13:30:22 +0000</pubDate>
      <link>https://dev.to/simonplmakcloud/building-a-financial-data-pipeline-how-i-scraped-25-years-of-stock-market-filings-with-python-and-3e4f</link>
      <guid>https://dev.to/simonplmakcloud/building-a-financial-data-pipeline-how-i-scraped-25-years-of-stock-market-filings-with-python-and-3e4f</guid>
      <description>&lt;h2&gt;
  
  
  The Discovery: An Undocumented JSON API
&lt;/h2&gt;

&lt;p&gt;The official HKEx website is a maze of JavaScript and session-based navigation. Scraping it directly with tools like Selenium would be slow, brittle, and a constant maintenance headache. I knew there had to be a better way.&lt;/p&gt;

&lt;p&gt;After some digging in my browser's network tab while using the official search portal, I found a hidden gem: an undocumented JSON API. The website's frontend was making calls to a &lt;code&gt;titleSearchServlet.do&lt;/code&gt; endpoint that returned clean, structured JSON data. This was the key. By mimicking these API calls, I could bypass the browser entirely and get the data directly from the source.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack: Python, Requests, and SurrealDB
&lt;/h2&gt;

&lt;p&gt;With the API discovered, I chose a simple but powerful stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Python&lt;/strong&gt;: For its rich data processing ecosystem and ease of use.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Requests&lt;/strong&gt;: A straightforward library for making the necessary HTTP calls to the HKEx API.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;SurrealDB&lt;/strong&gt;: A multi-model database that was a perfect fit for this project. I could store the filing metadata as structured documents and, more importantly, create graph relationships between companies and filings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture: A Two-Phase Pipeline
&lt;/h2&gt;

&lt;p&gt;The process is broken down into two main phases: scraping the metadata and then enriching it with the full document content.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Scraping Filing Metadata
&lt;/h3&gt;

&lt;p&gt;The first step is to fetch the metadata for every filing. Since the HKEx API limits searches to one-month intervals when no stock code is specified, I had to generate monthly date chunks and iterate through them.&lt;/p&gt;

&lt;p&gt;Here's how the &lt;code&gt;generate_monthly_chunks&lt;/code&gt; function works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_monthly_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate (chunk_from, chunk_to) pairs in 1-month increments (newest first).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_from&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_from&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;chunk_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_from&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_day&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;monthrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;chunk_end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_day&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;date_to&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;chunk_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_end&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each chunk, the &lt;code&gt;fetch_chunk_via_api&lt;/code&gt; function first sends a POST request to the search page to set the date range in the server's session, then makes paginated GET requests to the JSON API endpoint to retrieve all the records.&lt;/p&gt;

&lt;p&gt;The raw JSON from the API looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"FILE_INFO"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"53KB"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"NEWS_ID"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"12022263"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"STOCK_NAME"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ZHONGTAIFUTURES"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"STOCK_CODE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"01461"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"TITLE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Articles of Association"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"FILE_TYPE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PDF"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"DATE_TIME"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"11/02/2026 19:10"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"FILE_LINK"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/listedco/listconews/sehk/2026/0211/2026021100854.pdf"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This data is parsed, cleaned, and stored in a &lt;code&gt;SCHEMAFULL&lt;/code&gt; table in SurrealDB called &lt;code&gt;exchange_filing&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Downloading and Extracting Content
&lt;/h3&gt;

&lt;p&gt;With the metadata in place, the next step is to download the actual filing documents (PDF, HTML, or Excel) and extract their content. This is done in parallel using a &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; for efficiency.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_download_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filing_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="c1"&gt;# ... (implementation to download the document)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once downloaded, the text and any structured tables are extracted using &lt;code&gt;PyMuPDF&lt;/code&gt; for PDFs and &lt;code&gt;BeautifulSoup&lt;/code&gt; for HTML. This extracted content is then saved back to the corresponding record in the &lt;code&gt;exchange_filing&lt;/code&gt; table.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Graph Model: Connecting the Dots
&lt;/h2&gt;

&lt;p&gt;This is where SurrealDB's multi-model capabilities shine. I wanted to not only store the filings but also understand the relationships between them. I defined two types of graph edges using &lt;code&gt;TYPE RELATION&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;code&gt;(company)-[has_filing]-&amp;gt;(filing)&lt;/code&gt;: This links a company to the filings it has released.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;(filing)-[references_filing]-&amp;gt;(company)&lt;/code&gt;: This links a filing to other companies mentioned in its title.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This simple graph model allows for powerful queries, such as "find all filings from company X that mention company Y," which would be complex and slow to execute in a traditional relational database.&lt;/p&gt;

&lt;p&gt;Here's a snippet of the code that creates the &lt;code&gt;has_filing&lt;/code&gt; edges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;link_filings_to_companies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker_set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;    &lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Linking filings to companies via graph edges...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;    &lt;span class="n"&gt;update_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPDATE &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;COMPANY_TABLE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; SET filings += &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filing_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RELATE &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;company_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-&amp;gt;has_filing-&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filing_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; CONTENT {{ at: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filing_date&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; }};&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  From a Single Script to an Open-Source Project
&lt;/h2&gt;

&lt;p&gt;The initial version of this tool was a single, 1500-line Python script. While functional, it was difficult to maintain and not very user-friendly. I decided to refactor it into a proper, modular open-source project.&lt;/p&gt;

&lt;p&gt;This involved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Decoupling Dependencies&lt;/strong&gt;: Removing hardcoded dependencies on my private company data table, making the graph linking feature optional and configurable.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Modularization&lt;/strong&gt;: Breaking the monolithic script into logical modules (&lt;code&gt;api.py&lt;/code&gt;, &lt;code&gt;db.py&lt;/code&gt;, &lt;code&gt;extractor.py&lt;/code&gt;, etc.).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Packaging&lt;/strong&gt;: Creating a &lt;code&gt;pyproject.toml&lt;/code&gt; file to make the project installable via &lt;code&gt;pip&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;CLI&lt;/strong&gt;: Building a user-friendly command-line interface with &lt;code&gt;argparse&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Documentation&lt;/strong&gt;: Writing a comprehensive &lt;code&gt;README.md&lt;/code&gt; with installation instructions, configuration details, and usage examples.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion and Next Steps
&lt;/h2&gt;

&lt;p&gt;The result is &lt;code&gt;hkex-filing-scraper&lt;/code&gt;, a robust and easy-to-use tool for building a comprehensive database of HKEx filings. It's now available on GitHub and installable via PyPI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/simonplmak-cloud/hkex-filing-scraper" rel="noopener noreferrer"&gt;https://github.com/simonplmak-cloud/hkex-filing-scraper&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This project was a fun journey into reverse engineering, data pipeline design, and the power of multi-model databases. Future plans could include adding support for other exchanges like the SEC EDGAR database or building a web interface to explore the data.&lt;/p&gt;

&lt;p&gt;I encourage you to check out the repository, try it out for your own financial analysis projects, and contribute if you find it useful. Feedback is always welcome!&lt;/p&gt;

</description>
      <category>python</category>
      <category>opensource</category>
      <category>database</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
