<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jian</title>
    <description>The latest articles on DEV Community by Jian (@sukoshizutsu).</description>
    <link>https://dev.to/sukoshizutsu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2476108%2F4dce8821-f380-48a7-b3c3-eaba4d77842c.png</url>
      <title>DEV Community: Jian</title>
      <link>https://dev.to/sukoshizutsu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sukoshizutsu"/>
    <language>en</language>
    <item>
      <title>#014 | Standardising a multi-currency portfolio: Discovery</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Fri, 21 Feb 2025 05:21:15 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/014-standardising-a-multi-currency-portfolio-discovery-4j0o</link>
      <guid>https://dev.to/sukoshizutsu/014-standardising-a-multi-currency-portfolio-discovery-4j0o</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;Having built and tested the &lt;strong&gt;Extract, Transforms &amp;amp; Load&lt;/strong&gt; ("ETL") logic to process the &lt;strong&gt;Custodian Statement PDF&lt;/strong&gt; and &lt;strong&gt;yfinance&lt;/strong&gt; datasets, the building blocks to standardise a multi-currency portfolio to a common Base Currency are now in place.&lt;/p&gt;

&lt;p&gt;To recap, the two datasets serve the following purposes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Custodian Statement PDFs&lt;/strong&gt;: A portfolio's month-end &lt;strong&gt;Securities&lt;/strong&gt;, &lt;strong&gt;Funds&lt;/strong&gt;, &lt;strong&gt;Cash&lt;/strong&gt; and &lt;strong&gt;Miscellaneous&lt;/strong&gt; ("Security Types") holdings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;yfinance&lt;/strong&gt;: Month-end &lt;strong&gt;foreign exchange rates&lt;/strong&gt; ("FX rates") for selected Target Currencies relative to one Base Currency&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The following flowchart visualises the end goal of this part of the MVP:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwndn6ty3wrgvk9egavp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwndn6ty3wrgvk9egavp.png" alt="Endgoal" width="800" height="781"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Portfolio Currency Standardisation Logic
&lt;/h3&gt;

&lt;p&gt;The following logic is implemented to each Security Type:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Aggregate and sum month-end values by currency&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apply a mathematical formula to convert the summed data to a Base Currency. For example, US$1,000 / 0.25 (USD/MYR rate) = MYR4,000&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The values from (2) are then summed to get the Portfolio NAV, which is now standardised to a common Base Currency. &lt;/p&gt;

&lt;p&gt;This standardised Portfolio NAV paves the way for other MVP requirements to be fulfilled, such as calculating the Management Fee &amp;amp; Performance Fee.&lt;/p&gt;

&lt;p&gt;The currency standardisation logic can be built via the Xano user interface. It helps that all the data is already sitting in the same platform.&lt;/p&gt;

&lt;p&gt;As there is some repetitive logic, Xano's &lt;a href="https://docs.xano.com/the-function-stack/functions/custom-functions" rel="noopener noreferrer"&gt;&lt;strong&gt;Custom Functions&lt;/strong&gt;&lt;/a&gt; &lt;br&gt;
can be leveraged to make it (the logic) reusable, much like a Python function.&lt;/p&gt;

&lt;p&gt;Building a Custom Function is fairly straightforward. The challenge comes from ensuring the standardisation logic is correctly applied to the right data.&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>mvp</category>
      <category>sukoshizutsu</category>
      <category>xano</category>
    </item>
    <item>
      <title>#013 | Extract, Transform &amp; Load FX rates: User Acceptance Testing</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Wed, 12 Feb 2025 10:21:31 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/013-extract-transform-load-fx-rates-user-acceptance-testing-8fn</link>
      <guid>https://dev.to/sukoshizutsu/013-extract-transform-load-fx-rates-user-acceptance-testing-8fn</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;I built a basic &lt;strong&gt;Extract, Transform &amp;amp; Load&lt;/strong&gt; ("ETL") process to store historical &lt;strong&gt;foreign currency rate&lt;/strong&gt; ("FX rate") data for multiple &lt;strong&gt;Target Currencies&lt;/strong&gt; relative to one &lt;strong&gt;Base Currency&lt;/strong&gt;, the Malaysian Ringgit.&lt;/p&gt;

&lt;p&gt;To recap, an FX rate tells you how much of a Target Currency can be exchanged for one Base Currency.&lt;/p&gt;

&lt;p&gt;To test the ETL process, I did &lt;strong&gt;User Acceptance Testing&lt;/strong&gt; ("UAT").&lt;/p&gt;

&lt;h3&gt;
  
  
  Test Parameters
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Period&lt;/strong&gt;: Nov 1, 2024 to Dec 31, 2024&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interval&lt;/strong&gt;: Daily&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target Currencies&lt;/strong&gt;: Singapore Dollar ("SGD") &amp;amp; US Dollar ("US Dollar")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Base Currency&lt;/strong&gt;: Malaysian Ringgit ("MYR")&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  UAT Results: Pre-Fix
&lt;/h3&gt;

&lt;p&gt;The first round of UAT failed, as I made a critical mistake of inputting the wrong currency order in &lt;strong&gt;yfinance's&lt;/strong&gt; &lt;em&gt;download()&lt;/em&gt; function. This returned FX rates with the Malaysian Ringgit as the Target Currency (instead of the Base Currency).&lt;/p&gt;

&lt;p&gt;For example, instead of "MYRUSD=X" (how many US Dollars one Malaysian Ringgit buys), I input "USDMYR=X" (how many Malaysian Ringgit one US Dollar buys).&lt;/p&gt;

&lt;p&gt;This potentially fatal mistake was overlooked during the Build phase, which I covered in the &lt;a href="https://dev.to/sukoshizutsu/012-extract-transform-load-fx-rates-build-2k95"&gt;the previous post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thankfully doing UAT meant the mistake could be picked up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fixing the Code
&lt;/h3&gt;

&lt;p&gt;I fixed the currency order to retrieve the desired FX rate. &lt;/p&gt;

&lt;p&gt;Unfortunately, yfinance returned an error saying the "MYRSGD" currency pair/ticker was invalid. The "SGDMYR" pair, however, is valid.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww0rcucuxsah8dfczuzh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww0rcucuxsah8dfczuzh.png" alt="error" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It appears yfinance does not support Cross Currency Triangulation on every currency permutation. This means the MVP has to handle this transformation.&lt;/p&gt;

&lt;p&gt;I reverted back to the original currency order input, but added a line to calculate the inverse of the returned FX rate. This transforms the Malaysian Ringgit into a Base Currency (from a Target Currency).&lt;/p&gt;

&lt;h3&gt;
  
  
  UAT Results: Post-Fix
&lt;/h3&gt;

&lt;p&gt;The UAT was successful after applying a fix, and the data is now correctly transformed and loaded into Xano.&lt;/p&gt;

&lt;p&gt;Before fixing the error:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb341dxn2712np0gj6u17.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb341dxn2712np0gj6u17.png" alt="before" width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After fixing the error:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2prdqwtp0yjomk20u2td.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2prdqwtp0yjomk20u2td.png" alt="after" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>mvp</category>
      <category>sukoshizutsu</category>
      <category>python</category>
      <category>xano</category>
    </item>
    <item>
      <title>#012 | Extract, Transform &amp; Load FX rates: Build</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Mon, 03 Feb 2025 07:30:57 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/012-extract-transform-load-fx-rates-build-2k95</link>
      <guid>https://dev.to/sukoshizutsu/012-extract-transform-load-fx-rates-build-2k95</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;I wrote a script in Flask Python that runs a simple &lt;strong&gt;Extract, Transform &amp;amp; Load&lt;/strong&gt; ("ETL") process on foreign exchange rate ("FX rate") data from &lt;strong&gt;yfinance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To recap, yfinance is a Python library that sources FX rates from &lt;strong&gt;Yahoo Finance&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Background to FX rates
&lt;/h3&gt;

&lt;p&gt;We first understand how FX rates are quoted and their relationships to each other. This influences the data transformation and storage logics.&lt;/p&gt;

&lt;p&gt;An FX rate is conventionally quoted in the following format: [Base Currency]/[Target Currency]. This shows how much of the Target Currency &lt;strong&gt;one unit&lt;/strong&gt; of the Base Currency buys.&lt;/p&gt;

&lt;p&gt;Using the theory of &lt;strong&gt;Cross Currency Triangulation&lt;/strong&gt;, we don't need to download and store every currency permutation. All we need are the FX rates for one Base Currency from yfinance, and other currency permutations can be indirectly calculated (if needed).&lt;/p&gt;

&lt;p&gt;For example, assume we are given these quotes where the Malaysian Ringgit is the base currency:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;MYR/USD=0.23 means one Malaysian Ringgit buys 0.23 US Dollars&lt;/li&gt;
&lt;li&gt;MYR/SGD=0.30 means one Malaysian Ringgit buys 0.30 Singapore Dollars&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We can indirectly calculate the USD/SGD rate (or how many Singapore Dollars one US Dollar buys) using the previously mentioned rates. If we are also given the MYR/HKD rate, we can calculate USD/HKD or SGD/HKD rates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a0iurfhu0ddnwbk7saz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a0iurfhu0ddnwbk7saz.png" alt="cross currency rates" width="800" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Database Table Setup
&lt;/h3&gt;

&lt;p&gt;I set up two tables called &lt;code&gt;reference_currencies&lt;/code&gt; and &lt;code&gt;yfinance_fx_rates&lt;/code&gt; in Xano.&lt;/p&gt;

&lt;p&gt;The former stores &lt;a href="https://www.iban.com/currency-codes" rel="noopener noreferrer"&gt;IBAN&lt;/a&gt; currency codes for 179 global currencies. The latter stores yfinance-sourced historical month-end FX rates for a list of selected Target Currencies relative to one Base Currency, the Malaysian Ringgit.&lt;/p&gt;

&lt;h3&gt;
  
  
  ETL Process: Extract
&lt;/h3&gt;

&lt;p&gt;I used yfinance's &lt;code&gt;download()&lt;/code&gt; function to extract the data, specifying the following parameters:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpp4ahzpfqbvpezfinc2k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpp4ahzpfqbvpezfinc2k.png" alt="parameters" width="800" height="114"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7aiws9ln716g7oullc7p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7aiws9ln716g7oullc7p.png" alt="function" width="800" height="56"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The raw, extracted dataframe of the MYR/USD and MYR/SGD quotes from November and December 2024 look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61vaog18rwpxvp3ug7sj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61vaog18rwpxvp3ug7sj.png" alt="raw_data" width="800" height="104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Some characteristics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Date&lt;/strong&gt; is returned as a dataframe Index, not column&lt;/li&gt;
&lt;li&gt;The dataframe's columns are "Adj Close", "Close", "High", "Low", "Open" and "Volume"&lt;/li&gt;
&lt;li&gt;yfinance's FX rate convention is [Target Currency][Base Currency].&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We transform the raw data to get the MVP's desired columns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;date&lt;/strong&gt;: In "yyyymmdd" format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;base_currency&lt;/strong&gt;: 3-digit IBAN currency code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;target_currency&lt;/strong&gt;: 3-digit IBAN currency code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fx_rate&lt;/strong&gt;: Extracted from yfinance&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ETL Process: Transform
&lt;/h3&gt;

&lt;p&gt;I first dropped irrelevant columns and filtered for the last FX rate quote of each month.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3rtfspiuuwrx8cqjmtv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3rtfspiuuwrx8cqjmtv.png" alt="drop" width="800" height="58"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I then used pandas' &lt;code&gt;melt()&lt;/code&gt; function to rename and reshape the dataframe. I also converted the Date index into a column.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopws5mxizsm83vjmm3oj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopws5mxizsm83vjmm3oj.png" alt="reshape" width="800" height="81"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I split each currency pair into two columns: target_currency &amp;amp; base_currency. I anticipate this makes the data more flexible for data manipulation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuci84mx1nfno0q1su3am.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuci84mx1nfno0q1su3am.png" alt="split" width="800" height="81"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the final dataframe, after re-ordering and dropping columns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkomr3vxnxk13l73t77dc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkomr3vxnxk13l73t77dc.png" alt="final" width="800" height="81"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ETL Process: Load
&lt;/h3&gt;

&lt;p&gt;To load the data into Xano, I created an API endpoint that accepts a JSON of the final dataframe. This process is quite straightforward and closely follow the steps in &lt;a href="https://dev.to/sukoshizutsu/008-backend-database-build-part-2-54bc"&gt;my earlier post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The API endpoint has the following business logic: Based on the JSON input, retrieve the Primary Key of the Target Currency &amp;amp; Base Currency (from &lt;code&gt;reference_currencies&lt;/code&gt;). If found, store the results in the &lt;code&gt;yfinance_fx_rates&lt;/code&gt; table. I added a check that skips duplicate records.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffny7cizehmhwqaqkfgk6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffny7cizehmhwqaqkfgk6.png" alt="xano_reference" width="771" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwdt8nvyrpqbrokj0dcp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwdt8nvyrpqbrokj0dcp.png" alt="xano_fx" width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;We do basic &lt;strong&gt;User Acceptance Testing&lt;/strong&gt; ("UAT") to evaluate the success of the ETL process, before proceeding to the next phase of the MVP&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>etl</category>
      <category>sukoshizutsu</category>
      <category>mvp</category>
    </item>
    <item>
      <title>#011 | Extract, Transform &amp; Load FX rates : Discover</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Fri, 24 Jan 2025 08:35:53 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/011-retrieve-store-historical-fx-rates-discover-45c4</link>
      <guid>https://dev.to/sukoshizutsu/011-retrieve-store-historical-fx-rates-discover-45c4</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;The requirements for the MVP's &lt;strong&gt;foreign exchange rate&lt;/strong&gt; ("FX rate") data provider are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Free access&lt;/li&gt;
&lt;li&gt;Monthly historical data from at least 2018 onwards&lt;/li&gt;
&lt;li&gt;FX rates for these currency pairs: USDMYR, SGDMYR, HKDMYR, GBPMYR and JPYMYR&lt;/li&gt;
&lt;li&gt;Data can be easily retrieved&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Given the plethora of financial market data providers, I asked ChatGPT for some options based on the MVP's requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Selecting an FX rate provider
&lt;/h3&gt;

&lt;p&gt;ChatGPT's suggestions were as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmcahdqqw4ahq8ztijmet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmcahdqqw4ahq8ztijmet.png" alt="Option" width="728" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I decided to prioritise trying &lt;strong&gt;&lt;a href="https://www.alphavantage.co/#about" rel="noopener noreferrer"&gt;Alpha Vantage&lt;/a&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;a href="https://ranaroussi.github.io/yfinance/index.html" rel="noopener noreferrer"&gt;Yahoo Finance (via yfinance)&lt;/a&gt;&lt;/strong&gt;, as I have heard of these platforms before.&lt;/p&gt;

&lt;p&gt;Alpha Vantage is backed by &lt;strong&gt;Y Combinator&lt;/strong&gt; and delivers real-time and historical financial market data via &lt;strong&gt;Application Programming Interfaces&lt;/strong&gt; ("API"). &lt;/p&gt;

&lt;p&gt;yfinance is a free Python library that can be installed via &lt;code&gt;pip&lt;/code&gt;. The library scrapes data from Yahoo Finance's extensive, publicly-available financial database and returns it as a dataframe.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: Alpha Vantage
&lt;/h3&gt;

&lt;p&gt;This platform offers a lifetime, &lt;a href="https://www.alphavantage.co/support/#api-key" rel="noopener noreferrer"&gt;free usage tier&lt;/a&gt; where API consumers can make up to 25 API calls per day. This membership tier is good enough for building MVPs.&lt;/p&gt;

&lt;p&gt;The API lets users specify whether to return the results in &lt;strong&gt;Comma Separated Value&lt;/strong&gt; ("CSV") or &lt;strong&gt;JSON&lt;/strong&gt; format. I used Postman to a test API call:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvbz1665eiykietb5rvy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvbz1665eiykietb5rvy.png" alt="alpha vantage api" width="800" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also did further tests that confirmed the currency pairs relevant to the MVP's needs could be met.&lt;/p&gt;

&lt;p&gt;However, one thing I noted is the API does not support customised start and end dates. This potentially makes the API response quite large, and requires additional data transformation handling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: yfinance
&lt;/h3&gt;

&lt;p&gt;I used yfinance’s &lt;code&gt;download()&lt;/code&gt; function to retrieve a sample of historical month-end FX rates for the USDMYR. The data is returned as a dataframe, making it quite easy to apply data transformation logic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fleetmqmz01nw1yc7ga1n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fleetmqmz01nw1yc7ga1n.png" alt="script" width="800" height="239"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;download()&lt;/code&gt; accepts many optional parameters that make it quite easy for users to filter for FX rate data relevant to their needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1o3yp1m2plx7w74f9n1f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1o3yp1m2plx7w74f9n1f.png" alt="yfinance parameters" width="751" height="938"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, the start and end dates can be customised using the &lt;code&gt;start&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt; parameters. This flexibility is not supported by the Alpha Vantage API.&lt;/p&gt;

&lt;p&gt;I did further tests that confirmed the currency pairs relevant to the MVP's needs could be met.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;I decided to get FX rate data from yfinance as it meets the MVP's requirements. Although I am not using Alpha Vantage, it can be a good backup in the event any issues arise with yfinance.&lt;/p&gt;

&lt;p&gt;The next step is to write a script in the MVP's Flask Python backend that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extracts FX rate data from yfinance&lt;/li&gt;
&lt;li&gt;Transforms the data&lt;/li&gt;
&lt;li&gt;Loads the data in a Xano database&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>mvp</category>
      <category>sukoshizutsu</category>
      <category>yfinance</category>
    </item>
    <item>
      <title>#010 | Why FX rate data is important to the MVP</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Thu, 23 Jan 2025 07:43:45 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/010-why-fx-rate-data-is-important-to-the-mvp-11p</link>
      <guid>https://dev.to/sukoshizutsu/010-why-fx-rate-data-is-important-to-the-mvp-11p</guid>
      <description>&lt;p&gt;The next phase of the project is to automate the retrieval &amp;amp; storage of foreign exchange rate ("FX rate") data. This is used to standardise a portfolio's value to a common currency called the &lt;strong&gt;Base Currency&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Before diving head first into how we are going to do this, let's first understand why a Base Currency is needed and how this impact's the MVP's requirements.&lt;/p&gt;

&lt;p&gt;The Base Currency serves as a reference point for measuring a portfolio's value when it comprises assets denominated in many currencies. And a portfolio's value is the building block to calculating its returns.&lt;/p&gt;

&lt;p&gt;Take for example the &lt;strong&gt;&lt;a href="https://secure.fundsupermart.com/fsmone/funds/factsheet/BGF015/Blackrock-Global-Equity-Income-A6-USD" rel="noopener noreferrer"&gt;BlackRock Global Equity Income Fund&lt;/a&gt; ("BlackRock Global")&lt;/strong&gt;. From the snapshot below (of the Fund's portfolio report), we can see the Fund invests in companies from Asia, Canada, Europe, Taiwan and the United States. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhd6vbjhnw1x7w365u24q.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhd6vbjhnw1x7w365u24q.jpg" alt="blackrock" width="800" height="1131"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To calculate the portfolio's total value in US Dollar terms, the Fund's assets are standardised to the US Dollar from their respective currencies. The totals are then summed up. &lt;/p&gt;

&lt;p&gt;To replicate this capability, the MVP needs FX rate data and functions to do these calculations.&lt;/p&gt;

&lt;p&gt;Another point to make is that real-time FX rate data is not needed, as the MVP serves user personas who only need to tabulate month-end valuations. &lt;/p&gt;

&lt;p&gt;This makes a difference to the MVP's requirements as real-time market data is often expensive to procure. And I also don't want to overbuild this MVP.&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>sukoshizutsu</category>
      <category>mvp</category>
    </item>
    <item>
      <title>#009 | Backend Database: User Acceptance Testing</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Tue, 21 Jan 2025 09:31:26 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/009-backend-database-user-acceptance-testing-23mh</link>
      <guid>https://dev.to/sukoshizutsu/009-backend-database-user-acceptance-testing-23mh</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;The backend database &lt;strong&gt;User Acceptance Testing&lt;/strong&gt; ("UAT") tests the four &lt;strong&gt;Application Programming Interface&lt;/strong&gt; ("API") endpoints that add valid records to a Xano database. &lt;/p&gt;

&lt;p&gt;To recap, the APIs have the following functionality:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Authenticate an API client&lt;/li&gt;
&lt;li&gt;Accept a JSON input&lt;/li&gt;
&lt;li&gt;Apply basic data checks to skip invalid records &amp;amp; store valid records&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I used two external clients to do the UAT: the MVP's Flask Python backend and &lt;strong&gt;Postman&lt;/strong&gt;. I wanted to try Postman as it's been a while since I've used it, and there's no harm refreshing my knowledge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test Scenario Coverage
&lt;/h3&gt;

&lt;p&gt;I came up with test scenarios that cover the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Authentication&lt;/strong&gt;: Is each endpoint secure?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data validation&lt;/strong&gt;: Does each endpoint correctly handle duplicate records and invalid entities?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data storage&lt;/strong&gt;: Does each endpoint store valid records in the correct table?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Sample Data &amp;amp; UAT Results
&lt;/h3&gt;

&lt;p&gt;I re-used the same 10 Custodian Statement PDF statements (covering 71 pages) &lt;a href="https://dev.to/sukoshizutsu/005-automate-pdf-data-extraction-user-acceptance-testing-2of7"&gt;from the previous UAT&lt;/a&gt; as the sample dataset.&lt;/p&gt;

&lt;p&gt;This approach allowed me to easily validate my results by comparing the current UAT output to the previous UAT output (in Excel) as Xano data can be easily exported to &lt;strong&gt;Comma Separate Value&lt;/strong&gt; ("CSV") format.&lt;/p&gt;

&lt;p&gt;The alternative - using a fresh set of sample data - would have been to compare the current output to the Custodian Statement PDFs. This is a really tedious task with no obvious benefit.&lt;/p&gt;

&lt;p&gt;I executed &lt;strong&gt;20 UAT test case scenarios&lt;/strong&gt; in total, and there were fortunately no failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  External API Client: Flask Python Backend
&lt;/h3&gt;

&lt;p&gt;To call the Xano API endpoint using the Flask Python backend, I used Python's &lt;code&gt;requests&lt;/code&gt; library. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax8e7fc1rr4qh017tihf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax8e7fc1rr4qh017tihf.png" alt="vs_code" width="609" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;json_data&lt;/code&gt; variable contains the data extracted from the Custodian Statement PDF. The snapshot below is the results summary in my browser after a successful API call is made.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrpblz5w9v7ejdmagpuo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrpblz5w9v7ejdmagpuo.png" alt="flask_python" width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  External API Client: Postman
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Example 1: Invalid Auth Token
&lt;/h4&gt;

&lt;p&gt;Entering an &lt;strong&gt;invalid&lt;/strong&gt; &lt;code&gt;AuthToken&lt;/code&gt; returns a &lt;strong&gt;401 Unauthorized&lt;/strong&gt; response. This is expected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8lyedr6ugk454x5a8pm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8lyedr6ugk454x5a8pm.png" alt="example_1" width="800" height="670"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 2: Skip Duplicate Records
&lt;/h4&gt;

&lt;p&gt;To simulate this test, I first added &lt;strong&gt;234&lt;/strong&gt; valid records to the &lt;code&gt;custodian_securities&lt;/code&gt; table using the API endpoint.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhps0wnvufplfk0gl14mp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhps0wnvufplfk0gl14mp.png" alt="example_1_input" width="800" height="581"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I then called the API endpoint with the same JSON input. As expected, none of these records were stored as they were all duplicates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favwt812hnoo282okmfoq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favwt812hnoo282okmfoq.png" alt="example_1_duplicate" width="800" height="581"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 3: Skip records with invalid entities
&lt;/h4&gt;

&lt;p&gt;I created a JSON input with a fictitious entity called "Mickey Mouse". I then called the &lt;code&gt;add_custodian_securities&lt;/code&gt; API endpoint.&lt;/p&gt;

&lt;p&gt;As expected, a &lt;strong&gt;200 Response Code&lt;/strong&gt; was returned, with the skipped record listed in the response.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hw8ivvdochqsmf13pxq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hw8ivvdochqsmf13pxq.png" alt="example_1_invalid" width="800" height="581"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;I previously did a flowchart to lay out the MVP's envisioned scope, and have updated it to illustrate what has been completed ("Phase 1") and what I plan to do next ("Phase 2").&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhw51ox7wnxy4gq2aoy9i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhw51ox7wnxy4gq2aoy9i.png" alt="updated_flow" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>mvp</category>
      <category>xano</category>
      <category>postman</category>
      <category>sukoshizutsu</category>
    </item>
    <item>
      <title>#008 | Backend Database: Build (Part 2)</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Thu, 16 Jan 2025 01:43:46 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/008-backend-database-build-part-2-54bc</link>
      <guid>https://dev.to/sukoshizutsu/008-backend-database-build-part-2-54bc</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;I previously setup a Xano database - refer to &lt;a href="https://dev.to/sukoshizutsu/007-backend-database-build-part-1-110a"&gt;#007&lt;/a&gt; - to store data extracted from the Custodian Statement PDFs.&lt;/p&gt;

&lt;p&gt;I have now built &lt;strong&gt;Application Programming Interface&lt;/strong&gt; ("API") endpoints in Xano that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Authenticate an API client&lt;/li&gt;
&lt;li&gt;Accept a JSON input&lt;/li&gt;
&lt;li&gt;Apply basic data checks to skip invalid records&lt;/li&gt;
&lt;li&gt;Store valid records in the MVP's Xano database&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqf1aqctx2sm03t196cz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqf1aqctx2sm03t196cz.png" alt="summary endpoints" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The workflow below recaps what I am currently building:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2b61ilba3fp29qruq0vk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2b61ilba3fp29qruq0vk.png" alt="Workflow" width="800" height="606"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These API endpoints can be consumed by any external client, such as my Flask Python backend, so long as they have a valid &lt;strong&gt;Auth Token&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making a Xano API call
&lt;/h3&gt;

&lt;p&gt;I called the &lt;em&gt;/add_custodian_securities&lt;/em&gt; endpoint, which adds data to the &lt;em&gt;custodian_securities&lt;/em&gt; table, with a JSON payload of hypothetical data extracted using my Flask Python backend. This table stores each client's month-end &lt;strong&gt;Securities&lt;/strong&gt; holdings.&lt;/p&gt;

&lt;p&gt;The JSON payload has three records, of which the second record is a duplicate of the first record. A duplicate record exists if there is more than one record with the same entity, account number and date combination.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "new_records": [
    {
      "name": "AJI",
      "exchange": "KLSE",
      "quantity": "12800",
      "last_price": "15.40",
      "current_value": 197120,
      "account_number": "A0000000",
      "date": "20241130"
    },
    {
      "name": "AJI",
      "exchange": "KLSE",
      "quantity": "12800",
      "last_price": "15.40",
      "current_value": 197120,
      "account_number": "A0000000",
      "date": "20241130"
    },
    {
      "name": "HLIND",
      "exchange": "KLSE",
      "quantity": "39000",
      "last_price": "15.22",
      "current_value": 593580,
      "account_number": "B0000000",
      "date": "20241130"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Xano API Builder &amp;amp; &lt;em&gt;custodian_securities&lt;/em&gt; table &lt;strong&gt;before&lt;/strong&gt; the API Call&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5rnm81g6jgq35kh2pjw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5rnm81g6jgq35kh2pjw.png" alt="before" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4r7nzyp12ijf8a6hw7l4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4r7nzyp12ijf8a6hw7l4.png" alt="before" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Xano API Builder &amp;amp; &lt;em&gt;custodian_securities&lt;/em&gt; table &lt;strong&gt;after&lt;/strong&gt; a successful API Call&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbiluaj84pcgsst3zq1t5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbiluaj84pcgsst3zq1t5.png" alt="after" width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1tlbqtl1yvtqmivanxz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1tlbqtl1yvtqmivanxz.png" alt="after" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The endpoint's authentication mechanism also works, as an error was returned when an expired &lt;strong&gt;Auth Token&lt;/strong&gt; was entered. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrko0fuvfp4x9ggs24qm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrko0fuvfp4x9ggs24qm.png" alt="auth token" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why add data checks?
&lt;/h3&gt;

&lt;p&gt;The quality of the MVP's data output is directly impacted by the quality of the data being ingested. &lt;/p&gt;

&lt;p&gt;It is therefore important to have basic data checks to skip over invalid records before data ingestion. The endpoint should also return variables that inform the user of the data processing results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;I now have the building blocks to extract data from a list of PDFs and make an API call to store valid records in their respective Xano tables.&lt;/p&gt;

&lt;p&gt;But before I proceed further, I will run some tests to ensure the endpoints are working as expected.&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>mvp</category>
      <category>xano</category>
      <category>projectsukoshizutsu</category>
    </item>
    <item>
      <title>#007 | Backend Database: Build (Part 1)</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Fri, 10 Jan 2025 06:35:10 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/007-backend-database-build-part-1-110a</link>
      <guid>https://dev.to/sukoshizutsu/007-backend-database-build-part-1-110a</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;I built a database in Xano to store data extracted from the Custodian Statement PDFs. This database replaces the earlier &lt;a href="https://dev.to/sukoshizutsu/004-automate-pdf-data-extraction-build-2p8c"&gt;Comma Separate Values ("CSV") file storage format.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I then connected my Flask Python backend to the Xano database using API endpoints, also built in Xano.&lt;/p&gt;

&lt;p&gt;I found the Xano learning curve quite steep as A) I was new to it and B) I have very little experience with databases. That said, I managed to achieve my objective with a little trial &amp;amp; error, ChatGPT and help from Xano's tutorial videos.&lt;/p&gt;

&lt;p&gt;This article will be split into two parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 1&lt;/strong&gt;: Creating database tables in Xano&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 2&lt;/strong&gt;: Building Xano API endpoints to execute database queries&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2662o082tkysx7u5s0bq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2662o082tkysx7u5s0bq.png" alt="recap" width="800" height="606"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Tables to store entity attributes ("Reference Tables")
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxlzpu7tjduv0i9u2bs0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxlzpu7tjduv0i9u2bs0.png" alt="Summary" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I created &lt;strong&gt;Reference Tables&lt;/strong&gt; to store attributes of clients, securities, funds and miscellaneous financial instruments. This process creates a unique Primary Key for each entity, which can be referenced as Foreign Keys in other tables.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;reference_securities&lt;/strong&gt; table stores a security's &lt;strong&gt;full name&lt;/strong&gt;, &lt;strong&gt;short name&lt;/strong&gt;, &lt;strong&gt;stock code&lt;/strong&gt;, &lt;strong&gt;trading currency&lt;/strong&gt; and &lt;strong&gt;stock exchange&lt;/strong&gt;. I populated &lt;strong&gt;reference_securities&lt;/strong&gt; with data downloaded (legally) from the &lt;strong&gt;Singapore Exchange&lt;/strong&gt; and &lt;strong&gt;Bursa Malaysia&lt;/strong&gt; websites.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hgjr4dewe9uqoczfwxg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hgjr4dewe9uqoczfwxg.png" alt="securities" width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;reference_funds&lt;/strong&gt; stores each fund's &lt;strong&gt;full name&lt;/strong&gt;, &lt;strong&gt;country of origin&lt;/strong&gt;, &lt;strong&gt;short name&lt;/strong&gt; and &lt;strong&gt;trading currency&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhrqkqjiwczp2qfx1emq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhrqkqjiwczp2qfx1emq.png" alt="reference funds" width="800" height="182"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;reference_clients&lt;/strong&gt; stores each client's &lt;strong&gt;account number&lt;/strong&gt; and &lt;strong&gt;name&lt;/strong&gt;. Additional attributes can be added at a later stage, if there is a need.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjk36jbotybhmgb0v67tz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjk36jbotybhmgb0v67tz.png" alt="clients" width="800" height="215"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Tables to store Custodian Statements data
&lt;/h3&gt;

&lt;p&gt;I created four tables to store data extracted from the Custodian Statement PDFs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjhv7b7g07vdlcrcvv02.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjhv7b7g07vdlcrcvv02.png" alt="custodian data" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These tables are linked to one another using Foreign Keys, which are the Primary Keys of fields like a client's account number, security or fund. &lt;/p&gt;

&lt;p&gt;The Xano snapshot below summarises the inter-table relationships, along with each table's fields.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F811e27aag8j01w67i5ia.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F811e27aag8j01w67i5ia.png" alt="Relationships" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;With the database tables setup, I now need API endpoints to populate these tables with extracted Custodian Statement data.&lt;/p&gt;

&lt;p&gt;The steps I took to build the API endpoints will be outlined in Part 2.&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
    </item>
    <item>
      <title>#006 | Backend Database : Discovery</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Tue, 17 Dec 2024 09:10:09 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/006-data-storage-discovery-491m</link>
      <guid>https://dev.to/sukoshizutsu/006-data-storage-discovery-491m</guid>
      <description>&lt;h3&gt;
  
  
  Table Of Contents
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    * Overview&lt;br&gt;
    * Xano: An Easy-to-use Database System&lt;br&gt;
    * Building a Table&lt;br&gt;
    * Building an API&lt;br&gt;
    * Next Steps&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Overview &lt;a&gt;&lt;/a&gt;&lt;br&gt;
&lt;/h3&gt;

&lt;p&gt;I built a Python code to &lt;a href="https://dev.to/sukoshizutsu/004-automate-pdf-data-extraction-build-2p8c"&gt;extract data from Custodian Statement PDFs and output the data into three CSV files&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Comma Separated Value ("CSV") files are a great tool for non-technical users who need to store and perform basic data analysis.&lt;/p&gt;

&lt;p&gt;However, a more scalable storage solution is needed to handle larger datasets and complex data queries. Another drawback of a CSV file is that it is a single-user storage format.&lt;/p&gt;

&lt;h3&gt;
  
  
  Xano: An Easy-To-Use Backend System  &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;I have been experimenting with &lt;strong&gt;&lt;a href="https://www.xano.com/" rel="noopener noreferrer"&gt;Xano&lt;/a&gt;&lt;/strong&gt;, a cloud-based low-code backend platform, since coming to know about it from this company's &lt;a href="https://rarelydecaf.com/blog/learn-weweb-and-xano" rel="noopener noreferrer"&gt;blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Xano is great for non-technical users because it offers a user-friendly, low-code user interface ("UI") on top of PostgreSQL. The latter is a popular open-source object-relational database system. &lt;/p&gt;

&lt;p&gt;Xano also has a low-code API builder UI that helps you design and build APIs to connect your database to other systems.&lt;/p&gt;

&lt;p&gt;After replacing the CSVs as a storage medium with Xano, the MVP's workflow looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6c4sbs06vddly70iv428.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6c4sbs06vddly70iv428.png" alt="Workflow" width="800" height="580"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To familiarise myself with Xano, I did the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Built a database table&lt;/li&gt;
&lt;li&gt;Built a user signup API&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Building A Table &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Creating a table &amp;amp; adding fields
&lt;/h4&gt;

&lt;p&gt;Xano's UI  makes table creation a simple process. Users have the option to either import their own data (via CSV or &lt;strong&gt;Airtable&lt;/strong&gt;) or enter data manually.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmx4jtnazi4o6q91ho90.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmx4jtnazi4o6q91ho90.png" alt="Create Table" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjeouj8v7wn6u6tah8pr5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjeouj8v7wn6u6tah8pr5.png" alt="Create Table" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Adding a table field and specifying its type is also straightforward. It is also easy to make changes later on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitcuz5zirbvx9pzed5zs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitcuz5zirbvx9pzed5zs.png" alt="Add fields" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Generating random data for testing
&lt;/h4&gt;

&lt;p&gt;This is very helpful when you want dummy data to test your application.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiavju3l1ok888mjduens.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiavju3l1ok888mjduens.png" alt="Generate Data" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz420lrr3dorezsdyytvb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz420lrr3dorezsdyytvb.png" alt="Generate Data" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Exporting data
&lt;/h4&gt;

&lt;p&gt;Should you wish to migrate your data, Xano lets you export your table data to CSV.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7y6njihi1sef4keoxkby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7y6njihi1sef4keoxkby.png" alt="Export Data" width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Building an API &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;An Application Programming Interface ("API") is needed to send the data extracted from Custodian Statement PDFs to Xano for storage purposes. Vice-versa, the data stored in Xano can also be retrieved later on via an API.&lt;/p&gt;

&lt;p&gt;Xano has a pretty nifty low-code API builder that makes it easy to setup an API and add functions to determine the endpoints functionality.&lt;/p&gt;

&lt;p&gt;For example, I created a user signup API that lets the Back Office Team create accounts. I added functions to limit the number of created accounts to 3 (to block unauthorised spam accounts from being created) and prevent a signup if an existing username exists.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building the API's functionality&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8iyac0dy4lsklg941xw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8iyac0dy4lsklg941xw.jpg" alt="Functionality" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Successful user account creation&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8gfwcq10emefesdcge5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8gfwcq10emefesdcge5.png" alt="API creation" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Precondition 1: Block creation of more than three user accounts&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg5a2ms2p055us2moh1pb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg5a2ms2p055us2moh1pb.png" alt="API precondition" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Precondition 2: Block creation of duplicate user accounts&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpb9ggbqp5gxla139vw2d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpb9ggbqp5gxla139vw2d.png" alt="API precondition" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;I think I now have the basic knowledge to create APIs in Xano to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create user accounts&lt;/li&gt;
&lt;li&gt;Login to a user account and return an Authentication Token&lt;/li&gt;
&lt;li&gt;Store Securities Holdings data in &lt;em&gt;securities_pdf&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Store Fund Holdings data in &lt;em&gt;funds_pdf&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Store Cash Holdings data in &lt;em&gt;cash_pdf&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I also need to write code in my Python backend to send corresponding requests to these Xano APIs.&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>projectsukoshizutsu</category>
      <category>mvp</category>
      <category>xano</category>
    </item>
    <item>
      <title>#005 | Automate PDF data extraction: User Acceptance Testing</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Thu, 12 Dec 2024 08:20:28 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/005-automate-pdf-data-extraction-user-acceptance-testing-2of7</link>
      <guid>https://dev.to/sukoshizutsu/005-automate-pdf-data-extraction-user-acceptance-testing-2of7</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;Prior to each feature release, I do &lt;strong&gt;User Acceptance Testing&lt;/strong&gt; ("UAT") to surface bugs and ensure the business logic is correctly translated to code. &lt;/p&gt;

&lt;p&gt;I only clear a feature for release after UAT is 100% successful.&lt;/p&gt;

&lt;p&gt;My reasoning is simple: you only get one chance to make a good first impression to your end user, and a poor release makes it doubly hard to do so.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7v07pqvdavp6z2pkxl2x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7v07pqvdavp6z2pkxl2x.png" alt="Meme" width="500" height="626"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although this is an MVP feature that isn't meant for production release, I thought it'd be good to do some UAT to keep my skills fresh.&lt;/p&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;Of the &lt;strong&gt;19 UAT scenarios&lt;/strong&gt; I came up with, one failed because of a change in the &lt;strong&gt;Custodian Statement&lt;/strong&gt; PDF template. &lt;/p&gt;

&lt;p&gt;I anticipated this risk during &lt;a href="https://dev.to/sukoshizutsu/003-automate-pdf-data-extraction-discovery-59d7"&gt;Discovery&lt;/a&gt;, but truth be told, I did not expect the issue to crop up so soon. &lt;/p&gt;

&lt;p&gt;I will go into the bug fix details later in the article.&lt;/p&gt;

&lt;h3&gt;
  
  
  Methodology
&lt;/h3&gt;

&lt;p&gt;My UAT process involves using the business logic or feature requirements as a reference to create test scenarios and expected outcomes.&lt;/p&gt;

&lt;p&gt;Test scenarios don't need to be complicated. They can be as simple as : "The feature generates a CSV file within 30 seconds".&lt;/p&gt;

&lt;p&gt;For the UAT, I processed &lt;strong&gt;71 pages&lt;/strong&gt; of documents from &lt;strong&gt;10 Custodian Statement PDFs&lt;/strong&gt;. This should be a sufficiently large enough sample set.&lt;/p&gt;

&lt;p&gt;The expected output is three CSV files containing specific datapoints from the &lt;strong&gt;Fund Holdings&lt;/strong&gt;, &lt;strong&gt;Securities Holdings&lt;/strong&gt; and &lt;strong&gt;Cash Holdings&lt;/strong&gt; sections of the Custodian Statement PDF.&lt;/p&gt;

&lt;p&gt;I came up with the following test cases:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;CSV 1: Fund Holdings&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgzwp7n8nwm6oripis5fk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgzwp7n8nwm6oripis5fk.png" alt="Fund Holdings CSV" width="791" height="636"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;CSV 2: Securities Holdings&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdc3nuc9188wrv5c0o787.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdc3nuc9188wrv5c0o787.png" alt="Securities Holdings CSV" width="791" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;CSV 3: Cash Holdings&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2af7l8xqp1lhihsxg00m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2af7l8xqp1lhihsxg00m.png" alt="Cash Holdings" width="790" height="677"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug Fixing
&lt;/h3&gt;

&lt;p&gt;The one failed test was because the Custodian Statement PDF's template changed slightly in November. More specifically, the values in the "Current Value# 1. Foreign Currency 2. RM Equivalent" column of a Fund Holdings table now has an extra "-\n" prefix. &lt;/p&gt;

&lt;p&gt;For example, instead of reading "USD 10,000" in previous PDFs, the value now reads "- USD10,000".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuu5qy781oz7s5ja84pj9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuu5qy781oz7s5ja84pj9.jpg" alt="Change in PDF template" width="800" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This small change resulted in the following issue:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg7cyd3gzl3t5b0cvpsj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg7cyd3gzl3t5b0cvpsj.png" alt="CSV output error" width="569" height="306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I consulted ChatGPT on a fix, and it recommended the following scrubbing logic be added to remove the incorrect "-/n" prefix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Scrub error prefix
df['Currency'] = df['Currency'].str.replace('[-\n]', '', regex=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scrubbing did the trick and the Fund Holdings CSV output now comes out as expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Next?
&lt;/h3&gt;

&lt;p&gt;I'm now comfortable that the code to extract PDF data is functional. That said, I don't think a CSV file is the best place to store all this data.&lt;/p&gt;

&lt;p&gt;While CSV is user friendly (to me), storing data in a database makes it much easier to retrieve and manipulate data as per the end user's requirements.&lt;/p&gt;

&lt;p&gt;I have very limited experience in databases. So what I'll do next is Discovery on a database application that I can onboard quickly.&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>projectsukoshizutsu</category>
      <category>mvp</category>
      <category>python</category>
    </item>
    <item>
      <title>#004 | Automate PDF data extraction: Build</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Thu, 05 Dec 2024 07:45:00 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/004-automate-pdf-data-extraction-build-2p8c</link>
      <guid>https://dev.to/sukoshizutsu/004-automate-pdf-data-extraction-build-2p8c</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;I wrote a Python script that translates the &lt;a href="https://dev.to/sukoshizutsu/003-automate-pdf-data-extraction-discovery-59d7"&gt;PDF data extraction business logic&lt;/a&gt; into working code.&lt;/p&gt;

&lt;p&gt;The script was tested on &lt;strong&gt;71 pages&lt;/strong&gt; of Custodian Statement PDFs covering a 10 month period (Jan to Oct 2024). Processing the PDFs took about &lt;strong&gt;4 seconds&lt;/strong&gt; to complete - significantly quicker than doing it manually.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9jcw238l9v4o8ck83tr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9jcw238l9v4o8ck83tr.png" alt="Image description" width="800" height="226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From what I see, the output looks correct and the code did not run into any errors.&lt;/p&gt;

&lt;p&gt;Snapshots of the three CSV outputs are shown below. Note that sensitive data has been greyed out.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Snapshot 1: Stock Holdings&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjdearctxltpxbdphasc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjdearctxltpxbdphasc.png" alt="Image description" width="800" height="688"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Snapshot 2: Fund Holdings&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzh5e732lg1skvtr52pv2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzh5e732lg1skvtr52pv2.png" alt="Image description" width="800" height="840"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Snapshot 3: Cash Holdings&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2wcxdtjvxi5sezi25l3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2wcxdtjvxi5sezi25l3.png" alt="Image description" width="800" height="847"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This workflow shows the broad steps I took to generate the CSV files.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqc8sm60ww133dc3v5rcq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqc8sm60ww133dc3v5rcq.jpg" alt="Image description" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, I will elaborate in more detail how I translated the business logic to code in Python.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Read PDF documents
&lt;/h3&gt;

&lt;p&gt;I used &lt;code&gt;pdfplumber&lt;/code&gt;'s &lt;code&gt;open()&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Open the PDF file
with pdfplumber.open(file_path) as pdf:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;file_path&lt;/code&gt; is a declared variable that tells &lt;code&gt;pdfplumber&lt;/code&gt; which file to open.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2.0: Extract &amp;amp; filter tables from each page
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;extract_tables()&lt;/code&gt; function does the hard work of extracting all tables from each page. &lt;/p&gt;

&lt;p&gt;Though I am not really familiar with the underlying logic, I think the function did a pretty good job. For example, the two snapshots below show the extracted table vs. the original (from the PDF)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Snapshot A: Output from VS Code Terminal&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lx2bz8bdt32uw1tirk8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lx2bz8bdt32uw1tirk8.png" alt="Image description" width="800" height="58"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Snapshot B: Table in PDF&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47wqzaymffrgq974cvv7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47wqzaymffrgq974cvv7.png" alt="Image description" width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I then needed to uniquely label each table, so that I could "pick and choose" data from specific tables later on.&lt;/p&gt;

&lt;p&gt;The ideal option was to use each table's title. However, determining the title coordinates were beyond my capabilities.&lt;/p&gt;

&lt;p&gt;As a workaround, I identified each table by concatenating the headers of the first three columns. For example, the &lt;strong&gt;Stock Holdings&lt;/strong&gt; table in &lt;strong&gt;Snapshot B&lt;/strong&gt; is labeled &lt;code&gt;Stocks/ETFs\nNameExchangeQuantity&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;⚠️This approach has a serious drawback - the first three header names do not make all tables sufficiently unique. Fortunately, this only impacts irrelevant tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2.1: Extract, filter &amp;amp; transform non-table text
&lt;/h3&gt;

&lt;p&gt;The specific values I needed - Account Number and Statement Date - were sub-strings in Page 1 of each PDF.&lt;/p&gt;

&lt;p&gt;For example, "Account Number M1234567" contains account number "M1234567".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7vmahwnxgj0qg3d53s3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7vmahwnxgj0qg3d53s3.png" alt="Image description" width="800" height="161"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I used Python's &lt;code&gt;re&lt;/code&gt; library and got ChatGPT to suggest suitable regular expressions ("regex"). The regex breaks up each string into two groups, with the desired data in the second group.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Regex for Statement Date and Account Number strings&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;regex_date=r'Statement for \b([A-Za-z]{3}-\d{4})\b'
regex_acc_no=r'Account Number ([A-Za-z]\d{7})'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I next transformed the Statement Date into "yyyymmdd" format. This makes it easier to query and sort data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; if match_date:
    # Convert string to a mmm-yyyy date
    date_obj=datetime.strptime(match_date.group(1),"%b-%Y")
    # Get last day of the month
    last_day=calendar.monthrange(date_obj.year,date_obj.month[1]
    # Replace day with last day of month
    last_day_of_month=date_obj.replace(day=last_day)
    statement_date=last_day_of_month.strftime("%Y%m%d")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;match_date&lt;/code&gt; is a variable declared when a string matching the regex is found.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Create tabular data
&lt;/h3&gt;

&lt;p&gt;The hard yards - extracting the relevant datapoints - were pretty much done at this point.&lt;/p&gt;

&lt;p&gt;Next, I used pandas' &lt;code&gt;DataFrame()&lt;/code&gt; function to create tabular data based on the output in &lt;strong&gt;Step 2&lt;/strong&gt; and &lt;strong&gt;Step 3&lt;/strong&gt;. I also used this function to drop unnecessary columns and rows.&lt;/p&gt;

&lt;p&gt;The end result can then be easily written to a CSV or stored in a database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Write data to CSV file
&lt;/h3&gt;

&lt;p&gt;I used Python's &lt;code&gt;write_to_csv()&lt;/code&gt; function to write each dataframe to a CSV file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;write_to_csv(df_cash_selected,file_cash_holdings)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;df_cash_selected&lt;/code&gt; is the Cash Holdings dataframe while &lt;code&gt;file_cash_holdings&lt;/code&gt; is the file name of the Cash Holdings CSV.&lt;/p&gt;

&lt;p&gt;➡️ I will write the data to a proper database once I have acquired some database know-how.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;A working script is now in place to extract table and text data from the Custodian Statement PDF.&lt;/p&gt;

&lt;p&gt;Before I proceed further, I will run some tests to see if the script is working as expected.&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>mvp</category>
      <category>projectsukoshizutsu</category>
      <category>python</category>
    </item>
    <item>
      <title>#003 | Automate PDF data extraction: Discovery</title>
      <dc:creator>Jian</dc:creator>
      <pubDate>Mon, 02 Dec 2024 06:16:53 +0000</pubDate>
      <link>https://dev.to/sukoshizutsu/003-automate-pdf-data-extraction-discovery-59d7</link>
      <guid>https://dev.to/sukoshizutsu/003-automate-pdf-data-extraction-discovery-59d7</guid>
      <description>&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;I first analysed the Custodian Statement PDF to understand its structure and identify datapoints relevant to the Project.&lt;/p&gt;

&lt;p&gt;This helps me think of the business logic that needs to be translated to code. &lt;/p&gt;

&lt;p&gt;On the coding aspect, I will use ChatGPT to guide me as I have no prior experience writing code to extract PDF data. Taking time to understand ChatGPT's suggestions is as important as getting the right answer, as I also want to improve my coding skills. &lt;/p&gt;

&lt;p&gt;I'd also like to see how different the experience of sourcing solutions from ChatGPT is to platforms like Stackoverflow.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 1: Analyse PDF structure
&lt;/h3&gt;

&lt;p&gt;The Custodian Statements have three major sections, each containing one or more tables.&lt;/p&gt;

&lt;p&gt;In the snapshots below, I annotated the tables (blue), table headers (green) and non-table text (green) relevant to the Project requirements. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Section 1: Unit Trust&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd7smfs9cpk01dvx5gn2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd7smfs9cpk01dvx5gn2.jpg" alt="Image description" width="800" height="1132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Section 2: Stock Holdings&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkk138mxt3d8p25sm7hst.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkk138mxt3d8p25sm7hst.jpg" alt="Image description" width="800" height="1132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Section 3: Cash Holdings&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpjzurt4rb92hq7mjpdr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpjzurt4rb92hq7mjpdr.jpg" alt="Image description" width="800" height="1132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;➡️ The business logic below needs to be translated to code&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ID&lt;/th&gt;
&lt;th&gt;Business Logic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BL-1.0&lt;/td&gt;
&lt;td&gt;Extract &lt;strong&gt;Fund Name&lt;/strong&gt;, &lt;strong&gt;Units&lt;/strong&gt; and &lt;strong&gt;Current Value&lt;/strong&gt; values from "Unit trust holdings as at &lt;code&gt;Statement Date&lt;/code&gt;" table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BL-2.0&lt;/td&gt;
&lt;td&gt;Extract &lt;strong&gt;Stocks/ETFs Name&lt;/strong&gt;, &lt;strong&gt;Exchange&lt;/strong&gt;, &lt;strong&gt;Quantity&lt;/strong&gt;, &lt;strong&gt;Price&lt;/strong&gt; and &lt;strong&gt;Current Value&lt;/strong&gt; from "Stock Holdings as at &lt;code&gt;Statement Date&lt;/code&gt;" table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BL-3.0&lt;/td&gt;
&lt;td&gt;Extract &lt;strong&gt;Opening Balance&lt;/strong&gt; &amp;amp; &lt;strong&gt;Closing Balance&lt;/strong&gt; values from "Cash Account Transactions in the month" table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BL-4.0&lt;/td&gt;
&lt;td&gt;Extract the account number "M1234567" from "Account Number M1234567" string.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BL-5.0&lt;/td&gt;
&lt;td&gt;Extract the date "Jan-2024" from "Statement for Jan-2024".&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Step 2: Identify a suitable python library
&lt;/h3&gt;

&lt;p&gt;As you can see from the snapshots, the Custodian Statement contains a mixture of tables and text. &lt;/p&gt;

&lt;p&gt;I asked ChatGPT a general question on python libraries suited to PDF data extraction.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;What python library can I use to extract PDF data?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PyPDF2&lt;/td&gt;
&lt;td&gt;PyPDF2 is a pure-Python library that can extract text from a PDF, but it is relatively simple and doesn't handle more complex PDFs (like those with embedded images or non-standard fonts) as well as other libraries.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pdfplumber&lt;/td&gt;
&lt;td&gt;pdfplumber is an excellent library for extracting both text and tables from PDFs. It works well for PDFs with complex layouts and tables.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyMuPDF&lt;/td&gt;
&lt;td&gt;PyMuPDF (also known as fitz) is another robust library that can extract text and images from PDFs. It's often more efficient and accurate than PyPDF2 when handling complex PDFs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pdfminer.six&lt;/td&gt;
&lt;td&gt;pdfminer.six is a Python 3-compatible version of pdfminer, a library for extracting information from PDFs. It is particularly useful if you need to extract detailed information from complex PDFs or perform fine-grained analysis.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;➡️ Build the first iteration with &lt;code&gt;pdfplumber&lt;/code&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 3: Decide where extracted PDF data is written to
&lt;/h3&gt;

&lt;p&gt;I am more familiar with using Excel and CSV files to analyse data. So while researching and figuring out how to set up a database, I will write the PDF data to a CSV file.&lt;/p&gt;

&lt;p&gt;There should be three CSVs:&lt;/p&gt;

&lt;p&gt;CSV 1: Fund Holdings&lt;br&gt;
CSV 2: Securities Holdings&lt;br&gt;
CSV 3: Cash Holdings&lt;/p&gt;

&lt;p&gt;Writing to a CSV file means I can easily view and analyse the PDF-extracted data as I build the script. &lt;/p&gt;

&lt;p&gt;Once I am confident the script does what it does, I can customise it to write the data to either a local database or a cloud-based database.&lt;/p&gt;

&lt;p&gt;A CSV could be also be a storage solution. However, I think CSVs are less efficient at handling larger datasets and doing complex computations. Security could also be an issue, as the files can be easily stolen.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 4: Anticipate future challenges
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;How "change-proof" is the PDF data extraction solution?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's prudent to assume the Custodian Statement structure will change down the line. The PDF data extraction logic will need to be updated when this happens&lt;/p&gt;

&lt;p&gt;This is not an immediate concern to be bogged down with, as I notice at least a years' worth of PDFs have the same format&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Does the date in &lt;code&gt;BL-5.0&lt;/code&gt; need to be transformed?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The current date format in &lt;code&gt;BL-5.0&lt;/code&gt; is in "mmm-yyyy" format.&lt;/p&gt;

&lt;p&gt;Based on my experience using dates in Excel, it's always better if dates are a numerical string. This makes the data easier to sort and query. &lt;/p&gt;

&lt;p&gt;I think this logic also applies when the data is stored in a database. Something to consider during the build phase.&lt;/p&gt;

&lt;p&gt;--Ends&lt;/p&gt;

</description>
      <category>projectsukoshizutsu</category>
      <category>mvp</category>
    </item>
  </channel>
</rss>
