In my last post I wrote about using Google Sheets to webscrape dividend data from dividendhistory.org.
Today I'm going to share with you my adventures to improve efficiency and how to store data using sheets
FLATTEN() functions and search using
- The improved scrape
- Packing the data
- Searching the data
The Improved Scrape
Before, to scrape the dividend history table I was using the
IMPORTHTML() function but also noted that some times I may make up to two calls because the web page may only have one table instead of two causing essentially an index out of range error.
After reviewing the page source I found out the table has an
<table id='dividend_table' class='table table-striped table-bordered'> <thead> <tr> <th>Ex-Dividend Date</th> <th>Payout Date</th> <th>Cash Amount</th> <th>% Change</th> </tr> </thead> <tbody> <tr> <td><i>2022-10-29</i></td> <td><i>2022-11-26</i></td> <td><i>$1.33**</i></td> <td><i>unconfirmed/estimated</i></td> </tr> <tr> <td><i>2022-07-30</i></td> <td><i>2022-08-26</i></td> <td><i>$1.33**</i></td> <td><i>unconfirmed/estimated</i></td> </tr> <tr> <td>2022-04-29</td> <td>2022-05-26</td> <td>$1.33</td> <td></td> </tr> ... ... </table>
I can use
IMPORTXML() and just query for this table directly!
And using that query looks like this in Sheets:
You can continue to use the
INDEX(range, row, col) function to grab the exact cells you want but for myself I wanted to do a few fun things.
Packing the data
Now, I only really need some rows of data and I don't really care for the
% Change column, so we use
ARRAY_CONSTRAIN(range, rows, cols)
=ARRAY_CONSTRAIN(IMPORTXML("https://dividendhistory.org/payout/TSX/BMO", "//table[@id='dividend_table']/*/*"), 6, 3)
I want to further pack the data so that it occupies it's own row or column which can be done using
FLATTEN() and you'll see that order will matter:
We want to apply
TRANSPOSE() first then
=FLATTEN(TRANSPOSE(ARRAY_CONSTRAIN(IMPORTXML("https://dividendhistory.org/payout/TSX/BMO", "//table[@id='dividend_table']/*/*"), 6, 3)))
Which makes it look like the columns are combined to a stack (hehe stack):
Searching the Data
If you ever had to take a course on algorithms and data structures you might remember doing a lot of stuff related to pointers and offsets and this is why I stored the data to fit one column!
When am I going to get paid next?
I can make use of the
MATCH(key, range, [search_type]) formula to search for a relative offset.
search_type is useful because I want to find the next payout date. To find the date that is equal to or greater than today I can set the
-1 and luckily for us the dates are already sorted in descending order:
Sample of what we're going to do with the data:
As of writing,
2022-04-08 and we can see that the relative offset of the next payout date (
2022-05-26) when searching from
3 from the top, so located in
=MATCH(TODAY(), B14:B18, -1)
We can also get the exact row where
Payout Date starts at using
ROW() (useful if you drag things here and there instead of hardcoding):
So now we know that the next payout date is located in cell
B(13 + 3 but how can we get the information from that cell address?
We can use the
INDIRECT() function to reference a cell using an address that's in
Basically we want a generic way to write:
The cool thing is we can use
INDIRECT() also accepts address in
RyCx format if we set the optional
is_A1_notation flag to
FALSE. Again I like using
& to build strings in sheets:
=INDIRECT("R" & SUM(B1 + B3) & "C" & COLUMN(), FALSE)
And we can do the same for getting the next Cash Amount by using a different starting offset but same relative offset:
=INDIRECT("R" & SUM(B2 + B3) & "C" & COLUMN(), FALSE)
What about the other data?
In my last post I also scraped the
yield. I also did this and with a bit of data massaging also tacked it onto the column as you can see here in my raw data spreadsheet example:
So ultimately I end up making only two calls to dividend history per ticker symbol which is much better than up-to two calls for the dividend amount, and one call for the yield. We also get all the useful data from the page.
Here is an example of what you can do with the raw data and only taking the relevant data:
There's all sorts of stuff you can do for example, taking the frequency and calculating how much dividend you could expect from a single share over a year. You can use
IF() to decide if the frequency would be 12 (monthly) or 4 (quarterly).
And here is a link to the example if you would like to make a copy and check out the stuff for yourself. It's been pre-populated with Canadian banks and some American companies that pay dividends.
Top comments (0)