From Functional to Class: a look at SOLID coding

#python #beginners #programming #dataengineering

So now we have a set of unit tested functions that make up our extract from the website, transform into usable numbers, and load to a JSON file, but the functions aren't very scalable. What if we established a FundScraper class to use not just at the one website, but other similar fund website?

A class in Python isn't hard to create. We could create one just by calling Class FundScraper with something like the URL when we instantiate it.

class FundScraper:
    def __init__(self, url):
        self.url = url

Notice we don't put the args up in the class like this: class FundScraper(url). That's because this would imply it is inheriting another class called url, which is not our intention.

We could bring in the rest of our functions from the functional scraper article and be pretty much finished. What we would need is to replace the runner that we put after if __name__ == '__main__':.

Again, this is every easy to do. We would just create another method like this:

    def run(self):
        soup = self.web_call()
        data = {}
        data['shareprice'] = self.get_fund_values(
            soup, 4, 'fundHeader_value', ['$US', ','])
        data['u3o8_stock'] = self.get_fund_values(
            soup, 6, 'fundHeader_value', ['$US', ','])
        self.write_json(data)

You can find the full scraper class here. You would run this by importing the FundScraper with the URL and then using the run method like this:

from scrapper_class import FundScraper

scraper = FundScraper(url='https://sprott.com/investment-strategies/physical-commodity-funds/uranium/')
scraper.run()

However, if you wanted to run this on another website, it wouldn't really work unless it were set up exactly the same way and you were extracting the same exact information.

That's because our class isn't aligned with SOLID principles.

S – Single Responsibility Principle (SRP)
This doesn't mean it can only have a single method or function, but rather that the class only has one responsibility. In FundScraper, it's responsible for Extracting, Transforming, and Loading; which is three responsibilities.

O – Open-Close Principle (OCP)
This means that classes should be "open for extension, but closed for modification." Remember how I said that putting an arg in the class like this class FundScraper(url) meant it inherited another class? This would be a way you could extend it. If we had a simple Scraper class that was generic and only scraped/extracted the webpage, we could inherit that. That would be an example of extending a class. Simple modifications of the base class can have blast radius, so we try to extend instead of modify.

L – Liskov Substitution Principle (LSP)
Our FundScraper class relies on bs4, requests, and json, and assumes they will all work properly. It would be better to abstract these out with abstract classes or create new interfaces to make FundScraper more robust.

I – Interface Segregation Principle (ISP)
This is saying that no class should be forced to implement interface methods that it does not use, which is not the case for FundScraper since all methods are used.

D – Dependency Inversion Principle (DIP)
This last principle says that high-level classes should not depend on low-level classes; but rather they should both rely on abstractions. FundScraper has a tight coupling with BeautifulSoup, requests, and json, which makes it difficult to test in abstraction.

Next time, we'll build out the scaper with abstract classes and discuss some principles of Data Engineering before I do a high level intro to Airflow and finally convert the abstract classes into custom operators and a Directed Acyclic Graph (DAG).

DEV Community

From Functional to Class: a look at SOLID coding

Top comments (0)

Read next

LLMs' Overparameterization: Performance-Efficiency Trade-Off Uncovered

Two New LINQ Methods in .NET 9: CountBy and Index

Understanding the Factory and Factory Method Design Patterns

Database Optimization Techniques in Node.js