Yuan Gao

Posted on Jan 20, 2021

LGTM Devlog 22: Modularization

#devjournal #python

So, the codebase got to a point where all the functionality was coming together, but in a way that was a mess. There was duplicate code strewn about, functions that triggered other cloud functions unnecessarily when it could have just called the code directly rather than be de-coupled via duplicate code. So I took the time to refactor everything. Something that takes a lot of time, but I feel was necessary.

The end result is significantly simpler and easier to read, even if it has more complete functionality. The test-suite makes more sense, and so the end result was good. I don't think I could have arrived at this without having gone through building it up in a messy way first, so I don't regret havign to do major refactoring now.

Let me share what I've ended up with. The code can be found at commit 415ce36.

Structure

Instead of having effectively three different functions projects (github_webhook_lister, github_auth_flow, and create_new_game), I have combined them into one project: game_core. Multiple cloud functions deploy from it, but it's the same codebase and same set of shared python modules, reducing duplicate code and the need to manually ensure things work together correctly.

Instead, everything lives under the game_core/app directory as a set of python modules that each handle their own concerns, and a single main.py where things are tied together under functionality relating to the endpoints.

This separation of concerns is a lot more straightforward to understand, I'll explain them each in detail:

Firebase Utils

This is the smallest package, it consists of just initialization for firebase_admin package. The reason for this is because the Firebase app is stateful, and needs to be initialized, but will complain if initialized twice, unless giving them different names. While we could initialize a new instance of firebase everywhere we need it, this is not a good use of resources, so we need a singleton instead. While singletons are usually considered a bad idea, but the exception here is that we do in fact want to have only one instance of firebase because under the hood it is could be maintaining a single set of connections to the database (though actually we don't know what it's doing under there, but it stands to reason that this instance of firebase represents a database resource).

There's two ways we can do this, and I've accidentally used both without thinking. The first way is with a module (which is what I've done here). Python modules are singletons by nature - they will only be imported once, and therefore initializing firebase in here means it will be the same object that is being imported elsewhere in the app. I have mixed feelings about this approach. It's certainly neat, and has become a de-facto method in Python that many devs are familiar with, but it does somewhat rely on implicit knowledge of how Python imports work. Ironically, firebase's own library relies on module singletons to work, so it's not really unprecedented.

The second way, which I've also implemented, is an undocumented and unofficial method of testing to see if firebase has already been initialized (since it knows, because firebase library under the hood is also a module singleton). So in the end the module contains this:

import firebase_admin  # type:  ignore
from firebase_admin import firestore  # type:  ignore

if firebase_admin._apps:
    app = firebase_admin.get_app()
else:
    app = firebase_admin.initialize_app()
db = firestore.client()

In reality that first if statement will never evaluate true because this module will never be executed twice even though it's imported by different other modules.

Weird huh. But I'm leaving it.

Game

The game package contains a single class Game which represents our game entity. It's got a few class methods to it: Game.new(user, fork_url) which will create a new game in firestore, and assign it to the user identified by the user argument, and Game.find_by_user(user) which performs a search in the database to find an existing game for a user. This search is necessary to link up existing games that have already been started but a user has just signed up on the website and their account isn't yet linked to the game. This is made convenient because in theory there should only be a 1:1 relationship between user accounts and games. Players shouldn't be able to start more than one game (at least not for now).

An instance of Game() itself actually only contains a couple of keys that reference an entity in the database. Any methods then act on the database using these keys. So the expectation is that instances of Game() are returned by the class functions Game.new() or Game.find_by_user() rather than be instantiated directly. Any calls to instance methods will act upon the database.

The only instance method right now is assign_to_uid() which allows us to assign games to a specific user's UID, to "link" it with a user account.

The structure looks like this:

class Game:
    @classmethod
    def new(cls, user: User, fork_url: str) -> Game:
        """ Create a new game if doesn't exist, return reference to it """
        ...

    @classmethod
    def find_by_user(cls, user: User) -> Optional[Game]:
        """ Find a game by user_key and return ref object for it, or None """
        ...

    user: Optional[User] = None

    @property
    def key(self) -> str:
        ...

    def assign_to_uid(self, uid: str) -> None:
        ...

    def __repr__(self):
        ...

A test suite exists to exercise this Game class, with functions that run each of the creation. The tests also have fixtures to create User entities in order to correctly test the game, as well as code to clean up the tests created, for example, the testing_game() fixture is below:

@pytest.fixture(scope="package")
def testing_game(testing_user):
    """ A random game for testing, cleans up afterwards """
    fork_url = "url_" + "".join(
        [random.choice(string.ascii_letters) for _ in range(10)]
    )

    game = Game.new(testing_user, fork_url)
    yield game

    # cleanup
    db.collection("game").document(game.key).delete()
    db.collection("system").document("stats").update({"games": firestore.Increment(-1)})

    # cleanup auto-created quest too
    QuestClass = Quest.get_first()
    quest = QuestClass()
    quest.game = game
    db.collection("quest").document(quest.key).delete()

In the future, I will use Firebase's emulation suite, which should remove the need to clean up tests data afterwards. But for now, these tests happen on the production database (which is dangerous, but...we'll survive).

Github Utils

I've moved github-related functions to it's own package, this just contains the pydantic models and webhook secrets validation code that I put together previously

Quest

The Quest system is very similar to what I described previously, with it's quest importer. The main difference is I've now provided a Quest class that works similarly to the Game object above. It has the following structure:


class Quest(ABC):
    @classmethod
    def get_by_name(cls, name: str) -> Type[Quest]:
        ...

    @classmethod
    def get_first(cls) -> Type[Quest]:
        ...

    @classmethod
    def new(cls, game: Game) -> Quest:
        ...

    @property
    @abstractmethod
    def version(cls) -> VersionInfo:
        return NotImplemented

    @property
    @abstractmethod
    def difficulty(cls) -> Difficulty:
        return NotImplemented

    @property
    @abstractmethod
    def description(cls) -> str:
        return NotImplemented

    default_data: ClassVar[Dict[str, Any]] = {}
    quest_data: Dict[str, Any] = {}
    VERSION_KEY = "_version"
    NAME_KEY = "_name"
    game: Optional[Game] = None

    def __init_subclass__(self):
        ...

    @property
    def key(self) -> str:
        ...

    def load(self, save_data: Dict[str, Any]) -> None:
        ...

    def get_save_data(self) -> Dict[str, Any]:
        ...

    def __repr__(self):
        ...

The get_by_name() class method is the same get_quest_by_name() function defined previously, while get_first() now takes on the duties of returning the pre-defined "first quest".

The new() class method works similarly to the Game.new() function, in that it creates entities in the database and returns to us a quest object that represents that entity in thedatabase should we need to do further things to it. In fact, this class method is called during Game.new() since a new first quest should be started at the start of the game.

The rest of the properties remain the same as last time as the Quest object is also the abstract base class for the other quests.

As with Game, there's a test-suite for Quest that exercises it, including all the loading we had before, but this time also tests for quest creation in the database.

User

This module contains the User class, again, structured very similarly to Game, with a new class method that creates user entities in the database. It also has a reference method that returns a User instance without doing the creation, which is necessary in some cases because games can be started without users having done a signup, so we still need a "dummy" User object that contains the right references to a user, without creating the user in the database. I'm still considering whether I should allow this way of using the User class or not. things will get clearer when I begin implementing features, I may change my mind.

The structure is:

class User:
    @classmethod
    def reference(cls, source: Source, user_id: str) -> User:
        ...

    @classmethod
    def new(cls, uid: str, source: Source, user_data: UserData) -> User:
        ...

    @classmethod
    def find_by_source_id(cls, source: Source, user_id: str) -> Optional[User]:
        ...

    def __init__(self, source: Source, user_id: str):
        ...

    @property
    def key(self) -> str:
        ...

    def __repr__(self) -> str:
        ...

Again, tests exists to exercise the User clas.

main.py

Finally, all there functions have been combined into a single main.py file. The complexity of these functions are now much reduced, since the mechanics of user and game creation have been abstracted away. In fact, it's been abstracted enough that we can eliminate the separate create_new_game endpoint, since it's now a one-liner.

github_webhook_listener

The webhook listener in main.py now looks like this:

def github_webhook_listener(request: Request):
    """ A listener for github webhooks """

    # verify
    if not verify_signature(request):
        logger.error("Invalid signature")
        return jsonify(error="Invalid signature"), 403

    # decode
    try:
        hook_fork = GitHubHookFork.parse_raw(request.data)
    except ValidationError as err:
        logger.error("Validation error", err=err)
        return jsonify(error="Validation error"), 400
    user_id = str(hook_fork.forkee.owner.id)

    # check repo is ours
    if hook_fork.repository.full_name != OUR_REPO:
        logger.error("Not our repo!", repo=hook_fork.repository.full_name)
        return jsonify(error="Invalid repo"), 400

    logger.info("Got fork", data=hook_fork.dict())

    # create a user reference, and then create new game
    user = User.find_by_source_id(Source.GITHUB, user_id)
    if not user:
        user = User.reference(Source.GITHUB, user_id)

    game = Game.new(user, hook_fork.forkee.url)
    logger.info("Created new game for user", game=game, user=user)

    return jsonify(status="ok", user_id=user_id)

The verification and decoding steps are still the same, as they are very endpoint-related (not a function of any of the other modules - Game, User, Quest), but at the bottom, the core functionality of finding a user if it exists, and creating a new game has been reduced to those few lines.

Much of the tests defined before remain unchanged.

github_auth_flow

The auth flow function in main.py now looks like this:


def github_auth_flow(request: Request):
    """ Validates a user from github and creates user """

    # CORS headers
    if request.method == "OPTIONS":
        return ("", 204, CORS_HEADERS)

    # authenticate user
    token = request.headers.get("Authorization", "").removeprefix("Bearer ")
    try:
        decoded_token = verify_id_token(token)
    except (
        ValueError,
        InvalidIdTokenError,
        ExpiredIdTokenError,
        RevokedIdTokenError,
    ) as err:
        logger.warn("Authentication error", err=err)
        return jsonify(error="Authentication error"), 403
    uid = decoded_token["uid"]
    logger.info("Got authenticated user", decoded_token=decoded_token, uid=uid)

    # decode
    try:
        user_data = UserData.parse_raw(request.data)
    except ValidationError as err:
        logger.warn("Validation error", err=err)
        return jsonify(error="Validation error"), 400
    logger.info("Got user data", user_data=user_data)

    # authenticate GitHub
    github = Github(user_data.accessToken)

    try:
        user_id = github.get_user().id
    except BadCredentialsException as err:
        logger.warn("Bad Github credential", err=err)
        return jsonify(error="Bad GitHub credential"), 400

    if user_id != int(user_data.id):
        return jsonify(error="ID mismatch"), 400
    logger.info("Got github ID", user_id=user_id)

    # create new user, and find existing game to assign uid to
    user = User.new(uid=uid, source=Source.GITHUB, user_data=user_data)
    game = Game.find_by_user(user)
    if game:
        game.assign_to_uid(uid)
    logger.info("Results creating new user and finding game", game=game, user=user)

    return {"ok": True}, 200, CORS_HEADERS

Again, a lot of this function relates specifically to tasks that the endpoint must do to validate the data coming in, so they haven't been broken out into separate functions. Where before, we triggered a user creation function using pubsub, here we can just call the (now-shared) User class to do this for us. This is much cleaner, and there was no benefit to decoupling it for now.

Testes from before are largely unchanged

The result of all this cleanup means we drop from three github workflows to a single one, and code is cleaner and more modular. Test coverage is also good for now. It's not a guarantee we don't have bugs, but it does exercise the main features, and will help with future development work

Test coverage results

The missing line 5 in firebase_utils corresponds to that unused if statement because the modules are singleton, and so it never has an opportunity to re-use itself. I can probably remove that if statement.

The missing lines in app/main.py relate to a error handling when something that's not our repo is reported in the webohok, some CORS-related code, and error handling for a couple of edge-cases that I need to write an additional test for but also generate a valid test payload in order to reach, something I'll do later.

So, with all this in place, we now have the framework for quest and user storage! Though quests have no functionality at all, and there is no game loop, that is for next sprint.

DEV Community

LGTM Devlog 22: Modularization

Structure

Firebase Utils

Game

Github Utils

Quest

User

main.py

github_webhook_listener

github_auth_flow

Top comments (0)

Read next

Unlocking Text from Embedded-Font PDFs: A pytesseract OCR Tutorial

Python Development in VSCode Using Devcontainer

Building 5 AI Agents with phidata and Ollama

Creating a Simple a Virus using Python