LGTM Devlog 18: Python Serverless functions using GitHub API to validate users

#devjournal #python #serverless #github

Now that we have the website with working GitHub OAuth, and have worked out that little conundrum with how to validate a user account even though their credentials are passed from the frontend and therefore could be tampered, we have decided to add a serverless function whose job will be to validate a user's ID, and insert their user data into Firestore. The code for this post matches commit 08417eb

New Firebase function

I duplicate the previous github_webhook_listener, and start writing a single cloud function whose job will be to receive the data provided by the website OAuth (a logged-in Firebase user, and their github profile data, and token)

The code is relatively short:

def github_auth_flow(request: Request):
    """ Validates a user from github and creates user """

    # CORS headers
    if request.method == 'OPTIONS':
        return ('', 204, CORS_HEADERS)

    # authenticate user
    token = request.headers.get("Authorization", "").removeprefix("Bearer ")
    try:
        decoded_token = verify_id_token(token)
    except (ValueError, InvalidIdTokenError, ExpiredIdTokenError, RevokedIdTokenError) as err:
        logger.warn("Authentication error", err=err)
        return jsonify(error="Authentication error"), 403
    logger.info("Got authenticated user", decoded_token=decoded_token)

    # decode
    try:
        user_data = UserData.parse_raw(request.data)
    except ValidationError as err:
        logger.warn("Validation error", err=err)
        return jsonify(error="Validation error"), 400
    logger.info("Got user data", user_data=user_data)

    # authenticate GitHub
    github = Github(user_data.accessToken)

    try:
        gh_id = github.get_user().id
    except BadCredentialsException as err:
        logger.warn("Bad Github credential", err=err)
        return jsonify(error="Bad GitHub credential"), 400

    if gh_id != int(user_data.id):
        return jsonify(error="ID mismatch"), 400
    logger.info("Got github ID", gh_id=gh_id)

    # write user data
    db.collection("users").document(decoded_token["uid"]).set({
        **user_data.dict(),
        "joined": firestore.SERVER_TIMESTAMP
    })

    # stats
    db.collection("system").document("stats").update({
        "players": firestore.Increment(1)
    })

    # TODO: find game to join

    return {"ok": True}, 200, CORS_HEADERS

First, it authenticate the user based on their firebase token. This is all provided by the firebase library. And here we can use the lovely Python 3.9 removeprefix() method.

Then, it decodes the user data using pydantic. This model is a bit of duplicate code (and therefore brittle) as it has to match the Vuex auth interface defined in Typescript. This is one of the downsides of a polyglot project like this, but there are tooling possible that would help. In future I will need to think carefully about this. In other projects we already get around this by having the API generate an OpenAPI spec with which TypeScript data models are automatically generated, however in this case using Firebase throws a spanner in the works and that tooling wouldn't work without significant modification - a downside of using serverless today.

Then, once the data is decoded, we authenticate wit the Github API (using pygithub) using the user's access token, and retrieve their user ID and compare it with the data we received. If it matches, we can insert their user data into the database. If it doesn't, the sign-up fails and we don't do anything.

Finally, the code hasn't been added yet, but later on at this point we would query the database to see if their user ID has a game on-going, and if so, associate the game with their account.

Tests

This is again where I feel serverless functions fall short (at least with Python). Because this endpoint integrates with three other services: Firebase Auth, Firestore, and GitHub, and with a lack of tools (and examples!) for simulating python serverless functions, it's been hard to write tests for this endpoint.

In the end, I've gone with the slightly gnarly approach of actually hitting the production environment while testing. I should really move that to a test environment instead. But doing so means I don't have to figure out how to stub out the various bits of code. Perhaps one day I figure out best practices for doing this in the Firebase function. But for now, it is testable with the downside of generating junk data. The tests will also poll GitHub API with a real user token. This isn't ideal either, and I need to stub this, which I'll work out for later as well.

This user setup lives in the conftest.py code

@pytest.fixture(scope="package")
def test_user(firebase_app, firestore_client):
    uid = "test_user_"+"".join([random.choice(string.ascii_letters) for _ in range(10)])

    # create user
    yield auth.create_user(
        uid=uid,
        display_name="test_user",
        email="test_user@example.com",
        app=firebase_app)

    # cleanup
    auth.delete_user(uid, app=firebase_app)
    firestore_client.collection("users").document(uid).delete()
    firestore_client.collection("system").document("stats").update({
        "players": firestore.Increment(-1)
    })


@pytest.fixture(scope="package")
def test_user_token(firebase_app, firestore_client, test_user):
    """ Create a new test user and get a login key """

    # create custom token
    token = auth.create_custom_token(test_user.uid, app=firebase_app).decode()

    # get ID token
    url = f"https://identitytoolkit.googleapis.com/v1/accounts:signInWithCustomToken?key={WEB_API_KEY}"
    res = requests.post(url, json={"token": token, "returnSecureToken": True})
    res.raise_for_status()

    return res.json()["idToken"]

These pytest fixtures are responsible for creating (and then tearing down) a new user for the purpose of tests. The good thing is that in Firebase auth, these users show up as "anonymous user" with UIDs that start with "test_user_", and so are easy to identify and remove (or programatically block from doing anything useful) in the event that the junk data is left behind accidentally.

The test_user_token() fixture is needed because without a frontend doing the logging in and user account creation, we need to do that in a test fixture. This HTTP function will do that for us.

With these fixtures in place, one of the tests for a good flow simply looks like this:

def test_good_flow(client, firebase_app, test_user_token, test_user_data, test_user, firestore_client):
    """ Test a successful flow """

    res = client.post("/", headers={"Authorization": "Bearer "+test_user_token}, json=test_user_data)
    assert res.status_code == 200

    # check firestore
    doc = firestore_client.collection("users").document(test_user.uid).get()
    assert doc.exists
    assert doc.get("id") == test_user_data["id"]

We hit the endpoint with the Bearer Authorization (same auth that front-end would be using), and test user data (also defined as a fixture) which contains a valid GitHub user ID and token; and hit the endpoint with it. Then check that the firestore collection was created. Other test cases check the various error conditions and when data is bad.

Permissions

With a firebase function doing the user account creation, we can now ban the frontend from being able to modify user accounts, so the Firestore rules have been adjusted accordingly.

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    function isSignedIn() {
      return request.auth != null && request.auth.uid != null && request.auth.uid != "";
    }

    function isUser(uid) {
      return isSignedIn() && request.auth.uid == uid;
    }

    match /users/{uid} {
      allow read: if isUser(uid);
      allow write: if false;
    }

    match /system/stats {
      allow read: if true;
      allow write: if false;
    }

    match /{document=**} {
      allow read, write: if false;
    }
  }
}

There's also a new rule to allow public reading of /system/stats which can be used for some basic website stats like how many players there are of the game (and later, the run status of the game core loop)

That concludes this little section of the Games' authentication logic. With this in place, we now know how user accounts will look in the database, and therefore how to associate game data with user accounts in a way that keeps us flexible about creating game data without a user account existing; or creating user accounts when no game data exists.

Next step we can finally start defining the actual core of the game!