The Home Server Journey - 5: Rebuilding Burned Bridges

#selfhosting #kubernetes #decentralization #censorship

Hello again. It's been a while (in relative terms)

Having our own Matrix rooms feels amazing indeed, but left us too isolated, at least until we manage to bring people in. I know that in the beginning I've warned about making sacrifices, but it doesn't have to be that hard. In life some metaphorical burned bridges are better left unrepaired, but here let's try reconnecting to our old Internet life in an interesting way, right?

Notes on Infrastructure

I have waited a bit to resume publishing because there were 2 things bothering me about my setup: power reliability and storage coupling

I don't live near a big urban center, and power oscillations or outages are pretty common here. Having those causing frequent downtime on my server would be quite annoying, so I went after a nobreak to ensure more energy stability for my devices

With that out the way, as I have said on the previous article, ideally your K8s system would have an external and dedicated storage provider. I certainly wasn't satisfied with mixing K8s node and NFS server functionality in the same machines, but the ones I was using were the only units with high-bandwith USB 3.0 ports, desirable for external drives. So I've decided to invest a bit more and acquire an extra Raspberry Pi 5 for my cluster, leaving the RPi4 for data management

Not only the RPi5 gives me more performance, but it also is more equivalent to the Odroid N2+. As it comes with a beefier power supply (5V-5A), intended for video applications that I'm not doing, I've tried using it for the RPi4 connected to the current-hungry hard drives. However, its circuit is simply not made to use all that juice, so as a last piece of the puzzle I had to get an externally powered USB3 hub:

(The resulting Frankenstein is not a pretty sight)

If that mess of a wiring is too hard to understand (I bet it is), have a quick diagram of how it works below:

At first I was worried that using a single USB port for 2 disks would be a bottleneck, but even in the best case, sequential reading, HDDs cannot saturate the 5Gbps bandwidth the RPi4 can handle on that interface, and the 1Gbps Ethernet will be a greater constraint anyway

Keeping the State, Orchestrated Style

One thing that you notice when deploying all sorts of applications is how much they rely on databases, relational or non-relational, in order to delegate internal data management and remain stateless. That's true to the point that we can hardly progress here without having some DBMS running first

It's common to have databases running as off-node services just like NFS, but the ability to escalate them horizontally is then lost. Here, we'll resort to Kubernetes stateful features in order to keep data management inside the cluster. For that, instead of a Deployment we require a StatefulSet:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  namespace: choppa
  labels:
    app: postgres
data:
  POSTGRES_DB: choppadb                   # Default database
  POSTGRES_USER: admin                    # Default user
---
apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
  namespace: choppa
data:
  POSTGRES_PASSWORD: Z290eW91             # Default user's password
---
apiVersion: apps/v1
kind: StatefulSet                         # Component type
metadata:
  name: postgres-state
  namespace: choppa
spec:
  serviceName: postgres-service           # Do not give a service name to each replica
  replicas: 2
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16.4              # Current stable version at 17.0, but let's be a bit conservative
          imagePullPolicy: "IfNotPresent"
          ports:
            - containerPort: 5432
              name: postgres-port
          envFrom:
            - configMapRef:
                name: postgres-config
          env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: POSTGRES_PASSWORD
          volumeMounts:
            - mountPath: /var/lib/postgresql/data
              name: postgres-db
  volumeClaimTemplates:                     # Description of volume claim created for each replica
  - metadata:
      name: postgres-db
    spec:
      storageClassName: nfs-small
      accessModes: 
        - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: choppa
spec:
  type: LoadBalancer                        # Let it be accessible inside the local network
  selector:
    app: postgres
  ports:
    - protocol: TCP
      port: 5432
      targetPort: postgres-port

(At first we're using a PostgreSQL image from the official DockerHub repository, but different databases such as MySQL/MariaDB or MongoDB will follow a similar pattern)

The biggest difference from previous deployments is that we don't explicitly define a Persistent Volume Claim to be referenced by all replicas, but give the StatefulSet a template on how to request a distinct storage area for each Pod. Moreover, the serviceName configuration defines how stateful pods will get their specific hostnames from the internal DNS, in the format <pod name>.<service-name>:<container port number>

(I have tried using a stateful configuration for Conduit as well, but also wasn't able to increase the number of replicas due to some authentication issues)

Notice how those pods follow a different naming scheme. ReplicaSets with persisting data changes have to be created, deleted and updated/synchronized in a very particular order, therefore they cannot be as ephemeral as stateless ones, with asynchronously managed and randomly named containers. Those constraints are the reason we should avoid StatefulSets wherever possible

A Telegram from an Old Friend

As I've promised, today we're learning how to keep in touch with people in the centralized world from our new Matrix home. And that's achieved through the power of Mautrix bridges. As I'm a big user of Telegram, that's what I'll be using as an example, but the process is mostly the same for other services, like Discord or WhatsApp

Long story short, the bridge needs to be configured on 2 ends: how to interact with your third-party service account and how to register onto your Matrix server. The templates for those settings may be automatically generated using the same container image we'll me pulling into our K8s cluster:

# Create a directory for configuration files and enter it
$ docker run --rm -v `pwd`:/data:z dock.mau.dev/mautrix/telegram:latest                                                                                                
Didn't find a config file.
Copied default config file to /data/config.yaml
Modify that config file to your liking.
Start the container again after that to generate the registration file.
# Change the configuration file to match your requirements and preferences
$ docker run --rm -v `pwd`:/data:z dock.mau.dev/mautrix/telegram:latest                                                                                               
Registration generated and saved to /data/registration.yaml
Didn't find a registration file.
Generated one for you.
See https://docs.mau.fi/bridges/general/registering-appservices.html on how to use it.

Now you either manually add the contents of config.yaml and registration.yaml to a ConfigMap manifest file or automatically generate/print it with:

sudo kubectl create configmap telegram-config --from-file=./config.yaml --from-file=./registration.yaml --dry-run=client -o yaml

(Running as super user is required as the folder permissions are modified. --dry-run prevents the configuration from being directly applied, so that you may adjust possible mistakes, pipe the results to a YAML file or add the result to a bigger manifest)

Bridge configuration is quite long, so I'll reduce it below to only the most relevant parts. For information on all options, consult the official documentation:

apiVersion: v1
kind: ConfigMap
metadata:
  name: telegram-config
  labels:
    app: telegram
data:
  config.yaml: |
    # Homeserver details
    homeserver:
        # The address that this appservice can use to connect to the homeserver.
        address: http://conduit-service:8448
        # The domain of the homeserver (for MXIDs, etc).
        domain: choppa.xyz
    # ...
    # Application service host/registration related details
    # Changing these values requires regeneration of the registration.
    appservice:
        # The address that the homeserver can use to connect to this appservice.
        address: http://telegram-service:29317
        # When using https:// the TLS certificate and key files for the address.
        tls_cert: false
        tls_key: false
        # The hostname and port where this appservice should listen.
        hostname: 0.0.0.0
        port: 29317
        # ...
        # The full URI to the database. SQLite and Postgres are supported.
        # Format examples:
        #   SQLite:   sqlite:filename.db
        #   Postgres: postgres://username:password@hostname/dbname
        database: postgres://telegram:mautrix@postgres-service/matrix_telegram
        # ...
        # Authentication tokens for AS <-> HS communication. Autogenerated; do not modify.
        as_token: <same as token from registration.yaml>
        hs_token: <same hs token from registration.yaml>
    # ...
    # Bridge config
    bridge:
        # ...
        # Shared secrets for https://github.com/devture/matrix-synapse-shared-secret-auth
        #
        # If set, custom puppets will be enabled automatically for local users
        # instead of users having to find an access token and run `login-matrix`
        # manually.
        # If using this for other servers than the bridge's server,
        # you must also set the URL in the double_puppet_server_map.
        login_shared_secret_map:
            choppa.xyz: <your access token>
        # ...
    # ...
    # Telegram config
    telegram:
        # Get your own API keys at https://my.telegram.org/apps
        api_id: <id you have generated>
        api_hash: <hash you have generated>
        # (Optional) Create your own bot at https://t.me/BotFather
        bot_token: disabled
        # ...
  registration.yaml: |
    id: telegram
    as_token: <same as token from config.yaml>
    hs_token: <same hs token from config.yaml>
    namespaces:
        users:
        - exclusive: true
          regex: '@telegram_.*:choppa\.xyz'
        - exclusive: true
          regex: '@telegrambot:choppa\.xyz'
        aliases:
        - exclusive: true
          regex: \#telegram_.*:choppa\.xyz
    url: http://telegram-service:29317
    sender_localpart: HyMPlWT_552RAGkFtnuy_ZNNpkKrSSaHDndS_nmb9VIZ4RiJLH0uiSas3fi_IV_x
    rate_limited: false
    de.sorunome.msc2409.push_ephemeral: true
    push_ephemeral: true
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: telegram-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: telegram
  template:
    metadata:
      labels:
        app: telegram
    spec:
      containers:
        - name: telegram
          image: dock.mau.dev/mautrix/telegram:latest
          imagePullPolicy: "IfNotPresent"
          # Custom container initialization command (overrides the one defined in the image)
          command: [ "python3", "-m", "mautrix_telegram", "-c", "/data/config.yaml", "-r", "/data/registration.yaml", "--no-update" ]
          ports:
            - containerPort: 29317      # Has to match the appservice configuration
              name: telegram-port
          volumeMounts:
            - name: telegram-volume
              mountPath: /data/config.yaml
              subPath: config.yaml
            - name: telegram-volume
              mountPath: /data/registration.yaml
              subPath: registration.yaml
      volumes:
      - name: telegram-volume
        configMap:
          name: telegram-config
---
apiVersion: v1
kind: Service
metadata:
  name: telegram-service
spec:
  publishNotReadyAddresses: true
  selector:
    app: telegram
  ports:
    - protocol: TCP
      port: 29317               # Has to match the registration url
      targetPort: telegram-port

After applying the changes, the bridge pod is deployed, but it keeps failing and restarting... Why?

Visualizing the logs give us a clearer picture of what's happening:

$ kubectl logs telegram-deploy-84489fb64d-srmrc --namespace=choppa                                                                                                                         ✔  default ⎈ 
[2024-09-29 15:28:34,871] [INFO@mau.init] Initializing mautrix-telegram 0.15.2
[2024-09-29 15:28:34,879] [INFO@mau.init] Initialization complete in 0.19 seconds
[2024-09-29 15:28:34,879] [DEBUG@mau.init] Running startup actions...
[2024-09-29 15:28:34,879] [DEBUG@mau.init] Starting database...
[2024-09-29 15:28:34,879] [DEBUG@mau.db] Connecting to postgres://telegram:password-redacted@postgres-service/matrix_telegram
[2024-09-29 15:28:34,946] [CRITICAL@mau.init] Failed to initialize database
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/mautrix/bridge/bridge.py", line 216, in start_db
    await self.db.start()
  File "/usr/lib/python3.11/site-packages/mautrix/util/async_db/asyncpg.py", line 71, in start
    self._pool = await asyncpg.create_pool(str(self.url), **self._db_args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 403, in _async__init__
    await self._initialize()
  File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 430, in _initialize
    await first_ch.connect()
  File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 128, in connect
    self._con = await self._pool._get_new_connection()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 502, in _get_new_connection
    con = await connection.connect(
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asyncpg/connection.py", line 2329, in connect
    return await connect_utils._connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 991, in _connect
    conn = await _connect_addr(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 828, in _connect_addr
    return await __connect_addr(params, True, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 876, in __connect_addr
    await connected
asyncpg.exceptions.InvalidPasswordError: password authentication failed for user "telegram"
[2024-09-29 15:28:34,950] [DEBUG@mau.init] Stopping database due to SystemExit
[2024-09-29 15:28:34,950] [DEBUG@mau.init] Database stopped

If you pay attention to the configuration shown, there's a database section where you may either configure SQLite (local file, let's avoid that) or PostgreSQL usage, required by the bridge. I wouldn't go on a tangent about databases for nothing, right?

The problem is that the connection string defined, which follows the format postgres://<user>:<password>@<server hostname>/<database name>, is referencing a user and a database that don't exist (I wouldn't use the administrator account for a simple bot). So let's fix that by logging as the main user into the DBMS with a PostgreSQL-compatible client of your choice. Here I'm using SQLectron:

From there, use this sequence of SQL commands that are pretty much self-explanatory:

CREATE DATABASE matrix_telegram;
CREATE USER telegram WITH PASSWORD 'mautrix';
GRANT CONNECT ON DATABASE matrix_telegram TO telegram;
-- Personal note: ENTER THE DATABASE !!
GRANT ALL ON SCHEMA public TO telegram;
GRANT ALL ON ALL TABLES IN SCHEMA public TO telegram;
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO telegram;
-- Give priveleges for all newly created tables as well
ALTER DEFAULT PRIVILEGES FOR USER telegram IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO telegram;

(A bit overkill but, hey, works for me... If some commands fail you may need to wait a bit for the pods to synchronize or even reduce the number of replicas to 1 while making changes)

Errata: Once again we cannot use stateful features to the fullest. Replicating data across pods is not done automatically by the K8s system and dependends on the application. It took me a while to figure that out and I'll have to study how to do it properly for each case. For now we'll keep it at only one instance and address proper replication later

If you still get some log errors about an unknown token, go into the admin room of your Conduit server and send this message: