Photo by Chen Hu on Unsplash.
At Setter we have a 2nd generation API server that handles:
- API requests coming in from internal software
- API requests from our customer-facing iOS and Android applications
- Webhook requests from 3rd party services
This is all quite typical stuff.
Communicating With Systems Outside Of Our Control
In the process of handling some of these requests, we have to communicate with 3rd party services.
One example being customer order approvals, in which we have to send the customer an email to confirm the order and provide a sort of feedback loop to the user.
So in this case, the flow looks like:
- Receive API request from mobile app
- Process API request (which will involve make some DB inserts / updates)
- Dispatch API request to 3rd party email provider (in our case we use Postmark and we highly recommend it)
By making API requests directly from our system, we've now reduced the certainty of success, and introduced incomplete states. For example, Postmark (the email service provider we use) could be down for routine maintenance, and hence a request to their service could fail at unpredictable times. This introduces an incomplete state in that the customer will never receive an email to let them know that their order was indeed processed and acknowledged.
This sort of thing has happened a few times at our company.
Eliminating The Dependancy on 3rd Party Services
Currently we're undergoing an internal RFC processes to decide how we're going to decouple 3rd party services from the core of our system.
I took the lead on this particular RFC (although I've had lots of guidance from my colleagues while writing it), and in this post I discuss the bulk of it.
What I'm proposing at our company is that we leverage the technologies we already have (PostgreSQL & NodeJS) in order to not increase system complexity - as opposed to using a tool such as RabbitMQ (not to say that RabbitMQ is bad).
By using PostgreSQL's LISTEN
/ NOTIFY
features, you have everything you need in order to have a high-performance, fault-taulerant pub / sub system.
I went ahead and created an example app that implements this system - feedback welcome!
https://github.com/gDelgado14/pg-pubsub
Here are the relevant parts of the example project (as found in the README.md):
Implementing the "Pub" in Pub / Sub
The migrations
folder contains the schemas / triggers / and SQL functions necessary to implement the publishing aspect of the system.
More specifically,a db_events
table is created which stores messages sent into the pub sub system. Further, there is a trigger made that executes a sql function on any insertion into the db_events
.
Implementing the "Sub" in Pub / Sub
Inside src/services/client.ts
, I use the pg
module to:
- Connect to the db
- Listen to "pub_sub" events being invoked from within postgres (which I've defined in the migrations)
- Invoke any asynchronous functions associated with the various events that can occur in the system.
Now you can subscribe to any event you want. You can define the events in your code. It really helps if you use a statically-typed language (which is why I implemented the example in TypeScript) so that your message payload is always consistent to the message the payload is associated to.
You can see some example channel / payload combinations inside src/services/client.ts
. For example, if you publish an sms
message, the payload going in and out of the pub / sub system will always be { msg: 'some string' }
Another awesome aspect of this system is that you can choose to run your subscription logic on a different machine / runtime / language. This is because it's postgres that's sending messages into the runtime. In my example I kept it simple and had both the publishing and subscribing happening in the same app, but it doesn't have to be that way if you don't want it to be!
Handling failure gracefully
One thing I haven't thought enough about is how to handle message processing failures.
Example: Say I publish an email
message into the pub / sub system and a subscriber tries to process the message by, say, sending an API request to Postmark and Postmark is down. How should I best manage this?
I think implementing an exponential back-off retry might be the right approach.
Would love to hear your thoughts on this!
Top comments (13)
I looked into postgres and loved the concept, but in practice, I was using pgbouncer as a connection pool with aggressive connection sharing, and I think pgbouncer did not support the notify/listen commands, and even if it did, I think keeping transactions open was required which meant all the waiters would need separate real connections to the db, which ate up resources and created more problems. That's not to say I couldn't have overcome that barrier but at the time I considered it too risky to try or research further.
Like you, I prefer being able to debug and fix my own applications vs worry about configuration / the learning curve of other software, particularly when my needs are specific. I just rolled my own notifications, of 2 types:
Some things to keep in mind:
Wow, first of all don't do it.
Second, a nice way to decouple the 3 steps you said is with "functions, serverless", it fits your example very well ( async events that triggers 2 separate functions).
First point again, a few notes:
PS: I may be wrong, I'm not deeply familiar with PostgreSQL
PS2: there is no such thing as zero dependency, if you guys do this system in house you are creating a new project (in project), that requires your attention, tests, time and resources. Also the system is dependent on your business logic, infrastructure and database (things that can be mitigated by using a PaaS / cloud PUB/SUB))
I like this idea quite a bit. We actually have a postgres based setup at work plus a rabbitmq and it would be nice to try to eliminate a moving piece (rabbit) if I can. I love rabbit but I love minimalism even more. The only disadvantage I can think of is giving all the nodes access to the database which I may not want to due to security considerations.
Agreed, opening up access to all nodes is a serious drawback to this approach but I am sure that there are ways to go around this.
How are you and your colleagues liking RabbitMQ. We too are big fans of minimalism. Have there been some unexpected downsides to to introducing rabbit (besides having an additional moving piece)?
I'm a big fan of rabbit. Used it in production on several occasions for rather large (hundreds of messages per second) deployments and it worked really well for me. It's very easy to setup and it just sort of works.
Minimalist would be a simple Redis, like other said. You can get also persistent storage (from time to time), performance and scaling (if needed).
Slack devs are also reticent to new dependencies and avoids adding new tech in their stack, you can read about their Job Queue system here, which is exactly what you need too (more or less). The old system is on redis (previous version, probably is enough for your needs)
Why don't you use a Redis task queue to see how it goes?
You can create a dedicated database user that has access to certain tables.
Classical
Pub/Sub
might not be a best fit, some of the reasons Adrian outlined pretty nicely for you in his comment.May I propose to consider
Event Sourcing
pattern? You don't really remove or modify any records -- you just append them. Usually this pattern is being used withCQRS
pattern -- it brings really nice scaling advantages later on into your project.I think the requests and the results (process and email) are ephemeral, so they do not need to be kept indefinitely. Event sourcing solves the problem when you need to keep all the previous states of the entities (kind of), I don't think this is the case. The requests are unique and do not change either need to be kept for longer period of time.
Hey Gio, great post!
Just my two cents on this topic.
Apparently, your problem is one I was facing in my previous company: how to deal with scale.
It also appears that the first step to decouple your application was taken: boundaries have been defined for third party systems and completely removed from your core business logic.
There are some contenders out there to PostgreSQL that are minimal and perform quite well.
I'm a big fan of Nats. I mean, dead simple, blazing fast, and Just Works™ out of the box. It's also under the CNCF, so it shows a lot of promise.
Having that well defined boundary around this kind of dependency should make it trivial to use. It needs some special care, e.g. defining your own request/response mechanism in order to achieve at-least-once semantics, but in my opinion this extra step is well worth it.
thanks, nice example of postgres pub sub
If you are on AWS I would recommend going with SNS/SQS for pub sub. As another user mentioned, Event Sourcing might be a good alternative. In that case I would look at AWS Kinesis / DynamoDB