Discussion on: Use cases for persistent logs with NATS Streaming

View post

Thanks for the article, but there are some important things missing.

One large missing point is reconnection and NATS/STAN restarts, which isn't trivial to implement.

Another is producer may need to store outgoing events into own persistent outgoing queue before trying to send them to STAN to avoid missing events on producer restarts.

For this to be true after restarts, the client would need to persist the lastProcessed value someone and load it on start. But this could be as simple as a local file to contains the ID of the message that was last processed.

This won't work unless updating this file can be done atomically with handling message. Which is impossible in general case, and usually can be done in cases like when handling message is done by updating data in SQL database and this ID is updated in same database and in same transaction.

Byron Ruth • Jul 13 '18

Both the NATS and STAN clients reconnect automatically. One configurable setting is how many reconnects should take place and you set this to unlimited.

Another is producer may need to store outgoing events into own persistent outgoing queue before trying to send them to STAN to avoid missing events on producer restarts.

On a publish, the client will return an error if it cannot contact the server or if the server fails to ack. If you do this asynchronously (not in the request hot path for example), then you are correct you will need to handle this locally by storing the events and retrying perpetually. However if this is part of a request transaction, then presumably any effects could be compensated for and an error would returned to the caller.

This won't work unless updating this file can be done atomically with handling message.

Yes absolutely, the devil is in the details. The general case is that if you are doing at least two kinds of I/O in a transaction, you will need some kind of two-phase commit or consensus + quorum since any of the I/O can fail.

The intent for the examples here is to demonstrates patterns, not the failure cases (which are very interesting in their own right!) But thanks for pointing out a couple (very serious) failure cases!

Alex Efros • Jul 13 '18

Both the NATS and STAN clients reconnect automatically.

NATS - maybe, but STAN doesn't reconnect.

Byron Ruth • Jul 13 '18

MaxReconnects option for NATS, setting to -1 will cause the client to retry indefinitely. For the STAN client, you can achieve the same thing by passing in the NATS connection using this option. So..

nc, _ := nats.Connect("nats://localhost:4222", nats.MaxReconnects(-1))
sc, _ := stan.Connect("test", "test", stan.NatsConn(nc))

The STAN client just build on a NATS connection, so any NATS connection options will be respected in the STAN client.

Alex Efros • Jul 13 '18

STAN doesn't reconnect automatically, just test it yourself. Also, if no nats connection provided to STAN then it create default connection with same -1 setting.

Byron Ruth • Jul 13 '18

From the README, I read that the intent is for reconnection to happen transparently, but there are cases where this fails (there appears to have been quite a few changes and improvements since I wrote this article). It describes the case that the server doesn't respond to PINGs and the client decided to close the connection. Likewise if there is a network partition or something, the server may disconnect, but the client (in theory) could re-establish the connection after the interruption.

I am not arguing that it can't happen, but I am simply stating that the intent is for the client to reconnect. If you are observing otherwise, then that may be an issue to bring up to the NATS team.

Alex Efros • Jul 13 '18 • Edited

Team is aware about this. I agree intent is to support reconnects, but for now you have to implement it manually, and it's unclear when (if ever) this will be fixed in library. Main issue with reconnects is needs to re-subscribe, and how this should be done is depends on application, so it's not ease to provide some general way to restore subscriptions automatically - this is why it's not implemented yet and unlikely will be implemented soon.

Byron Ruth • Jul 13 '18

Good to know. Agreed, in the case of consumers how subscriptions need or should be re-established can vary based on the application.