Let’s start first with what a graceful shutdown is. Generally speaking, graceful shutdown refers to the process of shutting down an application in a controlled and orderly manner. It involves taking necessary steps to ensure that all ongoing operations are completed, resources are properly released, and data integrity is maintained before full termination.
If you are using a default go HTTP library for your server, you might have seen a function called Shutdown. This function already handles all required actions for the graceful termination of your HTTP server. More specifically, as the doc says, “Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down.”
Okay, you might think, why then do you need to read this post? Let’s check such an example.
P.S. I have to say that it's not a "best practice" code example and it exists only for conceptual purposes. So such things as application layers, interface abstractions, etc, are omitted by obvious reasons.
package main
import (
"context"
"errors"
"fmt"
"log"
"net/http"
"os/signal"
"syscall"
"github.com/gorilla/mux"
)
func main() {
userService := UserService{
db: db{},
amqp: amqp{},
}
r := mux.NewRouter()
r.HandleFunc("/user", func(rw http.ResponseWriter, req *http.Request) {
name := "some name" // let's imagine we got it from the request
if err := userService.RegisterUser(req.Context(), name); err != nil {
rw.WriteHeader(http.StatusInternalServerError)
return
}
rw.WriteHeader(http.StatusOK)
}).Methods(http.MethodPost)
srv := http.Server{}
srv.Addr = ":8080"
srv.Handler = r
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer stop()
go func() {
if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Fatalf("listen and serve returned err: %v", err)
}
}()
<-ctx.Done()
log.Println("got interruption signal")
if err := srv.Shutdown(context.TODO()); err != nil {
log.Printf("server shutdown returned an err: %v\n", err)
}
log.Println("final")
}
type UserService struct {
amqp amqp
db db
}
func (s *UserService) RegisterUser(ctx context.Context, name string) error {
log.Println("start user registration")
userID, err := s.db.InsertUser(ctx, name)
if err != nil {
return fmt.Errorf("db insertion failed: %v", err)
}
go s.amqp.PublishUserInserted(ctx, userID)
return nil
}
type db struct{}
func (d db) InsertUser(ctx context.Context, name string) (int, error) {
log.Println("user insert")
return 1, nil
}
type amqp struct{}
func (a amqp) PublishUserInserted(ctx context.Context, id int) {
log.Println("message publish")
}
Here we have a simple server that imitates a user registration. The HTTP server has a graceful shutdown functionality and everything seems nice. Let's try to make a call and then interrupt a process.
2023/07/09 22:35:50 start user registration
2023/07/09 22:35:50 user insert
2023/07/09 22:35:50 message publish
^C
2023/07/09 22:35:52 got interruption signal
2023/07/09 22:35:52 final
Both db and amqp were called, and then we triggered the shutdown. We can see, that signal handling works and HTTP server shutdown was handled. Does our code have a problem? Let's alter a few lines of our code.
func (d db) InsertUser(ctx context.Context, name string) (int, error) {
time.Sleep(time.Second * 10)
log.Println("user insert")
return 1, nil
}
func (a amqp) PublishUserInserted(ctx context.Context, id int) {
time.Sleep(time.Second * 20)
log.Println("message publish")
}
Here we will try to imitate, that db insertion and amqp publications take some "significant" time. What do we expect to see in logs?
2023/07/09 22:43:22 start user registration
^C
2023/07/09 22:43:25 got interruption signal
2023/07/09 22:43:32 user insert
2023/07/09 22:43:32 final
As you can see, we have gracefully shutdown only the db insertion, but the amqp publish was lost. Why so? Due to the fact, that amqp publish is performed in a goroutine, it's not controlled by HTTP server shutdown functionality and it becomes our responsibility to be handled.
There are several ways how it could be done, let's check one of them. First, it means, that we have to start controlling goroutines execution on our own. Let's add a waitgroup to our service structure.
type UserService struct {
amqp amqp
db db
doneWG sync.WaitGroup
}
We will use it while we trigger publish functionality in the register endpoint. Let's place publish in a separate function. First, we will use our recently added waitgroup to control goroutines execution flow. Second, we will add a recover function to prevent unexpected panic propagation outside of the goroutine.
func (s *UserService) publishUserInserted(ctx context.Context, userID int) {
s.doneWG.Add(1)
go func() {
defer s.doneWG.Done()
defer func() {
if err := recover(); err != nil {
log.Printf("publishUserInserted recovered panic: %v\n", err)
}
}()
s.amqp.PublishUserInserted(ctx, userID)
}()
}
Last but not least, we need to add a stop function, which will control goroutines execution finish, and use it in our main function. As a bonus point, we will handle a context done channel to skip waiting if there's some deadline or cancel.
func (s *UserService) Stop(ctx context.Context) {
log.Println("waiting for user service to finish")
doneChan := make(chan struct{})
go func() {
s.doneWG.Wait()
close(doneChan)
}()
select {
case <-ctx.Done():
log.Println("context was marked as done earlier, than user service has stopped")
case <-doneChan:
log.Println("user service finished")
}
}
The overall code would look like this:
package main
import (
"context"
"errors"
"fmt"
"log"
"net/http"
"os/signal"
"sync"
"syscall"
"time"
"github.com/gorilla/mux"
)
func main() {
userService := UserService{
db: db{},
amqp: amqp{},
}
r := mux.NewRouter()
r.HandleFunc("/user", func(rw http.ResponseWriter, req *http.Request) {
name := "some name" // let's imagine we got it from the request
if err := userService.RegisterUser(req.Context(), name); err != nil {
rw.WriteHeader(http.StatusInternalServerError)
return
}
rw.WriteHeader(http.StatusOK)
}).Methods(http.MethodPost)
srv := http.Server{}
srv.Addr = ":8080"
srv.Handler = r
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer stop()
go func() {
if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Fatalf("listen and serve returned err: %v", err)
}
}()
<-ctx.Done()
log.Println("got interruption signal")
if err := srv.Shutdown(context.TODO()); err != nil { // Use here context with a required timeout
log.Printf("server shutdown returned an err: %v\n", err)
}
userService.Stop(context.TODO()) // Use here context with a required timeout
log.Println("final")
}
type UserService struct {
amqp amqp
db db
doneWG sync.WaitGroup
}
func (s *UserService) Stop(ctx context.Context) {
log.Println("waiting for user service to finish")
doneChan := make(chan struct{})
go func() {
s.doneWG.Wait()
close(doneChan)
}()
select {
case <-ctx.Done():
log.Println("context done earlier then user service has stopped")
case <-doneChan:
log.Println("user service finished")
}
}
func (s *UserService) RegisterUser(ctx context.Context, name string) error {
log.Println("start user registration")
userID, err := s.db.InsertUser(ctx, name)
if err != nil {
return fmt.Errorf("db insertion failed: %v", err)
}
s.publishUserInserted(ctx, userID)
return nil
}
func (s *UserService) publishUserInserted(ctx context.Context, userID int) {
s.doneWG.Add(1)
go func() {
defer s.doneWG.Done()
defer func() {
if err := recover(); err != nil {
log.Printf("publishUserInserted recovered panic: %v\n", err)
}
}()
s.amqp.PublishUserInserted(ctx, userID)
}()
}
type db struct{}
func (d db) InsertUser(ctx context.Context, name string) (int, error) {
time.Sleep(time.Second * 10)
log.Println("user insert")
return 1, nil
}
type amqp struct{}
func (a amqp) PublishUserInserted(ctx context.Context, id int) {
time.Sleep(time.Second * 20)
log.Println("message publish")
}
Let's try to execute it and check, whether our problem was solved.
2023/07/09 22:51:48 start user registration
^C
2023/07/09 22:51:49 got interruption signal
2023/07/09 22:51:58 user insert
2023/07/09 22:51:58 waiting for user service to finish
2023/07/09 22:52:18 message publish
2023/07/09 22:52:18 user service finished
2023/07/09 22:52:18 final
Indeed, as you can see from the logs, now we have waited until publishing has finished. With this approach, you can extend control of the stop functionality on your own, e.g. add a timeout for how long you would love to wait for shutdown by passing a context with the corresponding timeout. Keep in mind, that ignoring graceful shutdown might lead to inconsistency and various data loss, so it's extremely important to pay attention to this subject.
Top comments (1)
I wrote a package that exports
GracefulContext
that implements Go's context - github.com/benedictjohannes/gracef... with the additional feature of graceful shutdown (with cleanup functions and cleanup function deadlines).