I've been an off and on Go user for several years but have recently started spending a lot more time using it for microservices. A common service (micro or otherwise) always seems to be a 'watchdog' type that tracks the health of a system, process, platform, etc and does something when it detects a problem. I've worked on these sorts of things (usually not in microservice format) at many places I've worked. I recently wrote one in Go that is extremely simplistic, but is a great foundation for a project that could evolve into something much more fully featured. At it's heart, the watchdog is nothing more than a very simple HTTP handler:


func handler(w http.ResponseWriter, r *http.Request) {
	service := r.URL.Query().Get("id")
	log.Printf("Got data from service %s", service)
	if service != "" {
		services[service] = time.Now()
	} else {
		http.Error(w, "invalid data", http.StatusBadRequest)
	}
}

It keeps track of heartbeats from other services on the network. When it receives a heartbeat, it stores a timestamp. It will periodically check its map of timestamps and take an action if any of them are older than a threshold:


func watcher(ticker *time.Ticker) {
	for {
		<-ticker.C
		for name, timestamp := range services {
			if time.Now().Sub(timestamp).Seconds() > 10 {
				delete(services, name)
				log.Printf("Service %s died - restarting", name)
				go restart(name)
			}
		}
	}
}

The watcher is setup to be run off a ticker that ticks every 5 seconds:


ticker := time.NewTicker(5 * time.Second)
go watcher(ticker)

Every 5 seconds a message is sent on the channel on the ticker, unblocking the infinite loop in the watcher. If any of the services in the map haven't sent watchdog a heartbeat within 10 seconds, the code in restart is called. I've chosen to simply restart the dead service. Any action could be taken (message a Slack channel, email someone, restart selectively, etc), and those could be areas to improve and expand the code.


func restart(name string) {
	cmd := exec.Command(name)
	err := cmd.Start()

	if err != nil {
		log.Printf("Error starting service %s: %v", name, err)
	}
}

I've set up a small program that will ping the watchdog and exit, meaning every 10-15 seconds the watchdog will restart the exited process.



The full project/code is here