I found myself implementing this pattern twice in recent times. As a result, I thought I'd share it.
First, some background. I wrote a web crawler for reasons. The crawler would find links in a webpage and create a go routine to follow the link and download what was at that endpoint, and recurse. Quickly, this caused the system running the crawler to run out of open file handles. Goroutines are so powerful that we had the opposite problem that we have in a serial downloader. Rather than going to slowly because of a one at a time situation, we did so much at once that we ran out of free file handles. I just wrote another crawler type program which downloads links discovered via a JSON API and needed the same solution.
Browsers are examples of good behavior to the point that they don't open very many connections to remote hosts. There is a great stack overflow post https://stackoverflow.com/questions/985431/max-parallel-http-connections-in-a-browser which describes how web browsers limit connections. It is very easy to make my Go programs behave in a similar way.
func main() {
...
for {
go httpSave(currentURL, currentFileName)
}
...
}
// httpSave saves u to file named name.
func httpSave(u, name string) {
resp, err := http.Get(u)
if err != nil {
// Queue errors?
log.Print(err)
return
}
f, err := os.OpenFile(name, os.O_RDWR|os.O_CREATE, 0666)
if err != nil {
// Queue errors?
log.Print(err)
}
defer f.Close()
n, err := io.Copy(f, resp.Body)
log.Printf("wrote %d bytes to %s", n, name)
}
The above httpSave function is real, and the main is partially omitted code for how it might be used.
There are many ways one might limit the number of connections, but a simple to understand way IMO, rather than using locks and a counter is to use a channel of length N where N is the number of concurrent things. I'm still going to use a lock, but that is for synchronizing access to the map which maps the host we are limiting to the channel.
I'm using globals, but for a larger app I would add the map and lock to the struct that holds data around my app state for this operation.
var domainLimit map[string]chan struct{}
var domainListMutex sync.Mutex
func main() {
domainLimit = make(map[string]chan struct{})
...
}
func getDomainToken(u string) func() {
u2, err := url.Parse(u)
if err != nil {
log.Print(err)
return func() {}
}
domainListMutex.Lock()
defer domainListMutex.Unlock()
f, ok := domainLimit[u2.Host]
if !ok {
f = make(chan struct{}, 6) // Six connections per host.
domainLimit[u2.Host] = f
}
f <- struct{}{}
return func() {
<-f
}
}
To use this, I add 1 line to the top of the httpSave function.
func httpSave(u, name string) {
defer getDomainToken(u)()
...
If I had other functions doing http operations I'd need to add that same one line to the top of them. In my case, I don't.
I like it because it is simple and it works.
Top comments (0)