DEV Community

Chinonso Amadi
Chinonso Amadi

Posted on

Improving Compiler Performance with Profile Guided Optimization

Since the rise of compilers, building software has been an ever evolving journey. Developers have to navigate between writing clean and maintainable code and delivering high-performance applications. This has given rise to different performance tuning techniques and one of such is the Profile Guided Optimization(PGO) which is also known as the Feedback-Driven Optimization(FDO).

In this article we would look at what PGO means, the benefits it offers, and the different steps on how it can be implemented.

What is PGO?

Profile Guided Optimization is a compiler optimization technique that improves the final compiled binary of a piece of software by using profile data hints and compiling the code based on those profiles to improve the runtime performance.

At its core, this technique transcends the traditional confines of static analysis. Rather than relying solely on predefined rules to enhance code performance, It introduces an element of adaptability by leveraging insights from actual program execution. It's akin to having a seasoned guide direct you through a maze, showing you the most frequented paths and shortcuts.

Benefits of Implementing PGO

PGO’s brilliance lies in its ability to optimize code based on genuine usage patterns. As software architects, developers, and engineers, we craft programs with expectations of how they will be used. PGO aligns with this human-centric approach by observing and learning from the runtime behavior of our applications.

Consider this analogy: if writing code is akin to composing a symphony, then PGO is the dynamic conductor adjusting the tempo based on the audience's response. It's not merely about creating an optimized performance; it's about crafting an experience tailored to the nuances of how users interact with our software.

How does it Work?

Implementing PGO involves a series of well-defined steps, each contributing to the creation of highly optimized, performance-centric binaries as shown below:

Steps PGO undergoes

  1. Instrumentation:

The first thing we do is to instrument our code. Think of this as attaching a profiler to our application, allowing it to collect data on which parts of the code are traversed most frequently.

  1. Profile Execution:

At this point we run the instrumented binary on representative input datasets, allowing the program to collect profiling data that mirrors its actual runtime behavior.

  1. Recompile with Profile Data:
    Armed with the collected profile data, the code is recompiled. The compiler utilizes this invaluable information to guide its optimization decisions.

  2. Generate Optimized Binary:
    The compiler, now armed with runtime insights, applies targeted optimizations based on the collected profile data. The result is a finely tuned binary designed to excel in the specific scenarios observed during profiling.

Actual Code Implementation

At this point the profile-guided optimization technique will be implemented using the Go language. From golang version 1.21 and above it provides a pprof CPU profile which can be collected using the runtime/pprof and net/http/pprof packages which can be used to instrument our code.

Secondly, the standard approach is to store a pprof CPU profile with the filename default.pgo in the main package directory of the profiled binary. This profile is automatically detected by the go build command, enabling PGO optimizations during the build.

Let’s build a simple application and see how it works:

package main

import (
    "log"

    "net/http"

    _ "net/http/pprof"

    "os"

    jsoniter "github.com/json-iterator/go"
)

type Data struct {
    Quiz struct {
        Sport struct {
            Q1 struct {
                Question string `json:"question"`

                Options []string `json:"options"`

                Answer string `json:"answer"`
            } `json:"q1"`
        } `json:"sport"`

        Maths struct {
            Q1 struct {
                Question string `json:"question"`

                Options []string `json:"options"`

                Answer string `json:"answer"`
            } `json:"q1"`

            Q2 struct {
                Question string `json:"question"`

                Options []string `json:"options"`

                Answer string `json:"answer"`
            } `json:"q2"`
        } `json:"maths"`
    } `json:"quiz"`
}

func main() {

    http.HandleFunc("/", handler)

    log.Fatal(http.ListenAndServe(":8080", nil))

}

func handler(w http.ResponseWriter, r *http.Request) {

    var json = jsoniter.ConfigCompatibleWithStandardLibrary

    file, _ := os.ReadFile("./data.json")

    data := Data{}

    json.Unmarshal(file, &data)

    d, _ := json.Marshal(data)

    w.Write(d)

}

Enter fullscreen mode Exit fullscreen mode

It’s a simple endpoint that reads a .json file and unmarshal and marshalagain and then write it to response. The piece of code has also been instrumented with the net/http/pprof package to prepare it for PGO.

Then next we build the instrumented binary code:

go build -o nonPGOBinary
Enter fullscreen mode Exit fullscreen mode

Before we proceed, we want to test the performance of this binary, for this I would be using github.com/tsliwowicz/go-wrk to create traffic that sends requests to this endpoint.

Start the server

./nonPGOBinary
Enter fullscreen mode Exit fullscreen mode

Open another terminal and run:

 go-wrk -d 20 http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

The results are as follows:

Instrument Binary Result

Take note of how many requests it is able to serve without Optimization.

Next, we gather profile data from the server that is running:

curl -o default.pgo "http://localhost:8080/debug/pprof/profile?seconds=10"
Enter fullscreen mode Exit fullscreen mode

Now let’s recompile our code with the profile data that we have and then run the benchmark again and compare the results.

Stop the current server and build again:

go build -pgo=auto -gcflags -m -o PGOBinary main.go
Enter fullscreen mode Exit fullscreen mode

Run the Optimized Binary and Create Traffic:

./PGOBinary
Enter fullscreen mode Exit fullscreen mode
go-wrk -d 20 http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

The result:

PGO Optimized Result

Notice that there is an increase in the number of requests our server can handle by 5,000.

Conclusion

In this illustrative example, our exploration of Profile-Guided Optimization (PGO) included the crucial step of collecting a profile and subsequently rebuilding our server. It's essential to note that, in a real-world scenario, software development is an ongoing journey, marked by continuous enhancements and refinements.

In practice, the ability to collect a profile from a production environment running last week's code and seamlessly apply it to build with today's source code is a testament to the robust flexibility of PGO in Go. Unlike some optimization techniques that might stumble upon encountering code evolution, PGO gracefully handles such scenarios. Its adaptability shines, allowing developers to harness its runtime insights even in the midst of an ever-evolving codebase.

Happy coding! 🚀

Top comments (0)