DEV Community: Oleg Sydorov

The Subtleties of Vulnerability Scanning in Go Projects

Oleg Sydorov — Mon, 14 Jul 2025 23:01:07 +0000

Today, I want to talk about addressing vulnerabilities in our Go projects. As you may know, the standard tool for vulnerability checking is govulncheck, an official utility from the Go development team that leverages the Go Vulnerability Database. In addition to being used as a standalone tool, this utility is integrated into GoLand IDE starting from version 2023.1, enabling real-time scanning of the go.mod file. Of course, you can always run govulncheck manually as well.

At first glance, it seems straightforward — what could possibly go wrong? Imagine you finish working on your code, run govulncheck, and confidently commit your changes. Suddenly, your CI/CD pipeline fails to build or, worse, your IT security team escalates an issue. You’re puzzled: how is this possible?

The nuance here lies in the fact that govulncheck only checks and considers effective dependencies, i.e., those actually used by the code. In contrast, many scanners commonly integrated into pipelines — such as Trivy, Grype, or platforms like Sonatype IQ (Nexus Lifecycle) — often operate in a "paranoid" mode, scanning the entire dependency graph, including unused or transitive packages. This frequently leads to situations where seemingly clean code triggers multiple vulnerability alerts.

My code is clean!

How is it possible?

Let's look at the detailed report (Sonatype OSS index example)

Let's turn to our go.mod file.

module my-awesome-project

go 1.23.0

toolchain go1.23.9

require (
    github.com/aws/aws-cdk-go/awscdk/v2 v2.204.0
    github.com/aws/aws-cdk-go/awscdklambdagoalpha/v2 v2.204.0-alpha.0
    github.com/aws/aws-lambda-go v1.49.0
    github.com/aws/constructs-go/constructs/v10 v10.4.2
    github.com/aws/jsii-runtime-go v1.112.0
)

require (
    github.com/Masterminds/semver/v3 v3.4.0 // indirect
    github.com/cdklabs/awscdk-asset-awscli-go/awscliv1/v2 v2.2.244 // indirect
    github.com/cdklabs/awscdk-asset-node-proxy-agent-go/nodeproxyagentv6/v2 v2.1.0 // indirect
    github.com/cdklabs/cloud-assembly-schema-go/awscdkcloudassemblyschema/v44 v44.9.0 // indirect
    github.com/cdklabs/cloud-assembly-schema-go/awscdkcloudassemblyschema/v45 v45.2.0 // indirect
    github.com/fatih/color v1.18.0 // indirect
    github.com/mattn/go-colorable v0.1.14 // indirect
    github.com/mattn/go-isatty v0.0.20 // indirect
    github.com/yuin/goldmark v1.7.12 // indirect
    golang.org/x/lint v0.0.0-20241112194109-818c5a804067 // indirect
    golang.org/x/mod v0.26.0 // indirect
    golang.org/x/sync v0.16.0 // indirect
    golang.org/x/sys v0.34.0 // indirect
    golang.org/x/tools v0.35.0 // indirect
)

What do we see? Our project does not have the problems reported by the scanner! However, as soon as you follow go mod graph | grep ....some_bad_package command, the situation immediately changes: the secret becomes clear.

Let's look at the detailed report again. Let's take a closer look at the particular vulnerability that was discovered.

The scanner complains about version 2.2.240 and recommends replacing it with a version no lower than 2.2.242

Well, now let's turn to the go.sum file. Maybe the answer lies here?

go.mod (unfixed)

Excellent! So, the crime is solved.

But how can this be addressed? The solution is as follows: You need to use the replace directive to override vulnerable packages with safe versions directly in your go.mod file.

In our case (see the full list of the detected vulnerabilities above), we will apply to the go.mod file such a fix:

replace (
    github.com/cdklabs/awscdk-asset-awscli-go/awscliv1/v2 => github.com/cdklabs/awscdk-asset-awscli-go/awscliv1/v2 v2.2.244
    github.com/golang-jwt/jwt/v5 => github.com/golang-jwt/jwt/v5 v5.2.2
    golang.org/x/crypto => golang.org/x/crypto v0.38.0
    golang.org/x/net => golang.org/x/net v0.40.0
    golang.org/x/text => golang.org/x/text v0.25.0
)

I recommend exercising caution:

Ensure compatibility and correct versioning with your Go environment
Verify that your code still builds successfully after the change

After applying the fix, run go mod tidy and, if needed, go mod vendor to clean up and update your module dependencies accordingly.

The results were not long in coming:

go.mod (fixed)

Likewise, the rest of the packages such as crypto, text, net, etc. also have been fixed.

Thus, as we’ve seen, the problem runs deeper and its resolution cannot be fully achieved using only standard methods and tools.

Wishing you successful development and clean code!

GoCV + openCV: an approach to capture video

Oleg Sydorov — Mon, 23 Jun 2025 23:58:36 +0000

When considering the issue of capturing images from external devices, in general, such devices can be divided into 3 categories: web cameras, IP (network) cameras, video capture cards. The subject of this guide is web cameras.

As a result of analyzing the methods of interaction with cameras, we will highlight the open source project openCV. Using these libraries will allow us to work both natively on C\C++ and write wrapper programs using higher-level languages, Golang for example.

HELLO VIDEO

The code of the wrapper program can be surprisingly simple. Here is a didactic example (using gocv wrapper):

package main

import (
    "gocv.io/x/gocv"
)

func main() {
    webcam, _ := gocv.VideoCaptureDevice(0)
    window := gocv.NewWindow("Hello")
    img := gocv.NewMat()

    for {
        webcam.Read(&img)
        window.IMShow(img)
        window.WaitKey(1)
    }
}

The price of ease of writing code is the assembly and linking of openCV library. The matter is complicated by the fact that the ATM runs on MS Windows. Let's go through this path in detail from start to finish.

The price of ease of writing code is the assembly and linking of openCV library. The matter is complicated by the fact that some equipment sometimes runs on MS Windows =) Let's go through this path in detail from start to finish.

Download and install MSYS2:

https://www.msys2.org/

Launch MSYS2 MSYS (via Start) pacman -Syu

Restart MSYS2 and run:

pacman -Su

pacman -S mingw-w64-x86_64-gcc

$env:PATH += ";C:\msys64\mingw64\bin"

gcc --version

Here's a step-by-step guide on how to build OpenCV with "contrib" modules (including Aruco) under Windows:

Download opencv sources

Let's assume that the directory C:\sources is created

cd C:\sources

git clone https://github.com/opencv/opencv.git

// create the opencv directory

git clone https://github.com/opencv/opencv_contrib.git

// create the opencv_contrib directory

Install CMake Download the installer from the official website:

https://cmake.org/download/

During installation, check “Add CMake to system PATH” — so you can call cmake from the command line.

NB! It is possible to simply install the “Desktop development with C++” workload from the MS Visual Studio 2022 package, which will provide CMake

In MSYS2 (since you are already using C:\msys64\mingw64\bin) the mingw32-make utility is included in the mingw-w64-x86_64-make package. To install it:

Run the MSYS2 MinGW-64-bit (not the usual "MSYS2 MSYS"!) shell.

Update the package database and the system itself:
pacman -Syu

(if asked, restart the shell and repeat pacman -Syu again)

Install make:

pacman -S mingw-w64-x86_64-make

Check that it is now available in PATH:

where mingw32-make

Run make Let's say we installed сmake as a component of Visual Studio. To run CMake, run x64 Native Tools Command Promt for VS 2022

Download and install Python >= 3.6 https://www.python.org/downloads/windows/

Install numPy

py -m pip install --upgrade pip
py -m pip install numpy

Or optionally use make flag -D WITH_PYTHON=OFF

set "PATH=C:\msys64\mingw64\bin;%PATH%" //set mingw instead of cl.exe from MS VS

NB! MS VS compatible building is deprecated and needs other flags and components. MS VS and minGW libraries are mutually exclusive as well!

cmake -S C:/sources/opencv -B C:/build -G "MinGW Makefiles" -DCMAKE_BUILD_TYPE=Relecmake -S C:/sources/opencv -B C:/build -G "MinGW Makefiles" -DCMAKE_BUILD_TYPE=Release -DOPENCV_EXTRA_MODULES_PATH=C:/sources/opencv_contrib/modules -DCMAKE_INSTALL_PREFIX=C:/build/install -D BUILD_opencv_world=ON -DBUILD_SHARED_LIBS=ON -DBUILD_EXAMPLES=OFF

flag -D BUILD_opencv_world=ON enables assembly of libraries into 1 file. - optional, recommended!

Make && make install Assume we build in the C:\buld directory

c:\build> mingw32-make -j8
c:\build> mingw32-make install

Build Go GoCV wrapper executable executable

Power Shell:

$env:CGO_ENABLED = "1"  
$env:CGO_CFLAGS = "-IC:/build/install/include -IC:/build/install/include/opencv2"
$env:CGO_CXXFLAGS = "-IC:/build/install/include -IC:/build/install/include/opencv2"
$env:CGO_LDFLAGS = "-LC:/build/install/x64/mingw/lib -lopencv_world4110"

go build -x -tags customenv -ldflags="-H=windows" -o capture.exe main.go

The newest GoCV vewsion needs openCV 4.1.1, but you can build with a newer 4.1.2 version, just renaming library libopencv_world4120 => libopencv_world4110 (4.1.2 compiles better).

4.1.2 library builds better because some inaccuracies in the header declarations were fixed, that were causing some problems when building under Windows.

Enjoy!

External Merge Problem - Complete Guide for Gophers

Oleg Sydorov — Sat, 11 Jan 2025 21:54:16 +0000

The external sorting problem is a well-known topic in computer science courses and is often used as a teaching tool. However, it's rare to meet someone who has actually implemented a solution to this problem in code for a specific technical scenario, let alone tackled the required optimizations. Encountering this challenge during a hackathon inspired me to write this article.

So, here is the hackathon task:

You have a simple text file with IPv4 addresses. One line is one address, line by line:

145.67.23.4
8.34.5.23
89.54.3.124
89.54.3.124
3.45.71.5
...

The file is unlimited in size and can occupy tens and hundreds of gigabytes.

You should calculate the number of unique addresses in this file using as little memory and time as possible. There is a "naive" algorithm for solving this problem (read line by line, put lines into HashSet). It's better if your implementation is more complicated and faster than this naive algorithm.

A 120GB file with 8 billion lines was submitted for parsing.

There were no specific requirements regarding the speed of program execution. However, after quickly reviewing available information on the topic online, I concluded that an acceptable execution time for standard hardware (such as a home PC) would be approximately one hour or less.

For obvious reasons, the file cannot be read and processed in its entirety unless the system has at least 128GB of memory available. But is working with chunks and merging inevitable?

If you are not comfortable implementing an external merge, I suggest you first familiarize yourself with an alternative solution that is acceptable, although far from optimal.

Idea

Create a 2^32 bit bitmap. This is a uint64 array, since uint64 contains 64 bits.
For each IP:

Parse the string address into four octets: A.B.C.D.
Translate it into a number ipNum = (A << 24) | (B << 16) | (C << 8) | D.
Set the corresponding bit in the bitmap.

1. After reading all the addresses, run through the bitmap and count the number of set bits.

Pros:
Very fast uniqueness detection: setting the bit O(1), no need to check, just set it.

No overhead for hashing, sorting, etc.
Cons:
Huge memory consumption (512 MB for the full IPv4 space, without taking into account overhead).

If the file is huge, but smaller than the full IPv4 space, this can still be advantageous in terms of time, but not always reasonable in terms of memory.

package main

import (
    "bufio"
    "fmt"
    "os"
    "strconv"
    "strings"
    "math/bits"
)

//  Parse IP address "A.B.C.D"  => uint32 number
func ipToUint32(ipStr string) (uint32, error) {
    parts := strings.Split(ipStr, ".")
    if len(parts) != 4 {
        return 0, fmt.Errorf("invalid IP format")
    }

    var ipNum uint32
    for i := 0; i < 4; i++ {
        val, err := strconv.Atoi(parts[i])
        if err != nil || val < 0 || val > 255 {
            return 0, fmt.Errorf("invalid IP octet: %v", parts[i])
        }
        ipNum = (ipNum << 8) | uint32(val)
    }

    return ipNum, nil
}


func popcount64(x uint64) int {
    return bits.OnesCount64(x)
}

func main() {
    filePath := "ips.txt"

    file, err := os.Open(filePath)
    if err != nil {
        fmt.Printf("Error opening file: %v\n", err)
        return
    }
    defer file.Close()

    // IPv4 space size: 2^32 = 4,294,967,296
    // We need 2^32 bits, that is (2^32)/64 64-bit words
    totalBits := uint64(1) << 32       // 2^32
    arraySize := totalBits / 64        //how many uint64 do we need
    bitset := make([]uint64, arraySize)

    scanner := bufio.NewScanner(file)
    for scanner.Scan() {
        ipStr := scanner.Text()
        ipNum, err := ipToUint32(ipStr)
        if err != nil {
            fmt.Printf("Incorrect IP: %s\n", ipStr)
            continue
        }

        idx := ipNum / 64
        bit := ipNum % 64
        mask := uint64(1) << bit
        // Setting the bit
        bitset[idx] |= mask
    }

    if err := scanner.Err(); err != nil {
        fmt.Printf("Error reading file: %v\n", err)
        return
    }

    count := 0
    for _, val := range bitset {
        count += bits.OnesCount64(val)
    }

    fmt.Printf("Number of unique IP addresses: %d\n", count)
}

This approach is straightforward and reliable, making it a viable option when no alternatives are available. However, in a production environment—especially when aiming to achieve optimal performance—it's essential to develop a more efficient solution.

Thus, our approach involves chunking, internal merge sorting, and deduplication.

The Principle of Parallelization in External Sorting

Reading and transforming chunks:

The file is split into relatively small parts (chunks), say a few hundred megabytes or a few gigabytes. For each chunk:

A goroutine (or a pool of goroutines) is launched, which reads the chunk, parses the IP addresses into numbers and stores them in a temporary array in memory.
Then this array is sorted (for example, with the standard sort.Slice), and the result, after removing duplicates, is written to a temporary file.

Since each part can be processed independently, you can run several such handlers in parallel, if you have several CPU cores and sufficient disk bandwidth. This will allow you to use resources as efficiently as possible.

Merge sorted chunks (merge step):

Once all chunks are sorted and written to temporary files, you need to merge these sorted lists into a single sorted stream, removing duplicates:

Similar to the external sorting process, you can parallelize the merge by dividing multiple temporary files into groups, merging them in parallel and gradually reducing the number of files.
This leaves one large sorted and deduplicated output stream, from which you can calculate the total number of unique IPs.

Advantages of parallelization:

Use of multiple CPU cores:
Single-threaded sorting of a very large array can be slow, but if you have a multi-core processor, you can sort multiple chunks in parallel, speeding up the process several times.
Load balancing:

If the chunk sizes are chosen wisely, each chunk can be processed in approximately the same amount of time. If some chunks are larger/smaller or more complex, you can dynamically distribute their processing across different goroutines.

IO optimization:

Parallelization allows one chunk to be read while another is being sorted or written, reducing idle time.

Bottom Line

External sorting naturally lends itself to parallelization through file chunking. This approach enables the efficient use of multi-core processors and minimizes IO bottlenecks, resulting in significantly faster sorting and deduplication compared to a single-threaded approach. By distributing the workload effectively, you can achieve high performance even when dealing with massive datasets.

Important consideration:

While reading the file line by line, we can also count the total number of lines. During the process, we perform deduplication in two stages: first during chunking and then during merging. As a result, there’s no need to count the lines in the final output file. Instead, the total number of unique lines can be calculated as:

finalCount := totalLines - (DeletedInChunks + DeletedInMerge)

This approach avoids redundant operations and makes the computation more efficient by keeping track of deletions during each stage of deduplication. This saves us serval minutes.

Оne more thing:

Since any small performance gain matters on huge amounts of data, I suggest using a self-written accelerated analogue of strings.Slice()

func fastSplit(s string) []string {
    n := 1
    c := DelimiterByte

    for i := 0; i < len(s); i++ {
        if s[i] == c {
            n++
        }
    }

    out := make([]string, n)
    count := 0
    begin := 0
    length := len(s) - 1

    for i := 0; i <= length; i++ {
        if s[i] == c {
            out[count] = s[begin:i]
            count++
            begin = i + 1
        }
    }
    out[count] = s[begin : length+1]

    return out
}

Additionally, a worker template was adopted to manage parallel processing, with the number of threads being configurable. By default, the number of threads is set to runtime.NumCPU(), allowing the program to utilize all available CPU cores efficiently. This approach ensures optimal resource usage while also providing flexibility to adjust the number of threads based on the specific requirements or limitations of the environment.

Important Note: When using multithreading, it is crucial to protect shared data to prevent race conditions and ensure the correctness of the program. This can be achieved by using synchronization mechanisms such as mutexes, channels (in Go), or other concurrency-safe techniques, depending on the specific requirements of your implementation.

Summary so far

The implementation of these ideas resulted in code that, when executed on a Ryzen 7700 processor paired with an M.2 SSD, completed the task in approximately 40 minutes.

Considering compression.

The next consideration, based on the volume of data and hence the presence of significant disk operations, was the use of compression. The Brotli algorithm was chosen for compression. Its high compression ratio and efficient decompression make it a suitable choice for reducing disk IO overhead while maintaining good performance during intermediate storage and processing.

Here is the example of chunking with Brotli:

package main

import (
    "fmt"
    "github.com/andybalholm/brotli"
    "os"
    "sort"
)

func processChunk(ips []uint32, chunkIndex int) (string, error) {
    sort.Slice(ips, func(i, j int) bool { return ips[i] < ips[j] })
    ips = deduplicate(ips)

    outFileName := fmt.Sprintf("chunk_%d.tmp", chunkIndex)
    f, err := os.Create(path + outFileName)
    if err != nil {
        return "", err
    }
    defer chkClose(f)

    compressor := brotli.NewWriterLevel(f, 5)
    defer chkClose(compressor)

    for _, ip := range ips {
        fmt.Fprintf(compressor, "%d\n", ip)
    }

    chunkFiles = append(chunkFiles, outFileName)
    return outFileName, nil
}

Results of Using Compression

The effectiveness of compression is debatable and highly dependent on the conditions under which the solution is used. High compression reduces disk space usage but proportionally increases overall execution time. On slow HDDs, compression can provide a significant speed boost, as disk I/O becomes the bottleneck. Conversely, on fast SSDs, compression may lead to slower execution times.

In tests conducted on a system with M.2 SSDs, compression showed no performance improvement. As a result, I ultimately decided to forgo it. However, if you're willing to risk adding complexity to your code and potentially reducing its readability, you could implement compression as an optional feature, controlled by a configurable flag.

What to do next

In pursuit of further optimization, we turn our attention to the binary transformation of our solution. Once the text-based IP addresses are converted into numeric hashes, all subsequent operations can be performed in binary format.

func ipToUint32(ipStr string) (uint32, error) {
    parts := fastSplit(ipStr)
    if len(parts) != 4 {
        return 0, fmt.Errorf("invalid IP format")
    }

    var ipNum uint32
    for i := 0; i < 4; i++ {
        val, err := strconv.Atoi(parts[i])
        if err != nil || val < 0 || val > 255 {
            return 0, fmt.Errorf("invalid IP octet: %v", parts[i])
        }
        ipNum = (ipNum << 8) | uint32(val)
    }

    return ipNum, nil
}

Advantages of the Binary Format

Compactness:

Each number occupies a fixed size (e.g., uint32 = 4 bytes).
For 1 million IP addresses, the file size will be only ~4 MB.

Fast Processing:

There's no need to parse strings, which speeds up reading and writing operations.

Cross-Platform Compatibility:

By using a consistent byte order (either LittleEndian or BigEndian), files can be read across different platforms.

Conclusion
Storing data in binary format is a more efficient method for writing and reading numbers. For complete optimization, convert both the data writing and reading processes to binary format. Use binary.Write for writing and binary.Read for reading.

Here's what the processChunk function might look like to work with binary format:

package main

import (
    "encoding/binary"
    "fmt"
    "os"
    "sort"
)

func processChunk(ips []uint32, chunkIndex int) (string, error) {
    sort.Slice(ips, func(i, j int) bool { return ips[i] < ips[j] })

    ips = deduplicate(ips)

    outFileName := fmt.Sprintf("chunk_%d.tmp", chunkIndex)

    f, err := os.Create(path + outFileName)
    if err != nil {
        return "", err
    }
    defer f.Close()

    for _, ip := range ips {
        err := binary.Write(f, binary.LittleEndian, ip)
        if err != nil {
            return "", fmt.Errorf("failed to write binary data: %w", err)
        }
    }

    chunkFiles = append(chunkFiles, outFileName)

    return outFileName, nil
}

WTF?! It became much slower!!

In binary format it became slower to work. A file with 100 million lines (IP addresses) is processed in binary form in 4.5 minutes, against 25 seconds in text. With equal chunk size and number of workers. Why?

Working with Binary Format May Be Slower than Text Format
Using binary format can sometimes be slower than text format due to the specifics of how binary.Read and binary.Write operate, as well as potential inefficiencies in their implementation. Here are the main reasons why this might happen:

I/O Operations

Text Format:

Works with larger data blocks using bufio.Scanner, which is optimized for reading lines.
Reads entire lines and parses them, which can be more efficient for small conversion operations.

Binary Format:

binary.Read reads 4 bytes at a time, resulting in more frequent small I/O operations.
Frequent calls to binary.Read increase overhead from switching between user and system space.

Solution: Use a buffer to read multiple numbers at once.

func processChunk(ips []uint32, chunkIndex int) (string, error) {
    sort.Slice(ips, func(i, j int) bool { return ips[i] < ips[j] })

    ips = deduplicate(ips)

    outFileName := fmt.Sprintf("chunk_%d.tmp", chunkIndex)

    f, err := os.Create(path + outFileName)
    if err != nil {
        return "", err
    }
    defer f.Close()

    bw := bufio.NewWriter(f)

    for _, ip := range ips {
        err := binary.Write(bw, binary.LittleEndian, ip)
        if err != nil {
            return "", fmt.Errorf("failed to write binary data: %w", err)
        }
    }

    err = bw.Flush()
    if err != nil {
        return "", fmt.Errorf("failed to flush buffer: %w", err)
    }

    chunkFiles = append(chunkFiles, outFileName)

    return outFileName, nil
}

Why Does Buffering Improve Performance?

Fewer I/O Operations:
Instead of writing each number directly to disk, data is accumulated in a buffer and written in larger blocks.
Reduced Overhead:

Each disk write operation incurs overhead due to context switching between the process and the operating system. Buffering reduces the number of such calls.

We also present the code for binary multiphase merge:

func mergeTwoFiles(fileA, fileB, outFile string) error {
    defer cleanUpChunk(fileA, fileB)

    fa, err := os.Open(fileA)
    if err != nil {
        return err
    }
    defer fa.Close()

    fb, err := os.Open(fileB)
    if err != nil {
        return err
    }
    defer fb.Close()

    fOut, err := os.Create(outFile)
    if err != nil {
        return err
    }
    defer fOut.Close()

    const batchSize = 1024
    bufferA := make([]uint32, batchSize)
    bufferB := make([]uint32, batchSize)

    bw := bufio.NewWriter(fOut)
    defer bw.Flush()

    var indexA, sizeA, indexB, sizeB int
    var lastWritten uint32
    var hasLast bool

    readNextBatch := func(file *os.File, buffer []uint32) (int, error) {
        tempBuffer := make([]byte, len(buffer)*4)
        n, err := file.Read(tempBuffer)
        if err != nil && err != io.EOF {
            return 0, err
        }
        count := n / 4
        for i := 0; i < count; i++ {
            buffer[i] = binary.LittleEndian.Uint32(tempBuffer[i*4 : (i+1)*4])
        }
        return count, nil
    }

    sizeA, err = readNextBatch(fa, bufferA)
    if err != nil {
        return err
    }
    sizeB, err = readNextBatch(fb, bufferB)
    if err != nil {
        return err
    }

    for indexA < sizeA || indexB < sizeB {
        var valA, valB uint32
        hasA := indexA < sizeA
        hasB := indexB < sizeB

        if hasA {
            valA = bufferA[indexA]
        }
        if hasB {
            valB = bufferB[indexB]
        }

        if hasA && (!hasB || valA < valB) {
            if !hasLast || valA != lastWritten {
                binary.Write(bw, binary.LittleEndian, valA)
                lastWritten = valA
                hasLast = true
            }
            indexA++
            if indexA == sizeA {
                sizeA, err = readNextBatch(fa, bufferA)
                if err != nil && err != io.EOF {
                    return err
                }
                indexA = 0
            }
        } else if hasB && (!hasA || valB < valA) {
            if !hasLast || valB != lastWritten {
                binary.Write(bw, binary.LittleEndian, valB)
                lastWritten = valB
                hasLast = true
            }
            indexB++
            if indexB == sizeB { 
                sizeB, err = readNextBatch(fb, bufferB)
                if err != nil && err != io.EOF {
                    return err
                }
                indexB = 0
            }
        } else if hasA && hasB && valA == valB {
            if !hasLast || valA != lastWritten {
                binary.Write(bw, binary.LittleEndian, valA)
                lastWritten = valA
                hasLast = true
                atomic.AddUint64(&DeletedInMerge, 1)
            } else {
                atomic.AddUint64(&DeletedInMerge, 1)
            }
            indexA++
            indexB++
            if indexA == sizeA {
                sizeA, err = readNextBatch(fa, bufferA)
                if err != nil && err != io.EOF {
                    return err
                }
                indexA = 0
            }
            if indexB == sizeB {
                sizeB, err = readNextBatch(fb, bufferB)
                if err != nil && err != io.EOF {
                    return err
                }
                indexB = 0
            }
        }
    }

    return nil
}

The result is fantastic: 14 min for 110Gb file with 8 billion lines!

That's an outstanding result! Processing an 110 GB file with 8 billion lines in 14 minutes is indeed impressive. It demonstrates the power of:

Buffered I/O:

By processing large chunks of data in memory instead of line-by-line or value-by-value, you drastically reduce the number of I/O operations, which are often the bottleneck.

Optimized Binary Processing:

Switching to binary reading and writing minimizes parsing overhead, reduces the size of intermediate data, and improves memory efficiency.

Efficient Deduplication:

Using memory-efficient algorithms for deduplication and sorting ensures that CPU cycles are utilized effectively.

Parallelism:

Leveraging goroutines and channels to parallelize the workload across workers balances CPU and disk utilization.

Conclusion

Finally, here is the complete code for the final solution. Feel free to use it and adapt it to your needs!

External merge solution for Gophers

Good luck!

AWS Application load balancer logging: a true serverless approach with AWS Athena

Oleg Sydorov — Sat, 07 Dec 2024 17:50:17 +0000

Let's say we have an AWS Lambda stack + ALB + %something_else_useful% implemented. Very often, at this stage, we may encounter some unexpected Lambda or ALB errors. Also, we always have the general need to store access logs, etc. In this aspect, bearing in mind the concept of true-serverless, it seems most convenient to use the AWS Athena tool. How do I configure it?

AWS S3

We go to AWS S3 and create two new buckets. The first will be used to store ALB logs in text form (zip archives), and the other is to create physical space for the AWS Athena database. Zip archives can be used to work with your ALB logs in plain text format, but this definitely doesn't sound good.
Be sure to choose the AWS region corresponding to the region in which the ALB is located. Bucket name must be unique within the global namespace!

Other settings can be left default, but you need to make sure that encryption type = Server-side encryption with Amazon S3 managed keys(SSE-S3)

Set Tags according to the policy of your organization.

Bucket permissions

At this stage, it is necessary to configure the bucket in such a way as to give ALB the right to write logs, because "out of the box" it does not work. To do this, find an account that corresponds to your ALB, decide whether the bucket will use a prefix in the path on which the logs will be collected, and define the ARN of your bucket.
For example, let the ALB account be 123456789012, the prefix for storing logs be 'access'. We are looking for ARN s3 bucket:

Now, let's construct the full value of the resource: if ARN = arn:aws:s3:::alb-websrvatms1, and path = access and account = 777777777777, the resulting resource will be arn:aws:s3:::albwebsrvatms1/access/AWSLogs/777777777777/*
The formula is: arn:aws:s3:::{mybucket-name}/{prefix}/AWSLogs/{accountId}/*
As the result, we get the following policy

{
  "Version": "2012-10-17",
  "Id": "Policy1708481615785",
  "Statement": [
    {
      "Sid": "Stmt1708481607341",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:root"
      },
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::alb-websrvatms1/access/AWSLogs/777777777777/*"
    }
  ]
}

Let's apply the policy in AWS S3 → buckets → my-bucket → Permissions
Also, you can use the auxiliary tool: AWS Policy Generator

In addition, there is another variant of the Principal like "Principal": { "Service": "logdelivery.elasticloadbalancing.amazonaws.com" }.

Do not hesitate to make your own experiments!

Setting up the ALB

Now that we've dealt with the storage, we need to configure the balancer. Go to: EC2 → Load balancers → my-alb → Attributes. We turn on the necessary ones logs and set an optional prefix:

A closing slash is not allowed.

AWS Athena

Let's deal with Athena. First, we need to create a database. Amazon Athena → Query editor
We execute a request to create a DB using the newly created S3 bucket:

CREATE DATABASE IF NOT EXISTS testdb COMMENT 'test DB' LOCATION 's3://aws-athena1/DB/' WITH DBPROPERTIES ('creator'='Oleg Sydorov');

Then, let's execute the request to create the alb_logs table:

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code int,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
lambda_error_reason string,
target_port_list string,
target_status_code_list string,
classification string,
classification_reason string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]
*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]
*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)
\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
LOCATION 's3://alb-websrvatms1/access/AWSLogs/777777777777/elasticloadbalancing/eu-central-1/'

The name of the table alb_logs as well as the location LOCATION 's3://alb-websrvatms1/access/AWSLogs/777777777777/elasticloadbalancing/eu-central-1/' must be adapted to current paths and names.

No additional configuration of rights is required.

Conclusion

Now that everything is configured, you can get the necessary data using a simple SELECT (SQL-like) syntax:

SELECT * FROM alb_logs WHERE time > '2024-02-22' ORDER BY request_creation_time desc limit 10;

Congratulations, it works! Be creative and feel free to perform your own investigations.

Good luck!

AWS CDK and security group configuration (Golang)

Oleg Sydorov — Wed, 24 Jul 2024 02:28:39 +0000

It's obvious that all cloud software is primarily network software. Therefore, when working with software deployment in the AWS environment, including AWS Lambda, we deal with security groups. If we do not explicitly define and configure a security group in the CDK code, one will be implicitly created, for example, when creating an ALB or AWS API Gateway. Such a security group, created implicitly by the CDK tool, will be configured as "any to any," which is often unacceptable in a production environment. Staying within the concept of CICD, the solution is to explicitly configure the security group in the code.

Let's consider this action with specific examples.

Assume we have a VPC that is either created or imported. We will explicitly create a security group:

sg := awsec2.NewSecurityGroup(stack, jsii.String("MyAwsLambdaSG"), &awsec2.SecurityGroupProps{
        Vpc:               vpc,
        AllowAllOutbound:  jsii.Bool(false),
        SecurityGroupName: jsii.String("MyAwsLambdaSG"),
    })

// Allow incoming HTTPS to ALB
sg.AddIngressRule(awsec2.Peer_AnyIpv4(), awsec2.Port_Tcp(jsii.Number(443)), jsii.String("Allow HTTPS traffic"), jsii.Bool(false))

// Allow outbound needed traffic
sg.AddEgressRule(awsec2.Peer_Ipv4(jsii.String("x.x.x.x/x")), awsec2.Port_AllTcp(), jsii.String("Allow outbound TCP traffic"), jsii.Bool(false))

Next, add the created security group to the Lambda configuration:

lambdaFunction := awscdklambdagoalpha.NewGoFunction(stack, jsii.String("MyAWSLambda"), &awscdklambdagoalpha.GoFunctionProps{
        Runtime:                      awslambda.Runtime_PROVIDED_AL2(),
        Entry:                        jsii.String("./src"),
        Architecture:                 awslambda.Architecture_ARM_64(),
        ReservedConcurrentExecutions: jsii.Number(10),
        Bundling: &awscdklambdagoalpha.BundlingOptions{
            GoBuildFlags: jsii.Strings(`-ldflags "-s -w" -mod=vendor`),
        },
        Environment:          nil,
        FunctionName:         jsii.String("MyAWSLambda"),
        Description:          jsii.String("MyAWSLambda backend Lambda function"),
        Role:                 applicationRole,
        Vpc:                  vpc,
        SecurityGroups:       &[]awsec2.ISecurityGroup{sg}, // NB!
        MemorySize:           jsii.Number(128),
        EphemeralStorageSize: awscdk.Size_Mebibytes(jsii.Number(512)),
    })

Important!

The permissions set in a security group work within the context of the given Lambda and do not affect the configuration of the ALB, endpoints, or API Gateway. For instance, ANY to TCP 443 does not mean any HTTPS access from outside if the access point is configured as PRIVATE, etc. This is a separate area of responsibility and configuration.

Some difficulties may arise when configuring with cloud resources like AWS DynamoDB. To ensure access to such resources, it is convenient to use the Prefix List ID. Follow the instructions to find the ID.

The console can be used as well:
aws ec2 describe-managed-prefix-lists --query "PrefixLists[?PrefixListName=='com.amazonaws.<region>.dynamodb'].PrefixListId"

Next, let's move on to the code.

Let's configure the endpoint to access dynamoDB:

vpc.AddGatewayEndpoint(jsii.String("DynamoDbEndpoint"), &awsec2.GatewayVpcEndpointOptions{
        Service: awsec2.GatewayVpcEndpointAwsService_DYNAMODB(),
        Subnets: &[]*awsec2.SubnetSelection{
            {SubnetType: awsec2.SubnetType_PRIVATE_WITH_EGRESS},
        },
    })

sg := awsec2.NewSecurityGroup(stack, jsii.String("MyAWSLambdaSG"), &awsec2.SecurityGroupProps{
        Vpc:               vpc,
        AllowAllOutbound:  jsii.Bool(false),
        SecurityGroupName: jsii.String("MyAWSLambdaSG"),
    })

// Add a rule for dynamoDB

dynamoDbPrefixListID := "pl-12345xyz"
sg.AddEgressRule(awsec2.Peer_PrefixList(jsii.String(dynamoDbPrefixListID)), awsec2.Port_Tcp(jsii.Number(443)), jsii.String("Allow outbound dynamoDB"), jsii.Bool(false))

Note that the rule 0.0.0.0/0 443 will also work, but this solution does not seem to be the best.

When finished, here is the result:

It works... Good luck!

CICD deployment in conditions of limited access (Go language)

Oleg Sydorov — Thu, 15 Feb 2024 21:40:28 +0000

In an environment with increased requirements for information security, a situation is possible when the CICD runner does not have access to external resources: the Docker image repository (for example, public.ecr.aws), external dependency packages, etc. In such conditions, I suggest the following actions (examples of the Go language):

Image problem:

We create a Docker image that meets our requirements (let it be minimal, golang 1.21 + AWS CDK based on Alpine):

FROM golang:1.21-alpine

RUN apk update && apk add --update --no-cache \
git \
bash \
nodejs \
npm
RUN npm update -g
# Install AWSCDK
RUN npm install -g
RUN cdk --version
ENTRYPOINT []

Build your image:
docker build --network host -t go-cdk-image .

Let's look at the result, and get the ID of the image:

docker images
REPOSITORY TAG IMAGE
ID CREATED SIZE
go-cdk-image latest f93a28e7b39a 2 hours ago 324MB

Adding a tag:
docker tag f93a28e7b39a 1234567890.dkr.ecr.eu-central-1.amazonaws.com/golang-cdk-image:latest

Add credentials, if not created (File: ./aws/config):

[profile mydevprofile]
sso_start_url = https://my-auth.awsapps.com/start
sso_region = eu-west-1
sso_account_id = 1234567890
sso_role_name = infosec-full
region = eu-central-1
output = json

Logging in to the AWS elastic container registry:

aws ecr get-login-password --region eu-central-1 --profile mydevprofile | docker login --username AWS --passwordstdin
1234567890.dkr.ecr.eu-central-1.amazonaws.com

Pushing the image:
docker push 1234567890.dkr.ecr.eu-central-1.amazonaws.com/golang-cdk-image:latest

Thus, at the moment, this Docker image is available within the corporate AWS in conditions of blocked internet. It is possible to use it in the pipeline by adding to .gitlab-ci.yml
image: 989668876111.dkr.ecr.eu-central-1.amazonaws.com/golang-cdk-image:latest

For cross-account use, you need to configure additional access rights to the downloaded image.

Working with dependencies

To work with dependencies, we will use the Golang module vendoring mechanism.
In the CICD config .gitlab-ci.yml we add (in the script section):

script:
 - export GO111MODULE=on

Also, we can add here the GOFLAGS=-mod=vendor option, but we can do this in the code as well.

lambdaFunction := awscdklambdagoalpha.NewGoFunction(stack, jsii.String("BackendLambda"),
        &awscdklambdagoalpha.GoFunctionProps{
            // Fill all the necessary attributes here...
            Bundling: &awscdklambdagoalpha.BundlingOptions{
                GoBuildFlags: jsii.Strings(`-ldflags "-s -w" -mod=vendor`), //see here
            },
        })

In the root folder of the project we should execute:

go mod init my-progect-name
go mod tidy
go mod vendor

The go.mod and go.sum files will be created, as well as the vendor directory with all dependencies. These artifacts should also be included in the git commit.

Nota bene!

It seems that everything is ready, but there is one important point related to the options of the AWS CDK Golang itself. If we look at the code of the cdk.json file, which the CDK created automatically, we see the following code:

{
  "app": "go mod download && go run backend-lambda2.go", // see here!
  "watch": {
    "include": [
      "**"
    ],
    "exclude": [
      "README.md",
      "cdk*.json",
      "go.mod",
      "go.sum",
      "**/*test.go"
    ]
  }
}

The code go mod download && go run backend-lambda2.go destroys the scheme and forces work with the internet.
That is, our last action is to change the code to go run backend-lambda2.go, excluding go mod download.
Now, when performing a git push, our pipeline will work even in conditions of limited access.

Congrats, it works!

AWS Lambda, CDK, and CICD for Gophers

Oleg Sydorov — Mon, 22 Jan 2024 15:26:13 +0000

I’d like to speak about AWS Lambdas and the best ways for Gofers to deal with the topic.
Firstly, let's define the stack. Let it be an AWS Lambda with an Application Load Balancer (ALB) as a trigger and some DynamoDB resources as a database. Why ALB and not API Gateway? The answer is simple: the AWS API Gateway is not always appropriate. For example, if you need to limit access to your application to the enterprise’s internal network, it is better to set up ALB. In addition, setting up a gateway is simpler and takes the intrigue out of the narrative! Now, I'm good back here if you all care to continue. For the beginning, let’s create a simple «Hello, world» template for our AWS Lambda.

package main

import (
    "github.com/aws/aws-lambda-go/events"
    "github.com/aws/aws-lambda-go/lambda"
    "net/http"
)

func router(req events.ALBTargetGroupRequest) (events.ALBTargetGroupResponse, error) {
    switch req.HTTPMethod {
    case "GET":
        return show(req)
    case "POST":
        return show(req)
    default:
        return clientError(http.StatusMethodNotAllowed)
    }
}

func show(req events.ALBTargetGroupRequest) (events.ALBTargetGroupResponse, error) {
    m := make(map[string]string)
    m["content-type"] = "text/plain"
    m["x-lambda-response"] = "true"

    return events.ALBTargetGroupResponse{
        StatusCode:      http.StatusOK,
        Body:            "Hello, world!",
        IsBase64Encoded: false,
        Headers:         m,
    }, nil
}

func clientError(status int) (events.ALBTargetGroupResponse, error) {
    return events.ALBTargetGroupResponse{
        StatusCode: status,
        Body:       http.StatusText(status),
    }, nil
}

func main() {
    lambda.Start(router)
}

Well done! Now we can build our executable and upload it to our AWS account using the web interface.
I suggest creating the Arm64 build for cost-cutting reasons.

GOARCH=arm64 GOOS=linux go build -tags lambda.norpc -o bootstrap -ldflags "-s -w" main.go && zip -5 bootstrap.zip bootstrap

Note that we use the modern Amazon Linux 2023 runtime instead of Go 1.x, which is deprecated. Pay attention to the fact that the default name of your executable must be «bootstrap».

All this sounds good, but our purpose is to automate the deployment! Thus, let’s take a look at the AWS Cloud Development Kit (CDK). Luckily, it supports Golang.
Firstly, install awscli and aws-cdk with their auxiliary dependencies. Do this in a way that is convenient for you, according to the operating system. Let's imagine that we are using Ubuntu.

sudo apt install awscli
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh
source ~/.bashrc
nvm list-remote
nvm install lts/hydrogen //or your preferred version
node - -version
npm install -g aws-cdk
cdk version

Now, go to the folder with your project and run cdk init app --language go. Cdk init will create all the files needed for your deployment, including two *.go files. The first of them is for your CDK code; another one (named *test.go) is for tests. So, the tool assumes that you might want to add a test stage to your pipeline. The particular names of the files depend on your directory’s name. Let’s think that your go code is in a dedicated src/ subfolder.
Now, we can start with Golang CDK. Let the code, with some appropriate comments, speak up!

package main

import (
    "github.com/aws/aws-cdk-go/awscdk/v2"
    "github.com/aws/aws-cdk-go/awscdk/v2/awsdynamodb"
    "github.com/aws/aws-cdk-go/awscdk/v2/awsec2"
    "github.com/aws/aws-cdk-go/awscdk/v2/awselasticloadbalancingv2"
    "github.com/aws/aws-cdk-go/awscdk/v2/awselasticloadbalancingv2targets"
    "github.com/aws/aws-cdk-go/awscdk/v2/awsiam"
    "github.com/aws/aws-cdk-go/awscdk/v2/awslambda"
    "github.com/aws/aws-cdk-go/awscdk/v2/awslogs"
    "github.com/aws/aws-cdk-go/awscdklambdagoalpha/v2"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/awserr"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/dynamodb"
    "github.com/aws/constructs-go/constructs/v10"
    "github.com/aws/jsii-runtime-go"
    "log"
    "os"
)

type LambdaDynamodbAlbStackProps struct {
    awscdk.StackProps
}

func NewLambdaDynamodbAlbStack(scope constructs.Construct, id string, props *LambdaDynamodbAlbStackProps) awscdk.Stack {

    tags := make(map[string]string)
    tags["stackType"] = "Backend Lambda stack"
    tags["CMDBENV"] = "PROD"
    // all your custom tags here

    awstTag := aws.StringMap(tags)

    awsRegion := os.Getenv("AWS_REGION")
    awsAccount := os.Getenv("AWS_ACCOUNT")
    // do not hardcode your settings. Get it from your CICD pipeline
    // otherwise, if you do not use CICD, define your region and account explicitly

    sprops := &awscdk.StackProps{Description: jsii.String("Backend Lambda stack"),
        Tags: &awstTag,
        Env: &awscdk.Environment{
            Region:  jsii.String(awsRegion),
            Account: jsii.String(awsAccount),
        },
        CrossRegionReferences: jsii.Bool(true),
    }

    stack := awscdk.NewStack(scope, &id, sprops)
    // declare a stack

    // create AmazonDynamoDBFullAccess role
    applicationRole := awsiam.NewRole(stack, aws.String("BLDynamoDBFullAccessRole"), &awsiam.RoleProps{
        AssumedBy: awsiam.NewServicePrincipal(aws.String("lambda.amazonaws.com"), &awsiam.ServicePrincipalOpts{}),
        RoleName:  jsii.String("BLDynamoDBFullAccessRole"),
        ManagedPolicies: &[]awsiam.IManagedPolicy{
            awsiam.ManagedPolicy_FromManagedPolicyArn(stack, aws.String("AmazonDynamoDBFullAccess"), aws.String("arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess")),
        },
    })

    // add policies
    applicationRole.AddToPolicy(awsiam.NewPolicyStatement(&awsiam.PolicyStatementProps{Actions: jsii.Strings("ec2:CreateNetworkInterface",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DeleteNetworkInterface",                                       // needed for VPC creation
        "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", // needed for logs
    ),
        Resources: jsii.Strings("*"),
    }))

    vpcID := "vpc-000000000000001"
    subnets := []*string{jsii.String("subnet-0000000000001"), jsii.String("subnet-0000000000002"), jsii.String("subnet-0000000000003")}
    sub0 := awsec2.Subnet_FromSubnetId(stack, jsii.String("subnet0"), subnets[0])
    sub1 := awsec2.Subnet_FromSubnetId(stack, jsii.String("subnet1"), subnets[1])
    sub2 := awsec2.Subnet_FromSubnetId(stack, jsii.String("subnet2"), subnets[2])
    zones := []*string{jsii.String("ca-central-1a"), jsii.String("ca-central-1b"), jsii.String("ca-central-1c")}
    // you need to work with a VPC. Let’s think that your organization has several already configured VPCs. We will attach one to the project.
    // also, you need to describe zones. This information you can see via web interface in the EC2 configuration section.

    vpc := awsec2.Vpc_FromVpcAttributes(stack, jsii.String(vpcID), &awsec2.VpcAttributes{
        AvailabilityZones: &zones,
        VpcId:             jsii.String(vpcID),
        PrivateSubnetIds:  &subnets,
    })
    // Nota bene! You can use awsec2.Vpc_FromLookup() instead, but this will fail if your organization’s VPC has more than 16 attached subnets, even if you use less in your CDK script! You can see the certain amount via web interface.

    // create lambda function with previously created AmazonDynamoDBFullAccess role
    lambdaFunction := awscdklambdagoalpha.NewGoFunction(stack, jsii.String("BackendLambda"), &awscdklambdagoalpha.GoFunctionProps{
        Runtime:      awslambda.Runtime_PROVIDED_AL2(),
        Entry:        jsii.String("./src"),
        Architecture: awslambda.Architecture_ARM_64(),
        Bundling: &awscdklambdagoalpha.BundlingOptions{
            GoBuildFlags: jsii.Strings(`-ldflags "-s -w"`),
        },
        FunctionName:         jsii.String("BackendLambda"),
        Description:          jsii.String("Backend Lambda function"),
        Role:                 applicationRole,
        Vpc:                  vpc,
        MemorySize:           jsii.Number(128),
        EphemeralStorageSize: awscdk.Size_Mebibytes(jsii.Number(512)),
    })

    var table awsdynamodb.Table

    dynamodbTableName := "BackendDynamoDB"
    tableEx, err := tableExists(dynamodbTableName)
    if err != nil {
        log.Fatalln("CDK script error:", err)
    }

    // once created, your dynamodb table cannot be re-created via CDK. So, you may get the «table already exists» error. You have to drop table manually when needed, or use this check:

    if !tableEx {
        // create DynamoDB table
        table = awsdynamodb.NewTable(stack, jsii.String(dynamodbTableName), &awsdynamodb.TableProps{
            BillingMode: awsdynamodb.BillingMode_PROVISIONED,
            TableName:   jsii.String(dynamodbTableName),

            PartitionKey: &awsdynamodb.Attribute{
                Name: aws.String("LockID"),
                Type: awsdynamodb.AttributeType_STRING,
            },
            SortKey:             &awsdynamodb.Attribute{Name: aws.String("MyAttribute"), Type: awsdynamodb.AttributeType_STRING},
            TimeToLiveAttribute: jsii.String("TTL"),
        })

        table.GrantReadWriteData(lambdaFunction)
    } else {
        log.Println("Dynamodb table exists")
    }

    awslogs.NewLogGroup(stack, jsii.String("BackendLambdaLogGroup"), &awslogs.LogGroupProps{
        Retention:    awslogs.RetentionDays_TWO_WEEKS, // Adjust retention as needed
        LogGroupName: jsii.String("/aws/lambda/" + *lambdaFunction.FunctionName()),
    })

    // work with the target group
    targetGroup := awselasticloadbalancingv2.NewApplicationTargetGroup(stack, jsii.String("BackendTargetGroup"), &awselasticloadbalancingv2.ApplicationTargetGroupProps{
        TargetGroupName: jsii.String("BLTargetGroup"),
        TargetType:      awselasticloadbalancingv2.TargetType_LAMBDA,
        Vpc:             vpc,
        Targets:         &[]awselasticloadbalancingv2.IApplicationLoadBalancerTarget{awselasticloadbalancingv2targets.NewLambdaTarget(lambdaFunction)},
    })

    // Create the Application Load Balancer
    alb := awselasticloadbalancingv2.NewApplicationLoadBalancer(stack, jsii.String("BackendALB"), &awselasticloadbalancingv2.ApplicationLoadBalancerProps{
        Vpc:            vpc,
        InternetFacing: jsii.Bool(false),
        VpcSubnets: &awsec2.SubnetSelection{
            Subnets:    &[]awsec2.ISubnet{sub0, sub1, sub2},
            SubnetType: "",
        },
    })

    // attach some existent SSL certificate
    certArn := "arn:aws:acm:ca-central-1:0000000000:certificate/00000000-0000-0000-0000-00000000"
    // listen HTTPS 443
    alb.AddListener(jsii.String("BackendHttpsListener"), &awselasticloadbalancingv2.BaseApplicationListenerProps{
        Certificates:        &[]awselasticloadbalancingv2.IListenerCertificate{awselasticloadbalancingv2.NewListenerCertificate(jsii.String(certArn))},
        DefaultTargetGroups: &[]awselasticloadbalancingv2.IApplicationTargetGroup{targetGroup},
        Port:                jsii.Number(443),
        Protocol:            awselasticloadbalancingv2.ApplicationProtocol_HTTPS,
    })

    targetGroupArn := targetGroup.TargetGroupArn()

    awscdk.NewCfnOutput(stack, jsii.String("Target group ARN"), &awscdk.CfnOutputProps{
        Value:       targetGroupArn,
        Description: jsii.String("Target group ARN"),
    })

    servicePrincipalOpts := awsiam.ServicePrincipalOpts{
        Conditions: &map[string]interface{}{
            "ArnLike": map[string]*string{
                "aws:SourceArn": targetGroupArn,
            },
        },
    }
    principal := awsiam.NewServicePrincipal(jsii.String("elasticloadbalancing.amazonaws.com"), &servicePrincipalOpts)
    lambdaFunction.AddPermission(jsii.String("LambdaInvoke"), &awslambda.Permission{
        Principal: principal,
    })

    // log lambda function ARN
    awscdk.NewCfnOutput(stack, jsii.String("lambdaFunctionArn"), &awscdk.CfnOutputProps{
        Value:       lambdaFunction.FunctionArn(),
        Description: jsii.String("Lambda function ARN"),
    })

    // log ALB  DNS name
    awscdk.NewCfnOutput(stack, jsii.String("ALB DNS Name"), &awscdk.CfnOutputProps{
        Value:       alb.LoadBalancerDnsName(),
        Description: jsii.String("ALB DNS Name"),
    })

    return stack
}

func main() {
    app := awscdk.NewApp(nil)

    NewLambdaDynamodbAlbStack(app, "LambdaDynamodbAlbStack", &LambdaDynamodbAlbStackProps{
        awscdk.StackProps{
            Env: env(),
        },
    })

    app.Synth(nil)
}

func env() *awscdk.Environment {
    return nil
}

func tableExists(tableName string) (bool, error) {
    sess := session.Must(session.NewSessionWithOptions(session.Options{
        SharedConfigState: session.SharedConfigEnable,
    }))

    svc := dynamodb.New(sess)
    input := &dynamodb.DescribeTableInput{
        TableName: aws.String(tableName),
    }

    _, err := svc.DescribeTable(input)
    if err != nil {
        if awsErr, ok := err.(awserr.Error); ok {
            if awsErr.Code() == dynamodb.ErrCodeResourceNotFoundException {
                // Table doesn't exist
                return false, nil
            }
        }
        // Other error occurred
        return false, err
    }

    // Table exists
    return true, nil
}

Congrats! You’ve done it! Now, your project is ready to be published using the CDK deploy command. But one more thing: Your AWS account, of course. The credentials and config file are updated when you run the command aws configure. The credentials file is located at ~/.aws/credentials on Linux or macOS, or at C:\Users\USERNAME.aws\credentials on Windows. Surely, you can edit the config file manually. Also, you might want (or your organization forces you) to use the AWS SSO login approach. For this, run the aws configure sso command or edit your config file. Please take a look at the example config (SSO)

[profile myuser-it-account-dev]
 sso_start_url = https://yourcompany-auth.awsapps.com/start
 sso_region = ca-cantral-1
 sso_account_id = 777777777777
 sso_role_name = it-developer
 region = ca-central-1
 output = json

Edit this for your purposes and add to your ~/.aws/config
When ready, run cdk deploy at once. But if you use SSO, authorize yourself like this: aws sso login --profile myuser-it-account-dev. This redirects you to the page for the authorization. Then, in the same terminal session, run cdk deploy --profile myuser-it-account-dev. Read more official manuals about AWS SSO and its’ pros and cons for organizations.
Well done! But it is time to remember our CDCI plans! We will use GitLab in our example. The main condition is to make sure that the runner is installed for the project. GitLab Runner is an application that works with GitLab CI/CD to run jobs in a pipeline. The issue of configuring the runner is beyond the scope of this article. The reason for this is that the topic is pretty wide itself, and, on the other hand, the issue is more in the field of DevOps but not pure devs. One of the important things on the developer’s side is to configure the CICD token for the runner. To do this, go to the Settings -> CICD -> runners section. Here, you can work with your registration information.
Now, it’s time to create the file named.gitlab-ci.yml. This is a conventional name for the YAML file, which describes the pipeline.
Let the code speak for itself:

stages:
  - preparation
  - deploy

retrieve_temp_credentials:
  stage: preparation
  script:
    - >
      printf "AWS_ACCESS_KEY_ID=%s\nAWS_SECRET_ACCESS_KEY=%s\nAWS_SESSION_TOKEN=%s"
      $(aws sts assume-role
      --role-arn ${AWS_ROLE_PROD}
      --role-session-name "GitLabRunner-${CI_PROJECT_ID}-${CI_PIPELINE_ID}"
      --duration-seconds 3600
      --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]'
      --output text) >> assume_role.env

  tags:
    - PROD
    - it-dep
    - backend-eks-runner

  artifacts:
    reports:
      dotenv: assume_role.env
  image: public.ecr.aws/aws-cli/aws-cli:latest

deploy_using_cdk:
  stage: deploy
  script:
    - echo $AWS_ACCESS_KEY_ID
    - go mod download
    - ls -la
    - npm update -g
    - cdk deploy --require-approval ${CDK_APPROVAL_LEVEL}
  tags:
    - PROD
    - it-dep
    - backend-eks-runner
  needs:
    - job: retrieve_temp_credentials
      artifacts: true

  image: public.ecr.aws/evergen-co/cdk-go-pipeline:latest

Let me briefly comment on this script. The script uses the assume-role approach. Why? The assume role method allows us to control privileges using IAM and does not care about user credentials storage or credentials rotation in our CICD settings. To use this approach, you have to go to the IAM division of the AWS console and create a dedicated role for your pipeline. Copy it’s ARN and then save it as a variable (let it be AWS_ROLE_PROD for the production environment) in the Gitlab CICD settings: Settings -> CICD -> variables. The pipeline will use this ARN for the assumed role process. Our YAML script consists of two sequential jobs: preparation (retrieve_temp_credentials) and deployment (deploy_using_cdk). So, retrieve_temp_credentials. Here we call the assume role procedure and retrieve the temporary credentials. The public.ecr.aws/aws-cli/aws-cli:latest Docker image is used to ensure the latest AWS cli software is on board. The information is received as plain text and then parsed into three variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN. The variables have then been saved into a physical file, assume_role.env. Do not try to avoid this! The variables are not mutual for all the stages and cannot be shared between them without being stored physically. Also, you can work with JSON format if it seems more convenient for you. Here is some example code:

export TEMP_CREDENTIALS=$(aws sts assume-role
      --role-arn ${AWS_ROLE}
      --role-session-name "GitLabCI"
      --duration-seconds 900
      --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]'
      --output json)
    - export AWS_ACCESS_KEY_ID=$(echo $TEMP_CREDENTIALS | jq -r .[0])
    - export AWS_SECRET_ACCESS_KEY=$(echo $TEMP_CREDENTIALS | jq -r .[1])
    - export AWS_SESSION_TOKEN=$(echo $TEMP_CREDENTIALS | jq -r .[2])

Surely, in this case, you have to care about how to install the jq package in your Docker image.
Now, let’s go forth. deploy_using_cdk stage uses the public.ecr.aws/evergen-co/cdk-go-pipeline:latest docker image. It contains both Golang and CDK packages needed for our purpose. The go mod download command fulfills the repo with the necessary dependencies. Otherwise, you can not store your go mod files in the repo and run go mod init and go mod tidy within the pipeline instead. npm update -g ensures that the CDK package is up-to-date. The command cdk deploy --require-approval ${CDK_APPROVAL_LEVEL} needs some comments. The CDK process needs some manual approvals to continue when a significant change is detected by the tool. By deploying in the terminal manually, you have the ability to press Y or N yourself. Using automation, this behavior is unwanted. So, I recommend creating the variable CDK_APPROVAL_LEVEL in the CICD settings and defining the needed behavior. For «no questions» behavior, set the «never» value. You can change this for some debug purposes later.
My congratulations! Well done!
Finally, we have to speak about the AWS IAM role. We assume that all we need is Lambda, DynamoDB, and ALB. Also, we use a ready VPC and SSL certificate from your organization (if not, ask DevOps to configure it or just do it yourself).

Here are the needed permissions: All the policies except AWS_CDK_CloudFormation_Lambda are AWS built-in presets. Hence, simply add them to your role. The AWS_CDK_CloudFormation_Lambda role needs some explanation. Go to the policies section of the IAM settings panel and create a new policy. You can set the AWS_CDK_CloudFormation_Lambda name or any other you prefer. Add such permissions to the policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AWSCDK0",
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::777777777777:role/*"
    },
    {
      "Sid": "AWSCDK1",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeVpnGateways"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AWSCDK2",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:GetBucketLocation",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::cdk-xxxxxxxx-assets-777777777777-ca-central-1",
        "arn:aws:s3:::cdk-xxxxxxxx-assets-777777777777-ca-central-1/*"
      ]
    }
  ]
}

What is arn:aws:iam::777777777777:role/* ? This is NOT the role for assume role, but your current role you have logged in = AWS role in your CDK. See here:

Env: &awscdk.Environment{
            Region:  jsii.String(awsRegion),
            Account: jsii.String(awsAccount),
        },

I recommend passing the role and region from CICD settings (variables section). In my case, I have created the AWS_ACCOUNT and AWS_REGION roles accordingly. This allows me to make changes without needing to edit the code itself.
What is arn:aws:s3:::cdk-xxxxxxxx-assets-777777777777-ca-central-1? When a DevOps configures a runner, he has to create an appropriate AWS CDK toolkit role and an associated S3 bucket for asset building purposes. The role above is what I am speaking about. In your case, you have to make an investigation or ask your DevOps about the appropriate ARN. Note that all these aspects are unnecessary if you deploy manually from the terminal with the cdk deploy command. Now, when all the errors are fixed and everything is done as needed, your AWS Lambda function will be deployed on every commit push.
Ok, we have almost finished. The last thing I would like to say is that the CICD script can be improved by adding processing to the cases depending on the test or production branches, etc. Also, I recommend you go to the Route53 section of the AWS console and attach the domain name according to your certificate. Route53 allows you to attach the domain directly to your Lambda by its name. Feel free to perform experiments and improvements.

Good luck!