DEV Community

Cover image for Updating your programs for S3 Express One Zone

Updating your programs for S3 Express One Zone

At last year's annual re:Invent conference, AWS launched a new S3 storage class called S3 Express One Zone. With the launch of this storage class you are now able to, for the first time, specify which availability zone your objects are stored in, in order to maximize performance. But there's more to it than that, a lot more.

I have a couple of programs dedicated to performing specific S3 tasks, so let's see what it takes to upgrade them to support this new storage class, and document some of the challenges involved. But first, let's spend some time understanding what's new.

While reading the launch blog post, you are slowly introduced to another concept: the "directory bucket". This is a new bucket type that is required in order to use S3 Express One Zone. You can't just upload files into a regular S3 bucket and specify the new storage class, that won't work. The directory bucket brings with it new restrictions and limitations in order to maximize performance (read How is S3 Express One Zone different?). The more you read about it, the more you realize it is a way bigger change than you expected at first. It may be the biggest change to Amazon S3 since it was launched in 2006.

Normally, these kinds of revisions are hard to make to a service, but since directory buckets are basically a new version of S3 that is running on new endpoints that are serving new buckets created specifically for S3 Express One Zone, it is possible for AWS to make these changes without breaking existing users. You cannot use an older existing program and point it to a directory bucket and expect it to work, because it will not. The authorization scheme is completely different, again in order to maximize performance.

You need to upgrade the AWS SDK version that your program is using in order to support S3 Express One Zone. In many cases, simply upgrading the SDK may be enough, depending on your program and how it uses S3. It's magic.

To explain the magic part, we have to look at the bucket name. Directory buckets have a special naming scheme, ending with --azid--x-s3. The SDK uses this information to automatically direct the request to the correct endpoint and perform the necessary authorization. There are no new parameters, all the necessary information is packed into the bucket name. It feels a bit unconventional coming from AWS to do it this way, but I think they correctly assumed that it would be the simplest way to roll this out. It makes me wonder how long it took for them to settle on this and find a naming scheme that didn't interfere with any existing bucket names.

  1. Upgrade AWS SDK.
  2. Use S3 Express One Zone.
  3. ???
  4. Profit!

Since there are new restrictions, there is a good chance that you actually do have to make changes to your program. As mentioned before, if you are lucky then you only need to upgrade the AWS SDK. Be sure to test your program extensively, though, as there are a lot of small changes that might bite you.

To take a closer look at the changes you may have to make, I went ahead and upgraded three of my programs and tested them with directory buckets and S3 Express One Zone. The programs are:

  • shrimp
  • s3sha256sum
  • s3verify

I'll break down the changes required for each program below.

shrimp

shrimp is a program that can upload large files to S3 buckets, and it would be great if it could also upload to S3 Express One Zone.

I have created a new directory bucket with the prefix my-test-bucket and put it in us-west-2, which makes the full bucket name my-test-bucket--usw2-az1--x-s3. I will use this bucket in my testing.

Let's see how shrimp behaves when attempting to use it with a directory bucket, before the SDK is upgraded:

$ ./shrimp LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

operation error S3: GetBucketLocation, https response error StatusCode: 404, RequestID: V8WHW968Y69RFXWE, HostID: OgLXISIHFFqh3hLYK9KwKI47zItxJIxPuOHCcOlarsEgiZaw4BeQ5vXioaWRrRUfwuVE8qNtxDI=, api error NoSuchBucket: The specified bucket does not exist
Enter fullscreen mode Exit fullscreen mode

It complains that the bucket does not exist, which makes sense because, as I explained previously, directory buckets are basically their own separate S3 service and the normal S3 service is not aware of them. It would be useful if the error message hinted that the user has to upgrade their program to make it compatible with S3 Express One Zone, as they should be able to make this determination on the server side. Compatible clients should never contact the normal S3 service using this bucket name.

Lets upgrade the AWS SDK. All three of my programs are written in Go so I simply have to run go get -u to upgrade my dependencies.

After the upgrade I attempt to build the program using go build, but unexpectedly there are many errors:

$ go get -u
$ go build
# github.com/stefansundin/shrimp
./main.go:186:30: cannot use bucketKeyEnabled (variable of type bool) as *bool value in struct literal
./main.go:555:5: invalid operation: offset += part.Size (mismatched types int64 and *int64)
./main.go:566:21: invalid operation: part.Size < 5 * MiB (mismatched types *int64 and untyped int)
./main.go:567:203: cannot use part.Size (variable of type *int64) as int64 value in argument to formatFilesize
./main.go:579:25: cannot convert part.PartNumber (variable of type *int32) to type int
./main.go:760:27: cannot use partNumber (variable of type int32) as *int32 value in struct literal
./main.go:915:21: cannot use partNumber (variable of type int32) as *int32 value in struct literal
Enter fullscreen mode Exit fullscreen mode

It turns out that the AWS SDK for Go, aws-sdk-go-v2, released some breaking changes that are unrelated to S3 Express One Zone. You can read more about these changes here.

These breaking changes are fairly easy to fix, luckily. You can take a look at the commit to fix them here: https://github.com/stefansundin/shrimp/commit/7273da630388462378417d0fdf502c5f003e202e

Let's try it again:

$ ./shrimp LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

operation error S3: GetBucketLocation, exceeded maximum number of attempts, 3, https response error StatusCode: 500, RequestID: 0033EADA6B01018D5E4BDEA60400F050543742C2, HostID: bDDLqJM45g7RP6hu, api error InternalError: We encountered an internal error. Please try again.
Enter fullscreen mode Exit fullscreen mode

Hmm, this seems like a bug on AWS's side. It doesn't seem like you can currently use GetBucketLocation with directory buckets. As a workaround we can add --region us-west-2 and shrimp won't try to look up the bucket region. I will report this issue as a bug. In the worst case we may have to start parsing the bucket names ourselves.

Let's add --region us-west-2 and try again:

$ ./shrimp --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

operation error S3: HeadObject, resolve auth scheme: resolve endpoint: endpoint rule error, S3Express does not support Dual-stack.
Enter fullscreen mode Exit fullscreen mode

Uh oh. I have programmed shrimp to automatically opt in to use the S3 dual-stack endpoints, and unfortunately S3 Express does not support this yet. AWS has been slow in launching IPv6 support for their service endpoints (see their progress here).

I'll just remove this automatic opt in for now and revisit it in the future. You can take a look at the commit to fix this here: https://github.com/stefansundin/shrimp/commit/3ce042d4f50068a6deb74f548e026e55710afddb

Let's try it again:

$ ./shrimp --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

Checking if this upload is already in progress.
operation error S3: ListMultipartUploads, https response error StatusCode: 400, RequestID: 0033eada6b00018d5e22210505096148538cae34, HostID: GZkFlD, api error InvalidRequest: This bucket does not support a prefix that does not end in a delimiter. Specify a prefix path ending with a delimiter and try again.
Enter fullscreen mode Exit fullscreen mode

Ah, we've finally hit a more interesting problem. Both the blog post and the documentation notes that S3 Express One Zone requires that / is used as a delimiter for ListObjectsV2. It seems like this limitation also applies to other listing operations such as ListMultipartUploads.

I can just remove the Prefix parameter in this request, as shrimp already paginates that response. The request would just take a little bit longer if the user many multi-part uploads in progress (which is unlikely). You can take a look at the commit to fix this here: https://github.com/stefansundin/shrimp/commit/576e4bde577d43eab01edbf1ffc5ca50fe65b804

Let's try it again:

$ ./shrimp --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

Checking if this upload is already in progress.
Creating multipart upload.
Upload id: ARI4K--9oqLMoUXfNiN8_otdAAAAAAAAAAEMAAAAAAAAADAyNzUyNDM0NjYxMBYAAAAAAAAAAA0AAAAAAAAAAAFoAWgAAAAAAAAEA8CSLF6NAQAAAAAfE13UiZ5xFkcxWcIBaZXKrciwyAIMYerYmp_35dHLZQ

Tip: Press ? to see the available keyboard controls.
Uploaded part 1 in 538ms (67.6 kB/s). (total: 100.000%, 0s remaining)
Completing the multipart upload.
All done!

{
  "Bucket": "my-test-bucket--usw2-az1--x-s3",
  "BucketKeyEnabled": null,
  "ChecksumCRC32": null,
  "ChecksumCRC32C": null,
  "ChecksumSHA1": null,
  "ChecksumSHA256": null,
  "ETag": "\"a849da80ceed4363a0d47eb0f0b8b18e-1\"",
  "Expiration": null,
  "Key": "LICENSE",
  "Location": "https://my-test-bucket--usw2-az1--x-s3.s3express-usw2-az1.us-west-2.amazonaws.com/LICENSE",
  "RequestCharged": "",
  "ResultMetadata": {},
  "SSEKMSKeyId": null,
  "ServerSideEncryption": "AES256",
  "VersionId": null
}
Enter fullscreen mode Exit fullscreen mode

Yay, it finally worked. 🥳

There may be other minor problems but it seems like the main functionality is working. I will perform more extensive testing before releasing a new version. Let's move on to the next program.

s3sha256sum

s3sha256sum is a program that calculates SHA-256 checksums of S3 objects. I wrote this program before AWS launched their own feature to support checksums (which you should definitely be using as it makes your upload faster!).

Let's move directly to upgrading the AWS SDK:

$ go get -u
$ go build
# github.com/stefansundin/s3sha256sum
./main.go:318:33: cannot convert obj.ContentLength (variable of type *int64) to type uint64
./main.go:360:37: invalid operation: obj.TagCount > 0 (mismatched types *int32 and untyped int)
Enter fullscreen mode Exit fullscreen mode

There are far fewer errors for this program. You can take a look at the commit to fix them here: https://github.com/stefansundin/s3sha256sum/commit/2dc22699acc7c183ed01de95d04b588b0bd183e9

Let's try it again:

$ ./s3sha256sum --region us-west-2 s3://my-test-bucket--usw2-az1--x-s3/LICENSE
8ceb4b9ee5adedde47b31e975c1d90c73ad27b6b165a1dcd80c7c545eb65b903  s3://my-test-bucket--usw2-az1--x-s3/LICENSE

Metadata 'sha256sum' not present. Populate this metadata (or tag) to enable automatic comparison.
Enter fullscreen mode Exit fullscreen mode

It worked, yay. 🥳 Looks like upgrading s3sha256sum was very simple indeed. Let's move on to the next program.

s3verify

s3verify is essentially the sequel to s3sha256sum which uses the new checksum feature in S3.

I think you know the drill by now, let's start by upgrading the SDK:

$ go get -u
$ go build
# github.com/stefansundin/s3verify
./main.go:231:13: cannot use 100000 (untyped int constant) as *int32 value in struct literal
./main.go:266:29: invalid operation: objAttrs.ObjectSize != fileSize (mismatched types *int64 and int64)
./main.go:310:18: cannot convert objAttrs.ObjectParts.TotalPartsCount (variable of type *int32) to type int
./main.go:325:20: invalid operation: partNumber != part.PartNumber (mismatched types int32 and *int32)
./main.go:335:48: cannot use part.Size (variable of type *int64) as int64 value in argument to io.LimitReader
./main.go:351:96: invalid operation: offset + part.Size (mismatched types int64 and *int64)
./main.go:356:3: invalid operation: offset += part.Size (mismatched types int64 and *int64)
Enter fullscreen mode Exit fullscreen mode

You can take a look at the commit to fix the errors here: https://github.com/stefansundin/s3verify/commit/47401dba869c464b9fee6bd63e654a2935a5c500

Let's see if it works:

$ ./s3verify --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
Fetching S3 object information...
S3 object checksum: +dj/ivnBnbf9whNrCdoipOOwqa4Vwv/7HK7mcsxsSnY=-1
Object consists of 1 part.

Part 1: jOtLnuWt7d5Hsx6XXB2QxzrSe2sWWh3NgMfFRetluQM=  OK

Checksum of checksums: +dj/ivnBnbf9whNrCdoipOOwqa4Vwv/7HK7mcsxsSnY=

Checksum MISMATCH! File and S3 object are NOT identical!
Enter fullscreen mode Exit fullscreen mode

This looks interesting, the checksums appear to be the same but on a closer inspection you can see that the S3 checksum has -1 added to the end. The AWS Management Console has long presented multi-part checksums this way, but the API has so far not appended the number of parts to the checksum value. This error only occurs with multi-part uploads, the program successfully verifies single-part uploads.

The fix for this is fairly simple, we need to account for both cases since the regular S3 still computes the checksum the original way. You can take a look at the commit to fix this here: https://github.com/stefansundin/s3verify/commit/5c80b6f7cc76b601e2788ec50142765071915ace

It might be challenging to keep your program is compatible with both versions of S3, especially if they diverge further as time goes on. You will have to remember to routinely test with both going forward, especially with major changes.

Let's try it again:

$ ./s3verify --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
Fetching S3 object information...
S3 object checksum: +dj/ivnBnbf9whNrCdoipOOwqa4Vwv/7HK7mcsxsSnY=-1
Object consists of 1 part.

Part 1: jOtLnuWt7d5Hsx6XXB2QxzrSe2sWWh3NgMfFRetluQM=  OK

Checksum of checksums: +dj/ivnBnbf9whNrCdoipOOwqa4Vwv/7HK7mcsxsSnY=-1

Checksum matches! File and S3 object are identical.
Enter fullscreen mode Exit fullscreen mode

Yay, it works. 🥳

Summary

This concludes the blog post. I hope it was informative for you. If I overlooked something or if you have any questions, please write a comment.

Top comments (0)