DEV Community: Stefan Sundin

How to use Amazon ECR Public over IPv6

Stefan Sundin — Sun, 28 Dec 2025 18:30:07 +0000

So you want to pull your Docker images from Amazon ECR Public over IPv6?

You are probably looking to remove your public IPv4 addresses in order to save on your AWS bill. But the fact that ECR Public doesn't seem to support IPv6 may be holding you back. Fortunately, the fix is easy.

Simply use ecr-public.aws.com instead of public.ecr.aws. Example:

# Use this when pulling over IPv4 or IPv6:
ecr-public.aws.com/ubuntu/ubuntu:latest

# The old URIs only work with IPv4:
public.ecr.aws/ubuntu/ubuntu:latest

I can't think of any reason when you should continue to use public.ecr.aws. Please correct me in the comments if I am wrong.

This new domain is official, although it is, so far, poorly documented. You can find references to it in the user guide pdf.

Why did they introduce a new domain name instead of simply adding AAAA records to the old one? I don't know, but it is such an Amazon thing to do. They do this all the time. My guess is that they want to avoid breaking customers that have badly configured networks. While that's very considerate of them, in this case I think having two ways to refer to the same docker image in ECR Public is confusing and annoying. Isn't this why we have the Happy Eyeballs algorithm?

The web gallery still only gives you image URIs using public.ecr.aws. Let's hope they improve that in the future.

Hope this helps!

How to configure a split-traffic VPN to access private S3 buckets

Stefan Sundin — Sat, 14 Sep 2024 21:45:08 +0000

If you maintain a VPN that allows users to access private VPC resources, then you might also want to use it to allow access to private S3 buckets. This post will help you configure your VPN accordingly.

This is only relevant if your VPN configuration is splitting the traffic in a way where only part of the traffic goes through the VPN. If you route all of the user's network traffic through the VPN then this post doesn't apply to your use case.

If you have a VPC Interface Endpoint set up (with private DNS enabled), then likewise this is should not be an issue. However, since interface endpoints cost $0.01/per AZ/per hour, I prefer using Gateway Endpoints since they are FREE and can even lower your AWS bill.

Now that we have defined the specific use case when this applies, please join me in figuring out what IP ranges you need to configure to be sent through the VPN.

Your bucket policy should look something like this (taken from this documentation page):

{
  "Version": "2012-10-17",
  "Id": "Policy1415115909152",
  "Statement": [
    {
      "Sid": "Access-to-specific-VPCE-only",
      "Principal": "*",
      "Action": "s3:*",
      "Effect": "Deny",
      "Resource": ["arn:aws:s3:::awsexamplebucket1",
                   "arn:aws:s3:::awsexamplebucket1/*"],
      "Condition": {
        "StringNotEquals": {
          "aws:SourceVpce": "vpce-1a2b3c4d"
        }
      }
    }
  ]
}

As mentioned in my other post, Four surprising IPv6 gotchas with Amazon S3, you may also want to add a NotIpAddress to the condition to allow for access over IPv6. Here's what that would look like (replace the IPv6 CIDR range with the CIDR range for your own VPC):

{
  "Version": "2012-10-17",
  "Id": "Policy1415115909152",
  "Statement": [
    {
      "Sid": "Access-to-specific-VPCE-only",
      "Principal": "*",
      "Action": "s3:*",
      "Effect": "Deny",
      "Resource": ["arn:aws:s3:::awsexamplebucket1",
                   "arn:aws:s3:::awsexamplebucket1/*"],
      "Condition": {
        "StringNotEquals": {
          "aws:SourceVpce": "vpce-1a2b3c4d"
        },
        "NotIpAddress": {
          "aws:SourceIp": "2600:1234:abcd:800::/56"
        }
      }
    }
  ]
}

With this in place, you are now required to route the traffic through your VPN to access this S3 bucket.

In this example I am configuring a WireGuard VPN client config using the AllowedIPs setting. This might look like the following to send both IPv4 and IPv6 traffic through to the VPN:

AllowedIPs = 10.0.0.0/8, 2600:1234:abcd:800::/56

However, since we're using a VPC Gateway Endpoint, we need to manually configure the IP ranges that Amazon S3 use here as well. Amazon publishes an ip-ranges.json file that makes it easy to automate some of this work.

Start by downloading the ip-ranges.json file:

curl -O https://ip-ranges.amazonaws.com/ip-ranges.json

My S3 bucket and VPC are both located in us-west-2, so I can use these jq commands to retrieve the IP ranges that are appropriate for me:

$ cat ip-ranges.json | jq -Mr '.prefixes[] | select((.service == "S3") and (.region == "us-west-2")) | .ip_prefix' | sort
18.34.244.0/22
18.34.48.0/20
3.5.76.0/22
3.5.80.0/21
35.80.36.208/28
35.80.36.224/28
52.218.128.0/17
52.92.128.0/17

$ cat ip-ranges.json | jq -Mr '.ipv6_prefixes[] | select((.service == "S3") and (.region == "us-west-2")) | .ipv6_prefix' | sort
2600:1f68:4000::/39
2600:1fa0:4000::/39
2600:1ff0:4000::/39
2600:1ff8:4000::/40
2600:1ff9:4000::/40
2600:1ffa:4000::/40

All you need to do is add all of these IP ranges to your AllowedIPs list. Here is what it looks like for me:

AllowedIPs = 10.0.0.0/8, 2600:1234:abcd:800::/56, 18.34.244.0/22, 18.34.48.0/20, 3.5.76.0/22, 3.5.80.0/21, 35.80.36.208/28, 35.80.36.224/28, 52.218.128.0/17, 52.92.128.0/17, 2600:1f68:4000::/39, 2600:1fa0:4000::/39, 2600:1ff0:4000::/39, 2600:1ff8:4000::/40, 2600:1ff9:4000::/40, 2600:1ffa:4000::/40

With these changes, my VPN configuration allows me to access both my private VPC resources and my private S3 buckets. It's a little bit messy, but it works!

If you have experience in how this is simplified using a VPC Interface Endpoint, please let us know in the comments. It would be an interesting read.

Please let me know if this helped you, or if anything needs clarification. :)

Four surprising IPv6 gotchas with Amazon S3

Stefan Sundin — Sat, 14 Sep 2024 20:55:26 +0000

You have been able to access Amazon S3 over IPv6 since 2016. In this post I'll describe a few reasons that I have found for why you might want to continue using IPv4 for S3, at least until the issues below are addressed by AWS.

If you are reading this a year from now (or more), then hopefully some of these gotchas are no longer relevant, so please double check each point using the provided references.

You are very likely accessing S3 over IPv4 today, since in order to use IPv6 you need to access it over the "dual-stack" endpoint which is not used by default. If you don't see "dualstack" in the S3 URL then you're using good old IPv4. If you see "dualstack" in the S3 URL then there's still a chance that you're not using IPv6, see the documentation for how to verify if your computer and network can connect to Amazon S3 using IPv6.

Here's the gotchas that I promised:

VPC Gateway Endpoint prefix lists for S3 do not work with IPv6. 🙈
- The main downside of this is that you pay for the data transfer between your EC2 instances and S3, which may be substantial if you transfer a lot of data.
- Go to the VPC console to see if the s3 prefix list has any IPv6 prefixes. It is likely that when AWS decides to publish IPv6 prefixes for Amazon S3, that they do so in a separate prefix list, since we already have two prefix lists for vpc-lattice.
Because of reason 1, if you use a bucket policy to restrict incoming traffic using the aws:SourceVpce condition, this isn't compatible with the dualstack endpoint. 🙉
- You can work around this issue by also using a NotIpAddress condition, example (replace the IPv6 CIDR range with the CIDR range for your own VPC):
```
"Effect": "Deny",
"Condition": {
  "StringNotEquals": {
    "aws:SourceVpce": "vpce-01234567890abcdef"
  },
  "NotIpAddress": {
    "aws:SourceIp": "2600:1234:abcd:800::/56"
  }
}
```
S3 Express does not support IPv6 right now. 🙊
- You may receive the error "S3Express does not support Dual-stack" if you try to access S3 Express over IPv6.
- My program shrimp used to default to using the dual-stack endpoint, but I removed that once I found out about this issue. So for now you should have users explicitly opt in to IPv6.
Static website hosting is not supported when accessing an S3 bucket over IPv6. 🙊
- This limitation is documented on this documentation page.

So that's the quick rundown. Have you found any other gotchas? Let me know in the comments.

Updating your programs for S3 Express One Zone

Stefan Sundin — Wed, 31 Jan 2024 21:33:38 +0000

At last year's annual re:Invent conference, AWS launched a new S3 storage class called S3 Express One Zone. With the launch of this storage class you are now able to, for the first time, specify which availability zone your objects are stored in, in order to maximize performance. But there's more to it than that, a lot more.

I have a couple of programs dedicated to performing specific S3 tasks, so let's see what it takes to upgrade them to support this new storage class, and document some of the challenges involved. But first, let's spend some time understanding what's new.

While reading the launch blog post, you are slowly introduced to another concept: the "directory bucket". This is a new bucket type that is required in order to use S3 Express One Zone. You can't just upload files into a regular S3 bucket and specify the new storage class, that won't work. The directory bucket brings with it new restrictions and limitations in order to maximize performance (read How is S3 Express One Zone different?). The more you read about it, the more you realize it is a way bigger change than you expected at first. It may be the biggest change to Amazon S3 since it was launched in 2006.

Normally, these kinds of revisions are hard to make to a service, but since directory buckets are basically a new version of S3 that is running on new endpoints that are serving new buckets created specifically for S3 Express One Zone, it is possible for AWS to make these changes without breaking existing users. You cannot use an older existing program and point it to a directory bucket and expect it to work, because it will not. The authorization scheme is completely different, again in order to maximize performance.

You need to upgrade the AWS SDK version that your program is using in order to support S3 Express One Zone. In many cases, simply upgrading the SDK may be enough, depending on your program and how it uses S3. It's magic.

To explain the magic part, we have to look at the bucket name. Directory buckets have a special naming scheme, ending with --azid--x-s3. The SDK uses this information to automatically direct the request to the correct endpoint and perform the necessary authorization. There are no new parameters, all the necessary information is packed into the bucket name. It feels a bit unconventional coming from AWS to do it this way, but I think they correctly assumed that it would be the simplest way to roll this out. It makes me wonder how long it took for them to settle on this and find a naming scheme that didn't interfere with any existing bucket names.

Upgrade AWS SDK.
Use S3 Express One Zone.
???
Profit!

Since there are new restrictions, there is a good chance that you actually do have to make changes to your program. As mentioned before, if you are lucky then you only need to upgrade the AWS SDK. Be sure to test your program extensively, though, as there are a lot of small changes that might bite you.

To take a closer look at the changes you may have to make, I went ahead and upgraded three of my programs and tested them with directory buckets and S3 Express One Zone. The programs are:

shrimp
s3sha256sum
s3verify

I'll break down the changes required for each program below.

shrimp

shrimp is a program that can upload large files to S3 buckets, and it would be great if it could also upload to S3 Express One Zone.

I have created a new directory bucket with the prefix my-test-bucket and put it in us-west-2, which makes the full bucket name my-test-bucket--usw2-az1--x-s3. I will use this bucket in my testing.

Let's see how shrimp behaves when attempting to use it with a directory bucket, before the SDK is upgraded:

$ ./shrimp LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

operation error S3: GetBucketLocation, https response error StatusCode: 404, RequestID: V8WHW968Y69RFXWE, HostID: OgLXISIHFFqh3hLYK9KwKI47zItxJIxPuOHCcOlarsEgiZaw4BeQ5vXioaWRrRUfwuVE8qNtxDI=, api error NoSuchBucket: The specified bucket does not exist

It complains that the bucket does not exist, which makes sense because, as I explained previously, directory buckets are basically their own separate S3 service and the normal S3 service is not aware of them. It would be useful if the error message hinted that the user has to upgrade their program to make it compatible with S3 Express One Zone, as they should be able to make this determination on the server side. Compatible clients should never contact the normal S3 service using this bucket name.

Lets upgrade the AWS SDK. All three of my programs are written in Go so I simply have to run go get -u to upgrade my dependencies.

After the upgrade I attempt to build the program using go build, but unexpectedly there are many errors:

$ go get -u
$ go build
# github.com/stefansundin/shrimp
./main.go:186:30: cannot use bucketKeyEnabled (variable of type bool) as *bool value in struct literal
./main.go:555:5: invalid operation: offset += part.Size (mismatched types int64 and *int64)
./main.go:566:21: invalid operation: part.Size < 5 * MiB (mismatched types *int64 and untyped int)
./main.go:567:203: cannot use part.Size (variable of type *int64) as int64 value in argument to formatFilesize
./main.go:579:25: cannot convert part.PartNumber (variable of type *int32) to type int
./main.go:760:27: cannot use partNumber (variable of type int32) as *int32 value in struct literal
./main.go:915:21: cannot use partNumber (variable of type int32) as *int32 value in struct literal

It turns out that the AWS SDK for Go, aws-sdk-go-v2, released some breaking changes that are unrelated to S3 Express One Zone. You can read more about these changes here.

These breaking changes are fairly easy to fix, luckily. You can take a look at the commit to fix them here: https://github.com/stefansundin/shrimp/commit/7273da630388462378417d0fdf502c5f003e202e

Let's try it again:

$ ./shrimp LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

operation error S3: GetBucketLocation, exceeded maximum number of attempts, 3, https response error StatusCode: 500, RequestID: 0033EADA6B01018D5E4BDEA60400F050543742C2, HostID: bDDLqJM45g7RP6hu, api error InternalError: We encountered an internal error. Please try again.

Hmm, this seems like a bug on AWS's side. It doesn't seem like you can currently use GetBucketLocation with directory buckets. As a workaround we can add --region us-west-2 and shrimp won't try to look up the bucket region. I will report this issue as a bug. In the worst case we may have to start parsing the bucket names ourselves.

Let's add --region us-west-2 and try again:

$ ./shrimp --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

operation error S3: HeadObject, resolve auth scheme: resolve endpoint: endpoint rule error, S3Express does not support Dual-stack.

Uh oh. I have programmed shrimp to automatically opt in to use the S3 dual-stack endpoints, and unfortunately S3 Express does not support this yet. AWS has been slow in launching IPv6 support for their service endpoints (see their progress here).

I'll just remove this automatic opt in for now and revisit it in the future. You can take a look at the commit to fix this here: https://github.com/stefansundin/shrimp/commit/3ce042d4f50068a6deb74f548e026e55710afddb

Let's try it again:

$ ./shrimp --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

Checking if this upload is already in progress.
operation error S3: ListMultipartUploads, https response error StatusCode: 400, RequestID: 0033eada6b00018d5e22210505096148538cae34, HostID: GZkFlD, api error InvalidRequest: This bucket does not support a prefix that does not end in a delimiter. Specify a prefix path ending with a delimiter and try again.

Ah, we've finally hit a more interesting problem. Both the blog post and the documentation notes that S3 Express One Zone requires that / is used as a delimiter for ListObjectsV2. It seems like this limitation also applies to other listing operations such as ListMultipartUploads.

I can just remove the Prefix parameter in this request, as shrimp already paginates that response. The request would just take a little bit longer if the user many multi-part uploads in progress (which is unlikely). You can take a look at the commit to fix this here: https://github.com/stefansundin/shrimp/commit/576e4bde577d43eab01edbf1ffc5ca50fe65b804

Let's try it again:

$ ./shrimp --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
File size: 34.3 kiB (35147 bytes)
Part size: 8.0 MiB (8388608 bytes)
The upload will consist of 1 parts.

Checking if this upload is already in progress.
Creating multipart upload.
Upload id: ARI4K--9oqLMoUXfNiN8_otdAAAAAAAAAAEMAAAAAAAAADAyNzUyNDM0NjYxMBYAAAAAAAAAAA0AAAAAAAAAAAFoAWgAAAAAAAAEA8CSLF6NAQAAAAAfE13UiZ5xFkcxWcIBaZXKrciwyAIMYerYmp_35dHLZQ

Tip: Press ? to see the available keyboard controls.
Uploaded part 1 in 538ms (67.6 kB/s). (total: 100.000%, 0s remaining)
Completing the multipart upload.
All done!

{
  "Bucket": "my-test-bucket--usw2-az1--x-s3",
  "BucketKeyEnabled": null,
  "ChecksumCRC32": null,
  "ChecksumCRC32C": null,
  "ChecksumSHA1": null,
  "ChecksumSHA256": null,
  "ETag": "\"a849da80ceed4363a0d47eb0f0b8b18e-1\"",
  "Expiration": null,
  "Key": "LICENSE",
  "Location": "https://my-test-bucket--usw2-az1--x-s3.s3express-usw2-az1.us-west-2.amazonaws.com/LICENSE",
  "RequestCharged": "",
  "ResultMetadata": {},
  "SSEKMSKeyId": null,
  "ServerSideEncryption": "AES256",
  "VersionId": null
}

Yay, it finally worked. 🥳

There may be other minor problems but it seems like the main functionality is working. I will perform more extensive testing before releasing a new version. Let's move on to the next program.

s3sha256sum

s3sha256sum is a program that calculates SHA-256 checksums of S3 objects. I wrote this program before AWS launched their own feature to support checksums (which you should definitely be using as it makes your upload faster!).

Let's move directly to upgrading the AWS SDK:

$ go get -u
$ go build
# github.com/stefansundin/s3sha256sum
./main.go:318:33: cannot convert obj.ContentLength (variable of type *int64) to type uint64
./main.go:360:37: invalid operation: obj.TagCount > 0 (mismatched types *int32 and untyped int)

There are far fewer errors for this program. You can take a look at the commit to fix them here: https://github.com/stefansundin/s3sha256sum/commit/2dc22699acc7c183ed01de95d04b588b0bd183e9

Let's try it again:

$ ./s3sha256sum --region us-west-2 s3://my-test-bucket--usw2-az1--x-s3/LICENSE
8ceb4b9ee5adedde47b31e975c1d90c73ad27b6b165a1dcd80c7c545eb65b903  s3://my-test-bucket--usw2-az1--x-s3/LICENSE

Metadata 'sha256sum' not present. Populate this metadata (or tag) to enable automatic comparison.

It worked, yay. 🥳 Looks like upgrading s3sha256sum was very simple indeed. Let's move on to the next program.

s3verify

s3verify is essentially the sequel to s3sha256sum which uses the new checksum feature in S3.

I think you know the drill by now, let's start by upgrading the SDK:

$ go get -u
$ go build
# github.com/stefansundin/s3verify
./main.go:231:13: cannot use 100000 (untyped int constant) as *int32 value in struct literal
./main.go:266:29: invalid operation: objAttrs.ObjectSize != fileSize (mismatched types *int64 and int64)
./main.go:310:18: cannot convert objAttrs.ObjectParts.TotalPartsCount (variable of type *int32) to type int
./main.go:325:20: invalid operation: partNumber != part.PartNumber (mismatched types int32 and *int32)
./main.go:335:48: cannot use part.Size (variable of type *int64) as int64 value in argument to io.LimitReader
./main.go:351:96: invalid operation: offset + part.Size (mismatched types int64 and *int64)
./main.go:356:3: invalid operation: offset += part.Size (mismatched types int64 and *int64)

You can take a look at the commit to fix the errors here: https://github.com/stefansundin/s3verify/commit/47401dba869c464b9fee6bd63e654a2935a5c500

Let's see if it works:

$ ./s3verify --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
Fetching S3 object information...
S3 object checksum: +dj/ivnBnbf9whNrCdoipOOwqa4Vwv/7HK7mcsxsSnY=-1
Object consists of 1 part.

Part 1: jOtLnuWt7d5Hsx6XXB2QxzrSe2sWWh3NgMfFRetluQM=  OK

Checksum of checksums: +dj/ivnBnbf9whNrCdoipOOwqa4Vwv/7HK7mcsxsSnY=

Checksum MISMATCH! File and S3 object are NOT identical!

This looks interesting, the checksums appear to be the same but on a closer inspection you can see that the S3 checksum has -1 added to the end. The AWS Management Console has long presented multi-part checksums this way, but the API has so far not appended the number of parts to the checksum value. This error only occurs with multi-part uploads, the program successfully verifies single-part uploads.

The fix for this is fairly simple, we need to account for both cases since the regular S3 still computes the checksum the original way. You can take a look at the commit to fix this here: https://github.com/stefansundin/s3verify/commit/5c80b6f7cc76b601e2788ec50142765071915ace

It might be challenging to keep your program is compatible with both versions of S3, especially if they diverge further as time goes on. You will have to remember to routinely test with both going forward, especially with major changes.

Let's try it again:

$ ./s3verify --region us-west-2 LICENSE s3://my-test-bucket--usw2-az1--x-s3/LICENSE
Fetching S3 object information...
S3 object checksum: +dj/ivnBnbf9whNrCdoipOOwqa4Vwv/7HK7mcsxsSnY=-1
Object consists of 1 part.

Part 1: jOtLnuWt7d5Hsx6XXB2QxzrSe2sWWh3NgMfFRetluQM=  OK

Checksum of checksums: +dj/ivnBnbf9whNrCdoipOOwqa4Vwv/7HK7mcsxsSnY=-1

Checksum matches! File and S3 object are identical.

Yay, it works. 🥳

Summary

This concludes the blog post. I hope it was informative for you. If I overlooked something or if you have any questions, please write a comment.

Save the cost of a load balancer with route53-update

Stefan Sundin — Mon, 24 Apr 2023 03:01:20 +0000

AWS is really great when it comes to scale. You can spin up hundreds of EC2 instances and very easily scale up to handle traffic spikes, and you can do this almost anywhere in the world.

An essential component in this setup is a load balancer, acting as the entry point in a region which seamlessly balances the traffic to the compute instances that runs your app. When instances are replaced then the load balancer automatically stops sending traffic to the old instance and when the replacement comes online it starts sending traffic to that one.

But what about when you only need a single instance? You might not need or want auto-scaling to occur. The service might be private or you might prefer it to fail rather than to incur more costs. If the instance fails and a new one comes up, how can we ensure that traffic is routed to the new instance? A load balancer costs about $17 per month, which in many cases is more than the compute required for the app itself.

A common solution to this problem is to allocate an Elastic IP and then associate the EIP to the new instance in the userdata script.

But in some cases an EIP is not appropriate (e.g. when using private IP addresses inside of the VPC), and it is just simpler to rely on DNS. It is easy to write a script that does this for you (you can find several options on GitHub).

I recently had this problem with Amazon ECS and I couldn't find a solution that perfectly solved my use cases. Sometimes I need to update the DNS to a private IP address, and sometimes to a public IP address. Sometimes I run the ECS task on EC2 and sometimes I run it on Fargate. To add to this, I want the program to be as small as possible to reduce the time and resources it takes to run (the official aws-cli docker image is more than 100 MB).

So I decided to write my own tool: route53-update

The program is still under development but so far there is a beta docker image published that can update DNS based on an ECS task's public or private IP address. You can also specify an argument to update the DNS to a value from a URL (e.g. https://checkip.amazonaws.com/). Once the program has matured a bit more then I will start publishing binaries to make it easier to install. A bonus for writing my own tool is that it gives me another opportunity to use Rust in a new project. The AWS SDK for Rust is slowly maturing so I am using Rust more and more in new AWS projects (the SDK is not stable yet but in my opinion it is completely fine to use in new hobby projects).

The GitHub repository contains an example showing how to plug it into your Amazon ECS Task Definition. I will add more examples in the future, like how to download and run it in a userdata script.

One incentive for writing route53-update is RSS Box. I am working to make it simpler to run on AWS. I want people to be able to run it without requiring a costly load balancer, and route53-update is one small piece of the solution.

That's all for today. The program is very early in its development, but my hope is that it will be very versatile and eventually support many different deployment configurations.

Comparing shrimp to the AWS CLI

Stefan Sundin — Sun, 23 Apr 2023 02:44:09 +0000

shrimp is an Amazon S3 uploader for humans, with powerful interactive features.

I want to expand on the blog post that I wrote when I announced shrimp; Introducing shrimp and s3sha256sum. While shrimp is a great tool to have in your toolbox, there are cases where it is not the best tool for the job. Let's compare it to the AWS CLI, and also dive deeper on the key challenges that shrimp overcomes.

	AWS CLI	shrimp	Notes
Upload files to Amazon S3	✅	✅
Download files from Amazon S3	✅	❌	I've been considering implementing downloads in shrimp. Leave a comment below if this is something you need.
Synchronize directory	✅	❌	This might be the biggest blocker with regards to uploading a lot of files with shrimp. Definitely want this in a future version.
Pause upload	❌	✅	Just hit the space key when uploading with shrimp to pause the upload.
Resume interrupted upload	❌	✅	This is the primary reason why I couldn't use the AWS CLI to upload my backups to S3. Even worse, the AWS CLI will abort your multi-part upload when you interrupt it, destroying any chance of resuming it.
Retry in case of errors	❌	✅	If you experience network issues during an upload, the AWS CLI will terminate with an error. You'll have to wrap it in a script to handle the error if you want it to recover automatically. shrimp automatically retries and resumes an upload when any error occurs.
Bandwidth limiter	✅	✅	The AWS CLI supports it through the configuration file (no command line argument).
Adjust bandwidth limiter during upload	❌	✅	The AWS CLI cannot adjust the bandwidth limiter during the upload. And since you can't resume the upload there isn't any workaround. shrimp can even automatically adjust the bandwidth limiter based on the current day and time.
Supports alternative checksum algorithms	❌	✅	For some inexplicable reason, the AWS CLI still hasn't been updated to upload files using the alternative checksum feature launched in February 2022. Using alternative checksums also speeds up the upload since the checksum does not need to be computed before the upload starts.
Smart single/multi-part selection	✅	❌	shrimp always uses a multi-part upload to do its job. Ideally it should just use a single request for small files. I want to implement this even though it might increase complexity of the code.
Upload multiple parts in parallel	✅	❌	For simplicity, I chose to not even attempt concurrent part uploads in shrimp. You might think you need this even when you don't. I don't think it will significantly increase the throughput in most cases. I have used shrimp on a very fast connection and I easily reached 50 MB/s.
Advanced MFA management	❌	✅	If your credentials require MFA and you use the AWS CLI to upload a very large file, and the upload doesn't finish in the validity time of the MFA token, then your upload will not complete. This can make it very hard to upload huge files without sacrificing security. shrimp will prompt you again for another MFA code when needed. It also supports an advanced `--mfa-secret` option that can be used to make shrimp automatically generate the codes.

The lack of some of these features in the AWS CLI were key in convincing me that I had to write shrimp, specifically:

Resume interrupted upload - This is paramount when you need to upload hundreds of gigabytes to Glacier.
Adjust bandwidth limiter during upload - This is essential when you need to upload large files over a residential internet connection. You want to let it rip overnight and then go slower when it would otherwise ruin your regular internet activities.

If the AWS CLI had supported these features then perhaps I would never have written shrimp.

I want to implement some of the missing features, like uploading multiple files similar to aws s3 sync. Please comment below if you need a specific feature, I'd be very interested to hear about your use case.

Hopefully shrimp can close the gap and become an even more valuable tool in everyone's toolbox. I would also love to see the AWS CLI learn some new tricks, like support for alternative checksum algorithms.

This blog post was also posted to GitHub: https://github.com/stefansundin/shrimp/discussions/5

Introducing s3verify: verify that a local file is identical to an S3 object without having to download the object data

Stefan Sundin — Mon, 22 Aug 2022 01:00:00 +0000

In February 2022, Amazon S3 released a new checksum feature that allows for integrity checking without having to download the object data (blog post, documentation). Today, I'm happy to announce s3verify, a new program that I've developed related to this feature. Before I talk about my program, I want to explain what the new S3 feature is and why it is so useful.

Previously, the only built-in way to attempt any kind of standardized verification like this was by using the object ETag. However, the ETag is only usable for this purpose if the object is unencrypted, which is not acceptable for most users these days. For encrypted objects, the ETag is most likely a checksum of the ciphertext, which is probably all that the S3 error-checking process requires in order to verify that the data hasn't been corrupted. This has been a bit of a sorry state of affairs for a long time, forcing S3 users to come up with their own verification schemes. I am guessing that many big customers of Amazon have been asking them to address this, and earlier this year change finally arrived!

When you upload a file to S3, you can now specify a checksum algorithm (SHA-1, SHA-256, CRC-32, or CRC-32C). Your client will be computing the checksum while the upload is taking place and submit the checksum at the end using an HTTP trailer. While S3 is receiving the data, they will perform the same checksum computation on their side, and at the conclusion of the upload they will reject the upload if the two checksums do not match. This checksum is then immutably stored in the object metadata for the lifetime of the object, making it available afterwards without the need to download the object data. It is impossible to modify or accidentally remove the checksum from the object. Having this checksum easily accessible is especially useful for objects on Glacier, which can be very costly and take days to retrieve.

If you have existing objects that were not uploaded with a checksum algorithm then you need to either make a copy of the object (using CopyObject with the x-amz-checksum-algorithm header) or by uploading the object from scratch with a checksum algorithm selected. This procedure might be a good subject for a future blog post.

Once you have an S3 object with a checksum, you may ask yourself: now how do I verify it? 🤔

Unfortunately, Amazon hasn't released any tool of their own to perform this verification, even though it has been 6 months since the introduction of the feature. I expected an aws cli subcommand to eventually appear, but it hasn't happened. They did release some Java reference code that uses the AWS SDK on this documentation page, but that is very hard for most people to use.

I decided to fill this gap by building s3verify. It allows you to very easily verify that a local file is identical to an S3 object, without the need to download the object data. It only works on objects that were uploaded using this new checksum feature.

The program is very simple, simply invoke it and point it at a local file and an S3 object, and it will tell you if they are identical:

$ s3verify important-backup-2021.zip s3://mybucketname/important-backup-2021.zip
Fetching S3 object information...
S3 object checksum: x5AZd/g+YQp7l0kwcN8Hw+qqXZj2ekjAHG0yztmkWXg=
Object consists of 21 parts.

Part  1: fiP2aEgcQGHHJgmce4C3e/a3m50y/UJHsYFojMS3Oy8=  OK
Part  2: /lRdaagPhvRL9PxpQZOTKLxr1+xX64bYN6hknuy9y3k=  OK
Part  3: nS/vLGZ13Cq7cGWlwW3QnLkJaDTRrY8PUgsCGs9abKU=  OK
Part  4: HJWCIDAo8MY0nk5m4uBvUJ5R0aZzPAWJPE9F9WheEAk=  OK
Part  5: JExPU8KHhBJ1K+fh/p0gNT50ueRi6BxOL3XXSvHVUgQ=  OK
Part  6: gyp/OaxJqKz1mYWAZadtNhBgqEXpDUvMVuIZybwD1ic=  OK
Part  7: 1RcmmE8STey0mE33MXrzFAXbWrjawaVbnXeX5GB/F/Y=  OK
Part  8: XdcyPdbc2OYgF0NE/c9Q5vBgI8BXlv8tLZB3g6ETvlI=  OK
Part  9: pOKv/u4hlfGEpaBE5YTKA3IlVQDY+hMlySbdh9dfqsI=  OK
Part 10: W4WKSjF+blMinRdP9EcJ9mSDQMMyAUn0KfFgCWv8ZxI=  OK
Part 11: nP35yqHA+Pgum8yWeeXRZU/jPGF/ntnAR+rqOcwlhqk=  OK
Part 12: aoEWVZnc/8ualswzKmMXWZaQg/Bg/4zFs1MGQQTpHV0=  OK
Part 13: LVMnzhFxBPzFfVRFzilrfNCPX8zJhu1jNSNn7cZYmew=  OK
Part 14: OrcQx1cNqtatD6WGf4kA2R/ld7rVzQTkzbL9rAtYLDY=  OK
Part 15: 1+1AxALVTubSsdBW1qXs2toyCLDpq81I+ivFKPAzogs=  OK
Part 16: 3kPLbv0PCSlATrTOdzin03KbAezfi165l1Tq09gAN0Q=  OK
Part 17: IPTEvMXa/ZZe8IabeFDNWAF8hBV7dwNsu3wXJrBHwRE=  OK
Part 18: IOhxLxcmmqWvRi+y6ITVaPyFLzjo4wAB4f7e7I6CFYc=  OK
Part 19: tGCw1J2c2dYlZdxlxvLX+w4r6Cp9S5WhN7hJeRXJMUo=  OK
Part 20: sMH7Jh9qH/nUOue0/oBaaPYJXf8S81j6p7LoMub+7H8=  OK
Part 21: q5W9UMl7As4VVuEJcdvQC1ENyAVM2AlLc9utiEF4v4E=  OK

Checksum of checksums: x5AZd/g+YQp7l0kwcN8Hw+qqXZj2ekjAHG0yztmkWXg=

Checksum matches! File and S3 object are identical.

If the checksums do not match then you will see the following:

Checksum MISMATCH! File and S3 object are NOT identical!

If the file size and S3 object size do not match then you will see a similar error (in this case hashing will not be attempted).

I hope that s3verify will be useful to you. Please file an issue in the GitHub repository if you have any problems using it. It is a perfect companion to my earlier S3 programs, shrimp and s3sha256sum.

P.S. Unfortunately, aws s3 cp doesn't yet have a --checksum-algorithm argument. It is very strange that they haven't added this yet. However, you can use shrimp in the meantime as it fully supports uploading objects with this new checksum feature.

P.P.S. There is another legacy project called s3verify that is currently ranked higher on most search engines. It is unrelated to checking object integrity. Hopefully my project will overtake it in search rankings soon.

Stable versions of shrimp and s3sha256sum

Stefan Sundin — Sun, 21 Aug 2022 05:48:00 +0000

It has been almost a year since I introduced shrimp and s3sha256sum, and I think it is time to announce the changes that have been made since then.

First a quick recap of what these tools do:

shrimp is an interactive multipart uploader that excels at uploading very large files to S3. What differentiates shrimp from the aws cli is that it allows the user to dynamically change the bandwidth limit during the upload, and it lets the user pause the upload and resume it later. It can easily resume an upload that was interrupted for any reason.
s3sha256sum is basically sha256sum for S3 objects. I created it to help me validate that shrimp is uploading files correctly.

Since releasing the source code last year, I have been incrementally adding features to both programs. shrimp should now support pretty much all of the features in the aws cli, and the command line syntax is now much closer to the aws cli. In addition to feature parity, shrimp has also gained features that set it apart from the aws cli, such as a scheduler that can automatically adjust the bandwidth limiter based on the day and time, and an MFA feature that can automatically generate TOTP codes which is useful in cases where your upload takes longer than your allowed session duration (12 hours at maximum).

In addition to new features, I have also been uploading a lot of large files to S3 using shrimp, all of them successfully without any consistency problem. I now consider shrimp to be battle tested and I can more confidently vouch for its stability.

With these improvements I thought it was time to start releasing binaries and versioning updates. You no longer have to compile the programs from source. Please visit the releases sections on GitHub to download: shrimp and s3sha256sum.

In my last blog post I wrote that shrimp is for "slow internet connections". I have since used shrimp on very fast internet connections and it is indeed capable of uploading very quickly as well (> 50 MB/s). The limiting factor is that it is uploading a single part at a time, and greater speed could potentially be gained by parallelizing this process. However, I do not think this would help most people and it would make the code much more complicated which would make it a lot harder to verify that shrimp is error-free. If you need parallel part uploading then I recommend that you build a custom solution that is implemented to your own specification.

In my next blog post I will announce a new tool that complements these two very nicely. Stay tuned! (edit: here's the blog post about s3verify)

Introducing shrimp and s3sha256sum

Stefan Sundin — Fri, 29 Oct 2021 05:55:28 +0000

Today I am releasing two open source programs that will help you manage your data on Amazon S3: shrimp and s3sha256sum.

The first program, shrimp, is an interactive multipart uploader that is built specifically for uploading files over a slow internet connection. It doesn't matter if the upload will take days or weeks, or if you have to stop the upload and restart it at another time, shrimp will always be able to resume where it left off. Unlike the aws cli, shrimp will never abort the multipart upload (please set up a lifecycle policy to clean up abandoned multipart uploads, as described here).

While shrimp is uploading, you can use your keyboard to adjust the bandwidth limit throttler. Press ? to bring up a list of available controls. If you want shrimp to throttle your upload during the day then simply set your desired limit (use a s d f to increase the limit in different increments, and z x c v to decrease the limit). Then in the evening, you can remove the throttle with u. This feature will let you keep the upload going while not slowing down your internet too much. And if you want to move to another location that has a faster upload speed (e.g. at your work), then simply pause the upload with p, move to the new location, and unpause by pressing p a second time.

shrimp makes it a worry-free process to upload very large files to Amazon S3. I'm planning to use it to upload terabytes of data to Glacier for backup purposes. Please let me know what your experience is using it, and what improvements you can think of.

The next program was created to give you peace of mind that your uploads were in fact uploaded correctly. I wanted to verify that shrimp was doing the correct thing, and not uploading the bytes incorrectly or reassembling the parts in the wrong order. Things can go wrong, so what can we do to verify that the correct behavior is taking place?

The program is called s3sha256sum, and the name should be familiar to many of you. As the name implies, it calculates SHA256 checksums of objects on Amazon S3. It uses a normal GetObject request and streams the object contents to the SHA256 hashing function. This way there is no need to download the entire object to your hard drive. You can verify very large objects without worrying about running out of local storage.

To save on costs when you verify your objects, you should know that it may be cheaper to spin up an EC2 instance in the same region as the S3 bucket, and run s3sha256sum from that instance. This is because data transfer from S3 to EC2 is free, as the S3 pricing page clarifies:

You pay for all bandwidth into and out of Amazon S3, except for the following:

Data transferred from an Amazon S3 bucket to any AWS service(s) within the same AWS Region as the S3 bucket (including to a different account in the same AWS Region).

If you attach the expected checksum to the object (either as metadata or a tag), then s3sha256sum can automatically compare the checksum that it just computed with the checksum that you stored on the object. It will print OK or FAILED depending on the outcome. For an example, see this discussion.

s3sha256sum has one more trick up its sleeve. Consider that you're running it on a 1 TB object but for some reason you have to abort it (with Ctrl-C) before it finishes. When the program is interrupted, it will get the internal state of the hash function and print a command that will let you resume the hashing from that position. For an example, see this discussion.

I think that about wraps it up. Please give shrimp and s3sha256sum a try and let me know if you find any bugs or have ideas for improvements. Thank you for reading!

CVE-2020-10187

Stefan Sundin — Sun, 03 May 2020 20:03:34 +0000

About two months ago, I found a CVE in a Ruby gem called Doorkeeper, and today the details were finally made public.

I found a couple of companies that were vulnerable in the wild, so it took some time to contact them and wait for them to patch their websites before the vulnerability was made public. I also worked with the gem maintainer to release a patch.

It's my first ever CVE, so I'm pretty proud of it.

Links: