DEV Community

SageMaker Studio - Getting Started with Data Wrangler

When you look at the console, it's really quite difficult to tell where you get started with these new services. So there are some steps that you need to do in the SageMaker Studio before you can start using, or even accessing the Data Wrangler tool. The first step is, you need to provision SageMaker Studio if you haven't done this already.

Now, if you need to provision SageMaker Studio for the first time, but I'll show you how to do that. Otherwise, skip this post. Okay. So let's walk through setting up SageMaker Studio.

Now, to do this, there's two options. You can use the Quick start, or you can set up the account to be run as a team account. So best, if you're just starting this process, use the Quick start. So open the SageMaker console, choose SageMaker Studio from the top left-hand side of the page there.

Image description

And on the Studio setup page, under get started, choose Quick start.

Image description

Okay, ypu can create a name for your Studio. You can keep that default name if you want or make up our own. We can have up to 63 characters, using characters, numbers, and a hyphen.

Okay. We need to choose a role for SageMaker to execute.

Image description

So for the execution role, you can either choose one from the role selector, or you can create your own IAM or ARN role. So if you create new role, the Create an IAM role dialog appears.

Image description

The role must have the Amazon SageMaker full access policy attached to it. And we can set from here, what do we want the role to be? And we must ensure that it has this Amazon SageMaker full access policy attached to it.

Now, you might find that when you first try and create this, it does error out. Go back in and do it again. You'll notice that the SageMaker full access policy has been created if you didn't already have it. Next step is for the S3 buckets that we're going to use, you need to specify what they are. If you don't wanna add any access to more buckets, just choose None.

Okay, so now we create the role. Now, as I mentioned, there's two options with the roles.

Image description

We can do this quick setup or we can use the team setup, which is for projects basically. The standard setup, which basically gives you a little bit more control over how you provision the Studio, you can use either AWS SSO authentication or an IAM role. And basically, if you're using the standard setup, then each user, each member, gets a unique sign in URL that directs them to the Studio and they sign in with their SSO credentials.

One FYI, if you're using SSO, then the organization account needs to be in the same AWS region as the Studio account, okay? So just keep that in mind if you are planning on using SSO. So you can set a little bit more granularity around the usage using the standard setup over the fast setup. Options when we're using the standard setup is that we can select the VPC we want to run it in.

Image description

You set the VPC. You can also set the subnets that we want to use, limit the network access for Studio, whether it's public only or VPC only.

We can set security groups. We also have the option to set encryption using one of our KMS keys if we have one. And we can tag. Once these are all set up, then we hit the Submit and then be prepared to wait for a little while. Eventually it will get there.

Image description

Don't worry about the wait, it's well worth it. You'll see the status is ready. The execution role is created. The authentication method is set. We can see these settings we've got in here and this is where we can enable the projects if we want it.

We can access the Studio now that this is provisioned, but remember, it can be quite a while before it actually starts up. So don't be alarmed if you end up waiting for five minutes while the Studio is provisioned. Now, just a quick word on, if you wish to go back from a fast startup and use SSO or use the standard setup, you actually need to delete your original SageMaker Studio. Now to do that, you have to remove all of the applications and all the instances. And then basically the Studio itself is labeled as a user.

So once you've removed all of the applications, then you can essentially delete the user for the Studio. And when you've done that, once you've removed all the applications, then you will get the option to delete the Studio and that will remove it completely. Then you can go through and set it up using the standard setup, choosing either SSO or an IAM role.



GitHub
LinkedIn
Facebook
Medium

Top comments (0)