Getting Started with AWS Storage tools — s3

#tutorial #aws #data #cloud

Just getting started with Cloud Computing in Amazon Web Service as a Data Analyst and not sure of the different cloud services to should explore. This article is for you.

As a data analyst, it is imperative for you to have a basic knowledge of cloud services and learn how to use them for whatever projects you will be working on. Just like you, I recently just started exploring and working with the different storage (s3) and analytical tools (AWS Athena and Quicksight) available in AWS.

In this article, we will focus on STORAGE SERVICES (Amazon S3), which allow us to store structured sets of data used by applications just like SQL Server, Oracle, and MySQL. In my previous article which outlined the steps needed to ingest data from Excel (.xlsx) to Azure Server Database, here I will be guiding you on how to ingest data into one of AWS Cloud Storage Service, S3.

Before I go on, it is important you have Credit in your AWS Account and also have a dataset available.

For this walkthrough, we would be using structured data and storing it in a .csv format. Here are some links to free datasets:

AWS Datasets: https://registry.opendata.aws/

Data World: https://data.world/

Kaggle: https://www.kaggle.com/datasets

Data.gov: https://data.gov/

Datahub.io: https://datahub.io/search

In doing so, Here is a step-by-step process:
Step 1: Log into your AWS account, and search for S3 (Amazon Simple Storage Service) in the console home. S3 is a fully managed object-based storage service and it's highly durable and cost-effective. S3 is used for storing large files like videos, images, static websites, and even backup archives as well.

Step 2: Create a Bucket. Think of a bucket as a container for your data to store objects.

By default, you can create up to 100 buckets in your AWS Account.

Click the Create icon
Next, create a bucket name (note that your bucket name has to be completely unique).
Change the AWS Region as you desire
For the other options such as Object Ownership, I usually just use the default recommended option.
Check the Public access settings for your bucket (this protects against public access and also allows you to inspect and change already existing policies and ACLs for your buckets and objects).
Enable Bucket Versioning, I will recommend you enable this so that you can have a different history of the changes made on your buckets. This will help you recover objects when you accidentally overwrite or delete (Just think of this like Github Version Control)

Go on and click the Create Bucket icon. Once it has been created, you will then see a success message e.g “Successfully created bucket “rfmdataworld”

Step 3: Select the newly created bucket “rfmdataworld”

Step 4: Upload your data into the bucket

In the objects option, click the “Upload” Icon, you will then be redirected to the UPLOAD PAGE.

Select “Add files”, these files in this case would be any dataset you already have stored on your computer.
Go ahead and Upload the file
Once it has been successfully uploaded, a success status, “Your file has been successfully uploaded” will be shown.