DEV Community: Abdul Raheem

Extractly - Turn PDFs into Data

Abdul Raheem — Mon, 15 Sep 2025 06:50:48 +0000

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

Extractly is an AI-powered PDF extraction platform that accurately extracts text, tables, and charts from PDFs, preserving their exact format and wording.

Most open-source libraries break down when faced with complex PDFs, especially SEC filings, financial reports, and compliance documents filled with dense tables and tricky formatting. Cells merge, numbers misalign, and the meaning of entire sections can be lost. Extractly fixes this problem.

Extractly excels at maintaining table structures and formatting integrity. This matters because even a small misalignment in financial tables can completely change their meaning, which in turn compromises any downstream applications( especially RAG systems).

With Extractly, organizations can:

Reliably extract complex tables and structured data without losing fidelity.
Ensure clean, LLM-ready data for training or RAG pipelines.
Build production-grade AI systems that understand documents as they were intended, not as garbled text.

By bridging the gap between raw PDFs and accurate, structured data, Extractly enables a new level of trust, precision, and usability in working with critical documents.

Original PDF Content	Extractly Content

Real World Impact

By transforming messy, unstructured PDFs into clean, structured, and reliable data, Extractly unlocks new levels of automation and insight across industries:

Finance & Compliance → Accurate SEC filing extractions reduce hours of manual review.
Legal & Contracts → Precise table preservation ensures no meaning is lost in negotiations.
Healthcare & Research → Extracts lab results and trial data from complex forms with high accuracy.
AI & RAG Pipelines → Produces clean, reliable data that boosts retrieval accuracy and downstream analytics.

Demo

Live App: https://extractly-505581424280.us-west1.run.app/
Video Recording: https://youtu.be/rFwgBlbzGXg

How I Used Google AI Studio

I used Google AI Studio to quickly turn my backend into a functional app. With the app builder and Gemini 2.5 Pro code assistant, I connected my backend, generated the UI, and set up the necessary connectors in minutes.

I applied prompt engineering techniques to guide the code assistant in optimizing the user experience. This included refining the UI into a more interactive design, adding components such as file download options, and rendering extracted results directly within the application.

Google AI Studio saved me a lot of time I would have otherwise spent building the frontend from scratch, while still giving me the freedom to shape the app’s flow and design the way I wanted.

Multimodal Features

Extractly uses Gemini 2.5 Pro’s multimodal capabilities to process PDFs that contain a mix of text, tables, and images. Instead of treating a PDF as flat text, Gemini analyzes both the content and the layout, which allows Extractly to:

Accurately capture tables and complex structures without losing formatting or merging cells.
Preserve original document fidelity, so financial and legal documents retain their meaning.
Extract multiple modalities together (text, structured data, and visuals) for a richer, more usable output.

By treating PDFs as multimodal objects (text + layout + structure), Extractly ensures that users don’t lose meaning or context when working with complex documents. For users, this means they can trust the extracted data to be LLM-ready, consistent, and production-grade without spending time on manual cleanup.

Deploy Your Static Web App on AWS S3 in just 10 Minutes

Abdul Raheem — Sun, 15 Sep 2024 08:47:58 +0000

I recently found myself staring at a folder full of HTML, CSS, and JavaScript files—my latest online project was about to go live. But then came the scary question: How do I launch this static web application without getting bogged down in server maintenance and complicated configurations?

I wanted a solution that was easy, affordable, and scalable. I'd heard of several hosting providers, but most looked overkill for a simple static website. Upon research, I came across AWS Serverless technologies, notably Amazon S3. Skeptical yet intrigued, I decided to give it a try. I'm pleased I did because it worked out to be the ideal option.

This blog will demonstrate how to solve this problem using AWS Serverless technology, specifically by leveraging Amazon S3 (Simple Storage Service) to host a static website.

Understanding AWS Serverless

AWS Serverless is a cloud-native development model that allows developers to build and run applications without thinking about servers. While servers still exist, AWS handles all the server settings, maintenance, and scaling. You simply focus on your website code.

Understanding Amazon S3 Buckets

An Amazon S3 bucket is a cloud storage resource available in AWS's Simple Storage Service (S3), which is an object storage service offering scalability, data availability, security, and performance. It provides Static website hosting where you can host your application serverless.

Note: The homepage of your document should be specified as index.html

Why Use S3 Buckets for Static Websites?

Perfect for hosting HTML, CSS, JavaScript, images, and other static files.
Designed for high durability.
Pay only for the storage you use.

Step-by-Step Guide

1. Setting Up AWS Account

An AWS account. If you don't have one, you can create a Free AWS Account.

2. Visit AWS Management Console

Go to the AWS Management Console and search for S3.

3. Create a new Bucket

On the S3 homepage, you will see an option to create a new bucket for your website. Click on it.

Create a bucket with a default setting but be very careful while choosing the name for the bucket because it can not be changed in the future.

4. Upload your website to Bucket

After creating the bucket, click on it to access the option to upload files. Click the Upload button and add the website repo/folder in your bucket and upload them. Again make sure to rename your homepage as index.html

5. Click on the Website Link

After uploading the files, you visit your website folder and there you can see all the files including index.html and if you click on Index.html you'll see a webpage link named as Object URL.

But clicking on it will result in the following error.

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>04JFN7P4G3PAKWRX</RequestId>
<HostId>Kdv84AFKxknddRTFla2/aeGypn/pqig7MRHi9QtVk1PD2IMoVAxG59+gI4+R8Svlv8fDQ7l3x6k=</HostId>
</Error>

Why so?

In creating the bucket, we set the bucket permissions to not allow public read access, preventing users from accessing your files.

5. Update Bucket Setting

To access our website publicly we need to do the following two things:

a. Allow Public Access

For this, you need to visit your bucket and then click on Permission and edit Block public access (bucket settings) and uncheck Block all public access

b. Update Bucket Policy
In the same permission tab, you see an option for bucket policy. On that bucket policy add the following policy and save changes.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicReadGetObject",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }
  ]
}

Note: You need to update your Resource with your bucket name in the above code

6. Check Index.html

After completing all the steps, navigate to your website folder and click on the index.html file.

You'll find the previous link there—click on it, and this time, you'll see your application running on a cloud serverless platform with minimal effort.

Conclusion

Hosting a static website with AWS S3 is an easy and quick solution. By creating a bucket, uploading your files, and adjusting relevant settings, you can quickly deploy your site without worrying about server management. It’s a simple and efficient way to launch a static website using AWS Serverless technologies.

Deploy Your LLM on AWS EC2

Abdul Raheem — Sat, 14 Sep 2024 20:31:02 +0000

Ever been excited to deploy your own Large Language Model(LLM) but hit a wall because your laptop isn't up to the task? I know exactly how that feels. LLMs are powerful, but they demand a lot of computing power—something most of us just don’t have on hand.

I ran into this problem myself. I wanted to create a cool application using LLMs that my friends could use, but the idea of buying an expensive GPU just to make it work was out of the question. So, I started looking for a workaround. That's when I stumbled upon AWS.

Using AWS turned out to be a lifesaver. I didn’t need to invest in any fancy hardware. Instead, AWS lets you pay only for what you use. Now, I've got three different LLMs applications running on the cloud, and I didn’t have to spend a ton of money on equipment.

In this post, I'll walk you through how you can set up your own LLMs on the cloud step-by-step and will share some tips to save your costs as well. If I can do it, you can too!

Getting Started

For deployment on AWS, we need to understand how LLMs compute work and which instance can be a good fit for us to go with.

I will be using an LLM-based RAG application built with Streamlit, using the LLama2-7B model from Meta.

1. Understanding the Right resource

Since each Large Language Model (LLM) has a different number of parameters and may use different numerical precisions, they require varying GPU capabilities for inference and fine-tuning.

A simple rule of thumb to estimate how much GPU memory is needed to store the model parameters during inference of any open-source LLM is as follows:

GPU Memory (in bytes) = Number of model parameters × Bits per parameter ÷ 8 (bits per byte)

For example, for a 7-billion-parameter model using 32-bit floating-point numbers:

GPU Memory = 7,000,000,000 parameters × 32 bits per parameter ÷ 8 bits per byte
GPU Memory = 28,000,000,000 bytes
GPU Memory = 28 GB

However, this requirement can be quite high. To reduce the memory footprint, we can use quantization techniques. For instance, using 4-bit quantization:

GPU Memory = 7,000,000,000 parameters × 4 bits per parameter ÷ 8 bits per byte
GPU Memory = 3,500,000,000 bytes
GPU Memory = 3.5 GB

Therefore, quantizing the model to 4 bits reduces the GPU memory requirement to approximately 3.5 GB.

2. Which Compute To Choose?

AWS instances like g4, g5, p3, and p4 are the latest generation of GPU-based instances and provide the highest performance in Amazon EC2 for deep learning and high-performance computing (HPC).

Some examples of compute and cost are as follows:

Instance Type	GPU Type	GPU Memory (GB)	vCPUs	On-Demand Price (per hour)
g4dn.xlarge	NVIDIA T4	16	4	$0.526
g5.xlarge	NVIDIA A10G	24	4	$1.006
p3.2xlarge	NVIDIA V100	16	8	$3.06
g4dn.12xlarge	NVIDIA T4	192	48	$4.344
g5.8xlarge	NVIDIA A10G	96	32	$8.288
p3.8xlarge	NVIDIA V100	64	32	$12.24
p4d.24xlarge	NVIDIA A100	320	96	$32.77

Step-by-Step Guide

Since we are aware of all the math and resources we require, let's go with deploying LLM application on AWS.

Step - 1

Search EC2 on your AWS Console. You will see a similar page for EC2.

Step - 2

Click on Instance from the sidebar and then click on Launch Instance

Step - 3

Configure the EC2 instance using the following setting and launch a new instance.

Name: Your-Application-Name
Amazon Machine Image (AMI): Ubuntu-Latest
Instance Type: g4dn.xlarge
Key Pair: Create a New Key Pair and use that
Network Setting: Use default with the addition of "Allow HTTPS traffic from the internet" & "Allow HTTP traffic from the internet"
Storage: 16 GB

Step - 4 (Only For Streamlit Application)

Go to the recently launched new instance and define some inbound roles, since we are running a streamlit application and it runs on port 8501, we have to define that rule.

Here, we will click on "Edit inbound rules" and add the following Custom TCP role with port 8501 and click save rules.

Step - 5

Go to the recently launched EC2 page and click on the connect button and then connect your recently created instance.

Step - 6

Once connected you will see a terminal where you have to run the following commands to install the required dependencies and updates.

Commands:

sudo apt update
sudo apt-get update
sudo apt upgrade -y
sudo apt install git curl unzip tar make sudo vim wget -y
sudo apt install python3-pip

Step - 7

Once downloaded all the dependencies, you need to connect with GitHub to clone(download) your streamlit application and its requirements from the GitHub repo.

You can clone your repo using the following command:

git clone "Your-repository"

Once cloned, you can enter your repo using the following command:

cd "Your-repository"

Step - 8

First, we need to set up a virtual environment to ensure all the dependencies for the Streamlit application are installed in an isolated environment. This helps avoid conflicts with system-wide Python packages.

sudo apt install python3-venv
python3 -m venv venv
source venv/bin/activate

Once the virtual environment is created then we need to install any requirements for the Streamlit application. For this, we will use pip to read the requirements.txt file and download all the libraries defined in that file.

pip install -r requirements.txt

Step - 9

Once everything is installed and ready, it's time to run your LLM-based application.

For this, you will run the following command:

python3 -m streamlit run <your-application-name>
python3 -m streamlit run app.py

Once you click on the external link, it will redirect to your streamlit application. And this is the public link that you can share with your friends and they can enjoy your application as well.

Voila!!!

The above command will run the application till your terminal is connected. Once it is closed the application will also stop working.

To Make sure the application keeps running if the terminal is disconnected, you need to run the following command.

nohup python3 -m streamlit run app.py

The nohup ensures the application keeps running even if you log out or lose your terminal session.

Conclusion

Deploying LLMs doesn’t have to be limited by your hardware. AWS offers a flexible, cost-effective way to harness the power of LLMs without investing in expensive GPUs. With the right instance and some optimization techniques like quantization, you can run sophisticated models in the cloud efficiently. By following this guide, you’re now equipped to deploy your own LLM-based application, making it accessible and scalable. So, dive into cloud deployment, share your innovations effortlessly, and let your application reach its full potential!

Reference:

Deequ: Your Data's BFF

Abdul Raheem — Fri, 23 Aug 2024 12:32:14 +0000

Data quality is crucial for reliable applications, whether you’re training accurate machine learning models or ensuring that your insights and decisions are based on trustworthy and accurate data. Issues like missing values, data distribution shifts, and incorrect data can lead to malfunctions, inaccurate machine-learning models, and bad business decisions.

What is Deequ and Why It’s Important?

Deequ is a library built on Apache Spark that allows you to create "unit tests for data," helping you check and measure data quality in large datasets. Deequ is used internally at Amazon to ensure the quality of various large production datasets.

Where in AWS, relevant team members set data quality constraints, and the system regularly checks metrics and enforces rules, pushing datasets to ML models upon success.

The best part is If we own a large number of datasets or if our dataset has many columns, it may be challenging for us to manually define appropriate constraints. Deequ can automatically generate some useful constraints by analyzing the data distribution. It begins with data profiling and then applies a series of rules to the results. We will see that in detail in the practical part.

Moreover, Deequ leverages Spark to compute and provide direct access to data quality metrics like completeness and correlation through an optimized set of aggregation queries.

Purpose of Using Deequ?

Deequ's purpose is to "unit-test" data to find errors in an early stage before the data gets fed to consuming systems or machine learning algorithms.

Some of the benefits of using Deequ are as follows:

Early data error detection
Improve data reliability and trustworthiness
Automated data validation
Improved data integrity
Streamlined data profiling
Integration with Spark(scalability + efficiency)

Ideal Datasets for Deequ?

Deequ is useful for datasets that are meant to be consumed by machines or for tasks involving data analysis, or in simple words we can use Deequ for any dataset that can fit into a Spark dataframe.

This includes data stored in tables, such as spreadsheets, or databases with a well-defined schema. Deequ confirms the data quality by applying pre-defined constraints or automated constraints to ensure consistency, accuracy, and completeness. It’s specifically designed for structured data with clearly defined attributes. That is why It's designed for data with a clear structure and defined attributes.

Important: Deequ's strength lies in handling massive datasets efficiently. Its distributed processing power will not be fully utilized with a few thousand rows dataset. Setting up and managing Spark clusters can add overhead, potentially slowing down the overall processing pipeline. The advantage of automated checks and scalability wouldn't be relevant for such a small size. Because we can achieve similar results with lower computational cost and minimal manual efforts.

Deequ excels at ensuring data quality in batch data processing rather than streaming data.

Pros & Cons of Deequ

Before diving into Deequ, it’s important to weigh its pros and cons to understand how it fits your data quality needs and what challenges you might face.

Pros:

Declarative API: Easy to use, we specify what we expect the data to look like or behave rather than writing complex validation checks manually.
Example: Instead of writing complex code to check if a column has missing values, we can simply say "This column should not have missing values" in Deequ's declarative language. This makes it easier for our team to understand and maintain our data validation checks.
Metrics and Constraints: Provides various data quality metrics and allows defining constraints based on those metrics for comprehensive data analysis.
Example: We can define constraints on the number of missing values allowed in a column (e.g., "no more than 5% missing values"). Deequ will calculate the actual percentage of missing values and compare it to our constraint, highlighting any violations. Additionally, we can define constraints for data distribution (e.g., "ensure the age column has a normal distribution"), allowing for comprehensive data analysis.
Scalability: Leverages Apache Spark for distributed processing, making it efficient for big data (billions of rows).
Example: Imagine a dataset with billions of customer records. Validating this data locally on a single machine would be slow and impractical. Deequ utilizes Apache Spark, which distributes the data and validation tasks across multiple machines in a cluster. This allows Deequ to handle massive datasets efficiently, analyzing each record in parallel.
Automation: Integrates with ML pipelines for automatic data validation, catching issues early and preventing downstream problems.
Example: We can integrate Deequ into our machine-learning workflow. Before training our model, Deequ automatically validates the data, catching issues like missing values or unexpected data formats. This helps prevent us from training a model on bad data, potentially leading to inaccurate results.
Open-Source: Freely available and customizable to specific needs.

Cons:

Learning Curve: Requires some understanding of Apache Spark and data quality concepts.
Limited Out-of-the-Box Rules: While Deequ offers a good set of metrics, we might need to write custom rules for complex validation needs.
Overhead for Small Datasets: Setting up Deequ involves some initial configuration and code writing and Cloud corporate cost if applicable. For very small datasets (like 2,000 rows), the time and effort spent setting up Deequ might outweigh the benefits of automated data validation.

Practical Work

In this practical work, we will use PyDeequ, an open-source Python wrapper over Deequ (an open-source tool developed and used at Amazon). While Deequ is written in Scala, PyDeequ allows us to use its data quality and testing capabilities from Python and PySpark, the language of choice of many data scientists.

We can call each Deequ function using Python syntax. The wrappers translate the commands to the underlying Deequ calls and return their response.

Step-by-Step Process

1. Setting Up Pyspark:

I used the PyDeequ Notebook and ran it in the AWS Sagemaker and set up the system using the following:

Install necessary Python packages: pydeequ, sagemaker-pyspark, and pyspark.
Download and set up Java JDK (OpenJDK 11).
Set the JAVA_HOME environment variable and update the PATH.
Verify Java installation.
Initialize a Spark session with PyDeequ configurations.

2. Loading Dataset & Visualizing Schema:

For the dataset, I used an open-source dataset of NYC TLC Trip.

3. Data Analysis:

Before we define checks on the dataset, we want to calculate some statistics on the dataset; we call them metrics.

The output table shows the results of various data features, such as its size, completeness, distinct counts, mean values, compliance checks, and correlations.

From the above data metrics, we learned the following:

Metric	Observation
Compliance of Long Trips	Only 6.53% of trips are classified as long trips.
Mean Trip Distance	The average trip distance is approximately 5.37 miles.
Dataset Size	The dataset contains approximately 2,463,931 records.
VendorID Completeness	The VendorID column is 100% complete, with no missing values.
Correlation between Total Amount and Fare Amount	There is a very high correlation (0.9999) between total_amount and fare_amount, indicating almost perfect correlation.
Correlation between Fare Amount and Trip Distance	There is a very low correlation (0.0004) between fare_amount and trip_distance, indicating little to no linear relationship.
Approximate Count of Distinct VendorIDs	There are approximately 4 distinct values in the VendorID column.

4. Define and Run Tests on Data:

After analyzing the data, it's important to make sure the same properties hold in future datasets. Let's define some data quality checks to the pipeline, we can ensure every dataset is reliable for any application that uses it.

Output:

Analysis

Check	Outcome
Record Count	Passed, with over 2,000,000 entries.
Completeness	All columns (VendorID, payment_type, etc.) passed the completeness check.
Uniqueness	Failed for VendorID, indicating duplicate values exist.
Value Range	Passed for VendorID and payment_type (with 96% of payment_type values within ["1", "2"]).
Non-Negativity	Passed for DOLocationID, PULocationID, and trip_distance.

Note: VendorID failed the uniqueness check, showing duplicate values. All other checks were successful.

5. Automated Constraints Generation:

Remember, as we discussed earlier, we can automatically generate certain data quality checks for the dataset. Let's see how it works in action.

Now, you can apply some of these constraints to your data to ensure it meets the quality standards and performs well under these checks.

Conclusion:

Deequ is a powerful tool for automating data quality checks at scale, ensuring reliable and accurate datasets for better decision-making.

That’s it for today’s Deequ blog! I hope you found it insightful and learned something new. For more information, detailed documentation, and the original code, feel free to explore the following pages:

Mastering Amazon ECS: Key Building Blocks Explained

Abdul Raheem — Thu, 13 Jul 2023 08:10:01 +0000

Amazon Elastic Container Service (ECS) is a comprehensive managed service offered by AWS, specifically designed to facilitate the seamless execution of containers in the cloud environment. ECS relieves developers from the burden of configuring intricate infrastructure settings, enabling them to focus solely on their application code. Whether it involves deploying a straightforward website or orchestrating elaborate distributed microservices with an extensive container fleet, ECS streamlines the entire process effortlessly.

Getting started with ECS is straightforward. To fully understand how it works and how you can use it, it helps to understand the basic building blocks of ECS and how they fit together!

Amazon EC2

First, we need to understand EC2 building blocks because when it comes to launching your containerized applications with Amazon ECS, you have the flexibility to choose between two launch types: EC2 and Fargate. While both options offer powerful capabilities, Fargate takes container management to the next level by abstracting away the complexities of Amazon EC2 instances. With Fargate, you can shift your focus towards managing tasks, rather than worrying about the underlying infrastructure components. In this blog post, we'll delve into the key features and benefits of both launch types, highlighting the advantages of Fargate's simplified approach to containerization within ECS.

1. Instance

EC2 instances serve as reliable virtual machines (VMs) that offer a wide range of benefits. Notably, you can seamlessly connect to these instances using SSH, ensuring efficient management and control over your containerized applications. The beauty of EC2 lies in its ability to accommodate diverse customer requirements, encompassing memory, storage, and computing power. With numerous instance types available, you can easily find the perfect fit for your specific needs. If you're looking to run a small application or explore a free trial, the t2.micro instance type is an excellent choice. On the other hand, for memory-intensive workloads, options like R3 and X1 instances provide optimal performance. Furthermore, there is a rich assortment of additional instance types tailored to cater to a wide array of use cases.

With this multitude of EC2 instance types at your disposal, Amazon ECS empowers you to select the ideal configuration that aligns with your application's requirements. Whether you seek lightweight experimentation or robust, memory-optimized operations, EC2 instances offer the flexibility and scalability needed to fulfill your computing needs effectively.

2. AMI

AMI stands for Amazon Machine Image. In simple terms, an AMI acts as a vital source of information required to launch an instance. It encompasses critical aspects such as the root volume, launch permissions, and volume-attachment specifications. When it comes to AMI selection, you have multiple options at your disposal. AWS offers a variety of Linux and Windows AMIs that you can readily leverage.

Alternatively, you can explore AMIs created and shared by the vibrant user community or browse through the extensive offerings in the AWS Marketplace. For instance, you might consider the Amazon ECS-Optimized AMI specifically designed to enhance ECS deployments. Additionally, if none of the existing options meet your requirements, you can even create your own custom AMI. By carefully choosing the appropriate AMI, you lay a solid foundation for successful instance creation within Amazon ECS. This step ensures that your instances possess the necessary configurations, permissions, and specifications to support your desired workload effectively.

3. Region

The expansive world of AWS is divided into regions, encompassing distinct geographic areas across the globe.

These regions include us-east-1 (N. Virginia), us-west-2 (Oregon), eu-central-1 (Frankfurt), ap-northeast-1 (Tokyo), and many more.

Each AWS region is meticulously designed to ensure complete isolation from one another. They consist of multiple independent data centers, forming an intricate network that creates a substantial "blast radius" in the event of failure. This means that even if an entire region experiences an outage, the others remain unaffected, safeguarding your operations from widespread disruptions. By strategically choosing the appropriate AWS region, you will have a solid foundation for your ECS operations, ensuring reliable and efficient container management within a geographically optimized environment.

4. Availability Zone

AWS regions are further divided into distinct units called Availability Zones. Each region comprises a minimum of two zones, and in some cases, multiple zones are available. These zones are physically isolated from one another and encompass different data centers within the region. However, they are interconnected via high-speed, low-latency fiber-optic networks and share certain underlying facilities.

The design of EC2 emphasizes mitigating the impact of common failures, ensuring that they are contained within a single zone and do not result in region-wide outages. By distributing your services across multiple zones and distributing workloads across hosts, you can achieve a high level of availability within a region.

This architecture provides a robust and fault-tolerant infrastructure, enabling you to design resilient applications that can withstand failures at the zone level. By strategically leveraging multiple Availability Zones, you can enhance the availability and reliability of your AWS services, ultimately delivering a seamless experience to your users.

5. Virtual Private Network(VPS)

The Amazon EC2-VPC (Elastic Compute Cloud - Virtual Private Cloud) capability allows you to construct a logically isolated virtual network environment within the AWS cloud. It gives you complete control over your virtual networking resources and allows you to customise and adjust your network settings to meet your individual needs.

Here are some key aspects and benefits of EC2-VPC:

Isolation and Security: EC2-VPC enables logical isolation from other networks, enhancing security and privacy.
Subnets: Divide your virtual network into subnets for IP address allocation and network segmentation, improving security and resource management.
Routing: Manage routing tables to control traffic flow between subnets and the internet, allowing for complex network architectures.
Internet Gateway: Facilitate outbound internet access and inbound traffic from the internet with an internet gateway.
Security Groups: Define virtual firewalls to control inbound and outbound traffic at the instance level, ensuring network security.
Network ACLs: Implement stateless packet filters at the subnet level to add an extra layer of network security.

Amazon ECS building blocks

Containers

The most important block of ECS are the containers. There is a huge misconception that containers are virtual machine but they are not. While virtual machines virtualize the hardware, containers take virtualization a step further by virtualizing the operating system. Containers are composed of processes running on the host system, interconnected through kernel constructs like namespaces and cgroups. However, the detailed inner workings of containers are not necessary to discuss in this post.

Why We Need Containers?

Containers provide a game-changing capability of building, shipping, and running your code effortlessly across diverse environments.

In the pre-cloud era, self-hosting necessitated the purchase of physical machines, setting up and configuring the operating system (OS), and finally executing your code. However, with the advent of virtualization in the cloud, the process became streamlined by eliminating the hardware concerns and allowing direct focus on OS setup and code execution. Containers take this convenience a step further by simplifying the process to just running your code.

Advantages

A key advantage of containers is their ability to package all dependencies along with the code in what is known as an image. This self-contained package enables containers to be deployed on any host machine seamlessly. From an external perspective, hosts appear as holders of multiple containers, all sharing a generic nature that allows them to be deployed on any host.

Within the realm of ECS, you can effortlessly run your containerized code and applications across a managed cluster of EC2 instances. This powerful capability empowers you to leverage the scalability and flexibility of AWS infrastructure while seamlessly managing and orchestrating your containerized workloads.

Container instance

An ECS container instance possesses distinct characteristics, including a precisely defined IAM policy and role, tailored to facilitate seamless integration with the ECS service. Additionally, these container instances are registered into your ECS cluster, forming an essential part of the overall infrastructure.

As you may have anticipated, within these instances, containers come into play. It is within the ECS container instances that you execute and manage your containerized workloads, leveraging the flexibility and scalability provided by the underlying EC2 infrastructure. By understanding the unique composition of ECS container instances and their pivotal role in the ECS ecosystem, you can harness the power of containerization to effectively deploy and orchestrate your applications with ease and precision.

Cluster

In ECS, an ECS cluster is a grouping of container instances* (or tasks in the case of Fargate) within a specific AWS region. These clusters can span multiple Availability Zones, offering redundancy and resilience.

When launching an instance (or tasks in the case of Fargate) within ECS, it automatically registers with the default cluster named "default" unless specified otherwise. If the default cluster doesn't exist, ECS creates it on the fly. Additionally, you have the flexibility to scale and delete your clusters based on your requirements. ECS clusters provide a streamlined approach to container management, enabling efficient organization and control over your container instances or tasks. By leveraging clusters, you can enhance the reliability and scalability of your containerized applications within the ECS environment.

Agent

The Amazon ECS container agent, a Go program, operates within its own container on each EC2 instance used with ECS. It serves as the intermediary component, facilitating communication between the scheduler and your instances. Running the agent on your instance is necessary for registering it into a cluster, which provides both a logical boundary and a resource pool.

Task & Task Definition

In ECS, containers are executed as part of a task, necessitating the creation of a task beforehand. Tasks serve as a logical grouping of 1 to N containers that run together on the same instance. The value of N can be up to 10 containers.

However, we can't create a task directly. Instead, you must create a task definition, which specifies the composition of the task. It's comparable to an architectural plan for a city. A task definition defines which containers are part of the task and includes details about container interaction, CPU and memory constraints, and task permissions using IAM roles.

Once you have a task definition, you can instruct ECS to start a task using that specific definition. While it may seem like extra planning initially, as you encounter scenarios involving multiple tasks, scaling, upgrades, and other real-life situations, the value of task definitions becomes evident. They provide a systematic approach to manage and track tasks within ECS, ensuring efficient container execution and orchestration.

Scheduler

The ECS scheduler is a vital component of the hosted orchestration layer provided by ECS. In simple terms, hosted means that ECS takes care of managing the scheduler on your behalf, saving you from the hassle of handling it yourself. While your applications run in containers on your instances, ECS manages the tasks for you, alleviating your concerns.

The scheduler's role is to decide which containers run on which instances based on specific constraints. For instance, if you need to scale a custom blog engine for high availability, you can create a service that automatically distributes tasks across all zones in your chosen region. By using the distinctInstance task placement constraint, you can ensure that each task runs on a different instance. ECS not only handles these assignments but also takes care of automatically restarting failed tasks.

With the ECS scheduler, you can focus on your applications, knowing that the task assignment and management are being efficiently handled. This simplifies the process of scaling, distribution, and ensuring high availability for your containerized workloads. Let ECS do the heavy lifting, while you enjoy the benefits of a simplified and streamlined container orchestration experience.

Service

A service in ECS is a unique concept that allows you to specify the desired number of tasks to be running at any given time, based on a specific task definition. If you set N=1, it means "ensure that this task is running and restart it if necessary!" This ensures that your task remains operational, with ECS automatically monitoring and restarting it if needed. On the other hand, with N > 1, you can effectively scale your application by running multiple instances of the task, while still ensuring that each task remains running.

By leveraging ECS services, you can simplify the management and scaling of your application tasks. ECS takes care of the underlying orchestration, enabling you to focus on developing and deploying your applications with confidence. Whether you need a single task or multiple instances running, ECS services provide a convenient mechanism for achieving seamless task management and scaling capabilities.

Load Balancer

Amazon ECS distributes incoming traffic among numerous containers or jobs by interacting with Elastic Load Balancing (ELB). This connection enables you to expose your containerized applications to the internet or internal networks, allowing for increased availability and scalability. ECS provides customizable load balancing mechanisms adapted to your application's individual demands, whether you choose Application Load Balancers (ALBs) for advanced routing and HTTP/HTTPS traffic or Network Load Balancers (NLBs) for high-performance load balancing at the transport layer. This ECS/ELB combo ensures that your application can handle increased traffic volumes, optimises resource utilisation, and provides fault tolerance for a resilient and responsive user experience.

Conclusion

In this blog post, we explored the key concepts and components of Amazon Elastic Container Service (ECS). We learned that ECS provides a managed environment for running containers in the cloud, eliminating the complexities of infrastructure configuration. By leveraging ECS, you can easily deploy and manage containers, whether it's for hosting a simple website or running complex distributed microservices.

We discussed the importance of understanding ECS building blocks such as clusters, instances, tasks, and services. Clusters act as logical groupings of container instances, while tasks represent a grouping of containers running together on the same instance. Services simplify task management and scaling, ensuring the desired number of tasks are running at all times.

Additionally, we touched upon the role of the ECS scheduler, which automates the allocation of containers to instances based on defined constraints. The scheduler plays a crucial part in achieving high availability, scalability, and efficient resource utilization.

You can explore more on ECS from following:

Amazon ECS Documentation: https://docs.aws.amazon.com/ecs
AWS Blog On Containers: https://aws.amazon.com/blogs/containers
AWS ECS GitHub Repository: https://github.com/aws/amazon-ecs-agent

Machine Learning 101: A Comprehensive Guide for Beginners

Abdul Raheem — Mon, 23 Jan 2023 16:07:18 +0000

Machine learning is a method of teaching computers to learn from data, without being explicitly programmed. It is a subset of artificial intelligence that focuses on the development of algorithms and statistical models that can enable machines to improve their performance on a specific task by learning from data.

Application:
In the real world, machine learning is used in a wide range of applications. Some examples include:

Self-driving cars, which use supervised learning to learn how to drive safely and efficiently.
Image and speech recognition, which use supervised learning to identify objects and transcribe speech.
Fraud detection, which uses supervised learning to identify suspicious transactions.
Recommendation systems, which use unsupervised learning to suggest products or content to users.
Robotics, which use reinforcement learning to train robots to perform tasks such as grasping objects or walking.

How Machine Learning Works

The process of machine learning typically involves the following steps:

1 - Collecting and preparing the data: This step involves acquiring the data, cleaning it, and transforming it into a format that can be used by the machine learning algorithm.

2 - Choosing a model: This step involves selecting a machine learning algorithm that is appropriate for the task at hand.

3 - Training the model: This step involves using the labeled data to train the model, so that it can make accurate predictions on new data.

4 - Evaluating the model: This step involves using a separate dataset to evaluate the model's performance, and making adjustments as needed.

5 - Deploying the model: This step involves using the trained model to make predictions on new data, or using it to control a system in the real world.

Main Types of Machine Learning

1. Supervised

Supervised learning is a type of machine learning where the computer is given a labeled dataset, and the goal is to train a model to make predictions on new, unseen data. The model learns to make predictions about unseen data by finding patterns in the training data.

Example:
One of the most common application of supervised machine learning model is to classify the different types of objects. For Example, in the figure below, we can see that there are some labeled data of square, triangle and hexagon and supervised machine learning model will tell us whether a new object will be square or triangle or hexagon based on its learning with the dataset.

Some other examples of supervised learning tasks include:

Image classification
Spam detection
Predicting stock prices
Predict whether a person is likely to develop a certain medical condition based on their age, sex, and other factors

Supervised Machine Learning Models:

There are several types of models used in supervised machine learning, including:

Linear Regression: Linear regression is a statistical model that is used for predicting a continuous value. It works by finding the best-fitting line through the data points. Real-world examples include: Sales forecasting, Inventory prediction

Logistic Regression: Logistic regression is a statistical model that is used for binary classification tasks, such as determining if an email is spam or not. Real-world examples include: Medical diagnosis, Credit scoring

Decision Trees: A decision tree is a tree-like model that makes decisions based on a series of conditions. It is commonly used for classification and regression tasks. Real-world examples include: Credit scoring, Medical diagnosis

Random Forest: A random forest is an ensemble of decision trees, where each tree is trained on a different subset of the data. The final output is the average of the outputs of all the decision trees. Random forest is commonly used for classification and regression tasks. Real-world examples include: Stock market analysis, Quality control

Support Vector Machines (SVMs): SVMs are a type of model that can be used for both classification and regression tasks. They work by finding a line (or hyperplane) that maximally separates the different classes in the data. Real-world examples include: Handwriting recognition, Gene expression classification

Neural Networks: Neural networks are a type of model that are inspired by the structure and function of the human brain. They are commonly used for image and speech recognition, natural language processing, and other complex tasks. Real-world examples include: Image recognition, Speech recognition, Natural Language Processing

Summary: In supervised machine learning, different models have different use cases and are good at different types of problems. It is important to understand the characteristics and assumptions of each model and choose the appropriate one for the specific problem at hand.

2. Unsupervised

Unsupervised learning is a type of machine learning where the model is not provided with labeled data, and the goal is to find patterns or relationships in the input data by its own. It is used to discover hidden structure or features in data.

Example:
In fruit detection model, we didn't give any label to any of the food to the model. But the model check the characteristics and similarity of the object in dataset and based on that similarity it distinguish the dataset into two parts as shown in the figure. If a new fruit is given to the machine learning model, it will not tell it is apple or orange rather it put that object in one of the category.

Some other examples of unsupervised learning tasks include:

Clustering satellite images to identify distinct features in landscapes
Anomaly detection in manufacturing process to identify faulty equipment
Identifying patterns in genetic data for disease diagnosis
Grouping customers by purchase history for targeted marketing
Identifying patterns in network traffic for cybersecurity

Unsupervised Machine Learning Models:
There are several types of models used in unsupervised machine learning, including:

K-Means: K-means is a clustering algorithm that groups similar data points together. It is used for tasks such as market segmentation and image segmentation. Real-world examples include: Customer segmentation, Image segmentation

Hierarchical Clustering: Hierarchical Clustering is a method of clustering which builds a hierarchy of clusters. It is used for tasks such as image segmentation and gene expression analysis. Real-world examples include: Image segmentation, Gene expression analysis

Principal Component Analysis (PCA): PCA is a technique used for dimensionality reduction. It works by finding the principal components of the data, which are the directions of maximum variance. Real-world examples include: Face recognition, Handwriting recognition

Autoencoder: Autoencoder is a neural network used for dimensionality reduction and feature learning. It works by learning a compressed representation of the input data. Real-world examples include: anomaly detection, speech recognition

Self-Organizing Maps (SOMs): SOMs are a type of neural network that is used for visualization and dimensionality reduction. It projects high-dimensional data onto a 2-dimensional grid, preserving the topological structure of the data. Real-world examples include: Fraud detection, Quality control

Generative Adversarial Networks (GANs): GANs are a type of model composed of two neural networks: a generator and a discriminator. The generator creates new data samples that are similar to the training data, while the discriminator attempts to distinguish the generated data from the real data. Real-world examples include: Image synthesis, Text-to-speech

Summary:
Unsupervised machine learning models are good at finding hidden structure and patterns in data, and can be used in a wide range of applications. But it's important to understand the assumptions and limitations of each algorithm and choose the appropriate one for the specific problem at hand.

3.Reinforcement

Reinforcement learning is a type of machine learning where the computer learns through trial and error. The goal is to train an agent to make decisions that will maximize a reward signal. For example, a reinforcement learning algorithm could be used to train a robot to navigate a maze by receiving a reward for reaching the end and a penalty for hitting a wall.

Some examples of reinforcement learning tasks include:

Training autonomous vehicles to navigate in complex environments
Improving energy efficiency in data centers
Training robots to perform tasks such as grasping objects or walking
Improving the performance of recommendation systems

Reinforcement Machine Learning Models:

There are several types of models used in reinforcement learning, including:

Q-Learning: Q-learning is a model-free algorithm that learns the optimal policy by estimating the value of each state-action pair. It's used for tasks such as game playing and robotics. Real-world examples include: Game playing, Robotics

SARSA: SARSA is a model-free algorithm that learns the optimal policy by estimating the value of each state-action pair. It's used for tasks such as game playing and robotics. Real-world examples include: Game playing, Robotics

Policy Gradient Methods: Policy gradient methods are a class of algorithms that optimize the policy directly, by gradient descent. It's used for tasks such as robotics and game playing. Real-world examples include: Robotics, Game playing

Deep Deterministic Policy Gradient (DDPG): DDPG is a variant of policy gradient methods that use a deep neural network as a function approximator. It's used for tasks such as robotics and game playing. Real-world examples include: Robotics, Game playing

Proximal Policy Optimization (PPO): PPO is an optimization algorithm that uses a trust region update to improve the policy. It's used for tasks such as robotics and game playing. Real-world examples include: Robotics, Game playing

Summary:
Reinforcement learning has been used to train agents to play games at a superhuman level, to control robots to perform complex tasks, and to drive cars in simulated environments. But it's important to understand that the success of RL depends on the quality of the reward function, which can be difficult to design.

Machine learning is a rapidly growing field with many potential applications, and it has the potential to revolutionize many industries. With the increasing amount of data being generated, machine learning has the ability to analyze, understand and make predictions that can have significant impact in many areas such as healthcare, finance, education, transportation and many more. The future of machine learning is bright and it is expected to continue to evolve and play an increasingly important role in shaping the way we live and work.

Follow Me For More Such Content | My Social Links

SQL Queries That Every Data Scientist Should Know!

Abdul Raheem — Mon, 23 Jan 2023 09:37:12 +0000

What is SQL

SQL (Structured Query Language) is a programming language used to manage and manipulate relational databases. It is used to insert, update, and retrieve data from databases, as well as to create, modify, and manage database structures. In simple terms, SQL is a way to talk to databases and get information from them or put information into them.

In Data science, we used SQL to read data and manipulate it to get our desired result.

Example
Let's say you have a database of all the students in your school, and each student has a name, an age, and a grade level. You could use SQL to ask the database for a list of all the names of the students in your grade. Or you could use SQL to change a student's grade level if they got promoted.

Why we should use SQL?
SQL is the most common and easy method to access data in databases. In Data science, we will use SQL to read data and manipulate it to get our desired result. Here we will be focused on manipulating data rather than the creation and removal of data.

Some Key Features of SQL

Easy to understanding- SQL is a simple and intuitive language to learn and use.
Direct data access- Traditional databases allow users to easily access and retrieve specific data.
Data audit and replication: It is easy to audit and replicate data in traditional databases.
Multi-table analysis- SQL is powerful for analyzing data from multiple tables at once.
Complex analysis- SQL allows users to analyze more complex data and questions than dashboard tools like Google Analytics.

SQL Queries

To understand and execute queries first we need to configure Parch and Posy Database

Database Schema:

1. SELECT & From

This statement is used to query a database and retrieve specific data from one or more tables. It is one of the most commonly used SQL commands. The basic syntax of this statement is as follows:

Query
SELECT column1, column2 From table_name;

In this query, the SELECT keyword is used to specify that you want to retrieve data from the database. The column1, column2, ... are the names of the columns that you want to retrieve data from. You can also use the * wildcard to select all columns. The FROM keyword is used to specify the table that you want to retrieve data from.

Query
SELECT * FROM orders;
This statement will retrieve all the columns in the order table as shown in figure below.

2. LIMIT

The LIMIT statement allows you to retrieve only a specific number of rows from a table, which can be useful when you only need to see the initial data and don't need to load the entire dataset. This can be much quicker for loading, as it reduces the amount of data that needs to be loaded.

LIMIT Statement will always used at the last of query.

Query
SELECT * FROM orders LIMIT 10;
This command will limit first 10 rows of all the columns of table.

3. ORDER BY

The ORDER BY statement in SQL allows sorting of query results based on data in any column. However, the sorting effect is only temporary and specific to that query, unlike sorting in spreadsheet software which permanently alters the data. This difference highlights the purpose of a SQL query.

DESC can be used after the column in your ORDER BY statement to sort column in descending order, as the default ORDER BY query will sort in ascending order.

Query
SELECT id, sales_rep_id FROM accounts ORDER BY id LIMIT 10;

In this statement the columns in the table will be sorted in ascending order with respect to id column.

We can even sort by multiple columns using ORDER BY by providing a list of columns. The sorting process first uses the leftmost column in the list, then the next column and so on. Additionally, it is still possible to reverse the order using the DESC keyword.

Query
SELECT account_id, total_amt_usd FROM orders ORDER By account_id, total_amt_usd DESC;

Here account _id will be sorted in ascending order as usual and total_amt_usd will be sorted in descending order as shown in figure below.

4. WHERE

The WHERE statement in SQL is used to filter the results of a query. It allows you to specify certain conditions that the data in your query must meet in order to be included in the final output.

Common symbols used in WHERE statements include:

> (greater than)
< (less than)
>=(greater than or equal to)
<=(less than or equal to)
=(equal to)
!=(not equal to)

Query
SELECT * FROM orders WHERE account_id = 4251 ORDER BY occurred_at LIMIT 1000;

In this statement, we are extracting the information, where account_id is equal to 4251.

WHERE statement will work with non-numeric data as well.

Query
SELECT * FROM accounts WHERE name = 'Walmart;'

5. Arithmetic Operators

In SQL, arithmetic operators are used to perform mathematical operations on values in a query. The most common arithmetic operators in SQL are:

* (Multiplication)
+ (Addition)
- (Subtraction)
/ (Division)

Order of all the arithmetic operation will follow PADMAS rule

Query
SELECT id, (standard_amt_usd/total_amt_usd)*100 FROM orders LIMIT 10;

In the output, you will notice that it created a new column after multiplication and the name of column is unknown. This can be solved using derived column.

Derived Column:
A derived column, also known as a calculated or computed column, is a new column created by combining existing columns in a table. This new column can be given a name, known as an alias, using the AS keyword.

Query
SELECT id, (standard_amt_usd/total_amt_usd)*100 AS std_percent FROM orders LIMIT 10;

Here, we are dividing the dollar amount of standard paper by the total order amount to calculate the percentage of standard paper used in the order. We named this new column "std_percent" using the AS keyword.

6. Logical Operators

In SQL, logical operators are used to combine multiple conditions in a query. Some of the commonly used logical operators are:

(i) Like

The LIKE operator in SQL is used to match a specific pattern in a column, it allows you to perform operations similar to using WHERE and =, but when you don't know the exact value.

It's particularly useful for working with text data. The LIKE operator is frequently used with % which means search for the text that is in between % no matter what comes on either side of the text.

It's case-sensitive, so searching for 'T' is different from searching for 't'

Query:
SELECT * FROM accounts WHERE website LIKE '%google%';

This query will select row that contains the google in it, no matter what comes before or after the google.

(ii) IN

The IN operator is useful for working with multiple values in both numeric and text columns, it allows you to check for one, two or many values within the same query. It's similar to using = but for multiple values of a particular column. It's a cleaner way of writing queries compared to using the OR operator which also allows you to perform similar tasks.

Query:

SELECT * FROM orders WHERE account_id IN (1001,4251);

It will extract the row from 1001 to 4251.

(iii) NOT

The NOT operator in SQL is used to negate a condition, it can be used in combination with other operators like IN and LIKE to retrieve rows that do not match specific criteria.

For example, the NOT IN operator can be used to retrieve rows that do not have a specific value in a column and the NOT LIKE operator can be used to retrieve rows that do not match a specific pattern in a column. It's useful for filtering and retrieving data that does not meet specific conditions.

Query:

SELECT * FROM orders WHERE account_id NOT IN (1001,4251);

In this query, all the row that not from 1001 and 4251 will be retrieved.

(iv) AND

In SQL, the AND operator is used to combine multiple conditions in a query. The AND operator is used to match rows where all conditions are true.

Query:
SELECT * FROM orders WHERE account_id NOT IN (1001,4251) AND id IN(17,26);

In this query, two conditions will be checked if both of them are true then it will retrieve the information as shown in figure.

(iv) BETWEEN

In SQL, the BETWEEN operator is used to match a range of values within a column. The BETWEEN operator is used to match rows where a column value is between two specified values, inclusive of the specified values.

Query:
SELECT * FROM orders WHERE account_id BETWEEN 1001 AND 4251

In this query, all the row between 1001 and 4251 in account_id will be retrieved.

(v) OR

The OR operator in SQL is used to combine multiple conditions in a query, it matches rows where at least one condition in query is true.

Query:
SELECT standard_qty, gloss_qty FROM orders WHERE (standard_qty = 0 OR gloss_qty = 0);

In this query, it will select all the rows where either gloss quantity is zero or standard quantity is zero.

These operator can be used in combination with other operators such as arithmetic operators (+, *, -, /)

Best Practice For Formatting Query

Following are the best practices that one should follow while writing the SQL queries:

1. Capitalize SQL commands like SELECT and FROM, and keep everything else in lower case to make the query more readable.
SELECT account_id FROM orders;

2. Use underscores instead of spaces in column and table names.
SELECT account_id, standard_qty FROM orders;

3. Always include a semicolon at the end of each statement, as it may be required in some SQL environments.
SELECT account_id FROM orders;

4. Use double quotes or square brackets to reference tables and columns that have spaces in their names.
SELECT "full name", "age" FROM "employee information";

5. Use comments in your query for better understanding of your code.
-- This query selects all information from 'customers' table SELECT * FROM customers;

6. Be consistent in formatting throughout your queries and scripts.

7. Use white space in queries to make them more readable.

8. Use indentation to make the query structure more clear.

9. Use meaningful name for columns, tables and variables in order to make it more understandable and readable.

10. Avoid using too many subqueries and joins, use them only when it's necessary.

How To Configure PgAdmin To Run SQL Queries

Abdul Raheem — Mon, 23 Jan 2023 07:02:53 +0000

Here we will be using Parch And Posey Database

Download and install pgAdmin on your computer.
Open pgAdmin and connect to your PostgreSQL server by providing the necessary credentials (e.g. host, port, username, and password).
Once connected, you should see the server listed under the "Servers" tree in the sidebar.
Create a new database in pgAdmin by right-clicking on the server, selecting "New" and then "Database".
Open the pgAdmin Query Tool by right-clicking on the server and selecting "Query Tool".
Copy and paste the SQL queries for creating the "Porch" and "Post" tables into the editor
After the database is created, you can expand it and see the tables, views, and other objects within it.
To run an SQL query, right-click on the database and select "Query Tool".
Type your query in the editor and press the execute button or use the shortcut key (F5) to run the query.
The results of the query will be displayed in the tab below the editor.

Generative AI: Shaping the Future of Music Industry using AWS DeepComposer

Abdul Raheem — Sun, 22 Jan 2023 19:17:17 +0000

What is Generative AI

Generative AI, also known as generative models, is a subfield of artificial intelligence that focuses on creating new and unique outputs, such as text, images, and music, based on a set of inputs or training data.

Example
We have trained our model with data from a cat and then Generative AI will use that data to create an artificial cat by following the pattern of previous cat data. It is usually an unsupervised machine-learning model where it uses the generative technique that is creating new data by using the pattern learned during the training. This is in contrast to discriminative models, which focus on classifying or identifying inputs based on previously learned information.

Famous Platform that used Generative AI:

Chat GPT
DALL·E 2
Amper Music
AIVA

Types of Generative AI

There are several types of generative AI:

Generative Adversarial Networks (GANs)- These consist of two neural networks, a generator and a discriminator, that work together to generate new data that is similar to a given dataset.

Variational Autoencoders (VAEs)- These consist of an encoder network that maps input data to a latent space, and a decoder network that maps the latent space back to the original data space. They are used to generate new data by sampling from the latent space.

Autoregressive Models - These models predict the next value in a sequence based on the previous values. Examples include Autoregressive Integrated Moving Average (ARIMA) and Recurrent Neural Networks (RNNs).

Transformer-based Models - These models are commonly used for natural language processing tasks and are known for their ability to handle sequential data with long-term dependencies. Examples include BERT and GPT-2

Deep Convolutional Generative Adversarial Networks (DCGANs) - These models use convolutional layers in both generator and discriminator networks, and are used for generating images.

Generative AI with AWS DeepComposer

AWS DeepComposer is an Amazon Web Services (AWS) tool that allows users to generate and compose music using generative artificial intelligence (AI) models.

It consists of the following parts:

1. USB Keyboard - Connects to your computer to input the melody.
2. Accompanying Console - Includes AWS DeepComposer Music studio to generate music.
3. Chartbusters - To represent your machine-learning skills.

But is not necessary to have the keyboard all the time, we can import your own MIDI file, use one of the provided sample melodies, or use the virtual keyboard in the AWS DeepComposer Music studio.

Working
The AWS DeepComposer Music Studio offers the ability to generate music using three different generative AI techniques: GANs, AR-CNNs, and transformers. The GAN technique can be used to generate accompaniment tracks, the AR-CNN technique can be used to make changes to notes in an input track, and the transformer technique can be used to extend an input track by up to 30 seconds.

GANs on AWS DeepComposer

Generative Adversarial Networks (GANs) are a unique type of machine learning model that utilizes two neural networks to generate new content.

1. A generator is the first neural network that learns to create new data resembling the source data on which it was trained.
2. A discriminator is a second neural network trained to assess how closely the generator's data resembles the training data set.

The generator and discriminator work in a back-and-forth process where the generator improves in creating realistic data and the discriminator becomes more adept at distinguishing between real and generated data.

Example To Understand:
Think of an example where A GAN can be compared to a cooking competition. There is a chef who creates new recipes and a judge who evaluates the quality of the dishes. The chef, like the generator network in a GAN, generates new dishes, and the judge, like the discriminator network, evaluates the dishes and provides feedback on how to improve them. As the chef receives feedback and improves, the dishes become more and more refined, similar to how the generator network in a GAN improves over time. AWS DeepComposer uses GANs in a similar way to create unique and distinctive music compositions.

Let's Generate a New Melody

Step 1:
Create an account on AWS DeepComposer website.

Step 2:
Click on AWS DeepComposer Music Studio or find AWS DeepComposer on the search option.

Step 3:
Click on the start composition.

Step 4:
You will move toward this page where you will have multiple options.

Update the Name of Your Music File
Import Music or Select Music
Play Or Stop Music
Create your own melody

After selecting a music track, you will hit continue.

Step 5:

This is the Machine learning phase where you have to select a model according to your requirements. For example, you can choose any of the following:

ARR-CNN can make your music sound more like Bach music by analyzing the patterns and characteristics of the training data, such as harmony, melody, rhythm, and structure, and using this knowledge, it will generate new music that will be similar to the original one.
GANs can be used to enhance music tracks by working in a collaborative cycle between the generator and discriminator. The generator will take in a single-track piano roll as input and output a multi-track piano roll with added accompaniments. The discriminator will then evaluate the output and provide feedback to the generator to improve the realism of the music generated. This process will continue to iterate until it creates more realistic and enjoyable music.
Transformers can make your music longer by adding 20 seconds. They are like computer programs that can make new music that sounds good. They do this using an autoregressive architecture where it tries to figure out what the next note, rhythm, or chord should be based on the pattern of the music that has already been played.

Step 6
After updating ARR-CNN parameters click continue. You can understand these parameters as:

Maximum Input Notes To Remove: Controls the percentage of input melody to be removed during effect. Increasing its value will allow the model to use less of input melody as reference during inference. You can set it to 60(optimal) or other value.

Maximum Notes to Add: Controls the number of node that can be added to input melody. Increasing the value might introduce some notes out of place. I will set it to 80.

Sampling Iteration: Controls the number of time input melody is passed through the model. I will set it to 90.

Creative Risk: Controls how much the model can deviate from the music that it was trained on. If you set the value too low, model will choose high probability notes and vice versa. I will set its value to 0.8

Step 7
You will get the output melody in this step that is been trained using the Generative AI in AWS DeepComposer.

Step 8
You can edit these melody if you want to by removing some of the nodes by clicking on Edit Melody.

Step 9
Once you are satisfied with the new melody, click on the continue button. You will be directed toward the Share Composition Page.

Step 10
This is it, Your Generative AI melody is ready, if you want to do additional changes to it you can do that. Some of the options in AWS DeepComposer are:

Generate accompaniment track using GANs
Extend input track with Transformers

Afterward, you can participate in the AWS DeepComposer Chartbuster challenge as well with your melody.

All About AWS AI & ML Scholarship Program

Abdul Raheem — Sun, 22 Jan 2023 10:21:01 +0000

Ever wanted a well-structured course where you can learn about Machine Learning and Artificial Intelligence from scratch to an advance level?

Your search is over because AWS is providing such an opportunity for students where they can learn about machine learning foundations. After successfully completing the foundation course on the AWS DeepRacer website you can have access to the Udacity Nanodegree "AI Programming with Python" where you can learn the following concepts in detail:

Python
NumPy
Pandas
Matplotlib
PyTorch
Linear Algebra
Calculus Essentials
Foundations for building your own neural network

How To Avail This Opportunity?

It consists of following steps:

Step 1:
Sign up for AWS DeepRacer Student

Step 2:
Opt into the AWS AI & ML Scholarship program by starting learning. You will find two learning resources on Introduction to Machine Learning and Reinforcement Learning. You have to complete both.

Step 3:
After completing the 20 hours of free learning take the quiz. You need to Obtain a score of 80% or higher on all quizzes related to AWS DeepRacer Student while gaining knowledge of the basic concepts of machine learning.

Step 4:
Once you passed the quiz, you will get a one-time code to apply for AI programming With Python Nanodegree

Step 5:
Submit your application which is starting from February 1, 2023 on Udacity using a unique code you’ll get once you complete the prerequisites.

What's Next?

Once you completed all the steps, you will receive a notification of the selection decision via email on June 12, 2023. Scholarship recipients will have full access to the entirety of their Nanodegree content on June 14, 2023.

What If Get a Scholarship?
If you are one of the lucky folks who will get the scholarship, it is time to get prepare for the onboarding. In the email, all the necessary information will be shared with you.

You will have access to a dedicated Slack Channel where you will find yourself among mentors and like-minded students who are always there to help and support you. There will be a connected session every week where the instructor will teach all the concepts in the nanodegree live. So that if you have any confusion about the content or topic of Nanodegree you can ask it from your instructor in the virtual connect session.