AWS Glue Custom Connector are the way to connect AWS Glue services to data sources that are not natively supported by AWS Glue connection types. It gives a wide range of connectivity options either by letting you develop your own connectors or reuse connectors from the Glue connectors marketplace.
To create custom connections, You need first to define a Glue Custom Connector and then create connection instances from this connector. This pattern describes how to deploy both AWS Custom Connector and connection instances using the AWS Cloud Development Kit (CDK).
As an example, we will be deploying a custom connector pointing to the latest SQL Server driver version 11.2, then we will create a connection instance from this connector. We will use AWS Secrets Manager to store the connection instance properties to make the pattern generic.
Target architecture
- Configurations of the Glue Customer connector and connection are coded in Python in a CDK Stack. CDK deploy generates a Cloud Formation stack with the configurations.
- CDK deploys the Cloud Formation stack to create the AWS Glue custom connector, connections and connection secret.
- AWS connection uses the driver jars from the Amazon S3 bucket and the connection secret from AWS Secrets Manager.
Tools
AWS Glue: AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.
Amazon Simple Storage Service: Amazon Simple Storage Service (Amazon S3) is storage for the internet. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere on the web.
AWS CloudFormation: AWS CloudFormation enables you to create and provision AWS infrastructure deployments predictably and repeatedly. AWS CloudFormation enables you to use a template file to create and delete a collection of resources together as a single unit (a stack).
AWS Cloud Development Kit (CDK): CDK accelerates cloud development using common programming languages to model your applications.
AWS Secrets Manager: AWS Secrets Manager helps you manage, retrieve, and rotate database credentials, API keys, and other secrets throughout their lifecycle.
Code
The code section below represents a CDK code sample for creating an AWS Glue Connector. The main connection input property that instructs CDK to create a connector not a connection is match_criteria
as it should be set to "template-connection"
as follows :
match_criteria=["template-connection"]
Additionally, the connection_type
property should be set to "CUSTOM"
:
connection_type="CUSTOM"
The catalog_id
connection property should point to the AWS account ID where the stack will be deployed.
Notice that you need to set the name property as this connector name will be used to instruct CDK on how to link custom connections with this connector. In this example, we set is as follows:
name="SynapseConnector"
In this code example, we parametrized the JDBC connection parameters such as host_url
, database, user, password and authentication so that the connector is generic and can be used as a connection to any SQL Server or Synapse database. These parameters can be stored in a secret in AWS Secrets Manager and can be used while creating connection instances from this connector.
cfn_connection = glue.CfnConnection(self, "SynapseConnector",
catalog_id="<account-id>",
connection_input=glue.CfnConnection.ConnectionInputProperty(
connection_type="CUSTOM",
connection_properties={
"CONNECTOR_CLASS_NAME" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"CONNECTOR_TYPE" : "Jdbc",
"CONNECTOR_URL" : "s3://<bucket-prefix>/mssql-jdbc-11.2.0.jre8.jar",
"JDBC_CONNECTION_URL" : "[[\"default=jdbc:sqlserver://${host_url};database=${database};user=${user};password=${password};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;authentication=${authentication}\"],\",\"]"
},
description="Synapse Connector",
match_criteria=["template-connection"],
name="SynapseConnector",
)
)
Optionally, you can create a secret within the same CDK stack to hold the connection information.
The advantage of creating the secret within the same stack is that you can directly link it to the created connection. The code block below shows sample code for creating the secret within the CDK stack:
connection_secret = secretsmanager.Secret(self, "CustomconnectionSM",
secret_object_value={
"host_url": SecretValue.unsafe_plain_text("database.sql.azuresynapse.net"),
"database": SecretValue.unsafe_plain_text("database"),
"user": SecretValue.unsafe_plain_text("username"),
"password": SecretValue.unsafe_plain_text("dummy password"),
"authentication": SecretValue.unsafe_plain_text("ActiveDirectoryPassword")
}
)
The following code block is a CDK code sample for creating a connection from the above created connector. Basically, you will use the same configuration as the connector however to instruct the CDK to link this connection to the connector, you need to set the following parameter match_criteria
to the connector name as follows:
match_criteria = ["Connection", cfn_connector.conection_input.name ]
Additionally, to link the connection to the created secret, you need to set the SECRET_ID
connection property as follows:
"SECRET_ID" : connection_secret.secret_name
For the complete python code for the CDK stack, please refer to the Additional Information section.
Best practices
- Parameterize the custom connector connection string as much as possible to make it reusable among connections of the same type. You can store parameters specific for each connection in a secret within AWS Secrets Manager.
- For security best practices, don’t include the connection password within the secrets creation code; instead, create a dummy password while creating the secret and then later you can define an automated or manual mechanism for updating the password within AWS Secrets Manager.
- It is better to let CDK to generate the secrets name. This is because deleting secrets from SecretsManager does not happen immediately, but after a 7 to 30 days blackout period. During that period, it is not possible to create another secret that shares the same name.
Additional information
The complete python code for the CDK stack:
import os
from aws_cdk import (
Stack,
aws_glue as glue,
aws_secretsmanager as secretsmanager,
SecretValue,
CfnParameter
)
from constructs import Construct
class CdkCustomConnectionStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
account_id = os.environ.get("AWS_ACCOUNT_ID")
driver_jar_path= "s3://<s3-bucket-prefix>/mssql-jdbc-11.2.0.jre8.jar"
cfn_connector = glue.CfnConnection(self, "SynapseConnector",
catalog_id=account_id,
connection_input=glue.CfnConnection.ConnectionInputProperty(
connection_type="CUSTOM",
connection_properties={
"CONNECTOR_CLASS_NAME" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"CONNECTOR_TYPE" : "Jdbc",
"CONNECTOR_URL" : driver_jar_path,
"JDBC_CONNECTION_URL" : "[[\"default=jdbc:sqlserver://${host_url};database=${database};user=${user};password=${password};encrypt=true;trustServerCertificate=falsehostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;authentication=${authentication}\"],\",\"]"
},
description="description",
match_criteria=["template-connection"],
name="SynapseConnector",
)
)
connection_secret = secretsmanager.Secret(self, "CustomconnectionSM",
secret_object_value={
"host_url": SecretValue.unsafe_plain_text("database.sql.azuresynapse.net"),
"database": SecretValue.unsafe_plain_text("database"),
"user": SecretValue.unsafe_plain_text("username"),
"password": SecretValue.unsafe_plain_text("dummy password"),
"authentication": SecretValue.unsafe_plain_text("ActiveDirectoryPassword")
}
)
cfn_connection = glue.CfnConnection(self, "SynapseConnection",
catalog_id=account_id,
connection_input=glue.CfnConnection.ConnectionInputProperty(
connection_type="CUSTOM",
connection_properties={
"CONNECTOR_CLASS_NAME" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"CONNECTOR_TYPE" : "Jdbc",
"CONNECTOR_URL" : driver_jar_path,
"JDBC_CONNECTION_URL" : "[[\"default=jdbc:sqlserver://${host_url};database=${database};user=${user};password=${password};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;authentication=${authentication}\"],\",\"]",
"SECRET_ID" : connection_secret.secret_name
},
description="description",
match_criteria = ["Connection", cfn_connector.connection_input.name],
name="SynapseConnection",
)
)
Top comments (0)