DEV Community

Sarma
Sarma

Posted on

Vector-Database: Qdrant-cluster on ECS-Fargate

When do you need this?

  • When SaaS is not viable due to compliance-reasons or for network-performance.
  • When Vector-DB should be in the same aws-account as the rest of the solution
  • When No kubernetes.

Questions? See the article # 2 which has great detail, as to why certain things HAVE to be that way! This article (# 1) just focuses on cdk-snippets to quickly get started (for those very familiar with CDK).

All of this (CDK, ECS & Clusters) is too much to handle?
Seek professional-services from Qdrant.tech; If not, me/my current-employer (an AWS-partner).

Short Summary

  • Run a 4/or/6-node Qdrant-cluster on ECS, across 2-or-3 AZs
  • Keep it very simple, by making AWS take care of almost everything re: availability, uptime, AZ-balancing.
  • No matter how many failed nodes are replaced, use a single "endpoint" (to connect to the Qdrant-cluster).
  • No EBS. Rely on EFS for "native" Snapshots, for protecting data.
  • Securing access to Qdrant-Dashboard via ALB + Qdrant-native API-Key.

how do I .. ?

  1. A article # 2 has FULL DETAILs on the Critical-Design & Key-requirements that influenced/constrained/forced the final implementation.
  2. A article # 3 re: Snapshots.
  3. A separate GitLab-repo contains the full CDK-Construct.
  4. Assumption: You'll OK to CUSTOM-build the Qdrant Container-IMAGE (using a custom Dockerfile) using Qdrant's github. article # 4 for a sensible/defensible Dockerfile.

Below is the key "lego-blocks" to get started, for CDK-experts.

Questions? See the article # 2 which has great detail, as to why certain things HAVE to be that way!

Key variables

bashScriptFromArticle4 = .. local-path-inside-docker__to_bash-script-from-article-4 ..;

ecsClusterName = .. .. + '-' + cpuArchStr;
fargateContainerName = .. + '-' + cpuArchStr;
fargateServiceName = 'ECSSvc-'+ fargateContainerName;
taskDefContainerName = 'cont-'+ fargateContainerName;

domainName = ecsClusterName + ".local";
qdrantFQDN = 'qdrant-cluster.'+ domainName;

nativeApiKeySecret = new aws_secretsmanager.Secret( .. );
// excludeCharacters = "~`@#$%^&*+={}[]()|\\:;'\"”<>,/? ";
Enter fullscreen mode Exit fullscreen mode

Fargate Task Definition

new aws_ecs.FargateTaskDefinition( .. .. {
    family: fargateContainerName, //// Equivalent to FargateTaskName/taskname
    cpu: ECS_FARGATE_CPU_SIZING + 512,
    memoryLimitMiB: ECS_FARGATE_MEMORY_SIZING_GB * 1024 + ( 1 * 1024 ),
    enableFaultInjection: (basicProps.tier === 'dev' ),
    ephemeralStorageGiB: 22, // ValidationError: Ephemeral storage size must be between 21GiB and 200GiB
    runtimePlatform: { cpuArchitecture: ecsCpuArchitecture, operatingSystemFamily: aws_ecs.OperatingSystemFamily.LINUX },
    volumes: [{
        name: fargateContainerName,  //// This name -MUST- referenced in the aws_ecs.FargateService.
        efsVolumeConfiguration: {
            fileSystemId: efsFileSystem.fileSystemId,
            rootDirectory: "/",
            transitEncryption: "ENABLED",
            authorizationConfig: { accessPointId: efsAccessPoint.accessPointId, iam: "ENABLED" }
        },
    }],
});

Enter fullscreen mode Exit fullscreen mode

Fargate Service

new aws_ecs.FargateService( .. .., {
    cluster: myECSCluster,
    serviceName: fargateServiceName +'-'+ label,
    taskDefinition: qdrantTaskDefinition,
    desiredCount: desiredCount,
    minHealthyPercent: 0,  // Allow the single task to be stopped during deployment
    maxHealthyPercent: 200, // Allow 1 new task to start before stopping the old one
    vpcSubnets: { onePerAz: true, subnetType: aws_ec2.SubnetType.PRIVATE_WITH_EGRESS, availabilityZones: vpc.availabilityZones },
    securityGroups: [fargateSvcSecurityGroup],
    assignPublicIp: false,
    enableExecuteCommand: true,         // Enable ECS Exec for debugging
    propagateTags: aws_ecs.PropagatedTagSource.SERVICE,
    availabilityZoneRebalancing: aws_ecs.AvailabilityZoneRebalancing.ENABLED,
    circuitBreaker: { enable: true, rollback: true },  // Enable deployment circuit breaker
});
Enter fullscreen mode Exit fullscreen mode

Container for Fargate-Task

Refer to article 4 for how to docker build the Docker-image.

const containerImage = aws_ecs.ContainerImage.fromEcrRepository(ecrRepo, qdrantImageTag);

//// Add Container to Task Definition
dockerLabels = {
    'app.cluster': ecsClusterName,
    'app.service': fargateServiceName,
    'app.task-definition': fargateContainerName, // must match family's name of task-definition
    'app.component': 'vector-database',
    'app.version': cdkContextBuildQdrantImageTag,
    'app.environment': basicProps.tier,
};

portMappings = [
    { containerPort: 6333, protocol: aws_ecs.Protocol.TCP, name: 'qdrant-rest', appProtocol: aws_ecs.AppProtocol.http },
    { containerPort: 6334, protocol: aws_ecs.Protocol.TCP, name: 'qdrant-grpc', appProtocol: aws_ecs.AppProtocol.http },
    { containerPort: 6335, protocol: aws_ecs.Protocol.TCP, name: 'qdrant-cluster', appProtocol: aws_ecs.AppProtocol.http },
];

containerEnvironmentVariables = {
    // runtime environment-variables used by Qdrant-VectorDB docker-container
    QDRANT__CLUSTER__ENABLED: "true",
    QDRANT__SERVICE__API_KEY: nativeApiKeySecret.secretValue.unsafeUnwrap(),
};

qdrantPrimaryNodeTaskDefinition.addContainer( 'QdrantContainer', {
    containerName: taskDefContainerName,
    image: containerImage,
    user: "1000:1000",
    command: [ '/bin/sh', '-c', bashScriptFromArticle4 ],
    cpu: constantsCdk.ECS_FARGATE_CPU_SIZING,
    memoryLimitMiB: ECS_FARGATE_MEMORY_SIZING_GB * 1024, // in MB
    environment: containerEnvironmentVariables,
    dockerLabels: { ...dockerLabels, 'app.instance': 'primary-node' },
    portMappings,
    essential: true,
    logging: containerLogs,
    healthCheck: commonInsideContainerHealthCheck,
});
Enter fullscreen mode Exit fullscreen mode

SecurityGroup

const fargateSvcSecurityGroup = new aws_ec2.SecurityGroup( cdkScope, 'QdrantSecurityGroupForFargate', {
    vpc: vpc,
    securityGroupName: ecsClusterName + '-SvcInbound-' + fargateContainerName,
    description: `For FARGATE-service - inbound 6333,6334 only, NO outbound. Cluster: ${ecsClusterName}, Container: ${fargateContainerName}`,
    allowAllOutbound: false, // Deny/Allow -- Explicitly all outbound traffic
});

//// Add inbound rules for Qdrant ports
fargateSvcSecurityGroup.addIngressRule(aws_ec2.Peer.ipv4(vpc.vpcCidrBlock), aws_ec2.Port.tcp(6333),
    'Allow inbound traffic on port 6333 (Qdrant REST API)'
);
//// Add outbound HTTPS for ECR image pulling
fargateSvcSecurityGroup.addEgressRule(aws_ec2.Peer.anyIpv4(), aws_ec2.Port.tcp(443),
    'Allow outbound HTTPS for ECR image pulling, whether or NOT using VPC Endpoints'
);
//// Cluster-replication traffic; REF: https://qdrant.tech/documentation/guides/distributed_deployment/#enabling-distributed-mode-in-self-hosted-qdrant
fargateSvcSecurityGroup.addIngressRule(fargateSvcSecurityGroup, aws_ec2.Port.tcp(6334),
    'Allow INTRA-CLUSTER Replication traffic on port 6334 (Qdrant gRPC API)'
);
fargateSvcSecurityGroup.addEgressRule(fargateSvcSecurityGroup, aws_ec2.Port.tcp(6334),
            'Allow INTRA-CLUSTER Replication traffic on port 6334 (Qdrant gRPC API)'
);
fargateSvcSecurityGroup.addIngressRule(fargateSvcSecurityGroup, aws_ec2.Port.tcp(6335),
    'Allow INTRA-CLUSTER Replication traffic on port 6335 (Qdrant Cluster-Replication traffic)'
);
fargateSvcSecurityGroup.addEgressRule(fargateSvcSecurityGroup, aws_ec2.Port.tcp(6335),
    'Allow INTRA-CLUSTER Replication traffic on port 6335 (Qdrant Cluster-Replication traffic)'
);

//// Allow Fargate tasks to access EFS
for ( const efsSecurityGroup of efsSecurityGroups ) {
    efsSecurityGroup.addIngressRule(fargateSvcSecurityGroup, aws_ec2.Port.tcp(2049), 'Allow NFS access from Fargate tasks');
    fargateSvcSecurityGroup.addEgressRule(efsSecurityGroup, aws_ec2.Port.tcp(2049), 'Allow NFS access to EFS');
}
Enter fullscreen mode Exit fullscreen mode

Other related articles

  1. Get-Started article # 1 - this one.
  2. A article # 2 has FULL DETAILs on the Critical-Design & Key-requirements that influenced/constrained/forced the final implementation.
  3. A article # 3 re: Snapshots.
  4. A separate GitLab-repo contains the full CDK-Construct.
  5. Assumption: You'll OK to CUSTOM-build the Qdrant Container-IMAGE (using a custom Dockerfile) using Qdrant's github. article # 4 for a sensible/defensible Dockerfile.

End.

Top comments (0)