DEV Community

Cover image for Enhancing Data Security with Role-Based Access Control of Qdrant Vector Database
M Quamer Nasim
M Quamer Nasim

Posted on • Originally published at quamernasim.Medium

Enhancing Data Security with Role-Based Access Control of Qdrant Vector Database

Data security has emerged as a major concern with the growing need for Retrieval Augmented Generation (RAG)-powered Generative AI applications in large companies. At the heart of RAG applications lies the vector database, which stores all the company’s proprietary data. This database is used by large language models (LLMs) to perform similarity searches and retrieve relevant content.

In large organizations, there are multiple levels, various departments, and different roles, each with access to different levels of sensitive information. For example, financial and company roadmap-related documents may only be accessible to top officials and are not required by developers. Therefore, it’s essential to restrict database or collection access based on defined roles. This approach not only helps to maintain security but also ensures that LLMs provide accurate and relevant responses based on their roles.

To address these needs, new Role-Based Access Control (RBAC) options have been introduced via JSON Web Tokens (JWT) in the Qdrant Vector Database in their latest 1.9 release. API keys previously supported basic read and write operations. However, recognizing the evolving needs of users, particularly large organizations, additional options for finer control over data access within internal environments have been implemented.

Qdrant 1.9: Introducing Role-Based Access Control and JWT Tokens

With the release of Qdrant version 1.9, significant advancements have been made in enhancing data security through the introduction of RBAC and JWT tokens. These new access control options offer a more granular and secure way to manage data access within large organizations.

In the earlier version of Qdrant, access control was managed using API keys, which supported basic read and write operations. In Qdrant’s 1.9 version, they have implemented additional access control options using JSON Web Tokens (JWT).

JWT allows a user to have limited access to specific data or collections in the database. By using JWT-based authentication, tokens with restricted access can be issued, which will help in the implementation of RBAC. This basically means administrators can define permissions for users, restricting access to sensitive endpoints and ensuring that only authorized individuals can access particular data segments.

The use of RBAC will help administrators assign specific roles and privileges to users based on their positions and responsibilities within the organization. This will be very useful in environments where different departments and roles require varying levels of access to the vector database. For instance, while developers might need access to certain datasets, financial information can be restricted to top-level executives.

Role-Based Access Control (RBAC)

RBAC in Qdrant allows administrators to define roles and assign specific privileges to users based on their roles within the organization. This ensures that users only have access to the data and actions necessary for their role, enhancing security and operational efficiency. Administrators can use the table below that outlines the actions allowed or denied based on the access level.

Image description
Actions allowed for different Roles (Symbols: ✅ Allowed | ❌ Denied | 🟡 Allowed, but filtered)

By using JWT tokens and RBAC, Qdrant ensures that each user has the appropriate level of access to perform their tasks efficiently while maintaining strict security protocols. This system provides a scalable and secure approach to managing user permissions, making it ideal for enterprises of all sizes.

Qdrant emerges as the best choice for organizations seeking fine-grained user access control and enhanced security measures. Unlike other databases such as Pinecone, Milvus, Chroma, and Weaviate, Qdrant offers a much higher level of granularity in access control, which sets it apart. With its JWT-based RBAC approach, Qdrant allows users to define permissions and restrict access to specific data parts, ensuring sensitive endpoints remain protected. This fine-grained control is coupled with Qdrant’s ability to integrate seamlessly with hybrid cloud environments and Kubernetes clusters, providing organizations with scalability and enhanced security.

Guide to Use JWT Auth for Role-Based Access

Starting from version 1.9.0, Qdrant supports granular access control using JSON Web Tokens (JWT). This means you can create tokens that grant specific permissions to access different parts of your data. With JWT, you can set up RBAC, defining what each user can and cannot do.

Enabling JWT-Based Authentication

To enable JWT-based authentication in Qdrant, we need to configure it by setting an api_key and enabling the jwt_rbac feature. There are two ways to do this: using a configuration file or environment variables.

  • Using Configuration File: We will open our Qdrant configuration file and add the following lines to the configuration:
service:
  api_key: your_secret_api_key_here
  jwt_rbac: true
Enter fullscreen mode Exit fullscreen mode
  • Using Environment Variables: We can also set the following environment variables:
export QDRANT__SERVICE__API_KEY=your_secret_api_key_here
export QDRANT__SERVICE__JWT_RBAC=true
Enter fullscreen mode Exit fullscreen mode

Make sure to replace your_secret_api_key_here with your actual secret key. This api_key is crucial because it will be used to encode and decode the JWTs, so it needs to be kept secure.

Generating JSON Web Tokens

JWTs can normally be generated by any library. We don’t need access to the Qdrant instance to generate them. We can easily use libraries such as PyJWT (Python), jsonwebtoken (JavaScript), or jsonwebtoken (Rust) to create JWTs.

JWT Structure

Let’s briefly understand the structure of the JWT token used to set up the RBAC. A JWT consists of three parts: the header, the payload, and the signature.

  • Header: Specifies the algorithm used to encode the token. Qdrant uses the HS256 algorithm.
{
"alg": "HS256",
"typ": "JWT"
}
Enter fullscreen mode Exit fullscreen mode
  • Payload: Contains the claims or data you want to include in the token. Here are some common claims you might use:
{
"exp": 1640995200, // Expiration time (Unix timestamp)
"value_exists": { /* See explanation below */ },
"access": "r" // Access level
}
Enter fullscreen mode Exit fullscreen mode
  • Signature: The token is signed with your api_key to ensure its validity. Qdrant can verify this signature using the same api_key.

Using JWT in Requests

Once JWT-based authentication is enabled, we now need to include the JWT in our requests to Qdrant. This can be done in two ways:

Authorization Header: Add the JWT as a bearer token in the Authorization header of the request.
Authorization: Bearer <JWT>
Api-Key Header: Alternatively, we can also include the JWT as a key in the Api-Key header.

Api-Key: <JWT>
Enter fullscreen mode Exit fullscreen mode

Here’s an example using the Qdrant client in Python:

from qdrant_client import QdrantClient

qdrant_client = QdrantClient(
    "your_qdrant_instance_url",
    api_key="<JWT>",
)
Enter fullscreen mode Exit fullscreen mode

Generating JWT Tokens from Web UI

Qdrant provides a convenient JWT generation tool within the Web UI. This tool is accessible under the 🔑 tab. It can be found out at http://localhost:6333/dashboard#/jwt.

Here’s a quick guide on how to generate JWT tokens from the Web UI:

  1. Access the JWT Tool: Navigate to the 🔑 tab in the Qdrant Web UI.
  2. Provide API Key: When prompted for the API key on the jwt dashboard, enter your API Key
  3. Generate Token: Follow the on-screen instructions to generate a JWT token. This token will encapsulate the user’s permissions and access levels.
  4. Use the Token: Include this token in the header of your API requests to authenticate and authorize the actions performed by the user.

Image description

Step-by-Step Tutorial to Set Up RBAC on Local Qdrant Instance

Here, for this blog post, I will be showing you how to implement a RBAC (Role-Based Access Control) with the help of JWT in Qdrant Vector Database. For this blog, I will be using the following data structure to create multiple collections.

├── data
│   ├── financial
│   │   └── Sample-Accounting-Income-Statement-PDF-File.pdf
│   └── general
│       ├── avengers-endgame-script-pdf.pdf
│       └── security_policy.pdf
Enter fullscreen mode Exit fullscreen mode

The idea is to create two collections, one for financial data and the other for general data. General data will have multiple files, and financial data will have only one file. Then we will see RBAC in action and see how we can restrict access of the user based on the role assigned to them.

To install Qdrant, we will be using Docker. Run the following codes to install the Qdrant image.

sudo apt-get update
sudo apt install docker
docker pull qdrant/qdrant
Enter fullscreen mode Exit fullscreen mode

Once this is done, we need to create a config.yaml file so that we can enable the RBAC in Qdrant. Copy paste the following commands in your config.yaml file.

service:
 api_key: {your_API_key}
 jwt_rbac: true
Enter fullscreen mode Exit fullscreen mode

After creating the config.yaml file let’s now run the Qdrant container so that we can begin the RBAC tutorial.

docker run -p 6333:6333 -v /home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/:/qdrant/storage -v /home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/config.yaml:/qdrant/config/config.yaml qdrant/qdrant
Enter fullscreen mode Exit fullscreen mode

Now, we can either open the dashboard or get started with Python. In this blog, we don’t really care much about the dashboard; we will get everything done in Python. So, let’s dive in.

Let’s start by connecting to Qdrant.

# Qdrant related Parameters
api = 'jhvfegfeboihf313fekfgejbv' # 'your_api_key'
host = 'localhost'
port = 6333
url = f'http://{host}:{port}'
Enter fullscreen mode Exit fullscreen mode

I will be keeping the loading of the dataset, generating embeddings, and creating collections at a very high level. If you want to know more about these topics, you can refer to one of my previous blog posts where I explained how to build a chatbot using Qdrant, Llama3, Ollama, and LangChain. In this blog post, I will be focusing on RBAC.

Now, Let’s start with different security scenarios.

Without Any Token, with RBAC Enabled

We have enabled RBAC in Qdrant, but we have not created any tokens yet. Let’s see how it behaves in this case.

from qdrant_client import QdrantClient

client = QdrantClient(url=url)
client.get_collections()
Enter fullscreen mode Exit fullscreen mode
# Output of the above code
UnexpectedResponse: Unexpected Response: 401 (Unauthorized)
Raw response content:
b'Must provide an API key or an Authorization bearer token'
Enter fullscreen mode Exit fullscreen mode

As expected, it should not allow any operation without a token. Now, in the next section, let’s create a JWT token and try to access the Qdrant API.

Global Read-Only Access

Let’s first create a function that we can reuse to generate JWT tokens.

import jwt
def generate_jwt(api, payload):
   '''
   This function generates a JWT token using the payload and the API key

   Args:
   api: API key
   payload: Payload to be encoded in the JWT token. It contains the access rights

   Returns:
   encoded_jwt: JWT token
   '''
   encoded_jwt = jwt.encode(payload, api, algorithm='HS256')
   return encoded_jwt
Enter fullscreen mode Exit fullscreen mode

Here, let’s first create a token with global read-only access. With global read-only access, the user can only read the resources in the cluster. They cannot create, update, or delete resources. This essentially means that the user can read all the collections available, so be careful when granting this permission.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has global read only access to all the collections.
# It also specifies that this token will expire in 1 hour.
payload = {
 "access": "r",
 "exp": current_time + 3600, # 1 hour
}

# Generate the JWT token
# This token will be used to authenticate the user.
jwt = generate_jwt(api, payload)
Enter fullscreen mode Exit fullscreen mode

Currently, we have no collections in the Qdrant Vector DB. Let’s see how the API behaves in this case.

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
client.get_collections()
Enter fullscreen mode Exit fullscreen mode
# Output of the above code
CollectionsResponse(collections=[])
Enter fullscreen mode Exit fullscreen mode

Great, it returned an empty list of collections. Now let’s try to create a collection with the same token. Note that this should fail as the token has only read-only access.

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
# Delete the collection if it exists
client.delete_collection(collection_name=collection_name)

Enter fullscreen mode Exit fullscreen mode
# Output of the above code
UnexpectedResponse: Unexpected Response: 403 (Forbidden)
Raw response content:
b'{"status":{"error":"Forbidden: Global manage access is required"},"time":0.000023168}'
As we can see, the API returned a 403 Forbidden error saying ‘Global manage access is required’ to create a collection.
Enter fullscreen mode Exit fullscreen mode

Now, in the next section, let’s create a token with global manage access and try to create a collection.

Global Manage Access

Now let’s create a token with global manage access. With Global Manage Access, the user can read, create, update, and delete collections in the cluster. This essentially means that the user can perform all the operations on all the collections available, so be extremely careful when granting this permission. You should only grant this permission to Admins.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has global manage access to all the collections.
# It also specifies that this token will expire in 1 hour.
# You should only generate this token for admin users.
payload = {
 "access": "m",
 "exp": current_time + 3600, # 1 hour
}

# Generate the JWT token
# This token will be used to authenticate the user.
jwt = generate_jwt(api, payload)

Enter fullscreen mode Exit fullscreen mode

Let’s again try to list the collections using this new token.

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
client.get_collections()
Enter fullscreen mode Exit fullscreen mode
# Output of the above code
CollectionsResponse(collections=[])
Enter fullscreen mode Exit fullscreen mode

Since no collections are available, it returned an empty list. Next, let’s try to delete a collection using this token. Remember, this operation failed using the previous read-only token. Even though we don’t have any collections, let’s try to delete a collection and see what happens.

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
# Delete the collection if it exists
client.delete_collection(collection_name=collection_name)

Enter fullscreen mode Exit fullscreen mode
# Output of the above code
False
Enter fullscreen mode Exit fullscreen mode

As we can see, it ran successfully and returned False as there was no collection to delete. Now let’s try to create two collections, one for financial data and the other for general data. Then we will try to explore the RBAC in more detail.

Before creating the collections, let’s first load the embeddings model to generate embeddings for the documents. Here we will keep the indexing phase at a high level. If you’re interested in knowing more about it, you can refer to one of my previous blog posts. For this blog post, I will be focusing on RBAC.

import fasttext as ft
# download the fasttext model from https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz and then unzip it
embedding_model_path = '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/embedding_model/cc.en.300.bin'
# Load the embedding model
embed_model = ft.load_model(embedding_model_path)
Enter fullscreen mode Exit fullscreen mode

Let’s define some constants for the creation of collections.

# data related parameters
chunk_size = 500
chunk_overlap = 50
batch_size = 4000

# vector related parameters
vector_size = 300
Enter fullscreen mode Exit fullscreen mode

Let’s create the chunk out of the general data. But before we move on to this step, let’s first create two essential utility functions that will help us in creating the chunks and respective embeddings of the data.

import pandas as pd
from tqdm.notebook import tqdm

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, Batch

def generate_embeddings_from_fastext_model(docs, embed_model):
    '''
    Generate embeddings for the documents using the FastText model

    Args:
    docs: List of documents
    embed_model: FastText model

    Returns:
    df: Dataframe with the documents, embeddings, metadata and payload
    '''

    # convert the documents to a dataframe
    # This dataframe will be used to create the embeddings
    # And later will be used to update the Qdrant Vector Database
    data = []
    for doc in tqdm(docs):
        # Get the page content and metadata for each chunk
        # Meta data contains chunk source or file name
        row_data = {
            "page_content": doc.page_content,
            "metadata": doc.metadata
        }
        data.append(row_data)

    df = pd.DataFrame(data)

    # Replace the new line characters with space
    df['page_content'] = df['page_content'].replace('\\n', ' ', regex=True)

    # Create a unique id for each document.
    # This id will be used to update the Qdrant Vector Database
    df['id'] = range(1, len(df) + 1)

    # Create a payload column in the dataframe
    # This payload column includes the page content and metadata
    # This payload will be used when LLM needs to answer a query
    df['payload'] = df[['page_content', 'metadata']].to_dict(orient='records')

    # Create embeddings for each chunk
    # This embeddings will be used when doing a similarity search with the user query
    df['embeddings'] = df['page_content'].apply(lambda x: (embed_model.get_sentence_vector(x)).tolist())

    return df


def create_new_collection(url, jwt, collection_name, df, vector_size, batch_size, delete_prev = False, create_from_scratch = False):

    '''
    This function creates a new collection in Qdrant Vector Database
    and updates the collection with the embeddings

    It starts by creating a connection to the Qdrant Vector Database running using the docker
    Then it deletes the collection if it already exists
    Then it creates a new collection with the specified collection name and vector size
    Then it updates the collection with the embeddings
    Finally, it closes the connection to the Qdrant Vector Database and returns the client object

    Args:
    url: URL of the Qdrant Vector Database
    jwt: JWT token
    collection_name: Name of the collection
    df: Dataframe with the documents, embeddings, metadata and payload

    Returns:
    client: QdrantClient object
    '''

    # Create a QdrantClient object
    # client = QdrantClient('https://localhost:6333')
    client = QdrantClient(url=url, api_key = jwt)

    # delete the collection if it already exists
    # remove or comment this line if you want to keep the existing collection
    # and want to use the existing collection to update new points
    if delete_prev:
        client.delete_collection(collection_name=collection_name)

    # Create a fresh collection in Qdrant
    # remove or comment this line if you do not want to create a new collection
    if create_from_scratch:
        client.create_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(size=vector_size, distance=Distance.COSINE),
        )

    # Update the Qdrant Vector Database with the embeddings
    # We are updating the embeddings in batches
    # Since the data is large, we will only update the first batch of size 4000
    client.upsert(
    collection_name=collection_name,
    points=Batch(
        ids=df['id'].to_list()[:batch_size],
        payloads=df['payload'][:batch_size],
        vectors=df['embeddings'].to_list()[:batch_size],
    ),
    )

    # Close the QdrantClient
    client.close()

    print(f"Collection {collection_name} created and updated with the embeddings")
Enter fullscreen mode Exit fullscreen mode

Great! Now let’s go ahead and start creating the chunks and respective embeddings

from langchain_community.document_loaders import DirectoryLoader
# from langchain_community.document_loaders import TextLoader
from langchain_community.document_loaders.pdf import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

collection_type = 'general'
root = '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data'
data_path = pjoin(root, collection_type)
collection_name = collection_type

# Load the documents from the directory
loader = DirectoryLoader(data_path, loader_cls=PyPDFLoader)

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=chunk_size,
   chunk_overlap=chunk_overlap,
   length_function=len,
   is_separator_regex=False,
)
docs = loader.load_and_split(text_splitter=text_splitter)

from utils import generate_embeddings_from_fastext_model
# Generate the embeddings for the data
df = generate_embeddings_from_fastext_model(docs, embed_model)

from utils import create_new_collection
# Create a new collection with manage access
create_new_collection(url, jwt, collection_name, df, vector_size, batch_size, delete_prev = True, create_from_scratch = True)

Enter fullscreen mode Exit fullscreen mode

We do the same for the financial data.

from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders.pdf import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

collection_type = 'financial'
root = '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data'
data_path = pjoin(root, collection_type)
collection_name = collection_type

# Load the documents from the directory
loader = DirectoryLoader(data_path, loader_cls=PyPDFLoader)

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=chunk_size,
   chunk_overlap=chunk_overlap,
   length_function=len,
   is_separator_regex=False,
)
docs = loader.load_and_split(text_splitter=text_splitter)

from utils import generate_embeddings_from_fastext_model
# Generate the embeddings for the data
df = generate_embeddings_from_fastext_model(docs, embed_model)

from utils import create_new_collection
# Create a new collection with manage access
create_new_collection(url, jwt, collection_name, df, vector_size, batch_size, delete_prev = True, create_from_scratch = True)
Enter fullscreen mode Exit fullscreen mode

Great! Now we have created two collections, ‘general’ and ‘financial’ collections. Let’s see if we can read these collections with different sets of tokens having different permissions.

Collection Specific Access

With this access, we can limit the access of the user to a specific collection only. This is the most secure way to grant access to the user. We can also limit the access to the types of documents in that collection or pages as well. Let’s see how we can do this.

In our Qdrant Vector Database, we now have two collections, ‘general’ and ‘financial’. As can be understood from the names, the ‘general’ collection contains general data, and the ‘financial’ collection contains financial data. Due to the nature of the data, we want to restrict access of each user to specific collections as per their roles in the organization.

Read-Only Access

Here, in this section, we will create a token with read-only access to a specific collection only. Let’s see how it behaves.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has read access to the general collection only.
# You can give access to multiple collections by adding multiple dictionaries in the access list.
# For now, we are only giving access to the general collection.
# It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "access": [
   {
     "collection": "general", # collection name - Change this to the collection you want to give access to, like financial
     "access": "r"
   },
 ]
}

# Generate the JWT token
# This token will be used to authenticate the user.
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
client.get_collections()
Enter fullscreen mode Exit fullscreen mode
# Output of the above code
CollectionsResponse(collections=[CollectionDescription(name='general')])
Enter fullscreen mode Exit fullscreen mode

This is great. We can see that the user can only access the ‘general’ collection and not the ‘financial’ collection. Now let’s try to verify if the user has read-only access to the ‘general’ collection.

import numpy as np

# We are generating a random query vector of size vector_size
query_vector = np.random.rand(vector_size)

# We are searching for the closest points to the query vector in the general collection
# Since we have the read access to the general collection, we can search in it.
hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits
Enter fullscreen mode Exit fullscreen mode
# Output of the above code
[ScoredPoint(id=11, version=2, score=0.07114598, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'is built using the Rust language - a static multi-paradigm, memory-efficient, low-level programming language focused on speed, security, and performance. The intention is to build Qdrant with as few moving parts as possible, thereby keeping the attack vector as low as possible. Email security Qdrant supports TLS encryption on all inbound and outbound emails. Qdrant uses Gmail to provide email and communication services. For an explanation of how email encryption works, take a look at this'}, vector=None, shard_key=None),
 ScoredPoint(id=10, version=2, score=0.045524757, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'within the Qdrant Cloud platform, and only the necessary ports are opened on each server. All outbound connections pass through the stateless access control rules, whilst inbound connections from the internet must pass through a secure, highly-available load balancer layer, and the stateless access control firewall rules before then being routed to each server. Software security We take the security of the Qdrant code very seriously. The database is built using the Rust language - a static'}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=2, score=0.043969806, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'All servers are tested for vulnerability and intrusion detection quarterly. The servers and services hosted on them are certified as complying with the PCI Data Security Standard established by the PCI Security Standards Council, which is an open global forum for the development, enhancement, storage, dissemination, and implementation of security standards for account data protection. The certification confirms that the services adhere to the PCI DSS Level 4 requirements for security management,'}, vector=None, shard_key=None),
 ScoredPoint(id=12, version=2, score=0.03491432, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'how email encryption works, take a look at this overview from Google. Data residency On Qdrant Cloud, the location of data can be specified. Locations may include London, Ireland, Belgium, Germany, Switzerland, North America, South America, Australia, Canada, Tokyo, or Singapore. Data will not be moved or replicated outside of a specified location. Data in transit All data is encrypted when it is being transmitted between client devices and Qdrant Cloud. SSL/TLS certificates shield data using'}, vector=None, shard_key=None),
 ScoredPoint(id=13, version=2, score=0.027367812, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'Cloud. SSL/TLS certificates shield data using 256-byte signatures and either 2048-bit or 4096-bit keys. All connections to Content Delivery Network (CDN) servers and the database layer are'}, vector=None, shard_key=None)]
Enter fullscreen mode Exit fullscreen mode

As we can see, the user can read the ‘general’ collection. Now let’s try to update the ‘general’ collection with the same read-only token. Let’s hope that it fails.

import numpy as np
from qdrant_client.models import PointStruct

# We are generating 100 random vectors of size vector_size
vectors = np.random.rand(100, vector_size)

# We are upserting these vectors in the general collection
Since we have read-only access to the general collection, we won't be able to insert the vectors.
client.upsert(
  collection_name="general",
  points=[
     PointStruct(
           id=idx,
           vector=vector.tolist(),
           payload={"color": "red", "rand_number": idx % 10}
     )
     for idx, vector in enumerate(vectors)
  ]
)
Enter fullscreen mode Exit fullscreen mode
Output of the above code
: Unexpected Response: 403 (Forbidden)
Raw response content:
b'{"status":{"error":"Forbidden: Write access to collection general is required"},"time":0.000079842}'
Enter fullscreen mode Exit fullscreen mode

As we’d guessed, the API returned a 403 Forbidden error saying, ‘Write access to collection general is required’. This is great. Now let’s see if we can read the financial collection with the same read-only token for the general collection. Just a heads up — this should fail.

We are generating a random query vector of size vector_size
query_vector = np.random.rand(vector_size)

We are searching for the closest points to the query vector in the general collection
# Since we have read-only access to the general collection only, we won't be able to search in financial collection.
hits = client.search(
  collection_name="financial",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits
Enter fullscreen mode Exit fullscreen mode
Output of the above code
: Unexpected Response: 403 (Forbidden)
Raw response content:
b'{"status":{"error":"Forbidden: Access to collection financial is required"},"time":7.61e-6}'
Enter fullscreen mode Exit fullscreen mode

Great. Once again, the API returned a 403 Forbidden error saying, ‘Access to collection financial is required’. In the next section, let’s test with read-write access to a specific collection. We will also see how we can grant access to multiple collections to a user.

Read-Write Access

Here we will grant the user read-write access to the ‘general’ collection only. And, on top of that, we will limit access of the ‘financial’ collection to read-only.

import time
from utils import generate_jwt

current_time = int(time.time())

This payload, along with the API, is used to generate the JWT token.
This token indicates that the user has access to two collections: general and financial.
# Access to the general collection is read-write and access to the financial collection is read-only.
It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "access": [
   {
     "collection": "general",
     "access": "rw"
   },
   {
     "collection": 'financial',
     "access": "r"
   }
 ]
}

Generate the JWT token
This token will be used to authenticate the user.
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient
client = QdrantClient(url=url, api_key=jwt)

collection = client.get_collections()
collection
Enter fullscreen mode Exit fullscreen mode
Output of the above code
CollectionsResponse(collections=[CollectionDescription(name='general'), CollectionDescription(name='financial')])
Enter fullscreen mode Exit fullscreen mode

Nice! The user has access to both the collections, ‘general’ and ‘financial’. Now let’s try to update the ‘general’ collection with the same token. Since the token has read-write access to the ‘general’ collection, it should work.

import numpy as np
from qdrant_client.models import PointStruct

# We are generating 100 random vectors of size vector_size
vectors = np.random.rand(100, vector_size)

We are inserting these vectors in the general collection
Since we have read-write access to the general collection, we can insert the vectors.
client.upsert(
  collection_name="general",
  points=[
     PointStruct(
           id=idx,
           vector=vector.tolist(),
           payload={"color": "red", "rand_number": idx % 10}
     )
     for idx, vector in enumerate(vectors)
  ]
)
Enter fullscreen mode Exit fullscreen mode
Output of the above code
UpdateResult(operation_id=1, status=<UpdateStatus.COMPLETED: 'completed'>)
Enter fullscreen mode Exit fullscreen mode

Looks good. We can see that the user can update the ‘general’ collection. Now let’s try to update the ‘financial’ collection with the same token. This should fail as the token has only read-only access to the ‘financial’ collection.

import numpy as np
from qdrant_client.models import PointStruct

vectors = np.random.rand(100, vector_size)
client.upsert(
  collection_name="financial",
  points=[
     PointStruct(
           id=idx,
           vector=vector.tolist(),
           payload={"color": "red", "rand_number": idx % 10}
     )
     for idx, vector in enumerate(vectors)
  ]
)
Enter fullscreen mode Exit fullscreen mode
Output of the above code
: Unexpected Response: 403 (Forbidden)
Raw response content:
b'{"status":{"error":"Forbidden: Write access to collection financial is required"},"time":0.000062905}'
Enter fullscreen mode Exit fullscreen mode

As expected, the API returned a 403 Forbidden error saying ‘Write access to collection financial is required’. Now let’s try to do a final check for this token. Let’s see if the user can read the ‘financial’ collection.

We are generating a query vector from a string that was available in one of the documents in the general collection
x = "based on the equation: assets = liabilities + owners' equity."
query_vector = embed_model.get_sentence_vector(x).tolist()

We are searching for the closest points to the query vector in the general collection
Since we have read-write access to the general collection, we can search in it.
hits = client.search(
  collection_name="financial",
  query_vector=query_vector,
  limit=20  # Return 5 closest points
)
hits
Enter fullscreen mode Exit fullscreen mode
Output of the above code
[ScoredPoint(id=3, version=0, score=0.7493207, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': "Balance Sheet The balance sheet is based on the equation: assets = liabilities + owners' equity . It indicates everything the company owns (assets), everything the company owes to creditors (liabilities) and the value of the ownership stake in the company (shareholders' equity, or capital). The balance sheet date is the ending date of the period or year and is a continuation of the amounts recorded since the"}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=0, score=0.7304637, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'period. Sources of cash listed on the statement include revenues, long-term financing, sales of non-current assets, an increase in any current liability account or a decrease in any current asset account. Uses of cash include operating losses, debt repayment, equipment purchases and increases in current asset accounts.'}, vector=None, shard_key=None),
 ScoredPoint(id=4, version=0, score=0.72308195, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'inception of the company or organization. The balance sheet is a "snapshot" of the financial position of the company at the balance sheet date and shows the accumulated balance of the accounts. Assets and liabilities are separated between current  and long-term , where current items are those items which will be realized or paid within one year of the balance sheet date. Typical current assets are cash, prepaid expenses, accounts receivable and inventory. Income Statement'}, vector=None, shard_key=None),
 ScoredPoint(id=26, version=0, score=0.7186612, payload={'metadata': {'page': 6, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Bank term loan bearing interest at prime plus 2%, repayable in monthly principal instalments of $2,100.00plus interest to November 2007, secured by a general security agreement on the assets of the company and a personal guaranteefrom the shareholder. 2002-2001 $ 111,300 $ Less current portion: 25,200; $ 86,100; approximate principal repayments are as follows: 2004 $ 25,2002005 25,2002006 25,2002007 10,500 $ 86,100 5. STATED CAPITAL Authorized: Unlimited number of Common shares'}, vector=None, shard_key=None),
 ScoredPoint(id=23, version=0, score=0.7100816, payload={'metadata': {'page': 5, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Significant Accounting Policies INVENTORY The inventory is valued at the lower of cost or market, with cost being  determined on a first-in, first-out basis. PROPERTY, PLANT AND EQUIPMENT Property, plant and equipment are stated at cost less accumulated amortization. Amortization is recorded at rates designed to amortize the cost of capital assets overtheir estimated useful lives. Amortization rates used are as follows: Furniture and equipment 20% declining balance'}, vector=None, shard_key=None),
 ScoredPoint(id=16, version=0, score=0.70792955, payload={'metadata': {'page': 3, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Deposits and prepaid expenses (254)           688                Inventory (2,487)        (904)               Accounts payable and accrued liabilities (9,290)        34,543           Long-term debt, current portion: 25,200; income tax payable: 14,387       2,206          Cash flows from operating activities: 115,402; 85,966        CASH FLOWS FROM INVESTING ACTIVITIE S Acquisition of property, plant and equipment (1,426)        (10,342)'}, vector=None, shard_key=None),
 ScoredPoint(id=18, version=0, score=0.70589364, payload={'metadata': {'page': 3, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'CASH (DEFICIENCY) RESOURCES: Beginning of Year (69,474)      17,789        CASH RESOURCES (DEFICIENCY) - End of Yea r $ 11,552    $ (69,474)     Cash resources (deficiency) is comprised of: Cash: 11,552 $; bank overdraft: 9,474 Bank loan: (60,000) $ 11,552 $ (69,474) The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.'}, vector=None, shard_key=None),
 ScoredPoint(id=14, version=0, score=0.7057368, payload={'metadata': {'page': 2, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'DIVIDENDS -- (16,000)       RETAINED EARNINGS (DEFICIT) - End of Yea r $ 17,166 $ (61,350) The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.'}, vector=None, shard_key=None),
 ScoredPoint(id=15, version=0, score=0.7047419, payload={'metadata': {'page': 3, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'XYZ COMPANY LIMITE D STATEMENT OF CASH FLO W FOR THE YEAR ENDE D JUNE 30, 2002 UNAUDITED - See "Notice to Reader" 2002–2001. CASH FLOWS FROM OPERATING ACTIVITIE S Net income for the year was $78,516; $8,810 Adjustment for:   Amortization 17,854 16,856   Loss on disposal of property, plant and equipment: 387   Gain on disposal of investment (16,149) Cash derived from operations: 80,221 and 26,053 Decrease (increase) in working capital items    Accounts receivable 7,625         23,380'}, vector=None, shard_key=None),
 ScoredPoint(id=17, version=0, score=0.6888898, payload={'metadata': {'page': 3, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Proceeds from disposal of property, plant and equipment -- 3,113          Proceeds from disposal of investment: 61,150       Dividends: 16,000  Cash flows from investing activities: 59,724       (23,229)       CASH FLOWS FROM FINANCING ACTIVITIE S Advances from (repayments to) shareholder (180,200) and (150,000)     Acquisition of (repayment of) long-term debt 86,100       -- (94,100)     (150,000)     NET INCREASE (DECREASE) IN CASH RESOURCES 81,026      (87,263)'}, vector=None, shard_key=None),
 ScoredPoint(id=1, version=0, score=0.68225944, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Understanding Basic Financial Statements During the accounting cycle, the accounting system is used to track, organize and record the financial transactions of an organization. At the close of each period, the information is used to prepare the financial statements, which are usually composed of a balance sheet (statement of financial position); income statement (statement of income and expenses); statement of retained earnings (owners’ equity); and a statement of cash flow.'}, vector=None, shard_key=None),
 ScoredPoint(id=7, version=0, score=0.6741429, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': "income or loss is added to the opening amount of retained earnings to arrive at the closing retained earnings. Retained earnings can be decreased by such items as dividends paid to shareholders. On the sample financial statements shown below, the statement of retained earnings is combined with the income statement presentation. Statement of Cash Flow The statement of cash flow shows all sources and uses of a company's cash during the accounting period."}, vector=None, shard_key=None),
 ScoredPoint(id=11, version=0, score=0.67381775, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': '17,167 (61,349) $ 276,498 $ 331,259 APPROVED The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.'}, vector=None, shard_key=None),
 ScoredPoint(id=21, version=0, score=0.6718547, payload={'metadata': {'page': 4, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': '$ 286,817 $ 339,905 The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.'}, vector=None, shard_key=None),
 ScoredPoint(id=25, version=0, score=0.6679283, payload={'metadata': {'page': 6, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'XYZ COMPANY LIMITED NOTES TO THE FINANCIAL STATEMENTS FOR THE YEAR ENDED JUNE 30, 2002 UNAUDITED - See "Notice to Reader." 3. DUE TO SHAREHOLDER The amount due to the shareholder bears interest at a rate determined annually and has no fixed terms of repayment.Interest paid for 2002 was $1,823 (2001 - $6,831) 4. LONG - TERM DEBT Bank term loan bearing interest at prime plus 2%,'}, vector=None, shard_key=None),
 ScoredPoint(id=13, version=0, score=0.66429466, payload={'metadata': {'page': 2, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'INCOME FROM OPERATIONS 77,855      8,860          OTHER INCOME (EXPENSES) Loss on disposal of property, plant and equipment (387)            Gain on sale of investment: 16,149       -- Miscellaneous (1,101)        337             15,048      (50)              NET INCOME BEFORE TA X 92,903      8,810          INCOME TAX  EXPENSE 14,387      -- NET INCOME 78,516      8,810          (DEFICIT) Beginning of Yea r (61,350)     (54,160)       DIVIDENDS -- (16,000)'}, vector=None, shard_key=None),
 ScoredPoint(id=5, version=0, score=0.6599545, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': "Income Statement An income statement is a type of summary flow report that lists and categorizes the various revenues and expenses that result from operations during a given period—a year, a quarter or a month. The difference between revenues and expenses represents a company's net income or net loss. The amounts shown in the income statement are the amounts recorded for the given period (a year, a"}, vector=None, shard_key=None).
 ScoredPoint(id=6, version=0, score=0.65201813, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'quarter or a month . The next period’s income statement will start over with all amounts reset to zero. While the balance sheet shows accumulated balances since inception, the income statement only shows the amounts earned or expensed during the period in question. Statement of Retained Earnings The statement of retained earnings shows the amount of accumulated earnings that have been retained within the company since its inception. At the end of each fiscal year-end, the amount of net'}, vector=None, shard_key=None),
 ScoredPoint(id=22, version=0, score=0.6324039, payload={'metadata': {'page': 5, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'XYZ COMPANY LIMITED NOTES TO THE FINANCIAL STATEMENTS FOR THE YEAR ENDED JUNE 30, 2002 UNAUDITED - See "Notice to Reader" 1. SIGNIFICANT ACCOUNTING POLICIES AND GENERAL INFORMATION Nature of Business The company is a Canadian-controlled private corporation subject to the Business Corporations Act, 1982 (Ontario), was incorporated in May 1995 and operates as a manufacturer of widgets in Anytown, Ontario. Significant Accounting Policies INVENTORY'}, vector=None, shard_key=None),
 ScoredPoint(id=10, version=0, score=0.62833935, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': "$ 276,498 $ 331,259 LIABILITIES CURRENT Bank overdraft $ -- $ 9,474Bank loan: $60,000 Accounts payable and accrued liabilities: 82,053; 91,343Long-term debt: current portion 25,200 --income tax payable 14,387 -- 121,640 -- 160,817 DUE TO SHAREHOLDER (Note 3) 51,591 231,791LONG-TERM DEBT (Note 4): 86,100 -- 259,331 -- 392,608 SHAREHOLDER'S EQUIT AND STATED CAPITAL (Note 5) 1 1 RETAINED EARNINGS (DEFICIT) 17,166 (61,350) 17,167 (61,349) $ 276,498 $ 331,259 APPROVED"}, vector=None, shard_key=None)]
Enter fullscreen mode Exit fullscreen mode

Great! The user can read the ‘financial’ collection. In the next section, let’s go ahead and see how we can limit user access within a single collection. We basically want to limit access to specific types of documents in the collection.

Document-Specific Access

In this last section, we will limit the user's access to specific types of documents in the collection. This is one of the most secure ways to grant access to the user. Assume a scenario where you have a collection of ‘general’ data and, in that collection, you have multiple types of documents, You can limit the access of the user to specific types of documents only. Let’s see how we can do this.

Here we create a token that allows the user to access only the general collection and the documents, which are named ‘security_policy.pdf’.

import time
from utils import generate_jwt

current_time = int(time.time())

This payload, along with the API, is used to generate the JWT token.
This token indicates that the user has access to the general collection only.
# Access to the general collection is read-write.
It also specifies that the token only limits access to the document security_policy.pdf in the general collection.
It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "access": [
   {
     "collection": "general",
     "access": "rw",
     "payload": {
       "metadata.source": "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf",
     }
   },
 ]
}

Generate the JWT token
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)

We are generating a query vector from a string that was available in the document security_policy.pdf in the general collection
x = 'take the security of Qdrant code'
query_vector = embed_model.get_sentence_vector(x).tolist()

We are searching for the closest points to the query vector in the general collection
hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits
Enter fullscreen mode Exit fullscreen mode
Output of the above code
[ScoredPoint(id=10, version=2, score=0.8121032, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'within the Qdrant Cloud platform, and only the necessary ports are opened on each server. All outbound connections pass through the stateless access control rules, whilst inbound connections from the internet must pass through a secure, highly-available load balancer layer and the stateless access control firewall rules before being routed to each server. Software security We take the security of the Qdrant code very seriously. The database is built using the Rust language (a static'}, vector=None, shard_key=None).
 ScoredPoint(id=5, version=2, score=0.7998578, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'into account the impact of company threats and vulnerabilities; must design and implement a comprehensive suite of information security controls and other forms of risk management to address company and architecture security risks; and adopt an overarching management process to ensure that the information security controls meet the information security needs on an ongoing basis. In addition, all hosting providers are certified at PCI DSS Level 1, which means that the application is run on the'}, vector=None, shard_key=None),
 ScoredPoint(id=4, version=2, score=0.77687603, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'centers are staffed 24x7x365 by security guards, and access is authorized strictly on a least privileged basis. The cloud hosting providers are certified with the ISO 9001:2008, ISO 27001:2013, ISO 27017:2015, and ISO 27018:2014 security standards—global standards that outline the requirements for information security management systems. This requires that the hosting provider systematically evaluate its information security risks, taking into account the impact of company threats and'}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=2, score=0.76660895, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'All servers are tested for vulnerability and intrusion detection quarterly. The servers and services hosted on them are certified as complying with the PCI Data Security Standard established by the PCI Security Standards Council, which is an open global forum for the development, enhancement, storage, dissemination, and implementation of security standards for account data protection. The certification confirms that the services adhere to the PCI DSS Level 4 requirements for security management,'}, vector=None, shard_key=None),
 ScoredPoint(id=3, version=2, score=0.7539886, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'will work with you to assess and understand the scope of the issue and fully address any concerns. Any emails are immediately sent to our engineering staff to ensure that issues are addressed rapidly. Any security emails are treated with the highest priority, as the safety and security of our service are our primary concerns. Physical security Qdrant Cloud services are hosted on Google Cloud Computing, Amazon Web Services, and Azure. The data centers are staffed 24x7x365 by security guards. '}, vector=None, shard_key=None)]
Enter fullscreen mode Exit fullscreen mode

As we can see, the search query returned only chunks of the document'security_policy.pdf’. It did not return any other documents. Next, let’s try to go even further and limit access to a specific page of the document. Let’s see how we can do this.

On top of all the previous restrictions, we have also limited access to the second page of the document'security_policy.pdf’. Let’s see if the user can access any other page except the second page of the document.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has access to the general collection only.
# The access to the general collection is read-write.
# It also specifies that the token only limits the access to the document security_policy.pdf in the general collection.
# It also specifies that the token only limits the access to the second page (page index starts with 0) of the document.
# It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "access": [
   {
     "collection": "general",
     "access": "rw",
     "payload": {
       "metadata.source": "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf",
       "metadata.page": 1
     }
   },
 ]
}

# Generate the JWT token
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)

x = 'take the security of Qdrant code'
query_vector = embed_model.get_sentence_vector(x).tolist()

hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=20  # Return 5 closest points
)
hits
Enter fullscreen mode Exit fullscreen mode
# Output of the above code
[ScoredPoint(id=10, version=2, score=0.8121032, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'within the Qdrant Cloud platform, and only the necessary ports are opened on each server. All outbound connections pass through the stateless access control rules, whilst inbound connections from the internet must pass through a secure, highly-available load balancer layer, and the stateless access control firewall rules before then being routed to each server. Software security We take the security of the Qdrant code very seriously. The database is built using the Rust language - a static'}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=2, score=0.76660895, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'All servers are tested for vulnerability and intrusion detection quarterly. The servers and services hosted on them are certified as complying with the PCI Data Security Standard established by the PCI Security Standards Council, which is an open global forum for the development, enhancement, storage, dissemination, and implementation of security standards for account data protection. The certification confirms that the services adhere to the PCI DSS Level 4 requirements for security management,'}, vector=None, shard_key=None),
 ScoredPoint(id=11, version=2, score=0.74504614, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'is built using the Rust language - a static multi-paradigm, memory-efficient, low-level programming language focused on speed, security, and performance. The intention is to build Qdrant with as few moving parts as possible, thereby keeping the attack vector as low as possible. Email security Qdrant supports TLS encryption on all inbound and outbound emails. Qdrant uses Gmail to provide email and communication services. For an explanation of how email encryption works, take a look at this'}, vector=None, shard_key=None),
 ScoredPoint(id=9, version=2, score=0.71566045, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'DSS Level 4 requirements for security management, policies, procedures, network architecture, software design, and other critical protective measures. Network security The system is designed with scalability and redundancy in mind. Web load balancers and database servers are distributed globally across geographically dispersed data centers in different operating regions. Each database server has its own firewall configuration based on its role within the Qdrant Cloud platform, and only the'}, vector=None, shard_key=None),
 ScoredPoint(id=12, version=2, score=0.70694244, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'how email encryption works, take a look at this overview from Google. Data residency On Qdrant Cloud, the location of data can be specified. Locations may include London, Ireland, Belgium, Germany, Switzerland, North America, South America, Australia, Canada, Tokyo, or Singapore. Data will not be moved or replicated outside of a specified location. Data in transit All data is encrypted when it is being transmitted between client devices and Qdrant Cloud. SSL/TLS certificates shield data using'}, vector=None, shard_key=None),
 ScoredPoint(id=13, version=2, score=0.63619953, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'Cloud. SSL/TLS certificates shield data using 256-byte signatures and either 2048-bit or 4096-bit keys. All connections to Content Delivery Network (CDN) servers and the database layer are'}, vector=None, shard_key=None)]
Enter fullscreen mode Exit fullscreen mode

As we can see, the exact same search query like the last section returned only the second page of the document ‘security_policy.pdf’, as expected.

Before I end this tutorial, let me show you one more way to limit the access of the user. Here we will limit the access of the user by using the ‘value_exists’ filter. This basically means that the user can only access the collection if the specific field exists in the document. Though this can be extended to several use cases, like user_id, user_role, etc, for the ease of this tutorial, let’s just use the ‘value_exists’ filter to check the presence of the document type. If the document type exists, then only the user can access the collection.

Let’s first see what happens if the document type does not exist in the document.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has access to the general collection only.
# The access to the general collection is read-write.
# Apart from the access to read-write the general collection, the token also specifies to check
# if the metadata.source key in the document matches the value "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/avengers-endgame-script-pdf.pdf".
# if it matches, then the user will have read-write access to the general collection.
# if it doesn't match, then the user won't have any access to the general collection.
# It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "value_exists": {
   "collection": "general",
   "matches": [
     {
         "key": "metadata.source",
         "value": "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/blah blah blah.pdf"
     }
   ]
 },
 "access": [
   {
     "collection": "general",
     "access": "rw",
   },
 ]
}

# Generate the JWT token
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)

# We are generating a query vector from a string that was available in the document avengers-endgame-script-pdf.pdf in the general collection
x = 'take the security of Qdrant code'
query_vector = embed_model.get_sentence_vector(x).tolist()

hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits
Enter fullscreen mode Exit fullscreen mode
# Output of the above code
UnexpectedResponse: Unexpected Response: 401 (Unauthorized)
Raw response content:
b'Invalid JWT, stateful validation failed'
Enter fullscreen mode Exit fullscreen mode

Woah! The API returned a 401 Unauthorized error. It clearly says that the validation failed! This is great. Let’s see what happens if the document type exists in the document.

import time

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has access to the general collection only.
# The access to the general collection is read-write.
# Apart from the access to read-write the general collection, the token also specifies to check
# if the metadata.source key in the document matches the value "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/avengers-endgame-script-pdf.pdf".
# if it matches, then the user will have read-write access to the general collection.
# if it doesn't match, then the user won't have any access to the general collection.
# It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "value_exists": {
   "collection": "general",
   "matches": [
     {
         "key": "metadata.source",
         "value": "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/avengers-endgame-script-pdf.pdf"
     }
   ]
 },
 "access": [
   {
     "collection": "general",
     "access": "rw",
   },
 ]
}

# Generate the JWT token
jwt = generate_jwt(api, payload)

client = QdrantClient(url=url, api_key=jwt)

x = 'take the security of Qdrant code'
query_vector = embed_model.get_sentence_vector(x).tolist()

hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits
Enter fullscreen mode Exit fullscreen mode
# Output of the above code
[ScoredPoint(id=10, version=2, score=0.8121032, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'within the Qdrant Cloud platform, and only the necessary ports are opened on each server. All outbound connections pass through the stateless access control rules, whilst inbound connections from the internet must pass through a secure, highly-available load balancer layer, and the stateless access control firewall rules before then being routed to each server. Software security We take the security of the Qdrant code very seriously. The database is built using the Rust language - a static'}, vector=None, shard_key=None),
 ScoredPoint(id=5, version=2, score=0.7998578, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'into account the impact of company threats and vulnerabilities; must design and implement a comprehensive suite of information security controls and other forms of risk management to address company and architecture security risks; and adopt an overarching management process to ensure that the information security controls meet the information security needs on an ongoing basis. In addition, all hosting providers are certified at PCI DSS Level 1, which means that the application is run on the'}, vector=None, shard_key=None),
 ScoredPoint(id=4, version=2, score=0.77687603, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'centers are staffed 24x7x365 by security guards, and access is authorized strictly on a least privileged basis. The cloud hosting providers are certified with the ISO 9001:2008, ISO 27001:2013, ISO 27017:2015, and ISO 27018:2014 security standards - global standards that outline the requirements for information security management systems. This requires that the hosting provider must systematically evaluate its information security risks, taking into account the impact of company threats and'}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=2, score=0.76660895, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'All servers are tested for vulnerability and intrusion detection quarterly. The servers and services hosted on them are certified as complying with the PCI Data Security Standard established by the PCI Security Standards Council, which is an open global forum for the development, enhancement, storage, dissemination, and implementation of security standards for account data protection. The certification confirms that the services adhere to the PCI DSS Level 4 requirements for security management,'}, vector=None, shard_key=None),
 ScoredPoint(id=3, version=2, score=0.7539886, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'will work with you to assess and understand the scope of the issue and fully address any concerns. Any emails are immediately sent to our engineering staff to ensure that issues are addressed rapidly. Any security emails are treated with the highest priority, as the safety and security of our service are our primary concerns. Physical security Qdrant Cloud services are hosted on Google Cloud Computing and Amazon Web Services, and Azure. The data centers are staffed 24x7x365 by security guards,'}, vector=None, shard_key=None)]
Enter fullscreen mode Exit fullscreen mode

Nice! It validated the document type and confirmed that the document type exists in the document. Once the validation is successful, it returned the chunks of the document from the correct document irrespective of the validation document type.

Finally, we have seen how we can use the Qdrant Vector Database with RBAC enabled and how we can grant access to the user based on their roles. We have seen how we can grant global read-only access, global manage access, collection-specific access, and document-specific access. We have also seen how we can limit the access of the user by using the ‘value_exists’ filter. This is a very powerful feature of Qdrant and can be used in various use cases.

Conclusion

Role-Based Access Control is super important for keeping our data safe and making sure the right people have the right access. When we mix RBAC with a Hybrid Cloud setup, it gives us a lot more flexibility to store and manage our data in different ways. Qdrant really shines here because it lets us control access in a really detailed way using JWT and Hybrid Cloud. Unlike some other databases like Pinecone, Milvus, Chroma, and Weaviate, Qdrant stands out for its strong security and privacy features. In this blog, I showed how we can get Qdrant up and running in a Hybrid Cloud setup and set up JWT for RBAC, showing just how easy and effective it can be to manage access in today’s data environments.

GitHub Repo

The codes for this blog can be found at https://github.com/quamernasim/Role-Based-Access-Control-of-Qdrant-Vector-Database

References

https://qdrant.tech/documentation/guides/security/
https://qdrant.tech/blog/qdrant-1.9.x/
https://quamernasim.medium.com/hindi-language-ai-chatbot-for-enterprises-using-llama-3-qdrant-ollama-langchain-and-mlflow-9b69391d3348

This article was originally published on: https://quamernasim.medium.com/enhancing-data-security-with-role-based-access-control-of-qdrant-vector-database-3878769bec83

Top comments (0)