Franck Pachot for YugabyteDB

Posted on Oct 7, 2024

YugabyteDB as a Graph database with PuppyGraph

#yugabytedb #graph #postgres #database

PostgreSQL can handle any workload without streaming data to a purpose-built database. This includes graph databases using PuppyGraph, which maps relational tables to a graph model.
When resilience, scalability, and elasticity are necessary, YugabyteDB provides a distributed alternative to PostgreSQL.

This article details the step-by-step tutorial to using PuppyGraph to query data in PostgreSQL, but using YugabyteDB instead of PostgreSQL.

Deployment

I Create a file docker-compose.yaml with the following content:

services:
  puppygraph:
    image: puppygraph/puppygraph:stable
    pull_policy: always
    container_name: puppygraph
    environment:
      - PUPPYGRAPH_USERNAME=puppygraph
      - PUPPYGRAPH_PASSWORD=puppygraph123
    networks:
      yugabytedb_net:
    ports:
      - "8081:8081"
      - "8182:8182"
      - "7687:7687"
  yugabytedb:
    image: yugabytedb/yugabyte:2024.1.2.0-b77
    command: yugabyted start --enable_pg_parity_tech_preview --background=false
    environment:
      - YSQL_USER=puppygraph
      - YSQL_PASSWORD=puppygraph123
    networks:
      yugabytedb_net:
    ports:
      - "5433:5433"
    volumes:
      - ./yugabyte-data:/root/var/data
networks:
  yugabytedb_net:
    name: puppy-yugabytedb

I run the following command to start YugabyteDB and PuppyGraph:

docker compose up -d

I can connect on http://localhost:8080/

Data Preparation

I run the following command to start ysqlsh, the PostgreSQL shell, to access the database:


docker compose run -it -e PGPASSWORD=puppygraph123 -e PGUSER=puppygraph -e PGHOST=yugabytedb -e PGPORT=5433 -e PGDATABASE=yugabyte yugabytedb ysqlsh

I execute the following SQL statements in the shell to create tables and insert data:

create schema modern;
create table modern.person (id text, name text, age integer);
insert into modern.person values
                              ('v1', 'marko', 29),
                              ('v2', 'vadas', 27),
                              ('v4', 'josh', 32),
                              ('v6', 'peter', 35);

create table modern.software (id text, name text, lang text);
insert into modern.software values
                                ('v3', 'lop', 'java'),
                                ('v5', 'ripple', 'java');

create table modern.created (id text, from_id text, to_id text, weight double precision);
insert into modern.created values
                               ('e9', 'v1', 'v3', 0.4),
                               ('e10', 'v4', 'v5', 1.0),
                               ('e11', 'v4', 'v3', 0.4),
                               ('e12', 'v6', 'v3', 0.2);

create table modern.knows (id text, from_id text, to_id text, weight double precision);
insert into modern.knows values
                             ('e7', 'v1', 'v2', 0.5),
                             ('e8', 'v1', 'v4', 1.0);

analyze;

Modeling a Graph

I use the PuppyGraph schema file schema.json with the following content:

{
  "catalogs": [
    {
      "name": "postgres_data",
      "type": "postgresql",
      "jdbc": {
        "username": "puppygraph",
        "password": "puppygraph123",
        "jdbcUri": "jdbc:postgresql://yugabytedb:5433/yugabyte",
        "driverClass": "org.postgresql.Driver"
      }
    }
  ],
  "vertices": [
    {
      "label": "person",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "person",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "type": "Int",
          "name": "age"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    },
    {
      "label": "software",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "software",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "type": "String",
          "name": "lang"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    }
  ],
  "edges": [
    {
      "label": "knows",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "knows",
        "metaFields": {
          "id": "id",
          "from": "from_id",
          "to": "to_id"
        }
      },
      "from": "person",
      "to": "person",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    },
    {
      "label": "created",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "created",
        "metaFields": {
          "id": "id",
          "from": "from_id",
          "to": "to_id"
        }
      },
      "from": "person",
      "to": "software",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    }
  ]
}

I upload the schema by selecting the file schema.json in the

Querying the graph via CLI

I execute the following command to access the PuppyGraph Gremlin Console

docker compose exec -it puppygraph ./bin/console

I run the following queries in the console to query the Graph:

puppy-gremlin> g.V().has("name", "marko").valueMap()
Done! Elapsed time: 0.359s, rows: 1
==>map[age:29 name:marko]
puppy-gremlin> g.V().has("name", "marko").out("knows").out("created").valueMap()
Done! Elapsed time: 0.108s, rows: 2
==>map[lang:java name:lop]
==>map[lang:java name:ripple]
puppy-gremlin>

SQL Queries

I refresh the local cache to check from pg_stat_statements that the queries are fast:

yugabyte=# select total_time/calls as time, calls, substr(query,1,80)
 from pg_stat_statements order by 1
;

Scale-Out YugabyteDB

To add more nodes to the YugabyteDB cluster, you can add a service that starts YugabyteDB and join the first node. Add the following to docker-compose.yaml:

  more-nodes:
    image: yugabytedb/yugabyte:2024.1.2.0-b77
    command: yugabyted start --join yugabytedb --enable_pg_parity_tech_preview --background=false
    networks:
      yugabytedb_net:
    deploy:
     replicas: 0

Start a second node with:

docker compose up -d --scale more-nodes=1

Start a third node with:

docker compose up -d --scale more-nodes=2

With three nodes, the cluster is automatically set to Replication Factor 3 and becomes resilient to one node failure

Conclusion

This tutorial demonstrates using PuppyGraph to query YugabyteDB as a graph database, leveraging the PostgreSQL compatibility of YugabyteDB's distributed architecture. It follows the simple deployment steps from PuppyGraph documentation.

With PuppyGraph, you can directly query your existing data stores without needing ETL pipelines. PuppyGraph and YugabyteDB together provide a powerful solution for modern graph-based analytics.

YugabyteDB. The Distributed SQL Database for Mission-Critical Apps

YugabyteDB is the 100% open source cloud native database for mission critical applications. YugabyteDB runs in any public or hybrid cloud.

yugabyte.com

DEV Community

YugabyteDB as a Graph database with PuppyGraph

Deployment

Data Preparation

Modeling a Graph

Querying the graph via CLI

SQL Queries

Scale-Out YugabyteDB

Conclusion

YugabyteDB. The Distributed SQL Database for Mission-Critical Apps

Top comments (0)

Read next

Prisma & MongoDB: server to be run as a replica set

Postgres TypeScript bindings for "stored procedures"

Introducing Milvus 2.5: Built-in Full-Text Search, Advanced Query Optimization, and More 🚀

Schema Design Patterns For MongoDB