Graph representation in relational tables

#apacheage #graphdatabase #database #postgres

Quick overview about making graphs in relational tables representation

We will have 2 tables Vertices and edges tables.

Vertices tables:

CREATE TABLE vertices (
vertex_id integer PRIMARY KEY,
properties json
);

Edges table

CREATE TABLE edges (
edge_id integer PRIMARY KEY,
from_vertex integer REFERENCES vertices (vertex_id),
to_vertex integer REFERENCES vertices (vertex_id),
label text,
properties json
);

Index the from-to vertices ids in the edges table (for faster access)

CREATE INDEX edges_from ON edges (from_vertex);
CREATE INDEX edges_to ON edges (to_vertex);

Json operators in PostgreSQL

Some important aspects of this model are:

Any vertex can have an edge connecting it with any other vertex. There is no schema that restricts which kinds of things can or cannot be associated.
Given any vertex, you can efficiently find both its incoming and its outgoing edges, and thus traverse the graph—i.e., follow a path through a chain of vertices — both forward and backward.
By using different labels for different kinds of relationships, you can store several different kinds of information in a single graph, while still maintaining a clean data model.

Querying vertices table:

SELECT * FROM vertices WHERE properties->>'json_filed' = 'Peru';

Querying edges table

SELECT * FROM edges WHERE from_vertex = (**SELECT QUERY**) and to_vertex = (**SELECT QUERY**)

So, what are the hardest parts is it like a piece of cake? The hardest parts comes when trying to make complex query like which has multiple edges in between with specific filtration
So, keeping on mind that query in OpenCypher

MATCH
(person) -[:BORN_IN]-> () -[:WITHIN*0..]-> (us:Location {name:'United States'}),
(person) -[:LIVES_IN]-> () -[:WITHIN*0..]-> (eu:Location {name:'Europe'})
RETURN person.name

Is equivalent to that in SQL

WITH RECURSIVE
-- in_usa is the set of vertex IDs of all locations within the United States
in_usa(vertex_id) AS (
SELECT vertex_id FROM vertices WHERE properties->>'name' = 'United States'
UNION
SELECT edges.tail_vertex FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
WHERE edges.label = 'within'
),
-- in_europe is the set of vertex IDs of all locations within Europe
in_europe(vertex_id) AS (
SELECT vertex_id FROM vertices WHERE properties->>'name' = 'Europe'
UNION
SELECT edges.tail_vertex FROM edges
JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
WHERE edges.label = 'within'
),
-- born_in_usa is the set of vertex IDs of all people born in the US
born_in_usa(vertex_id) AS (
SELECT edges.tail_vertex FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
WHERE edges.label = 'born_in'
),
-- lives_in_europe is the set of vertex IDs of all people living in Europe
lives_in_europe(vertex_id) AS (
SELECT edges.tail_vertex FROM edges
JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
WHERE edges.label = 'lives_in'
)
SELECT vertices.properties->>'name'
FROM vertices
-- join to find those people who were both born in the US *and* live in Europe
JOIN born_in_usa ON vertices.vertex_id = born_in_usa.vertex_id
JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id;