At present, Apache AGE does not support multiple labels in its vertices and edges. Furthermore, it utilizes the unique label identifier (ID) to create the unique identifier (ID) for the vertices and edges.
My proposal here is to present a alternative solution for that problem and discuss any details. I'm open for feedbacks about it.
Read about Apache AGE here: https://age.apache.org/
Github here: https://github.com/apache/age
Solution #1: Index for each label and a junction table
My solution will use the power of indexing to fetch data quickly. The table vertex
will store a unique identifier (a simple sequential ID) and its properties as they exist today. The edge
will have the same fields, in addition to their start_vertex
and end_vertex
. This will look like the image provided here:
Going forward, I will use the terms 'node' and 'vertex' interchangeably. To store the label in the node or edge, I've considered using an array to store all labels or creating columns for each label. However, both of these solutions have their drawbacks. If we use an array to store the label references within the node and edge data, it will be difficult to efficiently query and update data. If we create a separate column in the node and edge table for each label, we may end up with a large number of columns, which can become difficult to manage and maintain as the number of labels grows. Additionally, adding or removing label would require modifying the node or edge table schema, which can be disruptive.
Therefore, I propose another solution. We should create a table label
, that store the label_id
and label_name
, and a junction table vertex_labels
that links the table vertex
and the table label
. This junction table will have a composite primary key on the vertex_id
and label_id
columns, ensuring that it has no duplicates, and two foreign key columns: one that references the vertex table and one that references the label table. This allows a single node to have multiple label references, and the same solution applies to edges.
Here is an example of how the tables would look like:
As I mentioned earlier, we will be using indexes, so we can create two indexes on the node_labels
table: one on the label_id
column, and another on the composite key of node_id
and label_id
. The first index allows us to quickly locate all nodes with a specific label, while the second index allows us to quickly locate all labels associated with a specific node.
I think this is a simple solution, so it's feasible, and it solves both problems particularly well.
Top comments (1)
I think that's a great idea to start with! I've been working with label inheritance on AGE, and I can see this as a possibility for AGE supporting both label inheritance and composition for the vertices (possibly). For the label inheritance, the vertices and edges are stored in their own label table, we figured out a way that could make a table inherit another so that, when querying for the parent label table, it showed the edges or vertices in their child labels. One thing that I think you can do is create tables for each label, where these tables can inherit from one another and contain the data for the vertices and edges, but still preserving the idea you proposed of having a table for all labels and the
vertex_labels
table showing which vertices are composed of other labels.Let's say that you have a
Person
label that could have all the people from the graph and also aBook
label. We could use both inheritance and composition design patterns in this graph. The content of the books might includeFictionBook
,RomanceBook
,ComicBook
, and anAuthor
can also be aPerson
, applying the inheritance principle. But composition can be added to this since a Book has an Author.For example, we want to add a comic book and it's authors to the graph. So we have the labels:
Book
,Comic
,Person
,Author
in theLabels
table.Comic
inherits fromBook
andAuthor
inherits fromPerson
. Now, you could renamevertex_labels
toHierarchyLabels
and then store the vertex id with theComic
id and also the vertex id with theAuthor
id.But all of this is just an idea to add both inheritance and composition designs with AGE. Overall, I found your idea pretty nice! :D