DEV Community

Wendel Fernandes de Lana
Wendel Fernandes de Lana

Posted on

Project: Label with a junction table and indexes

At present, Apache AGE does not support multiple labels in its vertices and edges. Furthermore, it utilizes the unique label identifier (ID) to create the unique identifier (ID) for the vertices and edges.
My proposal here is to present a alternative solution for that problem and discuss any details. I'm open for feedbacks about it.
Read about Apache AGE here: https://age.apache.org/
Github here: https://github.com/apache/age

Solution #1: Index for each label and a junction table

My solution will use the power of indexing to fetch data quickly. The table vertex will store a unique identifier (a simple sequential ID) and its properties as they exist today. The edge will have the same fields, in addition to their start_vertex and end_vertex. This will look like the image provided here:

Image of the tables vertex and edge

Going forward, I will use the terms 'node' and 'vertex' interchangeably. To store the label in the node or edge, I've considered using an array to store all labels or creating columns for each label. However, both of these solutions have their drawbacks. If we use an array to store the label references within the node and edge data, it will be difficult to efficiently query and update data. If we create a separate column in the node and edge table for each label, we may end up with a large number of columns, which can become difficult to manage and maintain as the number of labels grows. Additionally, adding or removing label would require modifying the node or edge table schema, which can be disruptive.

Therefore, I propose another solution. We should create a table label, that store the label_id and label_name, and a junction table vertex_labels that links the table vertex and the table label. This junction table will have a composite primary key on the vertex_id and label_id columns, ensuring that it has no duplicates, and two foreign key columns: one that references the vertex table and one that references the label table. This allows a single node to have multiple label references, and the same solution applies to edges.
Here is an example of how the tables would look like:
Table label with 3 rows: Movie, Actor and American. Table vertex_label linking the vertex 1 with the label Movie and American, and vertex 2 with label Actor and American

As I mentioned earlier, we will be using indexes, so we can create two indexes on the node_labels table: one on the label_id column, and another on the composite key of node_id and label_id. The first index allows us to quickly locate all nodes with a specific label, while the second index allows us to quickly locate all labels associated with a specific node.

I think this is a simple solution, so it's feasible, and it solves both problems particularly well.

Top comments (1)

Collapse
 
matheusfarias03 profile image
Matheus Farias de Oliveira Matsumoto • Edited

I think that's a great idea to start with! I've been working with label inheritance on AGE, and I can see this as a possibility for AGE supporting both label inheritance and composition for the vertices (possibly). For the label inheritance, the vertices and edges are stored in their own label table, we figured out a way that could make a table inherit another so that, when querying for the parent label table, it showed the edges or vertices in their child labels. One thing that I think you can do is create tables for each label, where these tables can inherit from one another and contain the data for the vertices and edges, but still preserving the idea you proposed of having a table for all labels and the vertex_labels table showing which vertices are composed of other labels.

Let's say that you have a Person label that could have all the people from the graph and also a Book label. We could use both inheritance and composition design patterns in this graph. The content of the books might include FictionBook, RomanceBook, ComicBook, and an Author can also be a Person, applying the inheritance principle. But composition can be added to this since a Book has an Author.

For example, we want to add a comic book and it's authors to the graph. So we have the labels: Book, Comic, Person, Author in the Labels table. Comic inherits from Book and Author inherits from Person. Now, you could rename vertex_labels to HierarchyLabels and then store the vertex id with the Comic id and also the vertex id with the Author id.

But all of this is just an idea to add both inheritance and composition designs with AGE. Overall, I found your idea pretty nice! :D