Graphs are data model abstractions that are becoming pervasive in several real-life applications and practical use cases. In these settings, users primarily focus on entities and their relationships, further enhanced with multiple labels and properties to form the so-called property graphs. The processing of property graphs relies on graph queries that are used to extract and manipulate entities and relationships.
Databases are widely used for structured and
complicated data management including string data,
stream data, video, images, trees, and graphs. Graph
data is more complicated and general structure and it is
widely used to picture combination of proteins and
compare their structure, relationship networks,
medicine design, social networks, road networks, video indexing, web information computer
vision, pattern detection, and chemical/biological
informatics. Searching graph to extract required
information is one of the main fields of graph mining. In most of the cases, value and applicability of a
graph data application depends on performance of its graph query. This is one of the key issues in graph
mining field. The preliminary graph query
techniques extracted all isomorph, super-graphs by a
graph query in the database. Clearly, this technique is not
so efficient and like isomorphism test, sequential scan of
each database graphs needs great processing and storage
resources. To accelerate graph query process, therefore,
we need to index the database.
Central to graph analytics, is the need to locate patterns
in dataset graphs. Informally, given a query graph, the system is called to identify which of the stored graphs in its dataset contain it (subgraph matching), or are contained in it (supergraph matching). This is a very costly operation
as it entails the NP-Complete problem of subgraph isomorphism and even its most popular solutions are computationally very expensive. This problem is exacerbated when dealing with datasets storing large numbers
of graphs, as the number of required subgraph isomorphism
tests grows. Furthermore, performance deteriorates significantly with increasing graph sizes.
It is extremely important to understand how a database stores the data at the storage layer. The native graph stores like Neo4j, TigerGraph etc., stores both nodes and relationships data in fixed size data structures. This is called index free adjacency, whereby, each node maintains reference to the adjacent nodes. This speeds up storage processing and the computational cost is O(1). In contrast the non native datastores normally use a NoSQL or RDBMS as the underlying datastore and the adjacency list is stored as a JSON file as a name value pair. In order to speed up graph traversal, global indexes are often used to link nodes together, resulting in greater computational cost O(logn). So long the dataset is small, the underlying graph technology is not going to matter much, but in case the number of nodes are very high (millions to billions) native graph data stores will outperform non natives in real time storing, querying and traversing the graph.
Graph computing or Graph databases are not new concepts. They had been there for the last 30 years or so. But, the last few years have seen exponential growth in interest around the topic especially after it has been established that the graphs are indeed the most efficient mechanism to solve some real life problems of the interconnected world viz. community detection, anomaly detection, fraud analytics, recommendations etc. Today, more than 7 of the top 10 retailers use graph databases to drive recommendations, promotions and streamlining logistics. Some of the largest insurers use graph databases to fight frauds and manage risks. Companies like Google and Facebook have built their businesses on top of some graph algorithms. So these algorithms will further improve and faster graph query processing may be anticipated.
Apache AGE is a modern PostgreSQL extension which allows Graph Database features and querying. Take a look at:
Apache AGE
Apache AGE Github
Top comments (0)