<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Memgraph</title>
    <description>The latest articles on DEV Community by Memgraph (@memgraph).</description>
    <link>https://dev.to/memgraph</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F4973%2F37388b33-7d30-458b-a939-9a6e26f8f21b.gif</url>
      <title>DEV Community: Memgraph</title>
      <link>https://dev.to/memgraph</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/memgraph"/>
    <language>en</language>
    <item>
      <title>When Should You Use Text2Cypher in a GraphRAG Pipeline</title>
      <dc:creator>Sabika Tasneem</dc:creator>
      <pubDate>Fri, 22 May 2026 10:53:33 +0000</pubDate>
      <link>https://dev.to/memgraph/when-should-you-use-text2cypher-in-a-graphrag-pipeline-3imf</link>
      <guid>https://dev.to/memgraph/when-should-you-use-text2cypher-in-a-graphrag-pipeline-3imf</guid>
      <description>&lt;p&gt;Not every GraphRAG question needs the same retrieval pattern.&lt;/p&gt;

&lt;p&gt;Some questions need the neighborhood around an entity. Some need a summary across a large part of the graph. Some just need an exact answer from structured data. That last group is where Text2Cypher fits.&lt;/p&gt;

&lt;p&gt;It turns a natural language question into a Cypher query, so the system can return a precise graph result instead of a broad summary.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Text2Cypher?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/docs/ai-ecosystem/graph-rag/atomic-pipelines/text2cypher" rel="noopener noreferrer"&gt;Text2Cypher&lt;/a&gt; is the graph version of a broader pattern developers already know from &lt;a href="https://arxiv.org/abs/2406.08426" rel="noopener noreferrer"&gt;text-to-SQL systems&lt;/a&gt; where you take a natural language question and generate a database query that can answer it.&lt;/p&gt;

&lt;p&gt;The difference is the target query language.&lt;/p&gt;

&lt;p&gt;Instead of generating SQL for tables, Text2Cypher generates Cypher for graph data. &lt;a href="https://opencypher.org/" rel="noopener noreferrer"&gt;Cypher&lt;/a&gt; is a declarative query language for property graphs, where data is modeled as nodes, relationships, labels, and properties.&lt;/p&gt;

&lt;p&gt;The LLM’s job is not to invent the answer. Its job is to generate the right query, run it, and return the result. That distinction matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Text2Cypher Does in GraphRAG
&lt;/h2&gt;

&lt;p&gt;In a GraphRAG pipeline, Text2Cypher is useful when the user’s question maps cleanly to the graph schema.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does user &lt;code&gt;31254&lt;/code&gt; exist in this dataset?&lt;/li&gt;
&lt;li&gt;Which suppliers provide components used in Product A?&lt;/li&gt;
&lt;li&gt;How many orders are delayed by more than 7 days?&lt;/li&gt;
&lt;li&gt;Which customers have more than 3 unresolved support tickets?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These questions are not asking the model to read a pile of text and summarize it. They are asking for a structured answer from structured data.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/b_iWAxtQ7UE"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;A practical Text2Cypher flow usually looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Inspect the graph schema.&lt;/li&gt;
&lt;li&gt;Pass the relevant schema context to the LLM.&lt;/li&gt;
&lt;li&gt;Generate the Cypher query.&lt;/li&gt;
&lt;li&gt;Run the query.&lt;/li&gt;
&lt;li&gt;Return the result.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Schema is the part people underestimate.&lt;/p&gt;

&lt;p&gt;If the LLM does not know what labels, relationship types, and properties exist, it can generate a query that looks reasonable but does not match the actual graph. For example, it may generate &lt;code&gt;(:Customer)-[:PURCHASED]-&amp;gt;(:Product)&lt;/code&gt;when the real graph uses &lt;code&gt;(:User)-[:BOUGHT]-&amp;gt;(:Item)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That query is syntactically fine. It is just wrong for your data.&lt;/p&gt;

&lt;p&gt;In Memgraph, &lt;a href="https://memgraph.com/docs/querying/schema" rel="noopener noreferrer"&gt;&lt;code&gt;SHOW SCHEMA INFO&lt;/code&gt;&lt;/a&gt; can expose labels, relationship types, and properties, giving the model real schema context before it generates the query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Text2Cypher Is the Best Fit for Analytical GraphRAG Questions
&lt;/h2&gt;

&lt;p&gt;Analytical GraphRAG questions ask for something concrete.&lt;/p&gt;

&lt;p&gt;Usually, the answer is one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A count&lt;/li&gt;
&lt;li&gt;A boolean answer&lt;/li&gt;
&lt;li&gt;A list of matching nodes&lt;/li&gt;
&lt;li&gt;A filtered table&lt;/li&gt;
&lt;li&gt;A grouped result&lt;/li&gt;
&lt;li&gt;A ranked result based on a property or aggregate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, in a &lt;a href="https://memgraph.com/webinars/meet-atomic-graphrag" rel="noopener noreferrer"&gt;GitHub Issues knowledge graph&lt;/a&gt;, a user might ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How many feature requests Memgraph has?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That question does not need the model to retrieve five chunks about issue tracking and reason from prose.&lt;/p&gt;

&lt;p&gt;It needs a query over the graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="n"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;i:&lt;/span&gt;&lt;span class="n"&gt;Issue&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;i.issue_type&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;issue_type&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
       &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That answer comes back as a table shaped result.&lt;/p&gt;

&lt;p&gt;No long context window. No vague summary. No pretending that a generative answer is better than a database result.&lt;/p&gt;

&lt;p&gt;That is why Text2Cypher is a strong fit for analytical GraphRAG. The question has a query-shaped answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Text2Cypher Is the Wrong Tool
&lt;/h2&gt;

&lt;p&gt;Text2Cypher gets weaker when the question is open-ended, exploratory, or depends on broader context that does not live in a single clean query result.&lt;/p&gt;

&lt;p&gt;Bad fits include questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why are users unhappy with this product?&lt;/li&gt;
&lt;li&gt;What themes appear across negative reviews?&lt;/li&gt;
&lt;li&gt;Which related issues should an engineer investigate first?&lt;/li&gt;
&lt;li&gt;What is missing from this research corpus?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These questions need more than a count or table.&lt;/p&gt;

&lt;p&gt;They may need local graph search, where the system starts from a relevant node and expands into its surrounding neighborhood. Or they may need query-focused summarization, where the system synthesizes patterns across a larger part of the graph.&lt;/p&gt;

&lt;p&gt;Trying to force Text2Cypher onto those questions gives you shallow answers.&lt;/p&gt;

&lt;p&gt;A query can return rows. It does not automatically explain themes, tradeoffs, causes, or missing context.&lt;/p&gt;

&lt;p&gt;A useful rule is simple:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;If the Answer Should Look Like...&lt;/th&gt;
&lt;th&gt;Use...&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A number, table, filtered list, or direct lookup&lt;/td&gt;
&lt;td&gt;&lt;a href="https://memgraph.com/blog/text-to-cypher-graphrag-analytical-questions" rel="noopener noreferrer"&gt;Text2Cypher&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connected context around one entity&lt;/td&gt;
&lt;td&gt;&lt;a href="https://memgraph.com/blog/local-graph-search-graphrag-pipeline-type" rel="noopener noreferrer"&gt;Local graph search&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Themes or patterns across a corpus&lt;/td&gt;
&lt;td&gt;&lt;a href="https://memgraph.com/blog/global-atomic-graphrag-pipeline-query-focused-summarization" rel="noopener noreferrer"&gt;Query-focused summarization&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The retrieval path should match the question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keep the Pipeline Inspectable
&lt;/h2&gt;

&lt;p&gt;Text2Cypher has one major advantage for developers: &lt;strong&gt;you can inspect it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can read the generated query and you can run it again. That matters in GraphRAG because retrieval bugs are easy to hide behind fluent language.&lt;/p&gt;

&lt;p&gt;If the answer is wrong, you need to know where the failure happened. Was the schema context incomplete? Did the model generate the wrong query? Did the graph lack the right data? Did the final LLM response overstate what the query returned?&lt;/p&gt;

&lt;p&gt;For analytical retrieval, the cleanest pipeline is often the most boring one: inspect the schema, generate the query, execute it, and return the result.&lt;/p&gt;

&lt;p&gt;That is also what makes Text2Cypher easier to evaluate than a retrieval flow hidden behind several prompts and orchestration steps. The generated query gives you something concrete to inspect before the final answer reaches the user.&lt;/p&gt;

&lt;p&gt;For a deeper walkthrough of this pattern, Memgraph has a full guide on &lt;a href="https://memgraph.com/blog/text-to-cypher-graphrag-analytical-questions" rel="noopener noreferrer"&gt;Text2Cypher for GraphRAG analytical questions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Text2Cypher is not the whole GraphRAG story. It is the pattern you use when the question has a query-shaped answer.&lt;/p&gt;

</description>
      <category>text2cypher</category>
      <category>ai</category>
      <category>analytics</category>
      <category>rag</category>
    </item>
    <item>
      <title>When Should You Use GraphRAG Instead of RAG?</title>
      <dc:creator>Sabika Tasneem</dc:creator>
      <pubDate>Thu, 21 May 2026 10:36:08 +0000</pubDate>
      <link>https://dev.to/memgraph/when-should-you-use-graphrag-instead-of-rag-4fja</link>
      <guid>https://dev.to/memgraph/when-should-you-use-graphrag-instead-of-rag-4fja</guid>
      <description>&lt;p&gt;Most teams building LLM applications start with RAG for a good reason. It is practical, easy to understand, and usually good enough for a simple AI use case.&lt;/p&gt;

&lt;p&gt;But once users stop asking simple lookup questions and start asking relationship-heavy questions, standard RAG can get shallow fast.&lt;/p&gt;

&lt;p&gt;The issue is not that RAG is bad. The issue is that many real questions are not just about finding a relevant paragraph. They are about following connections across people, products, systems, documents, events, or dependencies.&lt;/p&gt;

&lt;p&gt;That is the gap GraphRAG tries to fill.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG vs GraphRAG
&lt;/h2&gt;

&lt;p&gt;RAG made LLM applications more useful because it gave models access to external information.&lt;/p&gt;

&lt;p&gt;Instead of asking a model to answer from training data alone, a RAG pipeline retrieves relevant content from your docs, tickets, wikis, PDFs, or databases, adds that content to the prompt, and asks the model to answer from it.&lt;/p&gt;

&lt;p&gt;That works well for a lot of use cases.&lt;/p&gt;

&lt;p&gt;If the question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What is our refund policy for annual subscriptions?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A standard RAG pipeline can search the documentation, find the right policy section, and give the model the relevant text.&lt;/p&gt;

&lt;p&gt;The problem starts when the question is not just about finding the right text. It starts when the answer depends on relationships.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which suppliers could be causing delivery delays for products affected by a specific component shortage?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That question is not just asking for a matching paragraph. It needs the system to connect suppliers, components, products, shipments, delays, and dependencies.&lt;/p&gt;

&lt;p&gt;This is where GraphRAG becomes useful.&lt;/p&gt;

&lt;p&gt;RAG is good at finding text that sounds relevant. GraphRAG is better when the answer depends on how things are connected.&lt;/p&gt;

&lt;h2&gt;
  
  
  What RAG Does Well
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2005.11401" rel="noopener noreferrer"&gt;Retrieval augmented generation&lt;/a&gt;, usually shortened to RAG, combines a language model with an external retrieval system. The original paper described this as combining a parametric model (the LLM itself) with non-parametric memory (external knowledge), usually retrieved from an external corpus.&lt;/p&gt;

&lt;p&gt;In most modern implementations, that retrieval step uses embeddings. The basic flow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Split documents into chunks.&lt;/li&gt;
&lt;li&gt;Convert each chunk into an embedding.&lt;/li&gt;
&lt;li&gt;Store those embeddings in a vector index.&lt;/li&gt;
&lt;li&gt;Convert the user question into an embedding.&lt;/li&gt;
&lt;li&gt;Retrieve the most similar chunks.&lt;/li&gt;
&lt;li&gt;Add those chunks to the LLM prompt.&lt;/li&gt;
&lt;li&gt;Generate the answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0f2yj49j3i7wgs5mv8y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0f2yj49j3i7wgs5mv8y.png" alt="RAG vector search workflow" width="800" height="673"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is useful when the answer is likely to be contained in one or a few text chunks. Good RAG use cases include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documentation search&lt;/li&gt;
&lt;li&gt;FAQ assistants&lt;/li&gt;
&lt;li&gt;Internal knowledge base search&lt;/li&gt;
&lt;li&gt;Customer support answer generation&lt;/li&gt;
&lt;li&gt;Summarization over a small set of relevant documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many teams, this is the right starting point. It is simpler than building a knowledge graph, and it can deliver useful results quickly.&lt;/p&gt;

&lt;p&gt;The issue is that similarity is not the same as understanding.&lt;/p&gt;

&lt;p&gt;A vector search system can find chunks that sound close to the query. It does not automatically know whether one entity owns another, depends on another, contradicts another, or affects another through a multi step chain.&lt;/p&gt;

&lt;p&gt;That difference matters once your questions become relational.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where RAG Gets Shallow
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog/what-is-rag" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; usually retrieves isolated text chunks. That creates a few common problems.&lt;/p&gt;

&lt;p&gt;First, chunking can break context. A policy, customer, transaction, or technical decision might make sense only when you see how it connects to other facts. Splitting documents into chunks can hide that structure.&lt;/p&gt;

&lt;p&gt;Second, semantic similarity can over retrieve. A chunk may sound relevant without being useful for the actual answer.&lt;/p&gt;

&lt;p&gt;Third, RAG does not inherently reason across relationships. It may retrieve text about a supplier, text about a product, and text about a shipment delay, but it does not automatically know how those things connect.&lt;/p&gt;

&lt;p&gt;Think about this question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which customers are affected by the delayed shipment from Supplier A?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A standard RAG pipeline might retrieve documents that mention Supplier A, delayed shipments, and customers. That is helpful, but still incomplete.&lt;/p&gt;

&lt;p&gt;The actual answer may require a path like this: &lt;/p&gt;

&lt;p&gt;&lt;code&gt;Supplier A -&amp;gt; supplies -&amp;gt; Component X -&amp;gt; used in -&amp;gt; Product Y -&amp;gt; included in -&amp;gt; Shipment Z -&amp;gt; assigned to -&amp;gt; Customer C&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That path is not just text similarity. It is structure.&lt;/p&gt;

&lt;p&gt;If your application needs to answer questions like this, treating your knowledge base as flat chunks is a weak model of the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What GraphRAG Adds
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog/what-is-graphrag" rel="noopener noreferrer"&gt;GraphRAG&lt;/a&gt; keeps the useful part of RAG: &lt;strong&gt;retrieval&lt;/strong&gt;. But it adds a graph layer, where information is represented as entities and relationships. Microsoft’s paper on &lt;a href="https://arxiv.org/abs/2404.16130" rel="noopener noreferrer"&gt;GraphRAG for query focused summarization&lt;/a&gt; helped popularize this pattern for using graph structure to answer questions that need broader connected context.&lt;/p&gt;

&lt;p&gt;Instead of only storing chunks like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Supplier A provides Component X. Component X is used in Product Y. Product Y is part of Shipment Z.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A graph represents the structure directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(Supplier A)-[:SUPPLIES]-&amp;gt;(Component X)
(Component X)-[:USED_IN]-&amp;gt;(Product Y)
(Product Y)-[:INCLUDED_IN]-&amp;gt;(Shipment Z)
(Shipment Z)-[:ASSIGNED_TO]-&amp;gt;(Customer C)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the system can retrieve context by following relationships, not just by matching similar text.&lt;/p&gt;

&lt;p&gt;A GraphRAG pipeline might work like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use semantic search, keyword search, or another method to find a starting point.&lt;/li&gt;
&lt;li&gt;Identify the relevant node or set of nodes in the graph.&lt;/li&gt;
&lt;li&gt;Traverse connected relationships.&lt;/li&gt;
&lt;li&gt;Rank, filter, and compress the connected context.&lt;/li&gt;
&lt;li&gt;Send the final context to the LLM.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhapl9s36cy4vvdg5ebdg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhapl9s36cy4vvdg5ebdg.png" alt="Memgraph GraphRAG workflow" width="799" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key difference is that search finds where to start, while graph traversal finds what is connected.&lt;/p&gt;

&lt;p&gt;That is why GraphRAG is useful for relationship-heavy use cases, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supply chain analysis where the system needs to trace products, components, suppliers, and delayed shipments&lt;/li&gt;
&lt;li&gt;Fraud detection where suspicious behavior appears across shared accounts, devices, transactions, or addresses&lt;/li&gt;
&lt;li&gt;Cybersecurity investigation where alerts need to be connected to users, assets, permissions, and attack paths&lt;/li&gt;
&lt;li&gt;Healthcare or life sciences research where answers depend on relationships between diseases, genes, drugs, and clinical evidence&lt;/li&gt;
&lt;li&gt;Customer 360 applications where support tickets, purchases, product usage, and account history need to be connected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not just document lookup problems. They are relationship problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG and GraphRAG Are Not Enemies
&lt;/h2&gt;

&lt;p&gt;The lazy version of this topic is: RAG bad, GraphRAG good.&lt;/p&gt;

&lt;p&gt;That is wrong. &lt;strong&gt;RAG&lt;/strong&gt; is still useful. If your data is mostly unstructured text and your questions are direct, a standard RAG pipeline may be enough. &lt;strong&gt;GraphRAG&lt;/strong&gt; becomes useful when the shape of the answer depends on connected facts. A better way to think about it:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Use RAG When&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Use GraphRAG When&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;The answer is likely inside a small number of text chunks.&lt;/td&gt;
&lt;td&gt;The answer depends on relationships across entities.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You need fast document Q&amp;amp;A.&lt;/td&gt;
&lt;td&gt;You need multi-hop reasoning.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Your data does not have strong entity relationships.&lt;/td&gt;
&lt;td&gt;Your data has dependencies, hierarchies, ownership, or causality.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You are building a first version quickly.&lt;/td&gt;
&lt;td&gt;You need more explainable and structured retrieval.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In practice, many good systems use both. &lt;strong&gt;Vector search&lt;/strong&gt; can find semantically relevant entry points. &lt;strong&gt;Graph traversal&lt;/strong&gt; can expand from those entry points into connected context.&lt;/p&gt;

&lt;p&gt;That combination is often more useful than either approach alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keep the Retrieval Logic Close to the Data
&lt;/h2&gt;

&lt;p&gt;GraphRAG gets harder to maintain when every retrieval step lives in a different place.&lt;/p&gt;

&lt;p&gt;One service finds similar chunks. Another stores the graph. Another expands relationships. Another ranks results. Another builds the final prompt.&lt;/p&gt;

&lt;p&gt;That can work, but it gives you more moving parts to debug when the answer is wrong.&lt;/p&gt;

&lt;p&gt;A cleaner pattern is to keep as much of the retrieval logic as possible close to the graph itself. Search can find the starting point. Traversal can expand the context. Ranking and filtering can reduce the result before it ever reaches the prompt.&lt;/p&gt;

&lt;p&gt;That is the idea behind &lt;a href="https://memgraph.com/docs/ai-ecosystem/graph-rag/atomic-pipelines" rel="noopener noreferrer"&gt;Atomic GraphRAG in Memgraph&lt;/a&gt;. It express the retrieval path as a single execution layer where possible, instead of spreading it across a pile of orchestration code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1i4bgi00xo9kfsoh53ts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1i4bgi00xo9kfsoh53ts.png" alt="atomic graphrag real example" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The broader lesson is not tool specific. If your GraphRAG pipeline is hard to inspect, it will be hard to trust. The retrieval path should be visible, testable, and easy to change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Rule
&lt;/h2&gt;

&lt;p&gt;Use RAG when you need to retrieve relevant text. Use GraphRAG when you need to retrieve connected context. That is the real distinction. &lt;/p&gt;

&lt;p&gt;If your question can be answered by finding the right paragraph, RAG is probably enough. If your question requires following relationships between people, products, systems, documents, events, risks, or dependencies, you are no longer just doing text retrieval. You are doing graph retrieval.&lt;/p&gt;

&lt;p&gt;The point is not to use GraphRAG as an extra layer and start using it where it is right retrieval model for the problem.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>graphrag</category>
      <category>llm</category>
    </item>
    <item>
      <title>MCP for Agents: The Security Gap Most Teams Miss</title>
      <dc:creator>Sabika Tasneem</dc:creator>
      <pubDate>Mon, 16 Feb 2026 12:31:41 +0000</pubDate>
      <link>https://dev.to/memgraph/mcp-for-agents-the-security-gap-most-teams-miss-3bnl</link>
      <guid>https://dev.to/memgraph/mcp-for-agents-the-security-gap-most-teams-miss-3bnl</guid>
      <description>&lt;p&gt;MCP is exciting because it turns an LLM into something that can execute actions through tool calls. One protocol, many tools. Your agent can pull data, update tickets, call APIs, and trigger workflows. That is exactly why teams are rushing to ship MCP based agents.&lt;/p&gt;

&lt;p&gt;That speed comes with a tradeoff. Once an LLM can touch live systems, mistakes stop being “bad answers” and start becoming real actions. The point of this post is not to criticize MCP. It is to help you ship agents that stay useful without unintentionally expanding your blast radius.&lt;/p&gt;

&lt;p&gt;Let’s dive in!&lt;/p&gt;

&lt;h2&gt;
  
  
  What MCP Gives You (And What it Does Not)
&lt;/h2&gt;

&lt;p&gt;MCP standardizes how tools and context are exposed to a model, which is great for developer velocity. What it does not do is decide what is safe or appropriate in your environment. You still own boundaries and behavior.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/nmAVOTVi7yE"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;In production, the gaps show up fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tool should be used for this request&lt;/li&gt;
&lt;li&gt;What data is allowed for this user or team&lt;/li&gt;
&lt;li&gt;Which actions should be blocked or require approval&lt;/li&gt;
&lt;li&gt;How you can audit tool use after something goes wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want the spec level overview, start with &lt;a href="https://www.anthropic.com/news/model-context-protocol" rel="noopener noreferrer"&gt;Anthropic’s MCP introduction&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Agents with MCP: 3 Problems You Will Hit First
&lt;/h2&gt;

&lt;p&gt;The first failure is rarely a headline breach. It usually looks like a normal product bug, except now the bug can trigger emails, update records, or touch production data. For instance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Agent Does the “Helpful” Thing You Did Not Ask For&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A user asks, “Can you check which customers are impacted?” The agent decides that notifying customers is helpful and drafts a mass email. Nothing was hacked. The model was just optimizing for task completion, and you gave it a tool that made the wrong idea easy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Demo Tool Becomes a Production Hazard&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most teams start with a broad tool set because it makes the demo work. Later, the agent gets a slightly different question and reaches for the most powerful tool available. If that tool can write, delete, or trigger workflows, you now have an outsized failure scope. That is the blast radius.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Agent Guesses and Guesses Wrong&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your agent can query a database, it will try. If it does not have the right context about what is allowed and what the data means, it will guess. Sometimes the guess is harmless. Sometimes it pulls data it should not have pulled, or it produces results that look right but are based on the wrong assumptions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Prompt Rules Are Not Enforcement
&lt;/h2&gt;

&lt;p&gt;The common response is to add more instructions: “Read-only,” “confirm before sending,” “never delete.” Those rules help, but they do not enforce anything.&lt;/p&gt;

&lt;p&gt;There is a simple reason. Prompts influence the model’s behavior. They do not change the system’s capabilities. If a write tool is exposed, the model can still call it, even if you told it not to. If a broad SQL tool is exposed, the model can still retrieve more data than you intended, even if you asked it to be careful.&lt;/p&gt;

&lt;p&gt;This is why prompt-only safety tends to decay over time. As you add tools, edge cases, and new workflows, the instruction layer becomes a long list of exceptions. The agent still has the same tool surface, but now it is operating under a growing set of text rules that are easy to miss, conflict, or misapply.&lt;/p&gt;

&lt;p&gt;The fix is capability control. Reduce what the agent can do, scope what it can see, and require explicit approvals for actions that have a real blast radius.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Fix: Shrink the Tool Surface at Runtime
&lt;/h2&gt;

&lt;p&gt;Do not rely on the model to always choose correctly. Make wrong choices harder. The simplest way to do that is to reduce what the agent can do by default, then expand capabilities only when you have a clear reason.&lt;/p&gt;

&lt;p&gt;Start with these guardrails:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expose fewer tools by default&lt;/li&gt;
&lt;li&gt;Only expose tools that match the current task&lt;/li&gt;
&lt;li&gt;Separate read tools from write tools&lt;/li&gt;
&lt;li&gt;Require approvals for irreversible actions&lt;/li&gt;
&lt;li&gt;Log tool calls so you can trace what happened&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is least-privilege design applied to agent tool access, enforced at runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where GraphRAG Fits in an MCP Tooling Stack
&lt;/h2&gt;

&lt;p&gt;Most RAG stacks start with vectors. Vectors are great at finding semantically similar text, but they are not built to represent relationships like who owns which data, which rule is current, or which tool is allowed for this workflow.&lt;/p&gt;

&lt;p&gt;Graphs are good at that because they model relationships directly. When you add a graph-based context layer, you can give the model a smaller, cleaner slice of context tied to the user and the task.&lt;/p&gt;

&lt;p&gt;For example, you can make use of &lt;a href="https://memgraph.com/docs/database-management/authentication-and-authorization/role-based-access-control#label-based-access-control" rel="noopener noreferrer"&gt;label-based access controls&lt;/a&gt; that determine which node labels and relationship edge types a given user or workflow can touch. That reduces overload and lowers the chance your agent reaches for the wrong tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Checklist You Can Actually Use
&lt;/h2&gt;

&lt;p&gt;If you are shipping MCP-powered agents, do not treat guardrails as a final polish step. Treat them as part of the build. The fastest way to end up in trouble is to bolt safety on after you have already exposed a wide tool surface to an LLM.&lt;/p&gt;

&lt;p&gt;Start with a simple baseline and improve it as you learn. The point is not to predict every edge case up front. The point is to make tool behavior observable, reversible where possible, and scoped to what the agent should be doing right now.&lt;/p&gt;

&lt;p&gt;If you are shipping MCP-powered agents, start here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;List your tools and label them read or write&lt;/li&gt;
&lt;li&gt;Turn off anything irreversible by default&lt;/li&gt;
&lt;li&gt;Add a human approval step for high impact actions&lt;/li&gt;
&lt;li&gt;Keep tool descriptions short and specific&lt;/li&gt;
&lt;li&gt;Log every tool call with who requested it and what tool ran&lt;/li&gt;
&lt;li&gt;Review misfires weekly and treat them as product bugs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This checklist is not about paranoia. It is about making MCP workflows predictable enough to ship. If your plan is “we will fix it in the prompt,” you are in for some trouble.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Memgraph adds to an MCP agent stack
&lt;/h2&gt;

&lt;p&gt;At some point in production, most enterprise teams realize they need a real context layer. Memgraph is an in-memory graph database used as a real-time context engine, which makes it a good fit when your agent needs fast traversal, connected context, and governance that changes as your systems change.&lt;/p&gt;

&lt;p&gt;In practice, you can use Memgraph to store and query the relationships your agent depends on, then apply GraphRAG patterns to retrieve a connected context slice instead of stuffing everything into a prompt.&lt;/p&gt;

&lt;p&gt;This is also where Memgraph’s &lt;strong&gt;&lt;a href="https://www.crowdcast.io/c/meet-atomic-graphrag-a-single-unified-execution-layer" rel="noopener noreferrer"&gt;Atomic GraphRAG&lt;/a&gt;&lt;/strong&gt; comes in. Instead of stitching together multiple retrieval steps in your application code, Atomic GraphRAG aims to generate context in a single query so it is simpler, faster, and easier to review and tweak. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv528cggc13phtxnqq4qm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv528cggc13phtxnqq4qm.png" alt="memgraph-atomic-graphrag-single-query-execution"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For you, that means fewer moving parts, clearer failure modes, and a smaller surface area for accidental tool misuse.&lt;/p&gt;

&lt;p&gt;If you are exploring MCP specifically, Memgraph provides an &lt;a href="https://memgraph.com/blog/pushing-mcp-forward" rel="noopener noreferrer"&gt;MCP Server&lt;/a&gt; to expose graph context to agents, and an &lt;a href="https://modelcontextprotocol.io/clients#:~:text=and%20Agent%20API-,Memgraph%20Lab,-%23" rel="noopener noreferrer"&gt;MCP Client&lt;/a&gt; inside &lt;a href="https://memgraph.com/lab" rel="noopener noreferrer"&gt;Memgraph Lab&lt;/a&gt; to compose workflows across MCP servers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;MCP is a doorway to useful agents. It also makes mistakes expensive. If you want to ship responsibly, focus on runtime guardrails: shrink the tool surface, keep context clean, and log everything.&lt;/p&gt;

&lt;p&gt;If you want to explore a graph-based context layer for MCP, &lt;a href="https://memgraph.com/blog/mcp-client-memgraph-lab-interoperability" rel="noopener noreferrer"&gt;start here&lt;/a&gt;. And remember, tool access is part of your attack surface, so review it alongside your production code.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>agents</category>
      <category>graphrag</category>
      <category>accesscontrols</category>
    </item>
    <item>
      <title>Innovation Graph Analytics Powered by Embeddings and LLM’s</title>
      <dc:creator>André Vermeij</dc:creator>
      <pubDate>Wed, 27 Nov 2024 10:52:44 +0000</pubDate>
      <link>https://dev.to/memgraph/innovation-graph-analytics-powered-by-embeddings-and-llms-4hb9</link>
      <guid>https://dev.to/memgraph/innovation-graph-analytics-powered-by-embeddings-and-llms-4hb9</guid>
      <description>&lt;p&gt;&lt;strong&gt;Guest Author:&lt;/strong&gt; &lt;a href="https://dev.to/andrevermeij"&gt;André Vermeij&lt;/a&gt;, Founder of Kenedict Innovation Analytics &amp;amp; Developer of Kenelyze&lt;/p&gt;

&lt;h2&gt;
  
  
  Intro &amp;amp; Recap: Innovation Graphs
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://dev.to/memgraph/innovation-as-a-graph-improved-insight-into-technology-clusters-collaboration-and-knowledge-networks-1746"&gt;first post in our series&lt;/a&gt; on &lt;strong&gt;Innovation Graphs&lt;/strong&gt; introduced the usage of graphs in the analysis of innovation and its output, such as patents, scientific publications and research grants. &lt;/p&gt;

&lt;p&gt;Innovation graphs focus on mapping the connections between technologies, organisations and people and can provide new insights into the actual underpinnings of innovative activity within topics or organisations of interest. They can be constructed based on all kinds of metadata and often focus on visually mapping three complementary perspectives: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Graphs of documents to gain deeper insight into technology/topic clusters.&lt;/li&gt;
&lt;li&gt;Graphs of organisations and institutions to focus on sector-wide collaboration patterns.&lt;/li&gt;
&lt;li&gt;Graphs of people/experts to get a better understanding of team-level collaboration and key players in a field of expertise.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this second post on Innovation Graphs, we’ll focus on the creation and LLM-powered analysis of the first type of graph mentioned above—graphs detailing clusters of technologies and topics within a specific sector of interest. Specifically, we’ll dive into how we can use text embeddings to construct document similarity graphs, and how we can automatically analyse the content and label the graph’s clusters using locally running Large Language Models. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa90h0xixzkux147sza9r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa90h0xixzkux147sza9r.png" alt="How to use text embeddings to construct document similarity graphs" width="800" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Text Embeddings &amp;amp; Graph Creation
&lt;/h1&gt;

&lt;p&gt;Mapping clusters of technology and the connections between them is a key part of most innovation analytics projects. A common way to create the related document similarity graphs is to collect the unstructured text related to documents in a dataset (for example, abstracts for scientific publications or summaries of R&amp;amp;D project reports), convert the text into vectors/embeddings, and then calculate pairwise similarities to get similarity scores for each pair of documents to construct the final graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Classical Way: TF-IDF
&lt;/h2&gt;

&lt;p&gt;Converting unstructured text into ready-to-analyse vectors can be done in various ways. A classical way to approach this is to use a variant of Term Frequency-Inverse Document Frequency (&lt;a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf" rel="noopener noreferrer"&gt;TF-IDF&lt;/a&gt;). Here, all unstructured text is initially pre-processed using common techniques in Natural Language Processing (tokenization, lemmatization, stop-word removal, etc.), after which each token in a document is assigned a TF-IDF score. This score is based on how often the token appears in the document itself (TF) and on the inverse of how often it appears across all documents in the dataset (IDF). For each document, a vector with a length equalling the total number of unique tokens across all documents is then created, holding the TF-IDF scores for all tokens in the document and zeroes for any tokens that do not occur in the document.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcajie7ezbmsnmy6c33iu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcajie7ezbmsnmy6c33iu.png" alt="Simplified version of TF-IDF vector construction" width="800" height="226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although this is a pretty intuitive way of converting text into vectors, it comes with several challenges. The main drawback is that semantic similarity is mostly overlooked in this approach, since the scores are simply based on term counts within and across documents. Also, the vectors resulting from TF-IDF are generally very sparse and can easily consist of thousands of elements per vector, depending on the size of the overall text corpus.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Modern Way: Embedding Models
&lt;/h2&gt;

&lt;p&gt;The rise of Large Language Models and Generative Artificial Intelligence has also resulted in the availability of a wide variety of embedding models and APIs to convert unstructured text into fixed-length vectors. For example, &lt;a href="https://docs.nomic.ai/reference/api/embed-text-v-1-embedding-text-post" rel="noopener noreferrer"&gt;Nomic&lt;/a&gt;, &lt;a href="https://www.mixedbread.ai/docs/embeddings/overview" rel="noopener noreferrer"&gt;Mixedbread&lt;/a&gt;, &lt;a href="https://jina.ai/embeddings/" rel="noopener noreferrer"&gt;Jina&lt;/a&gt;, and &lt;a href="https://platform.openai.com/docs/guides/embeddings" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; all offer APIs to get embeddings based on input of unstructured text of your choice. Some key use cases for these embedding models are query and document embedding for Retrieval Augmented Generation, but they also serve as an excellent basis for the large-scale embedding of datasets to create document similarity graphs.&lt;/p&gt;

&lt;p&gt;The main benefits of these embedding models are that they also consider semantic similarity between concepts and are usually of a fixed, dense size (often 768 or 1024 elements, often called dimensionality). A challenge is that users need to carefully pick the parameters when using these models since these can significantly impact the overall outcome when converting the vectors into document similarity graphs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph Creation Based on Embeddings
&lt;/h2&gt;

&lt;p&gt;We can construct a document similarity graph based on all pairwise similarities between the document vectors as soon as embeddings are generated for all documents in our dataset. The nodes in the graph are simply the original documents from our dataset, with weighted links drawn between nodes when they have a certain degree of similarity. A commonly used similarity metric is &lt;a href="https://en.wikipedia.org/wiki/Cosine_similarity" rel="noopener noreferrer"&gt;cosine similarity&lt;/a&gt;, with scores ranging from 0 to 1, where 1 denotes identical texts/vectors. Links between nodes can be determined by setting a threshold similarity value.&lt;/p&gt;

&lt;p&gt;The exact value used here can have a significant impact on the readability of the graph: setting the threshold too low will often lead to a hairball/spaghetti bowl visualization (too many links between nodes), while setting it too high will show many disparate clusters with no connections between them. When constructing a graph, it is therefore important to give this some thought and also relate it to the actual size of the text fragments you are dealing with – shorter strings (titles) usually go well with higher threshold values, while longer strings (abstracts, summaries) usually combine well with lower threshold values.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community Detection for Technology Cluster Identification
&lt;/h2&gt;

&lt;p&gt;As soon as the nodes and links in the graph have been constructed based on the embedding similarities and the threshold set, we can start analysing the graph of documents to uncover clusters of related content. In practice, this is a very important step to make the graph more readable and understandable.In innovation analytics, gaining insight into which technology clusters are present in a dataset and how they connect and evolve is often key to a project’s success.&lt;/p&gt;

&lt;p&gt;An excellent way to uncover these clusters is by using the Leiden community detection algorithm, now &lt;a href="https://memgraph.com/docs/advanced-algorithms/available-algorithms/leiden_community_detection" rel="noopener noreferrer"&gt;available in Memgraph&lt;/a&gt;. Based on the structure of the graph, this algorithm detects densely connected subsets of nodes and iteratively assigns them to the same communities. In the end, when colouring nodes based on the communities they are assigned, we have an excellent basis to start labelling and annotating the graph to make sense of its contents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdcxvzo5elrc7wk82pib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdcxvzo5elrc7wk82pib.png" alt="Uncoloured graph vs the same graph coloured based on communities" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  LLM-Powered Innovation Cluster Labelling
&lt;/h1&gt;

&lt;p&gt;In the analysis of technology and topic graphs, providing clear labelling and annotation of the resulting graph visualizations is key to gaining insights by stakeholders in an innovation analytics project. Annotated visuals are often used to provide initial high-level overviews of a graph’s contents in presentations, and often serve as a basis for further deep dives into specific clusters of interest.&lt;/p&gt;

&lt;p&gt;A classical approach to initial cluster labelling is to treat each cluster's contents as a separate corpus of documents and then run a version of TF-IDF to extract the top-5 highest-scoring tokens or phrases for each cluster. The resulting labels often provide a decent first indication of a cluster’s contents, but they do require subsequent manual analysis and improvement to improve their readability.&lt;/p&gt;

&lt;p&gt;An exciting alternative way to label clusters is to use a Large Language Model to summarize cluster contents. In our case, we utilize locally running models such as &lt;a href="https://www.llama.com/llama3_1/" rel="noopener noreferrer"&gt;Llama 3.1&lt;/a&gt; in &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; or &lt;a href="https://lmstudio.ai/" rel="noopener noreferrer"&gt;LM Studio&lt;/a&gt; based on the following high-level process:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcj93k8phc0x10lf1ht87.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcj93k8phc0x10lf1ht87.png" alt="Alternative way to label clusters is to use a Large Language Model to summarize cluster contents" width="800" height="129"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For each detected community, we first gather relevant unstructured text from the attributes of the nodes in the cluster. In most cases, we have found that sending over collections of document titles per cluster works very well for cluster labelling. This collection of texts is then added to a prompt that specifies exactly how the LLM should respond in its summarization: based on the texts provided, return a short summary/label consisting of a maximum of 5 words with an indication of the high-level topic. Many LLM’s are prone to adding a lot of introductory (“Absolutely! Here is a summary of…”) and concluding text to answers, so the prompt also specifies that it should never do this and purely focus on returning the labels.&lt;/p&gt;

&lt;p&gt;As soon as the LLM finishes providing the labels for all communities, we replace the nodes’ initial community attribute with the newly created label. Of course, these labels do require manual checks to see whether they make sense and sometimes require slight adjustments because they are too high-level. The quality of the labels is also dependent on the LLM itself: we’ve found that larger models such as Llama3.1 (8B parameters) generally provide better labels than smaller models such as Llama 3.2 (3B parameters).&lt;/p&gt;

&lt;h1&gt;
  
  
  Cluster Summarization Using LLM’s
&lt;/h1&gt;

&lt;p&gt;Another valuable way to use LLM’s in document similarity graph analysis is to further enhance users’ understanding of clusters by providing point-and-click larger summaries of what the documents in a cluster are about. The approach here is similar to the LLM-based labelling described above, with the prompt sent to the model focusing on providing an overall summary consisting of 3 to 5 phrases instead.&lt;/p&gt;

&lt;p&gt;Practically speaking, users of a graph visualization select nodes of their interest using a free-form selection tool and point out which unstructured text attribute should be used for the analysis, after which the LLM returns a summary based on the collection of texts sent to it. The summary is then printed in a window right on top of the visual, as in the example below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewarnjbuago4z34okjob.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewarnjbuago4z34okjob.png" alt="AI Sumary of Node label" width="800" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Up Next: Step-By-Step Real-Life Examples and Visuals
&lt;/h1&gt;

&lt;p&gt;This post provided an overview of how text embeddings can be used to construct document similarity graphs for innovation analysis, and how Large Language Models can aid in the labeling and summarization of the resulting graphs. Our next post in this series will show examples of this in practice using &lt;a href="https://www.kenelyze.com/" rel="noopener noreferrer"&gt;Kenelyze&lt;/a&gt;, based on a real-life dataset of the innovation output of a major high-tech company. It will also discuss the importance of local LLM’s when working with sensitive data, and highlight some technical considerations when picking and configuring a local LLM.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>technologyclusters</category>
      <category>innovationnetworkanalysis</category>
      <category>graphdatabase</category>
    </item>
    <item>
      <title>Innovation as a Graph: Improved Insight into Technology Clusters, Collaboration and Knowledge Networks</title>
      <dc:creator>André Vermeij</dc:creator>
      <pubDate>Wed, 18 Sep 2024 12:42:47 +0000</pubDate>
      <link>https://dev.to/memgraph/innovation-as-a-graph-improved-insight-into-technology-clusters-collaboration-and-knowledge-networks-1746</link>
      <guid>https://dev.to/memgraph/innovation-as-a-graph-improved-insight-into-technology-clusters-collaboration-and-knowledge-networks-1746</guid>
      <description>&lt;p&gt;&lt;strong&gt;Guest Author:&lt;/strong&gt; André Vermeij, Founder of &lt;a href="https://www.kenedict.com" rel="noopener noreferrer"&gt;Kenedict Innovation Analytics&lt;/a&gt; &amp;amp; Developer of &lt;a href="https://www.kenelyze.com" rel="noopener noreferrer"&gt;Kenelyze&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Organisations focused on innovation come in many forms, including corporations with large Research &amp;amp; Development (R&amp;amp;D) departments, universities, research institutions active in advancing science, and startups working on the potentially next big thing. Innovation-related data has become increasingly important for each of these organisations to inform decision-making and stay ahead of market developments. For example, an R&amp;amp;D-intensive corporation could use data to benchmark its own technology portfolio with its direct competitors, while a startup might be analysing data to assess previous activity and potential market entry in a sector of interest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Traditional Innovation Analysis
&lt;/h2&gt;

&lt;p&gt;The traditional way to look at innovation-related data is to report on output within a topic or organisation of interest based on counts and sums of variables of interest. When analysing its competition, a business may for example gather information on a competitor’s recent output and report on the number of documents in each technology domain, produce a list of the companies the competitor has worked with, or generate an overview of the most active inventors or researchers in a field of interest. Although all these analyses can be valuable in their own right, they’re missing out on a key aspect of an innovation ecosystem: the connections between technologies, organisations, and people.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Graph of Innovation
&lt;/h2&gt;

&lt;p&gt;Viewing innovation and its output as a graph of interconnected data points allows us to get a much deeper understanding of the technology and knowledge structures in a context of interest. Using the metadata in a wide array of innovation-related data sources, which will be discussed more in the following  section, it is possible to create graphs of connected documents, organisations and people and gain new insights into the actual underpinnings of innovative activity.&lt;/p&gt;

&lt;p&gt;For example, innovation graphs allow us to answer questions, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which clusters of activity can we distinguish within a topic or organisation of interest, and how has this evolved?&lt;/li&gt;
&lt;li&gt;What do the organisational collaboration networks in an area of interest look like, and who are the key players in network connectivity?&lt;/li&gt;
&lt;li&gt;How are teams of individual experts in a specific field composed, and who are the leading experts in a given topic?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Open Data Sources for Innovation Analytics
&lt;/h2&gt;

&lt;p&gt;Until just a few years ago, quality innovation data was quite hard to come by without a subscription to an expensive database hosting patent information or scientific publications. Luckily, in recent years, there has been a move towards more openly available data, which can serve as an excellent basis for setting up a wide variety of innovation graphs.&lt;/p&gt;

&lt;p&gt;Here’s a quick overview of common data sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Patents:&lt;/strong&gt; organisations apply for patents to protect their inventions against commercialisation by third parties. Patent applications and grants are published online by national patent offices around the world, with databases gathering data from all jurisdictions and providing a wide array of metadata. A great open data source is the European Patent Office's &lt;a href="https://www.epo.org/en/searching-for-patents/data/web-services/ops" rel="noopener noreferrer"&gt;Open Patent Services&lt;/a&gt; (OPS) API, or the EPO's search platform &lt;a href="https://worldwide.espacenet.com/" rel="noopener noreferrer"&gt;Espacenet&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scientific publications:&lt;/strong&gt; journal publications, conference proceedings, book chapters and various other types of scientific output are gathered in databases which bring together output from many sources. Paid databases such as Scopus are still used often by large organisations – great open alternatives include &lt;a href="https://openalex.org/" rel="noopener noreferrer"&gt;OpenAlex&lt;/a&gt; and &lt;a href="https://www.semanticscholar.org/" rel="noopener noreferrer"&gt;Semantic Scholar&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subsidies &amp;amp; funding programmes:&lt;/strong&gt; governmental subsidies to stimulate innovation and R&amp;amp;D in specific areas are often structured in openly available data sources. A good example is the European Union’s &lt;a href="https://data.europa.eu/data/datasets/cordis-eu-research-projects-under-horizon-europe-2021-2027?locale=en" rel="noopener noreferrer"&gt;CORDIS data&lt;/a&gt; for the Horizon Europe programme. Many national enterprise agencies also publish their granted subsidies and projects online.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal data:&lt;/strong&gt; the above data sources are often augmented with internal, unpublished data (e.g., internal project reports, unfiled patent applications, scientific output in the review stage) to get a view on very recent activity within an organisation. This is especially valuable when creating knowledge graphs within organisations or  carrying out an innovation portfolio analysis for a specific client.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a typical Innovation Analytics project, combining data from multiple of the above data sources is often key to gaining the best insights. For example, organisations applying for patents often also have scientific output related to the same theme and may also apply for governmental funding. To get a picture of innovative activity that is as complete as possible, it is therefore important to look at activity from multiple data sources and graph perspectives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graphs of Documents: Insight into Technology and Knowledge Clusters
&lt;/h2&gt;

&lt;p&gt;The analysis and visualisation of innovation graphs often starts with looking at the relationships between documents based on a shared characteristic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtvqe8014t8q1bsy0qbl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtvqe8014t8q1bsy0qbl.png" alt="Graphs of Documents: Text Similarity" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Depending on the goals of the analysis, there are various ways to link documents together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text similarity:&lt;/strong&gt; unstructured text data in the form of document titles, abstracts and summaries can be used to connect documents when there is a high similarity between their contents. This relies on vectorisation of the text of interest and subsequent calculation of pairwise cosine similarities, where a link is then drawn between documents based on a minimum similarity score.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge flows / shared authors:&lt;/strong&gt; another way to generate clusters of connected documents is to link them when the same people have worked on them. The authorship data on documents can be used to accomplish this. The key assumption here is that documents are part of the same “knowledge cluster” when persons with specific expertise have (co-) written them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Citations:&lt;/strong&gt; numerous citations to other documents can be found in both scientific publications and patent applications. We can use these citations to create various types of graphs:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Shared references:&lt;/strong&gt; connect documents when they cite the same sources, often with a minimum number of shared citations set as the weight for the links. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared citing documents:&lt;/strong&gt; connect documents when they have been cited by the same other documents, again often with a minimum weight set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct citations:&lt;/strong&gt; creation of citation graphs where links are drawn between documents when they cite each other.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Technology classifications:&lt;/strong&gt; patent documents are categorised using classification codes designating the technology areas which they fall into. These can be used to connect documents when they share one or multiple codes, essentially creating clusters of documents based on technological overlap.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The following graph is an example of a text similarity approach, where scientific publications in the area of autonomous vehicles are connected when they share significant textual content. Colors depict clusters of activity based on the outcomes of a community detection algorithm, and nodes are sized based on the number of times they were cited by other papers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sfmbvz220qcnd4hgd5i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sfmbvz220qcnd4hgd5i.png" alt="A graph showing scientific publications on autonomous vehicles, where node size reflects citation count, edges show text similarity, and colors represent clusters from a community detection algorithm." width="" height=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 1: Graph of scientific publications linked based on text similarity approach&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Graphs of Organisations: Insight into Collaboration Ecosystems
&lt;/h2&gt;

&lt;p&gt;Another graph perspective, which is very common in innovation analysis, focuses on mapping the connections between organisations (businesses, universities, research institutions, public bodies, hospitals, etc.). &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6940g5nx6nvjqimyuwvq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6940g5nx6nvjqimyuwvq.png" alt="Graphs of Organisations: Mapping the connections between organisations" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Many of the data sources above hold extensive metadata on the organisations responsible for the documents—scientific authors are affiliated with their employers, patents are applied for by the parties seeking protection of their invention and governmental subsidies are often received by consortia of collaborating organisations.&lt;/p&gt;

&lt;p&gt;It is common to attach weights to the links based on the number of collaborations between two organisations. Using these weights, it is then possible to filter the graph to focus only on the strongest / most frequently occurring collaborations.&lt;/p&gt;

&lt;p&gt;The graph below shows an example of collaboration in radiotherapy innovation, where colors are based on the type of organisation (e.g. blue = universities, green = hospital and medical centers) and node sizes based on their betweenness centrality scores:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fm8tokb4wtgd5asjo8a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fm8tokb4wtgd5asjo8a.png" alt="Collaboration in radiotherapy" width="" height=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 2: Collaboration in radiotherapy&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Graphs of People: Insight into Expertise and Knowledge Networks
&lt;/h2&gt;

&lt;p&gt;This is a graph perspective that often follows after mapping organizational collaboration networks, focusing on the actual person-to-person collaborations taking place to produce the analysed output. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firwev36fljk3eoyb7hcn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firwev36fljk3eoyb7hcn.png" alt="Graphs of People: Focused on the actual person-to-person collaborations taking place to produce the analysed output. " width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using the author/inventor metadata on documents, we draw links between people when they have co-authored a document. Similar to the organisational networks, we can also attach weights to the links,  which correspond to the number of documents which have been worked on jointly by two authors. This perspective can provide a deep understanding of the actual team structures and knowledge networks within and outside of organisations.&lt;/p&gt;

&lt;p&gt;Here’s an example of the (relatively large!) network of inventors who have worked on Apple patents. Nodes are sized based on their betweenness centralities, and colors are based on clusters detected by a community detection algorithm:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1n98a3uxox1ph32b33ss.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1n98a3uxox1ph32b33ss.png" alt="Apple's inventor network" width="" height=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 3: Apple's inventor network&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph Metrics &amp;amp; Innovation Insights
&lt;/h2&gt;

&lt;p&gt;The above examples show various ways to convert innovation data into actionable graph visualisations. In the actual analysis and interpretation of these graphs, it is important to make good use of the many metrics available in graph analytics. These metrics can help us understand which clusters are present in a network, and can aid in determining the importance of nodes based on centrality measures.&lt;/p&gt;

&lt;p&gt;The following metrics are valuable for analysing the overall graph structure in innovation analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Component analysis:&lt;/strong&gt; determining the components (interconnected subsets of nodes) in the graph to be able to see how far the graph is interconnected (how many nodes can reach each other directly or indirectly) and to determine the impact of the largest connected components versus smaller components.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;K-Cores:&lt;/strong&gt; to determine highly connected subsets of nodes in graphs, k-Cores can be used to highlight subgraphs in which all nodes have at least a degree of k. This can be used to focus on so-called cliques of nodes quickly and is especially valuable when analysing collaboration and knowledge networks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community detection:&lt;/strong&gt; using an algorithm such as the Leiden community detection algorithm to determine which clusters we can distinguish within the components. These clusters then serve as the basis for graph annotation, where clusters are labeled based on their actual contents (see the labels in the autonomous driving graph above).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the individual node level, degree and betweenness centrality measures can be used to determine the importance of nodes in innovation graphs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Degree Centrality:&lt;/strong&gt; determining simple connection counts per node to quickly see which actors are most important in terms of the number of other nodes they are connected to. Since most innovation graphs are weighted (links have weights associated with them), weighted degree centrality is also used regularly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Betweenness Centrality:&lt;/strong&gt; this is a frequently and often used metric to determine who holds key positions in a graph in terms of hub positions – which organizations/people are the “key connectors” between clusters/teams? It is calculated by determining how often each node appears on the shortest paths between all other nodes in the network.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Up Next: Use Cases
&lt;/h2&gt;

&lt;p&gt;Now that you have an initial idea of the main ideas behind innovation graphs, we will showcase practical use cases, real-world client examples and common challenges in innovation graph analysis in the next blog post. Stay tuned! &lt;/p&gt;

</description>
      <category>technologyclusters</category>
      <category>innovationnetworkanalysis</category>
      <category>knowledgegraph</category>
      <category>graphdatabase</category>
    </item>
    <item>
      <title>In-memory vs. disk-based databases: Why do you need a larger than memory architecture?</title>
      <dc:creator>Memgraph</dc:creator>
      <pubDate>Tue, 05 Sep 2023 14:41:54 +0000</pubDate>
      <link>https://dev.to/memgraph/in-memory-vs-disk-based-databases-why-do-you-need-a-larger-than-memory-architecture-37p</link>
      <guid>https://dev.to/memgraph/in-memory-vs-disk-based-databases-why-do-you-need-a-larger-than-memory-architecture-37p</guid>
      <description>&lt;p&gt;Memgraph is an in-memory graph database that recently added support for working with data that cannot fit into memory. This allows users with smaller budgets to still load large graphs to Memgraph without paying for (more) expensive RAM. However, expanding the main-memory graph database to support disk storage is, by all means, a complex engineering endeavor. Let’s break this process down into pieces.&lt;/p&gt;

&lt;h2&gt;
  
  
  On-disk databases
&lt;/h2&gt;

&lt;p&gt;Disk-based databases have been, for a long time, a de facto standard in the database development world. Their huge advantage lies in their ability to store a vast amount of data relatively cheaply on disk. However, the development can be very complex due to the interaction with low-level OS primitives. Fetching data from disk is something that everyone strives to avoid since it takes approximately 10x more time than using it from main memory. Neo4j is an example of a graph, an on-disk database that uses disk as its main storage media while trying to cache as much data as possible to main memory so it could be reused afterward. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fliwfmi74iomv74hf3bni.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fliwfmi74iomv74hf3bni.png" alt="disk oriented dbms" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  In-memory databases
&lt;/h2&gt;

&lt;p&gt;In-memory databases avoid the fundamental cost of accessing data from disk by simply storing all its data in the main memory. Such architecture also significantly simplifies the development of the storage part of the database since there is no need for a buffer pool. However, the biggest issue with in-memory databases is when the data cannot fit into the random access memory since the only possible way out is to transfer the data to a larger and, consequently, more expensive machine. &lt;/p&gt;

&lt;p&gt;In-memory database users rely on the fact that durability is still secured through durability mechanisms like transaction logging and snapshots so that data loss does not occur.&lt;/p&gt;

&lt;h2&gt;
  
  
  Larger-than-memory architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Main memory computation
&lt;/h3&gt;

&lt;p&gt;Larger-than-memory architecture describes a database architecture when the majority of computations are still within the main memory, but the database offers the ability to store a vast amount of data on disk, too, without having the computational complexity of interacting with buffer pools. &lt;/p&gt;

&lt;h3&gt;
  
  
  Identify hot &amp;amp; cold data
&lt;/h3&gt;

&lt;p&gt;The larger-than-memory architecture utilizes the fact that there are always hot and cold parts of the database in terms of accessing it. The goal is then to find cold data stored and move it to the disk so that transactions still have fast access to hot data. Cold data identification can be done either by directly tracking transactions’ access patterns (online) or by offline computation in which a background thread analyzes data.&lt;/p&gt;

&lt;p&gt;The second very important feature of the larger-than-memory architecture is the process of evicting cold data. This can be done in two ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DB tracks the memory usage and starts evicting data as soon as it reaches a predefined threshold.&lt;/li&gt;
&lt;li&gt;Eviction can be done only when new data is needed. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Transaction management
&lt;/h3&gt;

&lt;p&gt;Different systems also behave differently regarding transaction management. If the transaction needs data that is currently stored on the disk, it can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Abort the transaction, fetch data stored on the disk, and restart the transaction.&lt;/li&gt;
&lt;li&gt;Stall the transaction by synchronously fetching data from the disk.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Transaction must fit into memory
&lt;/h3&gt;

&lt;p&gt;The question is, what happens when the transaction data cannot fit into random access memory? In Memgraph, we decided to start with an approach that all transaction data must fit into memory. This means that some analytical queries cannot be executed on a large dataset, but this is the tradeoff we were willing to accept in the first iteration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjms4r0m897xd4b1u0063.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjms4r0m897xd4b1u0063.png" alt="memory dbms" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits of larger-than-memory databases
&lt;/h2&gt;

&lt;p&gt;Memgraph uses &lt;a href="https://rocksdb.org/" rel="noopener noreferrer"&gt;RocksDB&lt;/a&gt; as a &lt;a href="https://memgraph.com/blog/what-is-a-key-value-database" rel="noopener noreferrer"&gt;key-value store&lt;/a&gt; for extending the capabilities of the in-memory database. Not to go into too many details about RocksDB, but let’s just briefly mention that it is based on a data structure called &lt;a href="https://www.cs.umb.edu/~poneil/lsmtree.pdf" rel="noopener noreferrer"&gt;Log-Structured Merge-Tree&lt;/a&gt; (LSMT) (instead of B-Trees, typically the default option in databases), which are saved on disk and because of the design come with a much smaller &lt;a href="https://smalldatum.blogspot.com/2019/05/crum-conjecture-read-write-space-and.html" rel="noopener noreferrer"&gt;write amplification&lt;/a&gt; than B-Trees. &lt;/p&gt;

&lt;p&gt;The in-memory version of Memgraph uses Delta storage to support multi-version concurrency control (MVCC). However, for larger-than-memory storage, we decided to use the Optimistic Concurrency Control Protocol (OCC) since we assumed conflicts would rarely happen, and we could make use of &lt;a href="https://github.com/facebook/rocksdb/wiki/Transactions" rel="noopener noreferrer"&gt;RocksDB’s transactions&lt;/a&gt; without dealing with the custom layer of complexity like in the case of Delta storage. &lt;/p&gt;

&lt;p&gt;We’ve implemented OCC in a way that every transaction has its own private workspace, so potential conflicts are detected at the commit time. One of our primary requirements before starting to add disk-based data storage was not to ruin the performance of the main memory-based storage. Although we all knew there was no such thing as &lt;a href="https://www.youtube.com/watch?v=rHIkrotSwcc" rel="noopener noreferrer"&gt;zero-cost abstraction&lt;/a&gt;, we managed to stay within 10% of the original version. We decided to use &lt;a href="https://memgraph.com/blog/acid-transactions-meaning-of-isolation-levels" rel="noopener noreferrer"&gt;snapshot isolation&lt;/a&gt; as an appropriate concurrency isolation level since we believed it could be the default option for the large majority of Memgraph users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disadvantages of larger-than-memory databases
&lt;/h2&gt;

&lt;p&gt;As always, not everything is sunshine and flowers, especially when introducing such a significant feature to an existing database, so there are still improvements to be made. First, the requirement that a single transaction must fit into memory makes it impossible to use large analytical queries. &lt;/p&gt;

&lt;p&gt;It also makes our &lt;a href="https://memgraph.com/docs/memgraph/import-data/load-csv-clause/" rel="noopener noreferrer"&gt;LOAD CSV&lt;/a&gt; command for importing CSV files practically unusable since the command is executed as a single transaction. Although RocksDB is really good, fits really well into our codebase, and has proved to be very efficient in its caching mechanisms, maintaining an external library is always hard.&lt;/p&gt;

&lt;h2&gt;
  
  
  In retrospect
&lt;/h2&gt;

&lt;p&gt;Albeit the significant engineering endeavor, the larger-than-memory architecture is a super valuable asset to Memgraph users since it allows them to store large amounts of data cheaply on disk without sacrificing the performance of in-memory computation. We are actively working on resolving issues introduced with the new storage mode, so feel free to &lt;a href="https://discord.com/invite/memgraph" rel="noopener noreferrer"&gt;ask&lt;/a&gt;, &lt;a href="https://github.com/memgraph/memgraph/issues" rel="noopener noreferrer"&gt;open an issue&lt;/a&gt;, or &lt;a href="https://github.com/memgraph/memgraph/pulls" rel="noopener noreferrer"&gt;pull a request&lt;/a&gt;. We will be more than happy to help. Until next time 🫡 &lt;/p&gt;

</description>
      <category>database</category>
      <category>memgraph</category>
      <category>development</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Exciting News: LangChain Now Supports Memgraph!</title>
      <dc:creator>Memgraph</dc:creator>
      <pubDate>Fri, 25 Aug 2023 07:06:32 +0000</pubDate>
      <link>https://dev.to/memgraph/exciting-news-langchain-now-supports-memgraph-4mc8</link>
      <guid>https://dev.to/memgraph/exciting-news-langchain-now-supports-memgraph-4mc8</guid>
      <description>&lt;p&gt;We're thrilled to announce a powerful integration between LangChain and Memgraph, bringing you an unparalleled natural language interface to your Memgraph database. Say goodbye to complex queries and welcome a seamless and intuitive way to interact with your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memgraph QA chain tutorial
&lt;/h2&gt;

&lt;p&gt;If you've ever wanted to effortlessly query your Memgraph database using natural language, this tutorial is for you. This step-by-step guide will walk you through the process, ensuring you have all the tools you need to get started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you dive in, make sure you have Docker and Python 3.x installed on your system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get started
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Launch a Memgraph Instance&lt;/strong&gt;: With a few simple commands, you can have your Memgraph instance up and running using Docker. Just &lt;a href="https://python.langchain.com/docs/use_cases/more/graph/graph_memgraph_qa" rel="noopener noreferrer"&gt;follow our script&lt;/a&gt; to set it up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install dependencies&lt;/strong&gt;: We've got you covered with the required packages. Use pip to install &lt;code&gt;langchain&lt;/code&gt;, &lt;code&gt;openai&lt;/code&gt;, &lt;code&gt;neo4j&lt;/code&gt;, and &lt;code&gt;gqlalchemy&lt;/code&gt;. Don't forget the &lt;code&gt;--user&lt;/code&gt; flag to ensure smooth permissions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code playtime&lt;/strong&gt;: Whether you prefer working within this notebook or want to use a separate Python file, the tutorial offers code snippets to guide you through the process.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's inside
&lt;/h2&gt;

&lt;p&gt;Explore the rich features and functionalities that LangChain and Memgraph offer together:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API reference&lt;/strong&gt;: We provide an overview of the key components you'll be working with, such as ChatOpenAI, GraphCypherQAChain, and MemgraphGraph.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Populating the database&lt;/strong&gt;: Learn how to populate your Memgraph database effortlessly using the Cypher query language. We guide you through the process of seeding data that serves as the foundation for your work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Refresh graph schema&lt;/strong&gt;: Familiarize yourself with refreshing the graph schema, a crucial step in setting up the Memgraph-LangChain graph for Cypher queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Querying the database&lt;/strong&gt;: Discover how to interact with the OpenAI API and configure your API key. We'll show you how to utilize the GraphCypherQAChain to ask questions and receive informative responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chain modifiers&lt;/strong&gt;: Customize your chain's behavior with modifiers like return_direct, return_intermediate_steps, and top_k. Tailor the experience to your preferences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advanced querying&lt;/strong&gt;: Delve into advanced querying techniques and uncover tips for refining your prompts to improve query accuracy.&lt;/p&gt;

&lt;p&gt;Ready to take your data interaction to the next level? Join us in exploring the seamless synergy between LangChain and Memgraph. No more wrangling with queries – just natural language and meaningful insights. Simplify complexity, elevate your insights, and share your projects in our &lt;a href="https://discord.com/invite/memgraph" rel="noopener noreferrer"&gt;community&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>langchain</category>
      <category>memgraph</category>
      <category>python</category>
    </item>
    <item>
      <title>What is a Graph Database?</title>
      <dc:creator>Ani Ghazaryan</dc:creator>
      <pubDate>Wed, 23 Aug 2023 15:22:01 +0000</pubDate>
      <link>https://dev.to/memgraph/what-is-a-graph-database-4kl4</link>
      <guid>https://dev.to/memgraph/what-is-a-graph-database-4kl4</guid>
      <description>&lt;p&gt;While relational databases have been the go-to choice for data storage, they fall short when it comes to handling complex relationships and traversing interconnected data, which puts graph databases in a special spotlight. A graph database is a specialized database system designed to store, manage, and query highly connected data using graph theory principles. As data volumes continue to explode, companies need efficient and scalable solutions to handle the complexities of their data.&lt;/p&gt;

&lt;p&gt;Specialized database systems like graph databases offer a more natural and efficient way to model, query, and store data, leading to improved performance and better data insights.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding graphs
&lt;/h2&gt;

&lt;p&gt;In simple terms, at the core of graph databases lies the concept of a graph. In mathematics and computer science, a graph is a collection of nodes (also known as vertices) connected by edges. Nodes represent entities or objects, while edges depict the relationships or connections between them. This straightforward yet effective structure forms the foundation of graph databases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Components of graphs
&lt;/h3&gt;

&lt;p&gt;It's your time to shine! Let's reiterate on what we've learned so far. Graphs consist of two fundamental components: nodes and edges.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Nodes represent entities or objects and can have various attributes associated with them.&lt;/li&gt;
&lt;li&gt;Edges, on the other hand, depict the relationships or connections between nodes and can also carry properties.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Together, nodes and edges create a rich network of connected data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2Fnodes-and-edges.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2Fnodes-and-edges.png" alt="nodes and edges"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Graph theory basics
&lt;/h3&gt;

&lt;p&gt;Another common term you may hear here and there, alluding to graphs or graph databases, is graph theory, which is a branch of mathematics, that provides the theoretical underpinning for understanding and analyzing graphs. It defines vertices as the fundamental building blocks of a graph and edges as the connections between vertices. Relationships in a graph can be represented by directed or undirected edges, capturing the nature and direction of the connections.&lt;/p&gt;

&lt;h3&gt;
  
  
  Relational databases vs. graph databases
&lt;/h3&gt;

&lt;p&gt;Opinions split when it comes to &lt;a href="https://memgraph.com/blog/how-to-choose-a-database-for-your-needs" rel="noopener noreferrer"&gt;choosing a database&lt;/a&gt;, however, the debate around &lt;a href="https://memgraph.com/blog/graph-database-vs-relational-database" rel="noopener noreferrer"&gt;relational vs. graph databases&lt;/a&gt; is still hot. Relational databases have long been the dominant database model, organizing data into structured tables with predefined schemas. They excel in handling structured data and transactions but face challenges when dealing with complex relationships and traversing connected data. This is largely due to their rigid tabular structure.&lt;/p&gt;

&lt;p&gt;Joining multiple tables and navigating through numerous relationships can lead to performance bottlenecks and complex query formulations. This limits their effectiveness in scenarios where relationships play a crucial role.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2FGraph%2520vs%2520Relational%2520DB.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2FGraph%2520vs%2520Relational%2520DB.png" alt="relational vs graph databases"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Graph databases excel in modeling and querying relationships. They store connections explicitly, allowing for efficient traversals between nodes and enabling complex relationship queries with ease. And of course, graph databases provide flexibility, scalability, and performance advantages over a relational database when it comes to handling interconnected data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Characteristics of graph databases
&lt;/h2&gt;

&lt;p&gt;So far, you've been introduced to a few qualities that are typical to graphs, so let's put the learnings into structure and build off of what you've grasped.&lt;/p&gt;

&lt;h3&gt;
  
  
  Schema-less nature
&lt;/h3&gt;

&lt;p&gt;Unlike relational databases, graph databases are schema-less, meaning they do not require a predefined structure or schema for data. This flexibility allows for the dynamic addition of new node types, properties, and relationships, making graph databases highly adaptable to evolving data models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Native graph processing
&lt;/h3&gt;

&lt;p&gt;Graph databases are purpose-built for processing graph data. They employ optimized algorithms and data structures to efficiently traverse and manipulate the graph structure, resulting in faster query response times and improved performance compared to non-native graph databases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Graph traversal and pattern matching
&lt;/h3&gt;

&lt;p&gt;One of the key strengths of graph databases is their ability to traverse and explore relationships between nodes. Graph traversal algorithms can efficiently navigate the graph to discover patterns, uncover hidden connections, and retrieve data based on specific criteria. This capability is particularly valuable in applications such as recommendation engines, fraud detection, and knowledge graphs, which we will explore in the sections to come.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2Fgraph-traversal-.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2Fgraph-traversal-.png" alt="graph traversal"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Use cases of graph databases
&lt;/h2&gt;

&lt;p&gt;Unlike a traditional relational database that relies on tabular data, a graph database utilizes a flexible and intuitive data model, allowing for the representation of intricate relationships between entities. With its ability to efficiently capture and traverse vast networks of data, a graph database has emerged as an advanced tool for diverse domains, including:&lt;/p&gt;

&lt;h3&gt;
  
  
  Social networks and recommendation engines
&lt;/h3&gt;

&lt;p&gt;Graph databases have revolutionized social networking platforms and &lt;a href="https://memgraph.com/blog/building-a-recommendation-system-using-memgraph" rel="noopener noreferrer"&gt;recommendation engines&lt;/a&gt;. They enable personalized recommendations, friend suggestions, and social network analysis by leveraging the rich network of connections between users, interests, and entities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2FSocial-recommendation-engine.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2FSocial-recommendation-engine.png" alt="social recommendation engine"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Fraud detection and network analysis
&lt;/h3&gt;

&lt;p&gt;Graphs also excel in &lt;a href="https://memgraph.com/blog/how-memgraph-helped-companyx-save-7-figures-fraud-detection" rel="noopener noreferrer"&gt;fraud detection&lt;/a&gt; and network analysis. By representing complex networks of relationships, they can identify suspicious patterns, detect fraudulent activities, and uncover hidden connections that might indicate illicit behavior, making them an invaluable tool for cybersecurity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2Ffraud-detection-and-network-analysis.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2Ffraud-detection-and-network-analysis.png" alt="fraud detection"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Knowledge graphs and semantic networks
&lt;/h3&gt;

&lt;p&gt;Last but not least, graph databases serve as a foundation for building knowledge graphs and semantic networks. By representing data as nodes and relationships, they capture the semantics and context of information, enabling sophisticated knowledge discovery, semantic search, and data integration across disparate sources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2Fknowledge-graphs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhat-is-a-graph-database%2Fknowledge-graphs.png" alt="knowledge graphs"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Sneak peek into graph algorithms
&lt;/h2&gt;

&lt;p&gt;Surely enough, &lt;a href="https://memgraph.com/white-paper/beginners-guide-to-graph-algorithms" rel="noopener noreferrer"&gt;graph algorithms&lt;/a&gt; play a crucial role in leveraging the power of graph databases and unlocking valuable insights from connected data. In this section, we provide a sneak peek into some fundamental graph algorithms that form the backbone of graph database operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Breadth-First Search (BFS)&lt;/strong&gt;: Breadth-First Search is a fundamental algorithm used to explore and traverse a graph in a breadth-first manner. Starting from a given source node, BFS systematically explores all the neighboring nodes before moving deeper into the graph. This algorithm is commonly used to find the shortest path between two nodes, identify connected components, and perform level-based analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Depth-First Search (DFS)&lt;/strong&gt;: Depth-First Search is another crucial graph algorithm that explores a graph with a depth-first principle. DFS starts from a given source node and traverses as far as possible along each branch before backtracking. The algorithm is useful for identifying cycles in a graph, performing topological sorting, and searching for specific nodes or patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PageRank algorithm&lt;/strong&gt;: Developed by Google's founders, &lt;a href="https://memgraph.com/blog/pagerank-algorithm-for-graph-databases" rel="noopener noreferrer"&gt;PageRank&lt;/a&gt; is a graph algorithm used to measure the importance or relevance of nodes in a graph, particularly in web graphs. PageRank assigns each node a numerical value based on the number and quality of incoming links, and plays a vital role in search engine ranking, recommendation systems, and social network analysis.&lt;/p&gt;

&lt;p&gt;These are just a few examples of the numerous graph algorithms available, however, graph databases employ a wide range of algorithms to perform tasks such as &lt;a href="https://memgraph.com/docs/mage/algorithms/traditional-graph-analytics/community-detection-algorithm" rel="noopener noreferrer"&gt;community detection&lt;/a&gt;, centrality analysis, graph clustering, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sum up
&lt;/h2&gt;

&lt;p&gt;In this article, we explored the world of graph databases and their significance in modern data management. We defined graph databases and highlighted their importance in handling complex relationships and interconnected data. If you're curious and want to learn more about the fascinating world of graphs, make sure to check out our &lt;a href="https://memgraph.com/blog" rel="noopener noreferrer"&gt;blog&lt;/a&gt; and give us a shout in our &lt;a href="https://memgraph.com/community" rel="noopener noreferrer"&gt;community&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Security Analysis with JupiterOne’s Starbase and Memgraph</title>
      <dc:creator>Matea Pesic</dc:creator>
      <pubDate>Tue, 22 Aug 2023 16:36:38 +0000</pubDate>
      <link>https://dev.to/memgraph/security-analysis-with-jupiterones-starbase-and-memgraph-138f</link>
      <guid>https://dev.to/memgraph/security-analysis-with-jupiterones-starbase-and-memgraph-138f</guid>
      <description>&lt;p&gt;Starbase is an open-source graph-based security analysis tool that unifies all of JupiterOne’s integrations into one. It collects assets and relationships from services and systems, including cloud infrastructure, SaaS applications, security controls, and more, into an intuitive graph visualization. With over 115 open-source graph integrations, Starbase collaborates with your existing toolkit enabling easy and insightful cyber security analysis.&lt;/p&gt;

&lt;p&gt;In this article, we’ll dig into Starbase, guiding you through the setup of two example integrations and enabling Starbase to work with Memgraph for easy ingestion and visualization of your graph data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--p3ARwUhY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/security-analysis-with-starbase-and-memgraph/starbase-logo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--p3ARwUhY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/security-analysis-with-starbase-and-memgraph/starbase-logo.png" alt="starbase logo" width="726" height="722"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Installed &lt;a href="https://yarnpkg.com/"&gt;Yarn&lt;/a&gt; package manager.&lt;/li&gt;
&lt;li&gt;Installed &lt;a href="https://nodejs.org/en"&gt;Node.js&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A running Memgraph instance—visit Memgraph’s docs for instructions on how to &lt;a href="https://memgraph.com/docs/memgraph/installation"&gt;install&lt;/a&gt; and &lt;a href="https://memgraph.com/docs/memgraph/connect-to-memgraph"&gt;connect&lt;/a&gt; to Memgraph.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Setting up Starbase
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;To kick-start your Starbase setup, first, you need to clone the &lt;a href="https://github.com/JupiterOne/starbase"&gt;JupiterOne/Starbase&lt;/a&gt; repo into your local directory and ensure you have &lt;strong&gt;Yarn&lt;/strong&gt; and &lt;strong&gt;Node.js&lt;/strong&gt; installed.&lt;/li&gt;
&lt;li&gt;Once you’ve successfully cloned the repository and installed the prerequisites, place yourself in the terminal in the directory where you cloned the repo and run the &lt;code&gt;yarn&lt;/code&gt; command. The command installs all of the necessary project dependencies.&lt;/li&gt;
&lt;li&gt;The next step is setting up configurations for your integration of choice. You can find a list of all integrations on JupiterOne’s GitHub repo. Moving forward, we are going to explore two options for possible integration, &lt;a href="https://github.com/jupiterone/graph-zoom"&gt;Zoom&lt;/a&gt;, and &lt;a href="https://github.com/jupiterone/graph-github"&gt;GitHub&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Setting up integrations
&lt;/h2&gt;

&lt;p&gt;In order to set up an integration, you need to register an account in the system the integration targets for ingestion and obtain the necessary API credentials. Starbase leverages credentials from external services to authenticate and collect data. When Starbase is started, it reads configuration data from a single configuration file named &lt;code&gt;config.yaml&lt;/code&gt; at the root of the project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zoom integration
&lt;/h3&gt;

&lt;p&gt;In order to configure the Zoom integration, we need to create a Zoom app to retrieve the needed credentials: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to the &lt;a href="https://marketplace.zoom.us/"&gt;Zoom App Marketplace&lt;/a&gt; and sign into your Zoom account.&lt;/li&gt;
&lt;li&gt;In the top right corner, go to the Develop dropdown menu and select Build App.&lt;/li&gt;
&lt;li&gt;Choose to create an &lt;strong&gt;OAuth&lt;/strong&gt; type of app.&lt;/li&gt;
&lt;li&gt;Take note of your &lt;strong&gt;Account ID&lt;/strong&gt;, &lt;strong&gt;Client ID&lt;/strong&gt;, and &lt;strong&gt;Client secret&lt;/strong&gt; which we’ll need for the configuration file later on.&lt;/li&gt;
&lt;li&gt;In the Scopes section, add &lt;code&gt;group:read:admin&lt;/code&gt;, &lt;code&gt;role:read:admin&lt;/code&gt;, &lt;code&gt;user:read:admin&lt;/code&gt;, and &lt;code&gt;account:read:admin&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After you’ve successfully created your Zoom App, open up the starbase repo in your editor of choice and create your &lt;code&gt;config.yaml&lt;/code&gt; file. This is an example of a &lt;code&gt;config.yaml&lt;/code&gt; file for Zoom integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;integrations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;zoom&lt;/span&gt;
    &lt;span class="nx"&gt;instanceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;testInstanceId&lt;/span&gt;
    &lt;span class="nx"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;integrations&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;zoom&lt;/span&gt;
    &lt;span class="nx"&gt;gitRemoteUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="c1"&gt;//github.com/JupiterOne/graph-zoom.git&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="nx"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="nx"&gt;CLIENT_ID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;CLIENT_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="nx"&gt;CLIENT_SECRET&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;CLIENT_SECRET&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="nx"&gt;SCOPES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;read:admin role:read:admin user:read:admin account:read:admin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GitHub integration
&lt;/h3&gt;

&lt;p&gt;In order to configure GitHub integration, we need to create a GitHub app to retrieve the needed credentials: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to the &lt;a href="https://github.com/settings/apps"&gt;GitHub Apps&lt;/a&gt; and select to create a new GitHub App&lt;/li&gt;
&lt;li&gt;Name your app, and enter a homepage URL (in this case, you can use the JupiterOne’s &lt;a href="https://github.com/JupiterOne/starbase"&gt;Starbase repo URL&lt;/a&gt;), uncheck the webhook and adjust the repository permissions. The following permissions need to be set to &lt;strong&gt;read-only&lt;/strong&gt;: 
-&lt;em&gt;Repository Permissions&lt;/em&gt;: Actions, Environments, Issues, Pull Requests and Secrets
-&lt;em&gt;Organization Permissions&lt;/em&gt;: Administration, Members, Secrets. The rest of the permissions are &lt;strong&gt;No access&lt;/strong&gt; by default. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Read-only access for secrets repo doesn’t give read-only access to actual secret content, it only gives read-only info to the existence of the metadata about the secrets.  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select &lt;strong&gt;Any account&lt;/strong&gt; and create your GitHub App.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After you’ve successfully created your GitHub App, open up the cloned Starbase repository in your editor of choice and create your &lt;code&gt;config.yaml&lt;/code&gt; file. Generate your private key and retrieve other needed credentials from the GitHub App you previously created. Below is an example of a &lt;code&gt;config.yaml&lt;/code&gt; file for a GitHub integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;integrations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt;
     &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;github&lt;/span&gt;
     &lt;span class="nx"&gt;instanceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;testInstanceId&lt;/span&gt;
     &lt;span class="nx"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;integrations&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;github&lt;/span&gt;
     &lt;span class="nx"&gt;gitRemoteUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="c1"&gt;//github.com/JupiterOne/graph-github.git&amp;gt;&lt;/span&gt;
     &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nx"&gt;GITHUB_APP_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;GITHUB_APP_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nx"&gt;GITHUB_APP_LOCAL_PRIVATE_KEY_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;YOURPATH&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="sr"&gt;/{YOURFILENAME}.private-key.pe&lt;/span&gt;&lt;span class="err"&gt;m
&lt;/span&gt;        &lt;span class="nx"&gt;INSTALLATION_ID&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;INSTALLATION_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nx"&gt;GITHUB_API_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="c1"&gt;//api.github.com     &lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use Starbase with Memgraph
&lt;/h2&gt;

&lt;p&gt;After you’ve successfully created your &lt;code&gt;config.yaml&lt;/code&gt; file, the last step is to adjust your queries to work with Memgraph. In order to do that, run the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First, you need to place yourself in the terminal in the folder you cloned your Starbase repo and run yarn starbase setup command to clone or update all integrations listed in the config.yaml file, as well as install all dependencies for each integration.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run your Memgraph instance. Follow the instructions from Memgraph’s docs on how to connect to Memgraph, or if you are using Docker, simply run the following command:&lt;br&gt;
&lt;code&gt;docker run -it -p 3000:3000 -p 7444:7444 -p 7687:7687 memgraph/memgraph-platform&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;By modifying just a single line of code, you are ready to use Starbase with Memgraph. Inside the neo4jGraphStore.js file, locate the addEntities() function. To enable compatibility with Memgraph, simply update the following line:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;runCypherCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`CREATE INDEX index_&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; IF NOT EXISTS FOR (n:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;) ON (n._key, n._integrationInstanceID);`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;runCypherCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`CREATE INDEX index_&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; IF NOT EXISTS FOR (n:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;) ON (n._key, n._integrationInstanceID);`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;runCypherCommand&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;You are all set to utilize Starbase with Memgraph. The instance is actively listening to port 7687, as defined in the code. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The final step is to run the &lt;code&gt;yarn starbase run&lt;/code&gt; command. Afterward, launch your browser and go to &lt;code&gt;localhost:3000&lt;/code&gt; to access &lt;strong&gt;Memgraph Lab&lt;/strong&gt; or open your desktop version to explore and visualize your graph data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Explore your dataset
&lt;/h3&gt;

&lt;p&gt;Below, we’ve provided a few query examples that demonstrate how you can dig into your dataset and extract valuable insights. The following examples assume the use of GitHub integration.&lt;/p&gt;

&lt;p&gt;With the following query, you are retrieving the information of all of the extracted GitHub users from a certain organization:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MATCH (n:github_user) RETURN n LIMIT 3;&lt;/code&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9sx93pZM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/security-analysis-with-starbase-and-memgraph/github-user.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9sx93pZM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/security-analysis-with-starbase-and-memgraph/github-user.png" alt="github user" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you also want to determine which code owners of organization repositories grant access to outside contributors, execute the following query:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MATCH (account:github_account) - [e:OWNS] -&amp;gt; (repo:github_repo) -&amp;gt; [f:ALLOWS] -&amp;gt; (user:github_user {role: ‘OUTSIDE’})&lt;br&gt;
RETURN account, repo, user, e, f;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DM8YN9Pf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/security-analysis-with-starbase-and-memgraph/github-user-awesome-code.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DM8YN9Pf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/security-analysis-with-starbase-and-memgraph/github-user-awesome-code.png" alt="github user awesome code" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;Starbase is a powerful tool that simplifies security analysis by unifying integrations into a user-friendly graph view, enhancing cybersecurity insights. Incorporating Memgraph for data ingestion adds another dimension by enhancing its capabilities and visualizing your data. If you are curious about graphs and would like to learn more, make sure to check out our &lt;a href="https://memgraph.com/blog"&gt;blog&lt;/a&gt; and join our community on &lt;a href="https://discord.gg/memgraph"&gt;Discord&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Memgraph vs. TigerGraph</title>
      <dc:creator>Vlasta Pavicic</dc:creator>
      <pubDate>Fri, 18 Aug 2023 06:43:44 +0000</pubDate>
      <link>https://dev.to/memgraph/memgraph-vs-tigergraph-405o</link>
      <guid>https://dev.to/memgraph/memgraph-vs-tigergraph-405o</guid>
      <description>&lt;p&gt;In today's data-driven world, the necessity to process and interpret complex relationships within massive datasets is making organizations continually search for the go-to graph database, leaving the traditional relational database options behind. After the initial &lt;a href="https://db-engines.com/en/ranking/graph+dbms"&gt;DB-Engines&lt;/a&gt; consultations, two names commonly arise in conversations: TigerGraph and Memgraph. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--B5ldneKA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/tigergraph-vs-memgraph/db-rankings.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--B5ldneKA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/tigergraph-vs-memgraph/db-rankings.png" alt="db-engines-ranks" width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Background on both solutions
&lt;/h2&gt;

&lt;p&gt;Founded in 2012 by Dr. Yu Xu, &lt;strong&gt;TigerGraph's&lt;/strong&gt; core objective is to provide a scalable and efficient graph database platform that enables organizations to leverage the power of interconnected data, supporting applications ranging from fraud detection to AI and machine learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memgraph&lt;/strong&gt; is an in-memory, open-source graph database with roots in the UK and Croatia. Founded by Marko Budiselic and Dominik Tomicevic in 2016, and backed by American investors, Memgraph prioritizes high performance and developer accessibility. With a robust community edition, the platform offers a blend of ease of use and practical functionality, all presented through clear and uncomplicated &lt;a href="https://memgraph.com/blog/what-is-an-open-source-license"&gt;licensing&lt;/a&gt;, making it the backbone of many cybersecurity solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memgraph vs. TigerGraph differences
&lt;/h2&gt;

&lt;p&gt;Although both TigerGraph and Memgraph have been developed in C++ and aim to provide performant solutions for &lt;a href="https://memgraph.com/blog/real-time-graph-analytics"&gt;real-time data analytics&lt;/a&gt;, there exist some important differences between the two platforms that set them apart. Let’s check what those are.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query language
&lt;/h3&gt;

&lt;p&gt;The choice of query language plays a significant role in the overall user experience.&lt;/p&gt;

&lt;p&gt;GSQL, TigerGraph's proprietary query language, does offer an expressive, Turing-complete language tailored for graph pattern-matching and analytics functions but might present a steeper learning curve for those new to graphs. It has been specifically designed for TigerGraph, and the skillset may not be easily transferred from or to other graph database platforms. &lt;/p&gt;

&lt;p&gt;In contrast, Cypher query language is an open-source, declarative language known for its user-friendly syntax. Cypher's human-readable style has propelled it into a standard for querying graph databases. It has been developed by Neo4j but is utilized by various systems, including Memgraph. Due to its simplicity, and broad community support, it is a preferred choice for many developers who know their applications will need minimum changes if they require a switch to another database vendor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data storage
&lt;/h3&gt;

&lt;p&gt;TigerGraph and Memgraph offer distinctive approaches to handling data in their graph databases, each reflecting a unique strategy to balance performance, scalability, and flexibility. &lt;/p&gt;

&lt;p&gt;TigerGraph employs a hybrid memory-disk approach, leveraging RAM for storing frequently accessed data and disk storage for large graphs that may exceed available memory. This hybrid model allows TigerGraph to achieve real-time analytics, where active datasets are immediately available, while also scaling to handle massive datasets without being constrained by RAM.&lt;/p&gt;

&lt;p&gt;In contrast, Memgraph's architecture has been built natively for in-memory data analysis and storage, focusing on lightning-fast data processing. Being ACID compliant, it ensures consistency and reliability in its core design. However, Memgraph also offers flexibility. An analytical storage mode that bypasses ACID compliance is available, accelerating analytics and data import operations when absolute consistency is not required. Additionally, an on-disk storage option allows users to weigh performance against budget constraints, thus achieving a balance tailored to specific needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mTjPjUl_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/tigergraph-vs-memgraph/storage-modes.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mTjPjUl_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/tigergraph-vs-memgraph/storage-modes.png" alt="storage-modes" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While TigerGraph's hybrid approach offers a comprehensive solution for both speed and scalability, Memgraph's focus on in-memory processing with adaptable options reflects a commitment to performance with versatility to suit various requirements. The distinction between these two models shows the innovation in graph database technology, catering to diverse needs in data management, analysis, and storage. &lt;/p&gt;

&lt;h3&gt;
  
  
  Data isolation level
&lt;/h3&gt;

&lt;p&gt;TigerGraph employs a &lt;a href="https://memgraph.com/blog/acid-transactions-meaning-of-isolation-levels"&gt;read-committed data isolation level&lt;/a&gt;, meaning that a transaction can access data which is committed before and during this transaction’s execution.&lt;br&gt;
For example, two same READ queries inside one transaction can return different results because between them another transaction was committed.&lt;/p&gt;

&lt;p&gt;On the other hand, Memgraph uses snapshot isolation by default, where each query operates on a consistent snapshot of the data at the query's start time, with the option to change the isolation level, but snapshot isolation offers an advantage as it provides a more consistent view of the data, reducing the chance of reading partial or uncommitted changes. This ensures more accurate query results and a smoother transaction experience, making snapshot isolation generally considered a more reliable approach in many scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pricing models and support
&lt;/h3&gt;

&lt;p&gt;TigerGraph has a free version that allows users to work with up to 50GB of data, making it suitable for small projects or initial exploration. Memgraph offers something different with its &lt;a href="https://memgraph.com/pricing"&gt;Community Edition&lt;/a&gt;, which is not only free but also open-source and packed with features.&lt;/p&gt;

&lt;p&gt;For example, both TigerGraph and Memgraph offer high availability features to ensure that data is consistently accessible and resistant to failures, but Memgraph's replication is available even in the Community Edition of the product. This means that the Community Edition is not "crippleware" but a fully functional version that allows users to "kick the tires" on the product and properly test it to ensure it meets requirements before deploying it in a production environment.&lt;/p&gt;

&lt;p&gt;By offering this, and a plethora of other features in the Community Edition, Memgraph not only shows a commitment to performance and reliability but also to accessibility, empowering users to explore and validate the capabilities of the software without barriers.&lt;/p&gt;

&lt;p&gt;Due to the lack of complicated layers of management that some larger companies might have, in Memgraph you can talk directly to engineers if you have questions or need help. It's a more hands-on, direct way of working that puts you closer to the people who built the product, and it can make working with Memgraph a more pleasant and efficient experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Overview of features
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vm3EHAvR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/memgraph-vs-tigergraph/overview-of-features.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vm3EHAvR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/memgraph-vs-tigergraph/overview-of-features.png" alt="overvies-of-features" width="743" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways on both graph databases
&lt;/h2&gt;

&lt;p&gt;Memgraph and TigerGraph both offer graph database solutions, but Memgraph's native in-memory design sets it apart. Built for speed without losing stability or ACID compliances, Memgraph provides efficient real-time querying and analytics. Although TigerGraph claims to be "The World’s Fastest Graph Analytics Platform for the Enterprise", clients have reported increased performance after switching to Memgraph. If speed, reliability, direct interaction, and support from engineers are key priorities, Memgraph may be the more appealing choice. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ePlK0GrR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/tigergraph-vs-memgraph/benchgraph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ePlK0GrR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/tigergraph-vs-memgraph/benchgraph.png" alt="benchgraph" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check the performance of Memgraph on your own dataset using &lt;a href="https://memgraph.com/blog/benchmark-memgraph-or-neo4j-with-benchgraph"&gt;Benchgraph&lt;/a&gt;, a graph database performance benchmark, and feel free to &lt;a href="https://memgraph.com/contact-us"&gt;contact us&lt;/a&gt; about making the switch.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>What is a Key-Value Database?</title>
      <dc:creator>Kruno Golubic</dc:creator>
      <pubDate>Thu, 03 Aug 2023 09:17:43 +0000</pubDate>
      <link>https://dev.to/memgraph/what-is-a-key-value-database-34jh</link>
      <guid>https://dev.to/memgraph/what-is-a-key-value-database-34jh</guid>
      <description>&lt;p&gt;In the landscape of data management solutions, key-value databases stand apart, offering a unique blend of simplicity and performance tailored to the big data era. By abstracting data into simple key-value pairs and values, this type of database offers an intuitive and flexible data model that can seamlessly scale to accommodate large datasets. Before delving into the complexities of key-value databases, their advantages, use cases, and potential drawbacks, let's first establish a foundational understanding of these databases and how they represent a distinctive paradigm in data storage and retrieval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding a key-value store
&lt;/h2&gt;

&lt;p&gt;In the &lt;a href="https://memgraph.com/blog/how-to-choose-a-database-for-your-needs"&gt;ever-expanding world of databases&lt;/a&gt;, key-value databases have emerged as a vital element. They are designed for storing, retrieving, and managing associative arrays, a data structure commonly known as a dictionary or hash table. The dictionary contains a collection of records, each with different fields containing data. These records are stored and retrieved using a unique key. This unique approach to data management differs from traditional relational databases and makes a key-value database an optimal solution for use cases where simplicity, speed, and scalability are paramount. Key-value databases are the &lt;a href="https://memgraph.com/blog/types-of-nosql-databases-deep-dive"&gt;simplest form of NoSQL databases&lt;/a&gt;. Each item in the database is stored as an attribute name, or key, together with its value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining key-value stores
&lt;/h2&gt;

&lt;p&gt;A key-value database, or key-value store, uses a simple key-value method to store data. Each key-value pair represents a specific piece of data. The 'key' serves as a unique identifier that is used to find the data within the database. Key-value databases treat the data as a single opaque collection, which may have different fields for every record. This offers considerable flexibility and more closely aligns with modern concepts like object-oriented programming.&lt;/p&gt;

&lt;p&gt;The primary features of a key-value database include simplicity, high performance, and the ability to handle large volumes of data efficiently. Because placeholders or input parameters do not represent optional values, key-value databases often use far less memory to store the same data, which can lead to significant performance gains in certain workloads.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lGJCeU5G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/what-is-a-key-value-database/key-value-database.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lGJCeU5G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/what-is-a-key-value-database/key-value-database.png" alt="key value database" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Working with a key-value database
&lt;/h2&gt;

&lt;p&gt;Key-value databases are known for their ease of use, making them a good starting point for beginners in the database realm. Thanks to their straightforward data structure, they provide a simple, user-friendly interface. The process involves storing unique keys associated with their corresponding data values and retrieving the associated value using its unique key.&lt;/p&gt;

&lt;p&gt;Key-value databases excel in &lt;a href="https://databasetown.com/key-value-database-use-cases/"&gt;use cases that involve frequently accessed data&lt;/a&gt;. For instance, in e-commerce platforms, a key-value database could be used to store product details. The unique identifier or 'key' could be the product ID, with all the related product data stored as the associated 'value'. Another scenario is user logs, where key-value stores prove to be efficient. Popular key-value database examples include Redis, DynamoDB, Riak, and RocksDB have active community support and extensive documentation, making it easier for developers to get started and troubleshoot issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits and advantages of a key-value database
&lt;/h2&gt;

&lt;p&gt;Developers and organizations &lt;a href="https://www.geeksforgeeks.org/key-value-data-model-in-nosql/"&gt;choose key-value databases for several reasons&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Speed
&lt;/h3&gt;

&lt;p&gt;One of the main reasons is their speed. Key-value databases can handle large volumes of reads and writes with minimal latency. The simplicity of the data structures used in a key-value store makes it easy for developers to quickly store and retrieve data. They offer impressive speed and performance, handling large volumes of data and providing rapid, random data access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Horizontal scaling
&lt;/h3&gt;

&lt;p&gt;Scalability is a critical factor for modern applications dealing with ever-growing data volumes and increasing user demands. Key-value stores offer an advantage in this area, as they can easily scale horizontally. Horizontal scaling involves adding more servers to the existing network to distribute the data and workload across multiple nodes. As a result, key-value databases can maintain their performance levels even as data grows, ensuring that applications can handle a high number of concurrent users and massive data storage needs. This ability to scale out effectively makes them a preferred choice for applications that require elasticity and flexibility.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QkfJCRJt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/what-is-a-key-value-database/pros-vs-cons.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QkfJCRJt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/what-is-a-key-value-database/pros-vs-cons.png" alt="pros vs cons" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Ease of use and development
&lt;/h3&gt;

&lt;p&gt;Key-value databases provide a simple and straightforward interface for data storage and retrieval. Developers can easily interact with the database using basic operations like "get" and "put" based on the associated keys. The straightforward API and data model reduce the complexity of the application code, making it easier to develop and maintain. Additionally, the ease of integration with various programming languages and frameworks allows developers to quickly incorporate key-value stores into their applications without a steep learning curve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flexible data models
&lt;/h3&gt;

&lt;p&gt;Key-value databases are schema-less, meaning that they do not enforce a fixed data structure or data types. This flexibility allows developers to store and manage various types of data within the same database without strict predefined schemas. This is particularly advantageous for applications dealing with diverse and evolving data formats. Whether it's storing user profiles, session data, configurations, or complex objects, key-value databases can handle different data formats efficiently, eliminating the need for complex and costly data transformations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Potential drawbacks of a key-value database
&lt;/h2&gt;

&lt;p&gt;While key-value databases are powerful, they &lt;a href="https://www.techtarget.com/searchdatamanagement/tip/NoSQL-database-types-explained-Key-value-store"&gt;aren't always the ideal choice&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complex queries
&lt;/h3&gt;

&lt;p&gt;They lack the ability to perform complex queries or handle sophisticated relationships between data, which relational databases excel at. If your use case involves complex queries or requires understanding the relationships between different data entities, then a relational database or a graph database might be a better fit. Also, they may not be suitable for applications that require multi-record transactions or complex data manipulation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lack of data relationships
&lt;/h3&gt;

&lt;p&gt;Key-value databases do not inherently support relationships between data items. While data can be organized and retrieved efficiently based on keys, establishing complex relationships between different pieces of data requires additional application logic. This can lead to potential data inconsistencies and increased complexity in managing relationships outside the database layer. For applications heavily reliant on data relationships, such as social networks or recommendation systems, graph databases may provide more natural and performant solutions, as they are specifically designed to handle and traverse complex data relationships efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling complex data types
&lt;/h3&gt;

&lt;p&gt;Another challenge with key-value databases is the handling of complex data types. While they can handle structured, semi-structured, and unstructured data, they lack the advanced capabilities of a document database in managing complex objects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data size and indexing
&lt;/h3&gt;

&lt;p&gt;As the data size grows significantly, key-value databases may face challenges with indexing and maintaining performance. While they excel at handling fast key-based lookups, large datasets can impact the efficiency of these operations. Proper index design and tuning are crucial to ensure optimal performance. Additionally, some key-value databases may not offer sophisticated indexing options, limiting their capabilities to efficiently handle specific types of queries or data access patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overcoming limitations of a key-value database
&lt;/h2&gt;

&lt;p&gt;Despite these limitations, there are strategies and tools that can help overcome them. For example, secondary indexing and query languages like Redis's RediSearch and Amazon DynamoDB's Query Language can help perform more complex queries. Hybrid models, combining a key-value database with other database types like document databases, can handle complex queries or manage complex relationships between data entities. Additional technologies that can be incorporated into key-value databases to enhance functionality include secondary indexing to create efficient index structures to accelerate application responses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Oq50WUSt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/what-is-a-key-value-database/success.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Oq50WUSt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/what-is-a-key-value-database/success.png" alt="success" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;Key-value databases offer a compelling mix of simplicity, speed, and scalability that can be highly beneficial in certain situations. As with any technology, it's important to understand its strengths and weaknesses to make the most of it. The key-value database world is constantly evolving, and it's an exciting area to explore for any developer or organization looking to optimize their data storage and retrieval needs.&lt;/p&gt;

&lt;p&gt;They are a fantastic choice for use cases that involve handling large volumes of frequently accessed data or where horizontal scalability is required. However, it's crucial to understand the capabilities and limitations of key-value databases before choosing them as a solution. Overall, with ongoing evolution and innovation in the database world, the future looks promising for the key-value database scene.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>When to Use a NoSQL Database</title>
      <dc:creator>Katarina Supe</dc:creator>
      <pubDate>Fri, 21 Jul 2023 13:44:05 +0000</pubDate>
      <link>https://dev.to/memgraph/when-to-use-a-nosql-database-201k</link>
      <guid>https://dev.to/memgraph/when-to-use-a-nosql-database-201k</guid>
      <description>&lt;p&gt;How we manage and process data changes rapidly over the years, and staying on track with the newest technology trends is important. Traditional relational database management systems, such as MySQL, Oracle, and SQL Server, have always been the first choice to maintain data integrity and for fast querying, but with the rise of big data, NoSQL (Not Only SQL) databases emerged and became more popular due to their speed, flexibility, and scalability. Now, the question that pops up naturally is when is the right time to use a NoSQL database?&lt;/p&gt;

&lt;p&gt;This blog post will explore NoSQL databases and how they differ from traditional databases. Besides, you will learn when and why to utilize NoSQL databases. Whether you’re a developer, data engineer, or decision maker, this blog post will provide insights into modern databases that will come in handy to boost your projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding NoSQL databases
&lt;/h2&gt;

&lt;p&gt;NoSQL databases are non-relational databases with flexible schema designed for high performance at a massive scale. Unlike traditional relational databases, which use tables and predefined schemas, NoSQL databases use a variety of data models. There are 4 main types of NoSQL databases - document, graph, key-value, and column-oriented databases. NoSQL databases generally are well-suited for unstructured data, large-scale applications, and agile development processes. The most popular examples of NoSQL databases are &lt;a href="https://www.mongodb.com/" rel="noopener noreferrer"&gt;MongoDB&lt;/a&gt; (document), &lt;a href="https://memgraph.com/" rel="noopener noreferrer"&gt;Memgraph&lt;/a&gt; (graph),  &lt;a href="https://redis.io/" rel="noopener noreferrer"&gt;Redis&lt;/a&gt; (key-value store) and &lt;a href="https://hbase.apache.org/" rel="noopener noreferrer"&gt;Apache HBase&lt;/a&gt; (column-oriented).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhen-to-use-nosql-database%2Fnosql-databases.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhen-to-use-nosql-database%2Fnosql-databases.png" alt="nosql databases"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These &lt;a href="https://memgraph.com/blog/types-of-nosql-databases-deep-dive" rel="noopener noreferrer"&gt;types of NoSQL databases&lt;/a&gt; store data in their own unique ways offering pros and cons, which are presented below:&lt;/p&gt;

&lt;h3&gt;
  
  
  Advantages
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: If you’ve used an SQL database before, you are probably familiar with its strict schema. That can be a hassle with data rapidly changing. NoSQL databases, with their dynamic and flexible schemas, have a huge advantage in handling unstructured and semi-structured data, making an excellent choice for diverse data types.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Data volumes can grow exponentially, and scalability becomes crucial, especially with the rise of big data. Many NoSQL databases offer horizontal scalability, allowing you to add more servers to increase the capacity. On the other hand, SQL databases usually scale only vertically. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Many NoSQL databases are optimized to deliver high performance, even when dealing with large data volumes or data streams. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Disadvantages
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Maturity&lt;/strong&gt;: SQL (standing for Structured Query Language) databases have been popular in the database world for a long time, and because of that, they offer robustness, tools and community that NoSQL databases have not managed to match yet. That does not mean that NoSQL will not overtake over the years, just like it happens often with modern technologies. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt;: The learning curve with NoSQL databases is steeper than with SQL databases, especially because there is no standard query language. This follows from the maturity and the fact that NoSQL databases are more complex to design, implement and manage. But, the results you gain by storing your data in a proper database might be worth it. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ACID compliance&lt;/strong&gt;: Opposite to NoSQL databases, many SQL databases are usually ACID-compliant. However, as an exception, Memgraph is a NoSQL graph database that is ACID compliant. If you’re unsure what that means for your application, you are invited to check out our recent blog on &lt;a href="https://memgraph.com/blog/acid-transactions-meaning-of-isolation-levels" rel="noopener noreferrer"&gt;ACID transactions and isolation levels&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5 reasons to choose a NoSQL database
&lt;/h2&gt;

&lt;p&gt;Based on the above-mentioned advantages and disadvantages of NoSQL databases, there are various scenarios where NoSQL databases are the optimal solution for your project. Let’s take a look at 5 scenarios where NoSQL databases are great:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Handling large volumes of data at scale seamlessly
&lt;/h3&gt;

&lt;p&gt;NoSQL databases can handle large amounts of data by spreading it across multiple servers. Hence, the NoSQL database is a perfect solution if you have large amounts of growing data. You don’t have to worry that you don’t have enough resources to scale vertically; instead, you should design your database system properly to get the most insights from your data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhen-to-use-nosql-database%2Fvertical-vs-horizontal.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhen-to-use-nosql-database%2Fvertical-vs-horizontal.png" alt="vertical vs horizontal"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Working easily with unstructured or semi-structured data
&lt;/h3&gt;

&lt;p&gt;Predefined schemas can be good sometimes, but if you’ve always found that a hassle for your unstructured or semi-structured data which is often changing, NoSQL databases are the correct approach. They are built with flexible data schema, which speeds up the development process and lowers the efforts on database management. Problems with data types like XML, JSON and others have become issues of the past with NoSQL databases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhen-to-use-nosql-database%2Fstructured-semi-structured-unstructured-data.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhen-to-use-nosql-database%2Fstructured-semi-structured-unstructured-data.png" alt="structured and unstructured data"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Enabling rapid development
&lt;/h3&gt;

&lt;p&gt;Again, because of the NoSQL flexibility, your data model can rapidly change, meaning you can update your application on the fly without schema updates. Nowadays, development is fast, and iterations are quick, so removing the task of updating schemas constantly saves you valuable time that can be spent on even faster development.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. High read/write speed
&lt;/h3&gt;

&lt;p&gt;If you’re working on applications that require real-time data processing, you might be stuck with traditional databases that don’t offer the required speed. NoSQL databases are optimized for high-speed read and write operations and are ideal for chat, IoT, gaming, fraud detection, and similar applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Managing complex relationships
&lt;/h3&gt;

&lt;p&gt;Complex data relationships, such as parent-child or many-to-many relationships, arise in relational databases when data from different tables is related or somehow interconnected. Querying it requires hopping from one table to another and joining it with slow and resource-intensive join operations. A graph database is a type of NoSQL database that handles highly-connected data especially well and is a good choice for social networks, recommendation engines, fraud detection, or any application where you need to traverse through interconnected data points.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhen-to-use-nosql-database%2Fdata-pouring.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fwhen-to-use-nosql-database%2Fdata-pouring.png" alt="data pouring"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Concluding thoughts
&lt;/h2&gt;

&lt;p&gt;If you still don’t know whether you need an NoSQL or a relational database management system, we prepared &lt;a href="https://memgraph.com/blog/how-to-choose-a-database-for-your-needs" rel="noopener noreferrer"&gt;another article&lt;/a&gt; on that topic that may speed up your decision process. Keep in mind that it doesn’t always have to be a fight, SQL vs NoSQL, you can surely consider a hybrid approach and get the best out of both SQL and NoSQL databases. But if you did find that your use case fits the description above, then NoSQL is a database for you! We are always open to discussions on this and other topics in the database scene, so &lt;a href="https://www.discord.gg/memgraph" rel="noopener noreferrer"&gt;join the conversation&lt;/a&gt; on our Discord server to be a part of the community.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
