knowledge networks: formal notes and understanding

Conceptual Understanding
- What are knowledge graphs?
  - knowledge graphs are made up of nodes and connected by edges
  - the nodes represent concepts (diabetes, patient A, a facility such as SSMC)
  - the edges represent relationships (diagnosed with, treated at)
  - they make use of NLP (natural language processing) →
    - e.g patient shows symptoms of chest pain and was prescribed lisonipril
    - NLP uses a tool e.g SpaCy to break down this sentence into chunks
    - (patient) (chest pain) (lisonipril) are all put as nodes
    - (shows symptoms of), (was prescribed) are their relationships
    - ontologies are also incoporated, to ensure everything is standardised, e.g the concept of diabetes would be linked to the ICD - 10 code E11, or perhaps is under a different name in a different facility. this makes sure that the concept of diabetes is the same across all graphs
    - NLP is also essential when adding to a knowledge network, you could upload medical papers and the NLP will extract out data e.g breast cancer is associated with Gene BRCA1
    - this lets you ask more loaded questions using query programs like SPARKQL to ask questions like “Which patients have co-morbid conditions X and Y and are at high risk of hospitalization?” or “List patients with chest pain who have had an ECG in the last 6 months and have no history of cardiac arrest.”
    - but how is this different from SQL querying? and why might it be better?:
      - graphs can interpret implicity connections using ontologies, whereas tables cannot make such connections so you have to be super speicifc with your queries, which also requires knowledge of the column names you want
- How are they typically structured and maintained?
  - structure
    - each connection is stored as a triple (node → (connection) → node) (subject, predicate, object)
    - ontologies as mentioned earlier, map out different concepts to other concepts to make sure everything is standardised e.g heart attack → myocardial infection → ICD-10 code 121
  - maintained/works
    1. data ingestion → data comes from multiple sorces, is then extracted cleaned and mapped to ontologies
    2. metadata standardisation → pieces of data are standardised, equivalent to cleaning the data e.g MI, heart attack, myocardial infection are mapped to the same standard code, addinitionally units and dates are also standardised
    3. ontologies → define the classes and their relationships, e.g the E11 code for diabetes is put under the ‘disease’ classification
    4. reasoning and inference →
    5. integration → connect new nodes to exisiting map, easier to add information than in SQL
- What makes them valuable in the context of healthcare data and AI?
  - integrating heterogenous data e.g lab results → imaging → a diagnosis
  - helping LLMs to prevent hallucination
  - clinical decision support tool
  - spotting anomalies, e.g a patient was prescribed something but with no diagnosis that linked to that specific medicine
Healthcare Applications
- What are the main use cases of knowledge graphs in healthcare (e.g., clinical decision support, drug discovery, patient journey mapping)?
  - linking disparate data from different sources together
  - modelling complex relationships between diseases, patients, genes and family history
  - predicted modelling : graph embeddings can convert KGs into vectors for machine learning
  - reasoning for a diagnosis, inferring that this patient has pre diabetes based on their insulin level and family history (basic example)
  - population health analysis: detect trends for at risk groups
- How do they compare with traditional databases and analytics approaches?
  - due to the addition of ontologies, KGs can infer implicit relationships between nodes, whereas vector databases are unable to spot these links, as they only need to simply retrieve the chunks requested by a user (because traditional databases are poor at many-to-many connections)
Adoption in the Industry
- Who is using them today in healthcare?
  - IBM - clinical decision support
  - BMJ - as a clinical reasoning system for their clinician end users
  - Mayoclinic - patient journey, similiar to how we might utilise KGs
- currently BMJ has used knowledge graphs in their BMJ clinical intelligence, siting that
at the point of care, the graph can be queired to look for deviitons from a patients expectee course. once it detects a deviation, it can suggest an early intervenion to rectify the patients care
- What companies, institutions, or startups are leading the way?
- What lessons can be drawn from these examples?
  - google deepmind + google health are using KGs to integrate biomedical literature, clinical trial data and patient data like EPIs to unify a patients journey throughout the healthcare facility | lesson learnt → public facing healthcare tools need curated knowledge sources to avoid misinformation (deepmind) | lesson learnt → cloud healthcare API, introperability is key especially when you are dealing with data from multiple sources that needs to be standardised (ontologies can help greatly with this)
  - mayo clinic + stanford are building clinical KGs that are decised to be used to support decision making for physicians, lesson learn → all suggestions must be explainable, as doctors wont accept point blank one word reccomendations, and to be actually adopted, KGs must inegrate into the current workflows, as to not add complexity for doctors.
Practical Demonstration
- Using real (non-sensitive, provided) data, build a small demo knowledge graph.
- For this, you will use open-source software (which we will install for you).
- The goal is not to build something large-scale, but to give the team a tangible example of how data can be represented and queried in this format.

EXAMPLE ONE

import spacy as spacy
import networkx as nx
import matplotlib.pyplot as plt

nlp = spacy.load("en_core_web_sm")

custom_triples = [
    ("Patient A", "has_condition", "Diabetes"),
    ("Patient A", "lives_in", "Salma"),
    ("LTC Facility 1", "located_in", "Abu Dhabi"),
    ("Patient B", "has_condition", "Hypertension"),
    ("Patient B", "goes to", "SSMC"),
    ("Patient A", "is the brother of", "Patient B"),
    ("Patient B", "has_condition ","Diabetes"),
    ("Diabetes", "has code ", "G6384"),
    ("GB384", "will give us ", "1,000,000 aed")
]

all_triples = custom_triples   # combine both
# all_triples = custom_triples                # use only your own

G = nx.DiGraph()

for sub, rel, obj in all_triples:
    G.add_edge(sub, obj, label=rel)

# Draw the graph
plt.figure(figsize=(8,6))
pos = nx.spring_layout(G, k=0.5)
nx.draw(G, pos, with_labels=True, node_color="lightgreen", node_size=2800, font_size=10)
nx.draw_networkx_edge_labels(G, pos, edge_labels=nx.get_edge_attributes(G, 'label'))
plt.show()

# Define what counts as a literal (simplest way: conditions, places, values)
literals = {"1,000,000 aed", "Abu Dhabi", "G6384", "GB384"}
seha_facilities = {"Salma", "SSMC", "LTC Facility 1"}

node_colors = []
for node in G.nodes():
    if node in literals:
        node_colors.append("red")   # literals = red
    elif node in seha_facilities:
        node_colors.append("blue")  # facilities = blue
    else:
        node_colors.append("lightgreen")  # classes/entities = green

plt.figure(figsize=(8,6))
pos = nx.spring_layout(G, k=0.5)
nx.draw(
    G, pos,
    with_labels=True,
    node_color=node_colors,
    node_size=2800,
    font_size=10
)
nx.draw_networkx_edge_labels(G, pos, edge_labels=nx.get_edge_attributes(G, 'label'))
plt.show()

this is a very rudimentary example, with basic nodes such as diabetes, hypertension, and patient A and B, as well as literals (in red). the aim was simply to see how knowledge graphs can be constructed manually from text and how they organise data.

EXAMPLE TWO

https://i9h49gsk658rqxh6-default.preview.taskade.app/

a more detailed graph i made (with the brute force of chatgpt).

the more complex diagram shows how different classes such as ‘person’ or ‘astrophysics’ can be used to link concepts. i have also included the function to add both nodes and relationships, showing how easy it is to append information to a KG (although this is of course a simplified version of the real process in sparQl)

Useful applications:

Patient care insights:
- Quickly see all conditions a patient has and how they interact.
- Track disease progression or comorbidities.
Treatment recommendations:
- Identify patterns in which treatments work for certain combinations of conditions.
- Suggest personalized care plans.
Data querying and analytics:
- Ask questions like:
  - “Which patients with diabetes also have hypertension?”
  - “Which medications are most commonly prescribed for patients over 65 with chronic kidney disease?”
Predictive analytics:
- Later, you could integrate a machine learning model to predict risk of complications or hospital readmissions based on patterns in the graph.
Interoperability:
- Graphs can integrate other datasets (lab results, genetic data, hospital visits) easily without restructuring tables.

DEV Community

knowledge networks: formal notes and understanding

EXAMPLE ONE

EXAMPLE TWO

Top comments (0)