DEV Community

Cover image for OWL-Aware Chunking Strategies: A Comprehensive Performance Analysis
vishalmysore
vishalmysore

Posted on

OWL-Aware Chunking Strategies: A Comprehensive Performance Analysis

Detailed performance analysis of 10 different chunking strategies I created for Retrieval-Augmented Generation (RAG) applied to a legal ontology in Protégé. I tested both traditional text-based chunking and novel OWL-aware chunking strategies powered by the agenticmemory library. Performance varies significantly based on ontology structure, metadata quality, and naming conventions. ModuleExtractionChunking achieved the highest OWL-aware score (0.7068) with exceptional consistency, while AnnotationBasedChunking (0.7010) offered fine-grained semantic grouping with 39 focused chunks.


👉 Protege Plugin for Lucene based Vector store is here https://github.com/vishalmysore/lucene-protege-plugin

👉 Owl ontology for used in this article is here https://github.com/vishalmysore/graphrag/blob/main/graphrag/ontologies/legal-case-management.owl

👉 More Ontologis are here https://github.com/vishalmysore/graphrag/tree/main/graphrag/ontologies

👉 AgenticMemory pacakge is here https://github.com/vishalmysore/agenticmemory

Introduction

Chunking strategies are critical for RAG performance. The way I split knowledge into retrievable pieces directly impacts:

  • Context relevance: Whether the retrieved chunks contain the information needed
  • Answer accuracy: Whether the LLM receives complete vs. fragmented information
  • Query performance: Search time and computational cost

Traditional text-based chunking (word, sentence, paragraph boundaries) treats all text equally. However, ontologies have rich semantic structure—class hierarchies, property domains, annotation patterns—that can inform smarter chunking decisions.

This study evaluates whether OWL-aware chunking strategies outperform traditional text-based approaches.


Methodology

Test Environment

  • Platform: Protégé 5.6.7 with custom Lucene RAG plugin
  • Ontology: Legal domain (195 total axioms)
    • 3 cases (Smith v. Jones, State v. Doe, Appeal CV-2023-500)
    • 3 courts (District, Appellate, Supreme)
    • 4 judges and 3 lawyers
    • 3 evidence items
    • 2 statutes (Federal, State)
  • Vector Store: Apache Lucene 9.8.0 with KnnFloatVectorField
  • Embeddings: OpenAI text-embedding-3-small (1024 dimensions)
  • LLM: GPT-4
  • Test Query: "Which cases are currently active?"

Evaluation Metrics

  1. Chunk Count: Number of chunks created
  2. Top Similarity Score: Cosine similarity of best-matching chunk
  3. Answer Quality: Whether LLM provided correct, complete answer
  4. Chunk Distribution: How axioms were grouped

Results: Text-Based Chunking Strategies

1. WordChunking

Chunks Created: 58

Top Similarity: 0.7135

Answer Quality: ✅ Correct (both cases identified)

How It Works: Splits text at word boundaries, typically 100 words per chunk.

Performance:

  • Created 58 focused chunks
  • Each chunk contained 1-2 complete entities
  • No fragmentation of entity names
  • Example chunk: Full "Smith v. Jones" case with all properties

Best For:

  • Structured data where entities < 100 words
  • Short, self-contained knowledge items
  • Clear word boundaries separate concepts

2. SentenceChunking

Chunks Created: 76

Top Similarity: 0.7258 (highest raw score)

Answer Quality: ❌ Incomplete ("Jones" instead of "Smith v. Jones")

How It Works: Splits text at sentence boundaries (periods, exclamation marks, question marks).

Performance:

  • Created 76 smaller chunks (most of any strategy)
  • Critical flaw: Fragmented entity names across chunks
  • Highest similarity score but worst answer quality
  • Example: "Jones" appeared in one chunk, "Smith v." in another

Problem Identified:

Chunk A: "...the case Smith v."
Chunk B: "Jones was filed in District Court..."
Enter fullscreen mode Exit fullscreen mode

When LLM received Chunk B alone, it only saw "Jones" without "Smith v.", leading to incomplete answers.

Lesson Learned: Higher similarity scores don't guarantee better answers if chunks break semantic units.


3. ParagraphChunking

Chunks Created: 58

Top Similarity: 0.7141

Answer Quality: ✅ Correct (both cases identified)

How It Works: Splits text at paragraph boundaries (double newlines).

Performance:

  • Identical results to WordChunking (58 chunks)
  • This occurred because ontology entities had no paragraph breaks
  • All entities < 100 words, so paragraph = word boundaries

Best For:

  • Long-form documentation with clear paragraph structure
  • Articles, papers, documentation
  • Not ideal for structured RDF/OWL data

4. FixedSizeChunking

Chunks Created: Unknown (at least 5)

Top Similarity: 0.7141

Answer Quality: ✅ Correct (Smith v. Jones identified as active)

How It Works: Fixed character/token limits regardless of content boundaries. Unlike other strategies, ignores semantic structure entirely.

Performance:

  • Top similarity of 0.7141 (tied with ParagraphChunking)
  • Clean, well-structured chunks with complete entity information
  • Each chunk formatted with Class/Individual type, IRI, label, and properties
  • Successfully retrieved all relevant case information

Example Chunks:

Chunk 1 (0.7141): Criminal Case definition
Chunk 2 (0.7096): Appeal of CV-2023-500 (complete individual)
Chunk 3 (0.6985): Smith v. Jones (complete individual)  
Chunk 4 (0.6955): State v. Doe (complete individual)
Chunk 5 (0.6938): case status property
Enter fullscreen mode Exit fullscreen mode

Key Observation: Despite ignoring semantic boundaries, FixedSizeChunking produced well-formed chunks because:

  1. Ontology entities are naturally compact (< 100 words each)
  2. RDF/OWL serialization creates natural boundaries
  3. Fixed size happened to align with entity boundaries

Best For:

  • When entity size is consistent and predictable
  • Performance-critical applications needing uniform computational load
  • Baseline comparison for other strategies

Limitation: Would fragment entities if fixed size < entity size, or waste space if fixed size >> entity size.


Results: OWL-Aware Chunking Strategies

5. ClassBasedChunking

Chunks Created: 6

Top Similarity: 0.6964

Answer Quality: ✅ Correct

How It Works: Groups axioms by class hierarchies. Creates one chunk per class hierarchy plus one "orphan" chunk for non-hierarchy axioms.

Chunk Distribution:

Chunk 0: Evidence hierarchy (2 axioms)
  - DocumentEvidence
  - PhysicalEvidence

Chunk 1: Statute hierarchy (2 axioms)
  - FederalStatute
  - StateStatute

Chunk 2: Court hierarchy (3 axioms)
  - AppellateCourt
  - DistrictCourt
  - SupremeCourt

Chunk 3: Case hierarchy (3 axioms)
  - CivilCase
  - CriminalCase
  - AppellateCase

Chunk 4: Person hierarchy (7 axioms)
  - Judge, Lawyer
  - DefenseAttorney, Prosecutor
  - SupremeCourtJudge
  - Plaintiff, Defendant

Chunk 5: Orphan axioms (183 axioms) ← DOMINANT CHUNK
  - All individual assertions
  - All property declarations
  - All annotations
Enter fullscreen mode Exit fullscreen mode

Key Finding: 183 of 195 axioms (93.8%) ended up in the "orphan" chunk because they were individual assertions, not class hierarchy definitions.

Best For:

  • Queries about class relationships
  • "What types of cases exist?"
  • "What are the subclasses of Person?"

Limitation: Most real data (case details, property values) concentrated in massive orphan chunk.


6. AnnotationBasedChunking

Chunks Created: 39

Top Similarity: 0.7010

Answer Quality: ✅ Correct with best context

How It Works: Groups axioms by annotation label prefixes (first 3 characters).

Chunk Distribution:

Top chunks by axiom count:
- no-annotations: 84 axioms (labels, ranges, domains)
- "cas" prefix: 26 axioms (Case, CaseNumber, CaseStatus, Case_SmithVsJones, Case_StateVsDoe)
- "sta" prefix: 24 axioms (Statute, StatuteCode, StateStatute, FederalStatute)
- "app" prefix: 17 axioms (AppellateCase, AppellateCourt, AppealsTo property, Case_AppealCV001)
- "fil" prefix: 12 axioms (FiledIn, FilingDate, all filing-related assertions)
- "cou" prefix: Court entities and properties
- "jud" prefix: Judge-related entities
- "law" prefix: Lawyer-related entities
- "evi" prefix: Evidence-related entities
- 30 other prefixes: Varying axiom counts (1-10 axioms each)

Total: 39 semantic chunks
Enter fullscreen mode Exit fullscreen mode

Key Strengths:

  1. Semantic grouping: Entities with similar names usually have related meanings
  2. Balanced chunks: 39 focused chunks vs. 1 giant orphan chunk
  3. Complete entities: "Case_SmithVsJones" stayed with "CaseStatus", "CaseNumber"
  4. Effective retrieval: Top chunk (0.7010, no-annotations) contained complete case information with all labels
  5. Metadata-dependent: Performance relies heavily on consistent naming conventions

Example Query Flow:

Query: "Which cases are currently active?"
  ↓
Embedding matches "cas" prefix chunk (0.7010 similarity)
  ↓
Chunk contains: Case_SmithVsJones (Active), Case_StateVsDoe (Trial)
  ↓
GPT-4 receives complete case information
  ↓
Answer: ✅ "Smith v. Jones (Active), State v. Doe (Trial)"
Enter fullscreen mode Exit fullscreen mode

7. NamespaceBasedChunking

Chunks Created: 6

Top Similarity: 0.6964

Answer Quality: ✅ Correct

How It Works: Splits axioms by IRI namespace prefixes.

Performance:

  • Fell back to ClassBasedChunking: Legal ontology uses single namespace (http://www.semanticweb.org/legal#)
  • All chunk IDs showed "class-chunk-" not "namespace-chunk-"
  • Identical results to ClassBasedChunking

When It Would Excel:

Scenario: Multi-ontology project

Chunk 1: http://www.semanticweb.org/legal# (your domain)
  - Case, Court, Judge classes

Chunk 2: http://purl.org/dc/terms/ (Dublin Core)
  - creator, date, title

Chunk 3: http://xmlns.com/foaf/0.1/ (FOAF)
  - Person, Organization, name

Chunk 4: http://www.w3.org/2006/time# (OWL Time)
  - Instant, Interval, before, after
Enter fullscreen mode Exit fullscreen mode

Best For:

  • Projects importing multiple external ontologies
  • Separating domain concepts from metadata
  • Modular ontology architectures

Limitation: Useless for single-namespace ontologies.


8. DepthBasedChunking

Chunks Created: 3

Top Similarity: 0.6967

Answer Quality: ✅ Correct

How It Works: Groups axioms by hierarchy depth level.

Chunk Distribution:

Chunk 1: Non-class axioms (183 axioms, depth: N/A)
  - All individual assertions
  - All property declarations
  - All annotations
  - Similarity: 0.6967 ← Top result

Chunk 2: Depth Level 0 (15 axioms)
  - Direct subclass relationships
  - Evidence → DocumentEvidence, PhysicalEvidence
  - Court → DistrictCourt, AppellateCourt, SupremeCourt
  - Case → CivilCase, CriminalCase, AppellateCase
  - Person → Judge, Lawyer, Plaintiff, Defendant
  - Statute → FederalStatute, StateStatute
  - Similarity: 0.6717

Chunk 3: Depth Level 1 (2 axioms)
  - Second-level subclass relationships
  - Lawyer → DefenseAttorney, Prosecutor
  - Judge → SupremeCourtJudge
  - Similarity: 0.6226
Enter fullscreen mode Exit fullscreen mode

Key Insight: Legal ontology has only 2 hierarchy depth levels, indicating relatively flat structure.

Hierarchy Analysis:

Level 0: Top-level concepts (Case, Court, Person, Evidence, Statute)
Level 1: Direct children (17 classes)
Level 2: Grandchildren (3 classes: DefenseAttorney, Prosecutor, SupremeCourtJudge)
Enter fullscreen mode Exit fullscreen mode

Best For:

  • Understanding ontology complexity
  • Queries about abstraction levels
  • "What are the top-level classes?"
  • Structural analysis

Limitation: Most data still in non-class axioms chunk.


9. ModuleExtractionChunking

Chunks Created: 28

Top Similarity: 0.7068

Answer Quality: ✅ Correct (both cases identified)

How It Works:

  • Extracts minimal, self-contained ontology modules using OWL API algorithms
  • Each module is complete and independent
  • Selects seed entities and pulls all related axioms

Performance:

  • Created 28 modules from 195-axiom ontology
  • Remarkably tight similarity clustering: Top 5 scores range 0.7056-0.7068 (0.0012 spread)
  • Top chunk: 132 axioms with 4 seed entities
  • Most balanced retrieval: All top results highly relevant

Example Module:

module-chunk-1 (0.7068):
  - 132 axioms
  - 4 seed entities (likely: Case, Court, Judge, Lawyer)
  - Complete case information with all dependencies
Enter fullscreen mode Exit fullscreen mode

Key Insight:

  • Highest top score among OWL-aware strategies (0.7068 vs 0.7010 AnnotationBased)
  • Most consistent retrieval: Tiny 0.0012 variance in top-5 scores
  • All top chunks equally useful (any could answer the query)
  • Produces self-contained, logically complete modules

Why It Excels:

  1. Dependency closure: Each module includes all related axioms
  2. Semantic completeness: No fragmented information
  3. Multiple relevant modules: Different seed entities = different perspectives
  4. Logical coherence: Uses OWL reasoning to determine relationships

Best For:

  • Large, complex ontologies where relationships span many axioms
  • Queries requiring complete context (all properties, relationships)
  • Modular ontology architectures
  • Distributed knowledge bases
  • When consistency matters more than granularity

Trade-off:

  • Larger chunks (avg ~7 axioms) vs AnnotationBased (~5 axioms)
  • Fewer chunks (28 vs 39) = less storage, faster indexing
  • But superior retrieval consistency

10. SizeBasedChunking

Status: Not tested in this study

Configuration: 50 axioms per chunk

Expected Behavior:

  • Fixed axiom count per chunk
  • Maintains entity coherence (keeps related axioms together)
  • If single entity > 50 axioms, creates dedicated chunk

Best For:

  • Consistent computational load
  • Predictable memory usage
  • Balanced query performance

Comparative Analysis

Similarity Scores Ranking

Rank Strategy Score Answer Quality Retrieval Consistency
1 SentenceChunking 0.7258 ❌ Incomplete Low
2 FixedSizeChunking 0.7141 ✅ Correct Medium
2 ParagraphChunking 0.7141 ✅ Correct Medium
3 WordChunking 0.7135 ✅ Correct Medium
4 ModuleExtraction 0.7068 ✅ Correct ⭐ Highest (0.0012 variance)
5 AnnotationBased 0.7010 Best Context High
6 DepthBased 0.6967 ✅ Correct Medium
7 ClassBased 0.6964 ✅ Correct Medium
8 NamespaceBased 0.6964 ✅ Correct Medium

Critical Observations:

  1. Highest similarity score ≠ best answer quality (SentenceChunking fragmented entities)
  2. ModuleExtraction: Highest OWL-aware score (0.7068) + most consistent retrieval (0.0012 variance)
  3. AnnotationBased: Fine-grained grouping (39 chunks) effective when metadata quality is high
  4. Performance highly dependent on ontology design and metadata conventions

Chunk Count Analysis

Strategy Chunks Average Size Distribution
SentenceChunking 76 2.6 axioms Very unbalanced
WordChunking 58 3.4 axioms Balanced
ParagraphChunking 58 3.4 axioms Balanced
FixedSizeChunking Unknown Unknown Appears balanced
ModuleExtraction 28 ~7 axioms Varies by module
AnnotationBased 39 ~5 axioms Well-balanced
ClassBased 6 32.5 axioms Highly unbalanced (1 giant)
DepthBased 3 65 axioms Highly unbalanced
NamespaceBased 6 32.5 axioms Highly unbalanced

Note: ModuleExtraction creates semantically complete modules (e.g., top chunk had 132 axioms with 4 seed entities), making "average size" less meaningful than for text-based strategies.

Insight: More chunks ≠ better retrieval. Balance matters more than count.


The Orphan Axiom Problem

Definition: Axioms not part of class hierarchy definitions (individual assertions, property declarations, annotations).

Prevalence: 183 of 195 axioms (93.8%) in legal ontology.

Impact on OWL-Aware Strategies:

  • ClassBased: 183-axiom orphan chunk dominates
  • DepthBased: 183-axiom non-class chunk dominates
  • AnnotationBased: Splits orphans into 39 semantic groups (effective when naming conventions exist)

Why This Matters: Real ontologies contain mostly ABox data (individuals), not TBox data (class definitions). Strategies that handle orphan axioms well will perform better in practice.


Key Findings

1. Performance Depends on Ontology Characteristics

  • ModuleExtractionChunking: Highest score (0.7068) + best consistency (0.0012 variance)
  • AnnotationBasedChunking: Fine-grained semantic grouping (39 chunks), effective when naming conventions are consistent
  • WordChunking: Highest correct-answer score (0.7135), simplest implementation
  • Optimal strategy depends on: ontology size, hierarchy depth, metadata quality, naming patterns

2. High Similarity Scores Can Mislead

  • SentenceChunking: 0.7258 score but fragmented entities
  • Chunk boundaries matter more than matching algorithms
  • Semantic completeness > mathematical similarity

3. OWL-Aware Strategies Excel in Specific Contexts

  • ModuleExtraction: Best for completeness and consistency (0.7068, 0.0012 variance)
  • AnnotationBased: Effective when naming conventions exist (requires metadata)
  • ClassBased: Ideal for hierarchy queries
  • DepthBased: Excellent for structural analysis
  • NamespaceBased: Essential for multi-ontology projects

4. The "Orphan Axiom" Challenge

  • 93.8% of axioms are non-hierarchical
  • Traditional OWL-aware strategies struggle with this
  • AnnotationBased solution: semantic naming patterns

5. Ontology Structure Influences Strategy Selection

  • Flat hierarchy (2 levels): DepthBased produces only 3 chunks
  • Single namespace: NamespaceBased reverts to ClassBased
  • Entity size (< 100 words): WordChunking = ParagraphChunking

Recommendations

Strategy Selection Depends on Context

If ontology has consistent naming conventions (e.g., "case_", "judge_"):

  • AnnotationBasedChunking: Creates semantic groups automatically (39 chunks, 0.7010)
  • Requires well-designed metadata with prefix patterns

If ontology has complex relationships requiring complete context:

  • ModuleExtractionChunking: Highest accuracy (0.7068) + exceptional consistency (0.0012 variance)
  • Best for: Large ontologies, distributed knowledge bases

If seeking simplicity without OWL dependencies:

  • WordChunking: High performance (0.7135), no metadata required
  • ParagraphChunking: Good for documentation-style ontologies

Avoid in all cases:

  • SentenceChunking: Fragments entity names despite high scores

For Different Ontology Types

Deep Hierarchy Ontologies (5+ levels)

Use: DepthBasedChunking

  • Reveals abstraction layers
  • Good for "What are the most general concepts?" queries

Multi-Ontology Projects

Use: NamespaceBasedChunking

  • Clean separation between imported ontologies
  • Prevents cross-ontology confusion

Small, Class-Focused Ontologies

Use: ClassBasedChunking

  • Efficient for hierarchy queries
  • Works when most axioms are class definitions

Large, Complex Ontologies

Consider: ModuleExtractionChunking

  • Highest OWL-aware score (0.7068)
  • Self-contained modules with dependency closure
  • Exceptional consistency (0.0012 variance)
  • Best scalability

Performance-Critical Applications

Use: SizeBasedChunking

  • Predictable computational cost
  • Balanced load distribution

Technical Implementation Notes

Lucene Vector Store Configuration

LuceneVectorStore vectorStore = new LuceneVectorStore(
    "./lucene_index",  // File-based storage
    1024              // Max dimensions (Lucene 9.8.0 limit)
);
Enter fullscreen mode Exit fullscreen mode

Chunking Strategy Selection (Java)

// In RagService.java
if (chunkingStrategy.equals("AnnotationBasedChunking")) {
    AnnotationBasedChunker chunker = new AnnotationBasedChunker();
    List<OWLChunk> chunks = chunker.chunk(ontology);

    for (OWLChunk chunk : chunks) {
        String chunkId = chunk.getId();        // "annotation-chunk-10"
        String text = chunk.toOWLString();     // Manchester syntax
        String strategy = chunk.getStrategy(); // "Annotation-Based"
        int axiomCount = chunk.getAxiomCount();

        // Create embedding and store
        List<Float> embedding = embeddingService.createEmbedding(text);
        vectorStore.upsert(chunkId, embedding, text, metadata);
    }
}
Enter fullscreen mode Exit fullscreen mode

OpenAI Embedding Generation

// EmbeddingService.java
List<Float> embedding = embeddingService.createEmbedding(
    chunkText,
    "text-embedding-3-small",
    1024  // Dimensions
);
Enter fullscreen mode Exit fullscreen mode

Similarity Score Calculation

  • Formula: Cosine similarity = (A · B) / (||A|| × ||B||)
  • Implementation: Built into Lucene's VectorSimilarityFunction.COSINE
  • Range: 0.0 (unrelated) to 1.0 (identical)
  • Typical scores: 0.65-0.75 for relevant chunks

Limitations of This Study

  1. Single ontology tested: Results specific to legal domain with consistent naming conventions
  2. Small scale: 195 axioms; performance at 10,000+ axioms unknown
  3. Single query type: "Which cases are active?" tests factual retrieval only
  4. Metadata-dependent: AnnotationBased performance assumes well-structured naming
  5. No hybrid testing: Didn't test combinations of strategies
  6. Limited query diversity: Different query types may favor different strategies

Future Research Directions

1. Hybrid Chunking Strategies

Combine multiple approaches:

  • AnnotationBased for orphan axioms
  • ClassBased for hierarchy axioms
  • Could achieve best of both worlds

2. Dynamic Strategy Selection

AI-powered strategy selection:

  • Analyze ontology structure
  • Choose optimal strategy automatically
  • Adapt based on query patterns

3. Custom Chunking Rules

Domain-specific configurations:

  • Legal: Group by case type
  • Medical: Group by diagnosis
  • E-commerce: Group by product category

4. Large-Scale Testing

Evaluate on:

  • SNOMED CT (300,000+ concepts)
  • Gene Ontology (45,000+ terms)
  • DBpedia (6M+ entities)

5. Multi-Modal Chunking

Incorporate:

  • Text content
  • Visual diagrams
  • Audio annotations
  • Temporal data

Conclusion

OWL-aware chunking strategies represent a significant advancement in RAG for ontologies. My analysis demonstrates that no single strategy is universally optimal—performance depends critically on:

  1. Ontology structure: Hierarchy depth, namespace diversity, entity relationships
  2. Metadata quality: Consistent naming conventions, annotation completeness
  3. Query patterns: Specific facts vs. structural understanding
  4. Implementation priorities: Accuracy vs. simplicity vs. performance

Key Insights

Highest Scores Don't Guarantee Best Answers: SentenceChunking achieved 0.7258 but fragmented entities, while lower-scoring strategies with semantic completeness produced correct answers.

Metadata Matters: AnnotationBasedChunking (0.7010, 39 chunks) excels only when ontologies follow consistent naming conventions. Poor metadata quality degrades it to random grouping.

Consistency vs. Peak Score: ModuleExtractionChunking (0.7068) achieved the highest OWL-aware score with remarkable consistency (0.0012 variance), making all top results equally useful.

Practical Guidance

  1. Analyze your ontology first: Structure, metadata quality, naming patterns
  2. Test multiple strategies on representative queries from your domain
  3. Prioritize semantic completeness over raw similarity scores
  4. Consider hybrid approaches for complex multi-domain ontologies
  5. Match strategy to use case: Completeness (ModuleExtraction), granularity (AnnotationBased), simplicity (Word)

The agenticmemory library's OWL-aware chunking strategies provide powerful tools for knowledge graph RAG, but their effectiveness depends on matching strategy to ontology characteristics. As ontology-based AI systems proliferate, sophisticated chunking will become increasingly important for retrieval quality.


References

  1. agenticmemory Library: https://github.com/vishalmysore/agenticmemory
  2. Apache Lucene KNN: https://lucene.apache.org/core/9_8_0/core/org/apache/lucene/document/KnnFloatVectorField.html
  3. OpenAI Embeddings API: https://platform.openai.com/docs/guides/embeddings
  4. OWL API Documentation: http://owlcs.github.io/owlapi/
  5. Protégé Plugin Development: https://protegewiki.stanford.edu/wiki/PluginDevelopment

Acknowledgments

This research was conducted using:

  • Protégé 5.6.7: Stanford Center for Biomedical Informatics Research
  • agenticmemory 0.1.1: Vishal Mysore
  • Apache Lucene 9.8.0: Apache Software Foundation
  • OpenAI GPT-4 & Embeddings: OpenAI
  • Neo4j 4.4.13: Neo4j, Inc.

Appendix: Complete Test Data

Legal Ontology Statistics

  • Total Axioms: 195
  • Classes: 17
  • Object Properties: 7
  • Data Properties: 6
  • Individuals: 15 (3 cases, 3 courts, 4 judges, 3 lawyers, 2 people, 3 evidence, 2 statutes, 1 verdict)
  • Annotations: 84 rdfs:label assertions
  • Hierarchy Depth: 2 levels maximum
  • Namespaces: 1 (http://www.semanticweb.org/legal#)

Chunk Distribution Visualization

ClassBasedChunking (6 chunks):
███████████████████████████████████████████████████████ Orphan (183)
█ Evidence (2)
█ Statute (2)
█ Court (3)
█ Case (3)
██ Person (7)

AnnotationBasedChunking (39 chunks):
███████ no-annotations (84)
███ cas (26)
███ sta (24)
██ app (17)
█ fil (12)
█ [34 other prefixes with varying axiom counts]

DepthBasedChunking (3 chunks):
███████████████████████████████████████████████████████ Non-class (183)
██ Depth-0 (15)
█ Depth-1 (2)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)