DEV Community

Cover image for Building EDIFlow - Infrastructure Layer: Parsers, Repositories & Data Packages (Part 4)
hello-ediflow
hello-ediflow

Posted on

Building EDIFlow - Infrastructure Layer: Parsers, Repositories & Data Packages (Part 4)

Series: Building EDIFlow - A Clean Architecture Journey in TypeScript (Part 4/6)
Reading Time: ~12 minutes


Recap — Where We Left Off

In Part 3, we built the Application Layer — Use Cases, Output Ports (interfaces), DTOs, and the UseCaseFactory. Everything depends on abstractions, nothing on implementations.

Now it's time for the Infrastructure Layer — where theory meets reality. This is where IMessageParser becomes EdifactMessageParser, where IMessageStructureRepository becomes FileBasedMessageStructureRepository, and where 126–319 JSON message definitions get loaded at runtime.

┌───────────────────────────────────────────┐
│  🔥 INFRASTRUCTURE LAYER                  │  ← You are here
│  Parsers · Builders · Repositories        │
│                                           │
│  ┌─────────────────────────────────────┐  │
│  │      Application (Use Cases, Ports) │  │
│  │  ┌───────────────────────────────┐  │  │
│  │  │      Domain (Entities)        │  │  │
│  │  └───────────────────────────────┘  │  │
│  └─────────────────────────────────────┘  │
└───────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Infrastructure in a Multi-Standard World — Why Three Packages?

In Clean Architecture, the Infrastructure Layer implements the interfaces defined by Domain and Application. But EDIFlow supports four standards (EDIFACT, X12, HIPAA, EANCOM) — so where does the infrastructure code live?

The answer: it's split across three infrastructure packages, each with a clear responsibility:

@ediflow/edifact              → EDIFACT-specific: parser, builder, validator, tokenizer
@ediflow/x12                  → X12-specific: parser, builder, delimiter detection
@ediflow/infrastructure-shared → Standard-agnostic: file loading, repositories, caching
Enter fullscreen mode Exit fullscreen mode

Why not one big infrastructure package?

Because EDIFACT parsing and X12 parsing share zero implementation code. The delimiters are different (+:.' vs *~>), the envelope structure is different (UNB/UNZ vs ISA/GS/ST), the escape rules are different. Putting them together would create a God-package with no cohesion.

Why infrastructure-shared?

This package was primarily created for the CLI tool. The CLI needs to load message definitions for ALL standards — EDIFACT, X12, HIPAA, EANCOM — from a single entry point. The FileBasedMessageStructureRepository doesn't care whether the JSON describes an EDIFACT ORDERS or an X12 850. It can't live in @ediflow/edifact (X12 would depend on it) or @ediflow/x12 (vice versa). So it lives in a shared infrastructure package — mainly consumed by the CLI, but available to anyone who needs file-based data loading regardless of standard.

The dependency graph:

@ediflow/core  ←──  @ediflow/edifact
       ↑                    
       ├─────  @ediflow/x12
       ↑
       └─────  @ediflow/infrastructure-shared  ←──  @ediflow/cli
Enter fullscreen mode Exit fullscreen mode

Every infrastructure package depends on core (for interfaces), never on each other. The CLI depends on all of them to wire everything together.

Now let's see what happens inside each package — starting with the parsing pipeline.


The Parsing Pipeline — Three Steps, Three Classes

Parsing an EDIFACT message isn't one operation — it's a pipeline:

Raw EDI String → Delimiter Detection → Tokenization → Segment Parsing → EDIMessage
Enter fullscreen mode Exit fullscreen mode

Each step is a separate class implementing a separate interface. Here's why, and here's the real code.

Step 1: Delimiter Detection

EDIFACT messages can define custom delimiters via the UNA service string. The first 9 characters tell you which characters are used for components, elements, escaping, and segment termination:

export class EdifactDelimiterDetector implements IDelimiterDetector {
  private static readonly UNA_PREFIX = 'UNA';
  private static readonly UNA_LENGTH = 9;

  detect(message: string): Delimiters {
    if (this.hasUNA(message)) {
      return this.extractFromUNA(message);
    }
    // No UNA? Use EDIFACT defaults: + : . ? '
    return EdifactDelimiterDetector.DEFAULT_DELIMITERS;
  }

  private extractFromUNA(message: string): Delimiters {
    return Delimiters.custom({
      component: message.charAt(3),  // Usually ':'
      element:   message.charAt(4),  // Usually '+'
      decimal:   message.charAt(5),  // Usually '.'
      escape:    message.charAt(6),  // Usually '?'
      segment:   message.charAt(8),  // Usually "'"
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

This matters because real-world EDI partners sometimes use non-standard delimiters. Without this, your parser breaks on the first message from a partner who uses * instead of +.

Step 2: Tokenization

The tokenizer splits the raw string into segment strings, respecting escape characters:

export class EdifactTokenizer implements ITokenizer {
  tokenize(message: string, delimiters: Delimiters): string[] {
    const segments: string[] = [];
    let currentSegment = '';
    let position = 0;

    while (position < message.length) {
      const char = message[position];

      // Skip escaped characters (e.g., ?+ means literal +)
      if (this.isEscapedCharacter(message, position, delimiters)) {
        currentSegment += this.consumeEscapedCharacter(message, position);
        position += 2;
        continue;
      }

      // Segment terminator found — flush current segment
      if (char === delimiters.segment) {
        if (currentSegment.trim().length > 0) {
          segments.push(currentSegment);
        }
        currentSegment = '';
        position++;
        continue;
      }

      currentSegment += char;
      position++;
    }

    return segments;
  }
}
Enter fullscreen mode Exit fullscreen mode

Why a separate class? Because X12 tokenization works differently — segments end with ~, and there's no escape character. Same interface (ITokenizer), completely different implementation.

Step 3: The Message Parser — Orchestrating the Pipeline

The EdifactMessageParser ties everything together:

export class EdifactMessageParser implements IMessageParser {
  constructor(
    private readonly delimiterDetector: IDelimiterDetector,
    private readonly tokenizer: ITokenizer,
    private readonly segmentParser: EdifactSegmentParser
  ) {}

  parse(ediString: string, config?: ParserConfig): EDIMessage {
    this.validateMessage(ediString);

    const delimiters = config?.delimiters || this.delimiterDetector.detect(ediString);
    const segmentStrings = this.tokenizer.tokenize(ediString, delimiters);
    const segments = segmentStrings.map(s => this.segmentParser.parseSegment(s, delimiters));

    const unhSegment = segments.find(s => s.tag === 'UNH');
    const { version, messageType } = this.extractMetadata(unhSegment!, delimiters);

    const message = EDIMessageFactory.create({
      standard: Standard.EDIFACT,
      version,
      messageType
    });

    segments.forEach(segment => message.addSegment(segment));
    return message;
  }

  canParse(ediString: string): boolean {
    return ediString.includes('UNH');
  }
}
Enter fullscreen mode Exit fullscreen mode

Notice: the parser doesn't know tokenization internals. It delegates to ITokenizer and IDelimiterDetector. If we needed a streaming parser for huge messages, we'd swap the tokenizer — zero changes to the parser.


Building — The Reverse Pipeline

Building converts EDIMessage back to a raw string:

export class EdifactMessageBuilder implements IMessageBuilder {
  build(message: EDIMessage, options?: EdifactBuilderOptions): string {
    const delimiters = this.resolveDelimiters(options?.delimiters);
    const format = options?.format || OutputFormat.COMPACT;

    let result = '';
    if (options?.includeUNA) {
      result += delimiters.toUNA();  // "UNA:+.? '"
    }

    const segmentStrings = message.segments.map(seg =>
      this.serializeSegment(seg, delimiters)
    );

    return result + (format === OutputFormat.READABLE
      ? segmentStrings.join(delimiters.segment + '\n')
      : segmentStrings.join(delimiters.segment)) + delimiters.segment;
  }
}
Enter fullscreen mode Exit fullscreen mode

Same interface IMessageBuilder — the X12 builder uses * and ~ instead.


Validation — Builder Pattern for Format-Specific Rules

Validation is composable. The builder lets you pick which rules to apply:

export class EdifactValidationServiceBuilder {
  private service = new ComposableValidationService<EDIMessage>();

  withBasicRules(): this {
    this.service.addRule(new MessageMustHaveSegmentsRule());
    this.service.addRule(new VersionStandardMustMatchRule());
    return this;
  }

  withEDIFACTRules(): this {
    this.service.addRule(new UNBMustBeFirstRule());
    this.service.addRule(new UNZMustBeLastRule());
    this.service.addRule(new EDIFACTSegmentTagFormatRule());
    return this;
  }

  withCustomRule(rule: IValidationRule<EDIMessage>): this {
    this.service.addRule(rule);
    return this;
  }

  // Factory shorthand
  static forEDIFACT(): ComposableValidationService<EDIMessage> {
    return new EdifactValidationServiceBuilder()
      .withBasicRules()
      .withEDIFACTRules()
      .build();
  }
}
Enter fullscreen mode Exit fullscreen mode

Basic rules live in @ediflow/core (format-agnostic). EDIFACT rules live in @ediflow/edifact. X12 rules in @ediflow/x12. Each package only loads what it needs — tree-shaking friendly.


The Repository — Loading 126–319 Message Definitions at Runtime

This is where the data packages come in. Each package (@ediflow/edifact-d20b, @ediflow/x12-004010, ...) contains JSON files that define message structures:

packages/edifact-d20b/data/
  segments.json       # All segment definitions
  elements.json       # All element definitions
  composites.json     # Composite element definitions
  codes/              # Code list values
  messages/
    ORDERS.json       # ORDERS message structure
    INVOIC.json       # INVOIC structure
    DESADV.json       # ...195 message types total
Enter fullscreen mode Exit fullscreen mode

The FileBasedMessageStructureRepository implements IMessageStructureRepository:

export class FileBasedMessageStructureRepository implements IMessageStructureRepository {
  private contextCache = new Map<string, DataPackageContext>();

  constructor(private readonly basePath: string) {}

  async getMessageStructure(standard: string, version: string, messageType: string): Promise<MessageStructureDTO | null> {
    const messageFile = await this.loadMessageFile(standard, version, messageType);
    if (!messageFile) return null;

    const { builder, validator } = await this.getOrCreateContext(standard, version);

    // Validate data package integrity
    const issues = validator.validate(messageFile);
    if (issues.length > 0) {
      throw new DataPackageValidationError(standard, version, messageType, issues);
    }

    return builder.build(messageFile);
  }
}
Enter fullscreen mode Exit fullscreen mode

Key design decisions:

  1. Lazy loading — segments, elements, composites are loaded on first access per version, then cached
  2. Validation — every message file is validated against the data package (segment references, element references)
  3. Package aliases — HIPAA maps to hipaa-x12-005010, EANCOM maps to eancom-2002
  4. Qualifier fallback — HIPAA uses files like 837-Q1.json instead of 837.json

Monorepo Structure — Why 13 Packages?

packages/
  core/                   # Domain + Application (pure, no parsers)
  edifact/                # EDIFACT parser & builder
  x12/                    # X12 parser & validator
  infrastructure-shared/  # FileBasedRepository, loaders, caching
  cli/                    # CLI tool (4 commands)
  edifact-d96a/           # 126 EDIFACT D.96A message definitions
  edifact-d01b/           # EDIFACT D.01B definitions
  edifact-d12a/           # EDIFACT D.12A definitions
  edifact-d20b/           # 195 EDIFACT D.20B definitions
  eancom-2002/            # 50 GS1 retail messages
  x12-004010/             # 293 X12 transaction sets
  x12-006040/             # 319 X12 transaction sets
  hipaa-x12-005010/       # 14 HIPAA transaction sets
Enter fullscreen mode Exit fullscreen mode

We already covered why the infrastructure is split into edifact, x12, and infrastructure-shared above. The data packages follow the same principle: install only what you need. A user working with X12 004010 shouldn't download 195 EDIFACT D.20B definitions. Each data package is independent — small, focused, and npm install on its own.


Lessons Learned

✅ Pipeline pattern for parsing — splitting into delimiter detection, tokenization, and segment parsing made each piece testable in isolation. When we added X12 support, we reused the pattern with different implementations.

✅ Data packages as separate npm packages — users install only what they need. Keeps bundle sizes small.

✅ Repository pattern with lazy loading — loading segments.json + elements.json + composites.json was expensive (~50ms). Caching per version eliminates this on subsequent calls.

✅ Builder pattern for validation — format-specific rules stay in format packages. Core remains agnostic. Adding HIPAA-specific rules? Create a new builder, compose with existing rules.

⚠️ Package aliases — HIPAA and EANCOM don't follow the {standard}-{version} naming convention. The alias map works but isn't elegant. Lesson: decide on naming conventions early.


What's Next — Part 5: Presentation Layer (CLI)

In Part 5, we'll see how the CLI ties everything together — the DI container that wires parsers + repositories + use cases, and how parse, validate, build, and export-schema commands work.

All already built and running in production. Part 5 walks through the real code.

Part 1: Why Clean Architecture?
Part 2: Domain Layer
Part 3: Application Layer
GitHub: @ediflow/core

⭐ If this series is useful — a star on GitHub helps others find it: github.com/ediflow-lib/core


How do you structure data packages in your monorepos? One giant package or many small ones? Drop a comment.

Top comments (0)