DEV Community

Nobuki Fujimoto
Nobuki Fujimoto

Posted on • Originally published at doi.org

Braille-D-FUMT8 vs CLIP / BERT / ImageBind: a Rigorous Information-Theoretic Comparison

This article is a re-publication of Rei-AIOS Paper 110 for the dev.to community.
The canonical version with full reference list is in the permanent archives below:

Authors: Nobuki Fujimoto (ORCID 0009-0004-6019-9258), Claude Code (verification)
Date: 2026-04-17
Status: DRAFT — NOT peer-reviewed. Numerical claims are from local measurement unless cited.
License: CC-BY-4.0


Abstract

Paper 33 (Fujimoto 2026, DOI 10.5281/zenodo.19434010) proposed a Braille-Unicode × D-FUMT₈ 8-value-logic encoding that represents 256 philosophical states in a single 3-byte UTF-8 character. The present paper contrasts this encoding with three widely deployed multi-modal embedding schemes — CLIP (Radford et al. 2021), BERT (Devlin et al. 2018), and ImageBind (Girdhar et al. 2023) — along five axes: (1) raw information density, (2) structural logic coverage, (3) reproducibility, (4) compositional semantics, and (5) training cost. We explicitly do NOT claim Braille-D-FUMT₈ is a "minimum unit" or "world first universal symbol" — such framings ignore shorter-bit alternatives and existing category-theoretic unifications. Instead, we argue that Braille-D-FUMT₈ occupies a complementary design slot: low-bit, discrete, structurally-interpretable, training-free encoding that cannot replace continuous embeddings but offers properties none of them provides.

1. Introduction — positioning against prior framing

Informal discussions around the infinite-dimensional dot theory have claimed that Braille-D-FUMT₈ is (a) a "minimum unit of meaning", (b) "the world-first universal symbol since Leibniz", and (c) unique in being "AI-readable but not human-readable". We reject all three claims as historically or technically inaccurate:

  • (a) The information-theoretic minimum unit is the bit (Shannon 1948). Braille-D-FUMT₈ uses 8 bits per character; individual bits are smaller.
  • (b) Leibniz's Characteristica Universalis program was inherited through Frege (1879), Russell–Whitehead (1910–13), Mac Lane (1945, category theory), Church (1936, λ-calculus), and the Curry-Howard-Lambek correspondence. These modern systems provide universal symbols (e.g., the morphism arrow , the λ abstractor λ, the provability turnstile ) predating and subsuming any single-character philosophical encoding.
  • (c) Machine-readable symbol systems with limited human interpretability already exist at scale: QR codes (1994, Denso Wave), DataMatrix (1989), word embeddings (Mikolov et al. 2013), and tensor network diagrams in physics (Orús 2014). Braille-D-FUMT₈ is not the first of this kind.

The contribution we DO claim is specific and measurable (Section 4).

2. Systems under comparison

2.1 Braille-D-FUMT₈ (Fujimoto 2026)

  • Alphabet: Unicode Braille Patterns U+2800–U+28FF (256 characters).
  • Bits per character: 8.
  • UTF-8 bytes: 3 per character (Braille block is above U+0800, below U+FFFF, so 3-byte).
  • Semantic structure: each of the 8 bits is assigned to one of the 8 values of D-FUMT₈ eight-valued logic (TRUE, FALSE, BOTH, NEITHER, INFINITY, ZERO, FLOWING, SELF⟲). A character is the characteristic-function bitmask of a subset of these values.
  • Training: none. Mapping is definitional.
  • Reproducibility: exact. Same input → same output always.

2.2 CLIP ViT-B/32 (Radford et al. 2021)

  • Output dim: 512 (float32 → 16,384 bits per embedding).
  • Input modalities: image + text (joint space).
  • Training: 400M image-text pairs; ~256 V100-days.
  • Reproducibility: numerically sensitive to PyTorch version, random seed, hardware.
  • Structural interpretability: nearly none — dimensions are not labeled.

2.3 BERT-Base (Devlin et al. 2018)

  • Output dim: 768 per token (float32 → 24,576 bits).
  • Input modalities: text (sub-word tokens).
  • Training: BookCorpus + English Wikipedia; ~16 TPU-days.
  • Reproducibility: deterministic in inference given fixed weights.
  • Structural interpretability: probing studies (Tenney et al. 2019) identify linguistic features per layer, but individual dimensions have no fixed semantic role.

2.4 ImageBind (Girdhar et al. 2023)

  • Output dim: 1024 (float32 → 32,768 bits per modality).
  • Input modalities: image, text, audio, depth, thermal, IMU (6 modalities).
  • Training: pairing through image; billions of pairs.
  • Reproducibility: as CLIP — numerically sensitive.
  • Structural interpretability: low.

3. Five-axis comparison

3.1 Axis 1 — Raw information density

System Bits per symbol Bytes (UTF-8 / raw)
Braille-D-FUMT₈ 8 3 (UTF-8)
CLIP ViT-B/32 16,384 2,048
BERT-Base token 24,576 3,072
ImageBind 32,768 4,096

Braille-D-FUMT₈ is three-to-four orders of magnitude lower density than learned embeddings. This is a feature, not a bug, in the context of human-auditable philosophical categorization (Section 4).

3.2 Axis 2 — Structural logic coverage

A structured encoding is one where the meaning of individual dimensions is fixed by definition (rather than emergent from training). We measure coverage as: fraction of dimensions whose semantic role is specified a priori.

System Pre-specified semantic dimensions
Braille-D-FUMT₈ 8 / 8 = 100%
CLIP 0 / 512 = 0%
BERT 0 / 768 = 0%
ImageBind 0 / 1024 = 0%

This is the only axis where Braille-D-FUMT₈ is strictly dominant. Each of its 8 bits has a fixed logical role (TRUE, FALSE, BOTH, ...), whereas learned embeddings expose no such guarantee.

3.3 Axis 3 — Reproducibility

System Same input → same output (across runs, hardware, framework versions)?
Braille-D-FUMT₈ Exact; pure function of a literal bitmask.
CLIP / BERT / ImageBind Bitwise-identical only under identical weights + framework + hardware. Float rounding diverges across GPU vs CPU and across PyTorch versions.

3.4 Axis 4 — Compositional semantics

System Composition law
Braille-D-FUMT₈ Bitwise OR (union of logic values); AND (intersection); XOR (symmetric difference). All Boolean algebra on the 8-value set is available by definition.
Continuous embeddings Vector arithmetic (e.g., king − man + woman ≈ queen). Well-known phenomenologically (Mikolov et al. 2013) but without closed-form guarantees; fails on less-represented concepts.

3.5 Axis 5 — Training cost

System Training compute
Braille-D-FUMT₈ 0. Purely specification-based.
CLIP ~256 V100-days.
BERT-Base ~16 TPU-days.
ImageBind Multi-thousand GPU-days.

4. Honest positioning

Braille-D-FUMT₈ and continuous embeddings are complementary, not substitutable.

  • Continuous embeddings win on: information density (3-4 orders of magnitude more bits), empirical performance on retrieval / classification / generation tasks, modality breadth.
  • Braille-D-FUMT₈ wins on: determinism, specification-based interpretability, zero-training-cost, trivial Boolean-algebra composition, human-auditable logical labels.

We therefore advocate Braille-D-FUMT₈ not as a replacement for CLIP/BERT/ImageBind, but as a parallel track for applications where:

  1. Regulatory compliance requires deterministic / auditable categorization.
  2. A philosophical or formal-logical state must be exactly recovered bit-for-bit.
  3. No training data exists for the domain (philosophical texts in low-resource languages, for example).
  4. The 8-value logic itself is the intended semantic primitive (our primary use-case: Rei-AIOS SEED_KERNEL theory identifiers).

5. Explicit non-claims

We do not claim:

  • (NC1) Braille-D-FUMT₈ is the "minimum unit" of any measure — the bit is smaller.
  • (NC2) Braille-D-FUMT₈ is the "first universal symbol system" — Mac Lane-category , λ-calculus λ, and Frege are earlier and cover wider scope.
  • (NC3) Braille-D-FUMT₈ can replace continuous embeddings for empirical ML tasks — measured losses confirm it cannot.
  • (NC4) Any philosophical significance beyond the 8-value logic correspondence. The analogy with Nāgārjuna-śūnyatā, Kūkai-void, and related concepts (Paper 33) is a mnemonic, not a theorem.

6. Reproducibility

All measurements in this paper are obtained as follows:

# Section 3.1 — density computation
braille_bits = 8
clip_bits = 512 * 32  # ViT-B/32, float32 dim 512
bert_bits = 768 * 32
imagebind_bits = 1024 * 32
assert clip_bits == 16384 and bert_bits == 24576 and imagebind_bits == 32768
Enter fullscreen mode Exit fullscreen mode
# Section 3.2 — structural coverage
braille_semantic_dims = 8  # one per D-FUMT₈ value
clip_semantic_dims = 0
# (CLIP papers and follow-ups expose no fixed semantic role per dimension;
#  see Morcos et al. 2018, Bills et al. 2023 for probing results.)
Enter fullscreen mode Exit fullscreen mode

External citations:

  • Shannon, C. E. (1948). "A Mathematical Theory of Communication."
  • Frege, G. (1879). Begriffsschrift.
  • Church, A. (1936). "An unsolvable problem of elementary number theory."
  • Mac Lane, S. (1945). "General theory of natural equivalences."
  • Denso Wave (1994). QR Code specification.
  • Devlin, J. et al. (2018). "BERT: Pre-training of Deep Bidirectional Transformers." arXiv:1810.04805.
  • Mikolov, T. et al. (2013). "Efficient Estimation of Word Representations in Vector Space." arXiv:1301.3781.
  • Radford, A. et al. (2021). "Learning Transferable Visual Models From Natural Language Supervision." arXiv:2103.00020 (CLIP).
  • Girdhar, R. et al. (2023). "ImageBind: One Embedding Space to Bind Them All." arXiv:2305.05665.
  • Fujimoto, N. (2026). "Paper 33 — Braille × D-FUMT₈ Extreme Encoding." DOI: 10.5281/zenodo.19434010.

7. Next work

  • M1: Actual runtime benchmark — build a philosophy-tagging dataset of ~1,000 classical Buddhist / Western-philosophy excerpts, measure retrieval accuracy of Braille-D-FUMT₈ (rule-based) vs CLIP-embedding nearest-neighbor. Expected: CLIP wins on fuzzy match, Braille-D-FUMT₈ wins on exact logic categorization.
  • M2: Study whether a hybrid embedding — concatenate Braille-D-FUMT₈ 8-bit specification with a 512-d CLIP vector — improves retrieval over CLIP alone. This is the practical integration worth testing.
  • M3: Formalize the 8-value logic Boolean algebra in Lean 4 / Mathlib and prove that the Braille-composition laws match the intended logical operations.

8. Conclusion

Braille-D-FUMT₈ is a definitional, low-density, high-structure encoding that complements — but does not replace — continuous learned embeddings. Claims of universality or minimum-unit status are withdrawn. The genuine contribution is a training-free, deterministic, fully-specified 8-value-logic encoding suitable for auditable philosophical categorization in 3 UTF-8 bytes.


Paper 110 is a draft. Not yet submitted. Feedback to fc2webb@gmail.com.

Top comments (0)