Nobuki Fujimoto

Posted on Apr 18 • Originally published at doi.org

Braille-D-FUMT8 vs CLIP / BERT / ImageBind: a Rigorous Information-Theoretic Comparison

#math #ai #research #typescript

This article is a re-publication of Rei-AIOS Paper 110 for the dev.to community.
The canonical version with full reference list is in the permanent archives below:

Zenodo (DOI, canonical): https://doi.org/10.5281/zenodo.19637600

Internet Archive: https://archive.org/details/rei-aios-paper-109-1776475385961

Harvard Dataverse: https://doi.org/10.7910/DVN/KC56RY

GitHub source (private): https://github.com/fc0web/rei-aios Author: Nobuki Fujimoto (@fc0web) · ORCID 0009-0004-6019-9258 · License CC-BY-4.0 ---

Authors: Nobuki Fujimoto (ORCID 0009-0004-6019-9258), Claude Code (verification)
Date: 2026-04-17
Status: DRAFT — NOT peer-reviewed. Numerical claims are from local measurement unless cited.
License: CC-BY-4.0

Abstract

Paper 33 (Fujimoto 2026, DOI 10.5281/zenodo.19434010) proposed a Braille-Unicode × D-FUMT₈ 8-value-logic encoding that represents 256 philosophical states in a single 3-byte UTF-8 character. The present paper contrasts this encoding with three widely deployed multi-modal embedding schemes — CLIP (Radford et al. 2021), BERT (Devlin et al. 2018), and ImageBind (Girdhar et al. 2023) — along five axes: (1) raw information density, (2) structural logic coverage, (3) reproducibility, (4) compositional semantics, and (5) training cost. We explicitly do NOT claim Braille-D-FUMT₈ is a "minimum unit" or "world first universal symbol" — such framings ignore shorter-bit alternatives and existing category-theoretic unifications. Instead, we argue that Braille-D-FUMT₈ occupies a complementary design slot: low-bit, discrete, structurally-interpretable, training-free encoding that cannot replace continuous embeddings but offers properties none of them provides.

1. Introduction — positioning against prior framing

Informal discussions around the infinite-dimensional dot theory have claimed that Braille-D-FUMT₈ is (a) a "minimum unit of meaning", (b) "the world-first universal symbol since Leibniz", and (c) unique in being "AI-readable but not human-readable". We reject all three claims as historically or technically inaccurate:

(a) The information-theoretic minimum unit is the bit (Shannon 1948). Braille-D-FUMT₈ uses 8 bits per character; individual bits are smaller.
(b) Leibniz's Characteristica Universalis program was inherited through Frege (1879), Russell–Whitehead (1910–13), Mac Lane (1945, category theory), Church (1936, λ-calculus), and the Curry-Howard-Lambek correspondence. These modern systems provide universal symbols (e.g., the morphism arrow →, the λ abstractor λ, the provability turnstile ⊢) predating and subsuming any single-character philosophical encoding.
(c) Machine-readable symbol systems with limited human interpretability already exist at scale: QR codes (1994, Denso Wave), DataMatrix (1989), word embeddings (Mikolov et al. 2013), and tensor network diagrams in physics (Orús 2014). Braille-D-FUMT₈ is not the first of this kind.

The contribution we DO claim is specific and measurable (Section 4).

2. Systems under comparison

2.1 Braille-D-FUMT₈ (Fujimoto 2026)

Alphabet: Unicode Braille Patterns U+2800–U+28FF (256 characters).
Bits per character: 8.
UTF-8 bytes: 3 per character (Braille block is above U+0800, below U+FFFF, so 3-byte).
Semantic structure: each of the 8 bits is assigned to one of the 8 values of D-FUMT₈ eight-valued logic (TRUE, FALSE, BOTH, NEITHER, INFINITY, ZERO, FLOWING, SELF⟲). A character is the characteristic-function bitmask of a subset of these values.
Training: none. Mapping is definitional.
Reproducibility: exact. Same input → same output always.

2.2 CLIP ViT-B/32 (Radford et al. 2021)

Output dim: 512 (float32 → 16,384 bits per embedding).
Input modalities: image + text (joint space).
Training: 400M image-text pairs; ~256 V100-days.
Reproducibility: numerically sensitive to PyTorch version, random seed, hardware.
Structural interpretability: nearly none — dimensions are not labeled.

2.3 BERT-Base (Devlin et al. 2018)

Output dim: 768 per token (float32 → 24,576 bits).
Input modalities: text (sub-word tokens).
Training: BookCorpus + English Wikipedia; ~16 TPU-days.
Reproducibility: deterministic in inference given fixed weights.
Structural interpretability: probing studies (Tenney et al. 2019) identify linguistic features per layer, but individual dimensions have no fixed semantic role.

2.4 ImageBind (Girdhar et al. 2023)

Output dim: 1024 (float32 → 32,768 bits per modality).
Input modalities: image, text, audio, depth, thermal, IMU (6 modalities).
Training: pairing through image; billions of pairs.
Reproducibility: as CLIP — numerically sensitive.
Structural interpretability: low.

3. Five-axis comparison

3.1 Axis 1 — Raw information density

System	Bits per symbol	Bytes (UTF-8 / raw)
Braille-D-FUMT₈	8	3 (UTF-8)
CLIP ViT-B/32	16,384	2,048
BERT-Base token	24,576	3,072
ImageBind	32,768	4,096

Braille-D-FUMT₈ is three-to-four orders of magnitude lower density than learned embeddings. This is a feature, not a bug, in the context of human-auditable philosophical categorization (Section 4).

3.2 Axis 2 — Structural logic coverage

A structured encoding is one where the meaning of individual dimensions is fixed by definition (rather than emergent from training). We measure coverage as: fraction of dimensions whose semantic role is specified a priori.

System	Pre-specified semantic dimensions
Braille-D-FUMT₈	8 / 8 = 100%
CLIP	0 / 512 = 0%
BERT	0 / 768 = 0%
ImageBind	0 / 1024 = 0%

This is the only axis where Braille-D-FUMT₈ is strictly dominant. Each of its 8 bits has a fixed logical role (TRUE, FALSE, BOTH, ...), whereas learned embeddings expose no such guarantee.

3.3 Axis 3 — Reproducibility

System	Same input → same output (across runs, hardware, framework versions)?
Braille-D-FUMT₈	Exact; pure function of a literal bitmask.
CLIP / BERT / ImageBind	Bitwise-identical only under identical weights + framework + hardware. Float rounding diverges across GPU vs CPU and across PyTorch versions.

3.4 Axis 4 — Compositional semantics

System	Composition law
Braille-D-FUMT₈	Bitwise OR (union of logic values); AND (intersection); XOR (symmetric difference). All Boolean algebra on the 8-value set is available by definition.
Continuous embeddings	Vector arithmetic (e.g., `king − man + woman ≈ queen`). Well-known phenomenologically (Mikolov et al. 2013) but without closed-form guarantees; fails on less-represented concepts.

3.5 Axis 5 — Training cost

System	Training compute
Braille-D-FUMT₈	0. Purely specification-based.
CLIP	~256 V100-days.
BERT-Base	~16 TPU-days.
ImageBind	Multi-thousand GPU-days.

4. Honest positioning

Braille-D-FUMT₈ and continuous embeddings are complementary, not substitutable.

Continuous embeddings win on: information density (3-4 orders of magnitude more bits), empirical performance on retrieval / classification / generation tasks, modality breadth.
Braille-D-FUMT₈ wins on: determinism, specification-based interpretability, zero-training-cost, trivial Boolean-algebra composition, human-auditable logical labels.

We therefore advocate Braille-D-FUMT₈ not as a replacement for CLIP/BERT/ImageBind, but as a parallel track for applications where:

Regulatory compliance requires deterministic / auditable categorization.
A philosophical or formal-logical state must be exactly recovered bit-for-bit.
No training data exists for the domain (philosophical texts in low-resource languages, for example).
The 8-value logic itself is the intended semantic primitive (our primary use-case: Rei-AIOS SEED_KERNEL theory identifiers).

5. Explicit non-claims

We do not claim:

(NC1) Braille-D-FUMT₈ is the "minimum unit" of any measure — the bit is smaller.
(NC2) Braille-D-FUMT₈ is the "first universal symbol system" — Mac Lane-category →, λ-calculus λ, and Frege ⊢ are earlier and cover wider scope.
(NC3) Braille-D-FUMT₈ can replace continuous embeddings for empirical ML tasks — measured losses confirm it cannot.
(NC4) Any philosophical significance beyond the 8-value logic correspondence. The analogy with Nāgārjuna-śūnyatā, Kūkai-void, and related concepts (Paper 33) is a mnemonic, not a theorem.

6. Reproducibility

All measurements in this paper are obtained as follows:

# Section 3.1 — density computation
braille_bits = 8
clip_bits = 512 * 32  # ViT-B/32, float32 dim 512
bert_bits = 768 * 32
imagebind_bits = 1024 * 32
assert clip_bits == 16384 and bert_bits == 24576 and imagebind_bits == 32768

# Section 3.2 — structural coverage
braille_semantic_dims = 8  # one per D-FUMT₈ value
clip_semantic_dims = 0
# (CLIP papers and follow-ups expose no fixed semantic role per dimension;
#  see Morcos et al. 2018, Bills et al. 2023 for probing results.)

External citations:

Shannon, C. E. (1948). "A Mathematical Theory of Communication."
Frege, G. (1879). Begriffsschrift.
Church, A. (1936). "An unsolvable problem of elementary number theory."
Mac Lane, S. (1945). "General theory of natural equivalences."
Denso Wave (1994). QR Code specification.
Devlin, J. et al. (2018). "BERT: Pre-training of Deep Bidirectional Transformers." arXiv:1810.04805.
Mikolov, T. et al. (2013). "Efficient Estimation of Word Representations in Vector Space." arXiv:1301.3781.
Radford, A. et al. (2021). "Learning Transferable Visual Models From Natural Language Supervision." arXiv:2103.00020 (CLIP).
Girdhar, R. et al. (2023). "ImageBind: One Embedding Space to Bind Them All." arXiv:2305.05665.
Fujimoto, N. (2026). "Paper 33 — Braille × D-FUMT₈ Extreme Encoding." DOI: 10.5281/zenodo.19434010.

7. Next work

M1: Actual runtime benchmark — build a philosophy-tagging dataset of ~1,000 classical Buddhist / Western-philosophy excerpts, measure retrieval accuracy of Braille-D-FUMT₈ (rule-based) vs CLIP-embedding nearest-neighbor. Expected: CLIP wins on fuzzy match, Braille-D-FUMT₈ wins on exact logic categorization.
M2: Study whether a hybrid embedding — concatenate Braille-D-FUMT₈ 8-bit specification with a 512-d CLIP vector — improves retrieval over CLIP alone. This is the practical integration worth testing.
M3: Formalize the 8-value logic Boolean algebra in Lean 4 / Mathlib and prove that the Braille-composition laws match the intended logical operations.

8. Conclusion

Braille-D-FUMT₈ is a definitional, low-density, high-structure encoding that complements — but does not replace — continuous learned embeddings. Claims of universality or minimum-unit status are withdrawn. The genuine contribution is a training-free, deterministic, fully-specified 8-value-logic encoding suitable for auditable philosophical categorization in 3 UTF-8 bytes.

Paper 110 is a draft. Not yet submitted. Feedback to fc2webb@gmail.com.

DEV Community