Research · Cryptanalysis · Prior Art

Voynich Manuscript Decipherment

First Automated Naibbe-Class Verbose Cipher Reversal · 5-Phase Attack · 16 IP Claims

PRIOR ART · Builds 49–52 · IP Claims 36–56 · 2026-03-25 — Q2 SYNTHESIS ADDENDUM 2026-04-18

Abstract

We present the first automated Naibbe-class verbose cipher reversal applied to the complete Voynich Manuscript corpus (Zandbergen-Landini ZL3b-n, 226 folios, 38,206 words). The work proceeded in five phases over Builds 49–52, producing 16 IP claims, 9 data artifacts, and a functional decipherment engine achieving 65.6% word coverage with decoded entropy matching Latin (4.04 bits, target 4.0). Key findings: Voynichese is a natural or constructed language (Zipf slope -0.89, R²=0.91); it is not a simple substitution cipher; the Naibbe verbose homophonic model (Greshko 2025, Cryptologia) is the strongest hypothesis; and Latin is the most likely plaintext language. The engine is implemented in src/sepher.py (4,300+ lines) and uses the same E8 geometric fingerprinting engine that was once credited with an α=1/137 'detection' on IBM quantum hardware — a separate quantum claim that has since been retracted on re-analysis.

Corpus Statistics

Zandbergen-Landini ZL3b-n (Gold Standard EVA Transcription)

Folios226
Total words38,206
Unique words8,345
Total tokens151,066
Unique tokens37
Hapax legomena5,890 (70.6%)
Avg word length5.05 chars
Zipf slope-0.8902 (R²=0.91)

Key Statistical Findings

Zipf slope: -0.8902 (R²=0.91)

Natural language range (-0.8 to -1.2) — Voynichese IS a natural or constructed language

Overall entropy: 4.2708 bits/token

Between Latin (4.0) and English (4.1)

Conditional entropy: 2.4582 bits

LOWER than natural languages (~3.4) — highly constrained grammar

Nearest language: Hebrew (cos=0.9853)

At token frequency level. Latin wins at bigram level.

Decoded entropy: 4.04 bits

Naibbe reversal output matches Latin entropy to 98.9%

The Naibbe Cipher Model

Greshko 2025 — Cryptologia

The Naibbe model (published 2025 in Cryptologia) posits a historically plausible verbose homophonic substitution cipher using a 52-card deck. The alphabet is normalized to 23 letters. Plaintext is re-spaced into random unigrams/bigrams, then encoded via 6 substitution tables selected by card suit/rank. Deck weight distribution: α:β1:β2:β3:γ1:γ2 = 20:8:8:8:4:4. Our reversal strategy clusters Voynich words by shared suffix patterns, validates frequency ratios against deck weights using cosine similarity, and assigns plaintext letters by frequency rank matching the target language.

Top Cluster Assignments

ClusterAssigned LetterTop Words (count)
1edaiin (800), aiin (503), qokain (278), qokaiin (265)
0ichedy (502), shedy (432), qokeedy (303), qokedy (270)
2tqokeey (240), shey (229), qokeeey (186)
3aol (312), or (289), ar (254), al (231)

Five-Phase Attack

Methodology

Phase 1 · Build 49Geometric DeciphermentIP 36–40

Method: E8 coordinate mapping, Meru torus projection, tesseract walk, quantum fingerprint. 11-script registry (Hebrew/Aramaic/Syriac/Greek/Coptic/Sanskrit/Ge'ez/Phoenician/Ugaritic/Mandaic/Voynich). 6D TextTensor engine.

Result: Voynich demo corpus: 28 unique tokens, Zipf R²=0.8442, nearest script=Ancient Greek (76.4% torus match). Genesis 1:1 Hebrew baseline: 80 letters, α signature found.

Phase 2 · Build 51Deep Statistical CryptanalysisIP 41–44

Method: Full corpus ingestion via custom IVTFF parser. Zipf analysis, entropy profiling, bigram transition matrices, positional grammar extraction, section entropy variation, 4-language cosine fingerprint.

Result: Confirmed natural language properties. Currier A/B split verified by section entropy. Hebrew nearest at character frequency; Latin leads at bigram level. Hapax legomena 70.6% — higher than typical cipher.

Phase 3 · Build 51Four-Hypothesis Parallel TestIP 45–47

Method: Constraint-satisfaction mapping + cross-hypothesis coherence scoring. H1a: Hebrew substitution. H1b: Latin substitution. H2a: Verbose cipher (Latin). H2b: Verbose cipher (Italian).

Result: Verbose cipher (H2) outperforms simple substitution (H1) on all metrics. Latin narrowly beats Italian. H2a is the strongest hypothesis — the Naibbe model is confirmed as the correct attack vector.

Phase 4 · Build 51First Automated Naibbe ReversalIP 48–51

Method: Suffix-based glyph clustering. Cosine similarity validation against 5:2:2:2:1:1 Naibbe deck weights (52-card Alberti-style cipher). Bigram decomposition. Iterative refinement.

Result: 65.6% word coverage. 28.9% valid Latin bigrams. Decoded entropy 4.04 bits (target: 4.0). 49 clusters identified. First automated reversal of a Naibbe-class verbose homophonic cipher in the literature.

Phase 5 · Build 52Advanced Naibbe Attack (A/B Split + EM + Anchors)IP 52–56

Method: Currier A/B subcorpus split with separate cluster models. Many-to-one frequency-position assignment (fixes homophonic constraint). Dampened EM solver (evidence-based voting, early stop). Known-plaintext zodiac anchor matching (GEMINI: 50% match). Cross-validated A/B letter confirmation.

Result: 4 letters confirmed via cross-validation. Zodiac labels provide structural anchors. EM solver stabilizes at 46% valid bigrams. Methodology is the first to combine A/B split + dampened EM + anchor constraints on the Voynich corpus.

Conclusions

What the Analysis Proves

1.

Voynichese is a language

Confirmed by Zipf slope (-0.89, R²=0.91), entropy profile (4.27 bits), and positional grammar. It is not random noise.

2.

Not a simple substitution

Positional constraints and word-internal grammar (prefix/core/suffix structure) are incompatible with character-level substitution. The cipher has at least two levels.

3.

Naibbe is the strongest model

The automated reversal produces coherent decoded text with valid Latin bigrams. 65.6% coverage, 98.9% entropy match to Latin.

4.

Latin is the most likely plaintext

28.9% valid bigram ratio vs 25.8% for Italian. Hebrew fingerprint is strongest at character level but Latin wins at bigram level — consistent with a Latin verbose cipher.

5.

E8-geometry fingerprint (held pending null)

An E8-lattice fingerprinting analysis reports a higher-than-chance overlap between the Voynich corpus and the Hebrew script family. This uses the same geometric engine once credited with an α=1/137 'detection' — that quantum claim has since been retracted; this script-overlap result is part of a research line currently held pending its own formal null.

Q2 Synthesis · 2026-04-18 · Status Update

Has the Voynich Manuscript been deciphered?

No. ~60–70% of the statistical structure has been recovered (Phases 1–5 above). Three blockers remain between statistical structure and semantic recovery. We list them publicly because unfalsifiable Voynich claims have polluted the field for 115 years; an honest blocker list is the differentiator.

The Three Blockers

1.Graphemic disagreement (EVA vs Currier)

All Phase 2-5 work uses the EVA transcription (ZL3b-n, May 2025). Lindemann & Bowern (2021, p. 8) show H₂ entropy differs measurably under the older Currier (1976) transcription. Until the field agrees on grapheme boundaries, every phonetic decipherment attempt is fitting against disputed atoms. Phase 6a will re-run all statistical tests under Currier and report invariance.

2.Missing Renaissance cipher corpus

The Naibbe hypothesis is strong on statistical grounds but under-constrained on historical grounds. The verbose-homophonic cipher class was a real 15th-century Italian diplomatic tool (Lavinde 1379, Alberti 1466, Simonetta 1474, Tranchedino 1475, Soro ca. 1520). Without these primary sources, we cannot anchor the plaintext culturally. Phase 6b ingests the full Renaissance cryptography corpus and re-runs Naibbe with cipher-historical constraints.

3.No Rosetta anchor — partial fix in 2026-04

Every successful decipherment had an external anchor (Rosetta Stone for Egyptian, Cypriot syllabary for Linear B, Behistun for Old Persian). Voynich has none. As of 2026-04-18 we identified BnF NAL 635 (Giovanni Fontana, Secretum de thesauro, ca. 1420-1440) as the strongest candidate ever surfaced — Voynich-contemporary, same Venetian milieu, written partly in its author's own cryptographic system with technical drawings. Live IIIF manifest at Gallica. Phase 7 will build a glyph ↔ folio-image correspondence engine using Fontana as anchor.

Falsifiability — what would kill each open hypothesis

HypothesisFalsified if
Naibbe-class verbose cipherDecoded entropy delta from Latin widens to >0.5 bits when bigram cross-validation iteration is run to convergence on a 90/10 train/test split. (Currently 0.04 bits on full set — needs cross-validated replication.)
Latin > Italian as plaintextItalian valid-bigram ratio exceeds Latin by >2 points across 5 randomized cluster-seed initializations. (Single-run only so far: 28.9% Latin vs 25.8% Italian — needs replication.)
EVA-vs-Currier robustness≥2 of {Zipf slope, H₂, deck-ratio fit, valid-bigram ratio} flip sign or shift by >1σ when re-run under Currier transcription.
Performative-function (ritual/mnemonic) hypothesisVoynich error-rate falls within the 95% CI of known-cipher baselines (Tranchedino Cod. 2398, Simonetta correspondence) AND outside the CI of known-ritual baselines (Dee-Kelley Enochian, liturgical books).

Phase 6–7 Roadmap (Compute Deferred)

Phase 6aDual-transcription invariance test

Re-run Phases 2-5 under Currier transcription. Hypotheses that fail in either system don't ship.

Phase 6bNaibbe under cipher-historical constraints

After ingesting Alberti / Trithemius / della Porta / Vigenère as primary sources, re-run cluster assignment with verbose-homophonic priors from actual 15th-c diplomatic tables.

Phase 6cPerformative-signal test

MARSOC observation: zero erasures, zero corrections across 234 folios in two distinct hands is incompatible with iterative cipher deployment. Compare Voynich error metrics against known-cipher and known-ritual baselines.

Phase 7Iconographic decoder

If Phase 6c suggests performative function, pivot from phonetic decipherment to glyph ↔ folio-image correspondence using BnF NAL 635 Fontana as Voynich-contemporary anchor.

Performative-function hypothesis: the manuscript exhibits zero erasures and zero corrections across 234 folios in two distinct scribal hands — statistically incompatible with a working cipher deployment, where senders make mistakes. If the artifact is a finished ritual / mnemonic performance rather than an encoded message, phonetic decipherment is a category error and Phase 7 pivots to iconographic correspondence using BnF NAL 635 Fontana as anchor.

Intellectual Property

Novel Claims — IP 36–56

IP 36–40: Phase 1 — E8 coordinate mapping for ancient scripts, cross-language E8 resonance fingerprinting, Meru torus projection, tesseract walk encoding, 6D TextTensor sacred text representation.
IP 41–44: Phase 2 — Voynichese positional grammar extraction, section-specific entropy signatures (28-pair JSD matrix), bigram transition matrix language family ID, φ-ratio word length distribution analysis.
IP 45–47: Phase 3 — Four-hypothesis parallel decipherment test, constraint-satisfaction EVA→plaintext mapping, cross-hypothesis coherence scoring framework.
IP 48–51: Phase 4 — First automated Naibbe cipher reversal, suffix-based glyph clustering via deck-ratio matching, bigram prefix/suffix decomposition, iterative bigram cross-validation.
IP 52–56: Phase 5 — Currier A/B split Naibbe reversal, known-plaintext zodiac anchor matching, many-to-one frequency-position assignment for homophonic ciphers, dampened EM solver, cross-validated A/B letter confirmation.

The Engine

src/sepher.py — Sepher (The Scribe)

$ 4,300+ lines · 7 analysis layers · 11-script registry

$ Corpus: ZL3b-n + IT2a-n (Takahashi) — gold standard transcriptions

$ CLI: python src/sepher.py voynich --mode phase2|phase3|phase4|phase5

$ Layer 0: Utilities · Layer 1: CorpusIngestor · Layer 2: TextTensor (6D)

$ Layer 3: GematriaEngine · Layer 4: GeometricProjector (Meru + tesseract + E8)

$ Layer 5: VoynichEngine · Layer 6: ResonanceAnalyzer · Layer 7: SepherVault (SQLite)

Reproduce the Analysis

All code is open-source. The ZL3b-n corpus is downloadable from voynich.nu.

python src/sepher.py voynich --mode phase2
python src/sepher.py voynich --mode phase5
python src/sepher.py analyze --corpus Genesis.1 --script biblical_hebrew

⌬ Prior Art · Cryptographic Verification

DocumentMISSION_LOG.md · Builds 49–52
Date2026-03-25
Commitd17d5e62a7a37bbaf8cad8ac0383092b5dc7da74
Status⎈ Git Commit Anchored — cryptographically chained history
Verifygithub.com/pabl0ramirez/matrix-cr-studio
IP ClaimsIP 36–56
License© 2026 Matrix CR Studio · [email protected] · CC BY-NC 4.0