v4j

Voynich for Java (v4j) library


Project maintained by mzattera Hosted on GitHub Pages — Theme by mattgraham

Note 005 - Slots and a New Alphabet

Last updated Apr. 2nd, 2022.

This note is part of my article I presented at the International Conference on the Voynich Manuscript 2022 ZATTERA (2022).

This note refers to release v.10.0.0 of v4j; links to classes and files refer to this release; files might have been changed, deleted or moved in the current master branch. In addition, some of this note content might have become obsolete in more recent versions of the library.

Working notes are not providing detailed description of algorithms and classes used; for this, please refer to the library code and JavaDoc.

Please refer to the home page for a set of definitions that might be relevant for this working note.

« Home


Abstract

I show how the structure of Voynich words can be easily described by assuming each word type is composed by “slots” that can be filled accordingly to simple rules, which are described below.

This in turn sheds some lights on the definition of what might constitute a Voynich character (the Voynich alphabet).

Given the nature of this topic, it is impossible to define rules that apply to 100% of cases; after all, syntactical and grammatical exceptions exists in any modern text as well. However, I will try to focus on claims that apply to the vast majority of cases.

Methodology

I start my analysis from a concordance version of the Voynich text (see Note 001); this is obtained from the Landini-Stolfi Interlinear file by merging available interlinear transcriptions for each transcriber. In the merging, characters that are not read by all authors in the same way are marked as unreadable. This to ensure the word types I will extract from the text are the most accurate.

For reasons explained below, any occurrence of the following characters is also marked with an unreadable character:

As a second step, tokens are created by splitting the text where a space was detected by at least one of the transcribers; there are 31’317 tokens in the text, ignoring those that contain an unreadable character.

The list of word types is the list of tokens without repetitions (this would be the “vocabulary” of the Voynich). These 5’105 total word types have then been analyzed as explained below.

Considerations

By looking at the word types in the Voynich, we can see their structure (that is, the sequence of Voynich glyphs used to write them) can be easily described as follows:

The below table summarizes all of these rules, showing the 12 slots and the glyphs that can occupy them {1}.

Slots

In some cases, the word structure can be ambiguous, since a glyph can occupy any of 2 available slots (e.g. the word type ‘y’ can be seen as a ‘y’ either in slot 1 or slot 11); following some further analysis on word structure, when decomposing a word, I always put each glyph in the rightmost possible position. Notice this is a “weak” rule that is quite arbitrary and has no impact on which word types can or cannot be described by this model.

To exemplify this concept, I show how some common word types can be decomposed in slots;

'daiin'

  0     1     2     3     4     5     6     7     8     9    10    11
[   ] [   ] [   ] [   ] [   ] [   ] [   ] [ d ] [ a ] [ii ] [ n ] [   ] 

(notice 'd' is in slot 7 and not 0, even if both position would be legit)


'qokeedy'

  0     1     2     3     4     5     6     7     8     9    10    11
[ q ] [ o ] [   ] [ k ] [   ] [   ] [ee ] [   ] [   ] [   ] [ d ] [ y ] 

(notice 'd' is in slot 10 and not 7, even if both position would be legit)


'chcthor'

  0     1     2     3     4     5     6     7     8     9    10    11
[   ] [   ] [   ] [   ] [ch ] [cth] [   ] [   ] [ o ] [   ] [ r ] [   ] 

We can then see {2} that tokens can be classified as follows:

The below tables summarize these findings.

Table with distribution of words accordingly to their classification.

Pie chart with distribution of words accordingly to their classification.

In short, almost 9 out of 10 tokens in the Voynich text exhibit a “slot” structure. Of the remaining, a fair amount can be decomposed in two parts each corresponding to regular word types appearing elsewhere in the text. The remaining cases (3 out of 100) are mostly words appearing only once in the text.

The below table shows percentage occurrence of glyphs in slots for regular word types {3}.

Table with glyph count by slot.

(it is expected the distribution to change based on Currier’s language and illustration; this is something to be further investigated).

The Voynich Alphabet

The definition of the Voynich alphabet, that is of which glyphs should be considered a single Voynich character in the text, is still open. Each transcriber must continuously decide what symbols in the manuscript constitute instances of the same glyph and how each glyph needs to be mapped into one or more transliteration characters.

However, if we consider the above defined slots as relevant for the structure of word types, we can reasonably assume that each glyph appearing in a slot constitutes a basic unit of information, that is a character in the Voynich alphabet. As far as I know, this is the first time that a possible Voynich alphabet is supported by empirical evidence of an inner structure of Voynich word types.

Below, I analyze more in detail some relationships between glyphs, as they appear in slots, and EVA characters.

Rare Characters

Some EVA characters seldom appear in the original interlinear transliteration {4}, end even less frequently in the concordance version used, where they appear mostly as single characters, as shown in the table below (which also considers “unreadable” tokens). For this reason, I decided to ignore these characters and mark them as “unreadable character” for this analysis.

Statistics about rare characters

Notice that through the Voynich there are several glyphs which cannot be directly transliterated into EVA characters (so called “weirdoes”); they are ignored by most authors in any analysis of the text.

Gallows and Pedestals

Some glyphs (EVA ‘t’, ‘k’, ‘p’ and ‘f’) appear taller than other characters and are traditionally referred to as “gallows”. The combination ‘ch’ is instead called “pedestal”. Some glyphs (EVA ‘cth’, ‘ckh’, ‘cph’ and ‘cfh’) appear visually as a overlap of the pedestal with one of the gallows and are therefore called “pedestalled gallows”. These glyphs appear in slots 3, 4, and 5 and are shown in the below table.

Gallows and Pedestal

It has been hypothesized (e.g. TILTMAN (1967) p.7 point (b.)) that pedestalled gallows might be a “ligature”, that is a more compact from of writing a combination of the pedestal and a gallows character. If we look at slots 3 through 5, we might think that pedestalled gallows can be indeed a combination of a gallows character followed by the pedestal, in this specific order. However,

This leads me to think pedestalled gallows are Voynich characters in their own, and not ligatures. Incidentally, CURRIER (1976) came to the same conclusion.

In addition, the character ‘c’ appears outside of the pedestal or pedestalled gallows only 7 times (‘c’, ‘oc’, ‘chcpar’, ‘ckshy’, ‘ocfshy’, ‘cs?t?eey’, and ‘o?cs’); similarly, the character ‘h’ appears outside of the pedestal, the pedestalled gallows or the “plumed” pedestal only 4 times (‘theody’, ‘docfhhy’, ‘cfhhy’, and ‘d?ithy’). This seems a strong indication that EVA ‘c’ and ‘h’ do not correspond to Voynich characters {5}{6}.

‘e’ and ‘i’

The characters ‘e’ and ‘i’ only appear in slots 6 and 9 respectively, in a sequence of 1, 2 or 3. Some transcribers, like Currier, have assumed some of these sequences to be a single Voynich character.

It can be argued that these are indeed repetitions of the same character but, if this is the case, as these sequences appear always in same slots, what it is relevant here would be the number of repetitions. Using an example with Roman numerals, the sequence “III” must not be understood as a 3-character word, rather as the number 3.

In addition, it should be noted that several characters in the Latin script might appear as repetitions of the same character, when written by hand; for example “m” looks like “nn”, “w” can be read as “uu”, but these are single characters.

Based on the above, I assume each sequence of ‘e’ and ‘i’ is probably a character in itself (or anyway a single “logical unit”, like in Italian where, even if “q” and “u” are distinct letters, “q” appears only in “qu-“).

The Slot Alphabet

Finally, drawing from the above considerations, I propose a new transliteration alphabet, which I will call the Slot alphabet for obvious reasons.

I think that, being based on the inner structure of Voynich word types, this alphabet is more suitable than others when performing statistical analysis that relies on characters in words or when attempting to decipher the Voynich, where a one-to-one correspondence between the transliteration characters and the Voynich characters is paramount.

In addition, the alphabet can be easily converted into EVA, and vice-versa, therefore being used interchangeably.

The below table defines the Slots alphabet and compares it with other transliteration alphabets.

The Slot alphabet and a comparison with other transliteration alphabets

In some of the above alphabets, sequences of EVA ‘i’ are treated differently, depending on the letter following the sequence. Therefore, there is no unique way to transliterate sequences of ‘i’ into these alphabets without looking at the whole Voynich word being transliterated.

For Titlman, ‘p’ and ‘f’ are variant forms of ‘t’ and ‘k’ respectively; similarly ‘cph’ and ‘cfh’ are variants for ‘cth’ and ‘ckh’. I assume EVA ‘m’ is transliterated as a variant of ‘l’ (Tiltman’s ‘E’).

A transliteration of the Landini-Stolfi interlinear file that uses the Slot alphabet is available within v4j library and accessible using VoynichFactory factory methods.

Comparisons with Previous Works

I am not the first one analyzing the internal structure of Voynich words; as this section is going to be long and possibly continuously updated, I created a separate page.

Conclusions


Notes

{1} I have removed gallows and pedestalled gallows from slot 7, where they additionally appeared in earlier versions of this working note. This because my subsequent attempts at creating a state machine that models word structure lead me to believe this was a more correct and concise description.

{2} Class Slots has been used to perform this analysis. An Excel (Slots.xlsx) with its output can be found in the analysis folder.

{3} Class CountCharsBySlot has been used to produce this table.

{4} Class CountRegEx can be used with regular expression [^\\.]*[gxvujbz]+[^\\.]* to find words with rare characters.

{5} Class FindStrangeCH can be used to list words with these “strange” occurrences of ‘c’ and ‘h’.

{6} Stolfi came to the same conclusion when defining his grammar for Voynichese words.

{7} To perform this analysis class RemoveChar has been used.


« Home

Copyright Massimiliano Zattera.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.