Voynich for Java (v4j) library
Last updated Dec. 19th, 2024.
This note refers to release v.4.0.0 of v4j; links to classes and files refer to this release and files might have been changed, deleted or moved in the current master branch. In addition, some of this note content might have become obsolete in more recent versions of the library.
Working notes are not providing detailed description of algorithms and classes used; for this, please refer to the library code and JavaDoc.
Please refer to the home page for a set of definitions that might be relevant for this working note.
The class ‘MostUsedTerms’ finds top 20 most used word types for each cluster defined in Note 003 and prints out the result in .CSV format.
An Excel file (“MostUsedTerms.xlsx
”) containing this data can be found under the
analysis folder.
The below table summarizes the results, showing, the relative frequency of word types in each cluster.
As expected from cluster analysis, beside word types that appear frequently in all clusters (such as ‘chey’, ‘daiin’, ‘dar’, ‘dy’, and ‘or’), there are word types characteristic of a single cluster; the table below shows them.
It might be interesting to note that:
Most common word types in Herbal A pages (HA cluster) start with ‘ch-‘ or ‘sh-‘; the latter prefix appearing only here,
Pharmaceutical (PA cluster) common word types end in ‘-ol’, which is rare for other clusters. In addition, they seem to prefer the ‘ok-‘ or ‘qok-‘ prefix.
Zodiac (ZZ) common word types mostly start with ‘ot-‘, this is uncommon for clusters above. Moreover, these pages feature single characters as common word types.
Copyright Massimiliano Zattera.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.