v4j

Voynich for Java (v4j) library


Project maintained by mzattera Hosted on GitHub Pages — Theme by mattgraham

Note 004 - On Word types

Last updated Sep. 18th, 2021.

This note refers to release v.4.0.0 of v4j; links to classes and files refer to this release and files might have been changed, deleted or moved in the current master branch. In addition, some of this note content might have become obsolete in more recent versions of the library.

Working notes are not providing detailed description of algorithms and classes used; for this, please refer to the library code and JavaDoc.

Please refer to the home page for a set of definitions that might be relevant for this working note.

« Home


The class ‘MostUsedTerms’ finds top 20 most used word types for each cluster defined in Note 003 and prints out the result in .CSV format.

An Excel file (“MostUsedTerms.xlsx”) containing this data can be found under the analysis folder.

The below table summarizes the results, showing, the relative frequency of word types in each cluster.

Most used word types

As expected from cluster analysis, beside word types that appear frequently in all clusters (such as ‘chey’, ‘daiin’, ‘dar’, ‘dy’, and ‘or’), there are word types characteristic of a single cluster; the table below shows them.

Most used word types

It might be interesting to note that:


« Home

Copyright Massimiliano Zattera.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.