[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: list of words from candidate languages...



Hello

For some time I have been playing with the file Intrln17.txt...
I loaded it into a Sql database and I do queries on the occurence of words
(like so for instance:
select mot, count(*) from voynich
where codlangage = 'F' and charindex('%',mot) = 0 and charindex('*',mot) = 0
and charindex('!',mot) = 0 and charindex('[',mot) = 0 and charindex(']',mot)
= 0
and charindex('|',mot) = 0 and patindex('%AM%', mot) <> 0
group by mot
having count(*) > 10
order by len(mot), mot desc, produce this list:
AM	233
TAM	41
SAM	16
RAM	75
OAM	24
HAM	42
EAM	12
DAM	90
8AM	827
2AM	163
TDAM	18
T8AM	14
ORAM	45
OHAM	139
OEAM	41
ODAM	181
O8AM	54
HZAM	13
GHAM	36
GDAM	38
G8AM	19
EDAM	39
ARAM	11
4OAM	18
TODAM	12
TO8AM	39
TC8AM	30
SO8AM	18
SC8AM	12
OEDAM	44
4OHAM	94
4ODAM	336
4O8AM	39

This helps me identify a key (in this case the "word" AM) and its occurance
in the "text".
I think this particular key (AM) has interesting enaugh behaviour to make it
a good candidate
to identify the language... Lots of 4 letter words ending in AM, like baLL,
taLL, waLL in the
case of english... Which brings me to a suggestion to the group...

FWIW It might be of interest to all to have a place where one could download
a list of words in
any particular languages that one might think of... I suppose most of you
scan the web for
dictionaries and the you extract the words to have a list... Like I did! If
we put all of our
files in one place (say the voynich site) we<d have a very good sample of
languages from old french,
english, italian, latin, even khowar :) Of course the idea is to "train" a
program to match the
"key's behaviour" ("AM") in the target language's word list...

If anyone is interested in having the vb code that loads the interlinear
file into sql I can
email it. The end result is a 7mb sql database which I could also upload to
a ftp site...

That's it for now, just thought it worth mentionning that stuff about a
common place where
one would get "words" from various languages...


Cheers.

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list