[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: list of words from candidate languages...
Hello
For some time I have been playing with the file Intrln17.txt...
I loaded it into a Sql database and I do queries on the occurence of words
(like so for instance:
select mot, count(*) from voynich
where codlangage = 'F' and charindex('%',mot) = 0 and charindex('*',mot) = 0
and charindex('!',mot) = 0 and charindex('[',mot) = 0 and charindex(']',mot)
= 0
and charindex('|',mot) = 0 and patindex('%AM%', mot) <> 0
group by mot
having count(*) > 10
order by len(mot), mot desc, produce this list:
AM 233
TAM 41
SAM 16
RAM 75
OAM 24
HAM 42
EAM 12
DAM 90
8AM 827
2AM 163
TDAM 18
T8AM 14
ORAM 45
OHAM 139
OEAM 41
ODAM 181
O8AM 54
HZAM 13
GHAM 36
GDAM 38
G8AM 19
EDAM 39
ARAM 11
4OAM 18
TODAM 12
TO8AM 39
TC8AM 30
SO8AM 18
SC8AM 12
OEDAM 44
4OHAM 94
4ODAM 336
4O8AM 39
This helps me identify a key (in this case the "word" AM) and its occurance
in the "text".
I think this particular key (AM) has interesting enaugh behaviour to make it
a good candidate
to identify the language... Lots of 4 letter words ending in AM, like baLL,
taLL, waLL in the
case of english... Which brings me to a suggestion to the group...
FWIW It might be of interest to all to have a place where one could download
a list of words in
any particular languages that one might think of... I suppose most of you
scan the web for
dictionaries and the you extract the words to have a list... Like I did! If
we put all of our
files in one place (say the voynich site) we<d have a very good sample of
languages from old french,
english, italian, latin, even khowar :) Of course the idea is to "train" a
program to match the
"key's behaviour" ("AM") in the target language's word list...
If anyone is interested in having the vb code that loads the interlinear
file into sql I can
email it. The end result is a 7mb sql database which I could also upload to
a ftp site...
That's it for now, just thought it worth mentionning that stuff about a
common place where
one would get "words" from various languages...
Cheers.
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list