[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Word Dependency -> ARABIC??

>Gabriel Landini:
> Let me see if we agree to the terminology we have used with respect of the 
> vms:
> A paragraph may be made out of various lines of text, so in a single paragraph 
> we have as many <BOL>s and <EOL>s as lines there are in the text.
> The first <BOL> is also the begining of paragraph <BOP>.
> The last <EOL> is also end of paragraph "=" or <EOP> and it must be the end of 
> a sentence as well.
Now I agree with your terminology. 
>Gabriel Landini:
> But we know that they must start at the begin of paragraph <BOP>, and end 
> somewhere else. We know that at least one sentence (which we do not know were 
> it started) should be ending at the end of paragraph "=" or <EOP>. 
> <EOL>s may or may not coincide with "end of sentence" (most likely that they 
> will not). But! there is an interesting issue with <EOL> characters as 
> observed by Currier (namely some characters tend to appear more ofter there). 
> This can also be inferred from the spectral analysis plots of the VMS without 
> any spacing, as there is a (not very high) peak in the power spectrum that 
> more or less corresponds to the modal line length of the vms.

My hypotesis should be like follows:

The hypothetical end-of-sentence "=~" should be added in the following cases:
(1) Between words, "statistically strong" at the <EOP>, and gallows
(like ch-words, d-words, -m words, -g words, -y words (not -dy, not -ey))
(2) Between words, "statistically very strong" at the <EOP>, if they are on the
<EOL> (-m words, -g words)
(3) Before gallows, if they are after <EOL>
Exclusion: One-word-sentence should be avoided.

All my best

To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list