[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Word Dependency -> ARABIC??
On Wednesday 19 Mar 2003 10:11 am, vladimir@xxxxxxxx wrote:
> Here I mean "sentence" as formally delimited by "=" in the transcription.
I think that we are using some confusing terminology.
Perhaps this would be "end of paragraph" which must be the end of a sentence
too. That corresponds to the end of a block of text.
> So, I use <EOL> and <BOL> accorsing to "=", begin of parahgaph,
> end-of-paragraph.
Let me see if we agree to the terminology we have used with respect of the
vms:
A paragraph may be made out of various lines of text, so in a single paragraph
we have as many <BOL>s and <EOL>s as lines there are in the text.
The first <BOL> is also the begining of paragraph <BOP>.
The last <EOL> is also end of paragraph "=" or <EOP> and it must be the end of
a sentence as well.
> Really I do not know beginns and ends of real "sentences".
Yes, we agree here :-)
But we know that they must start at the begin of paragraph <BOP>, and end
somewhere else. We know that at least one sentence (which we do not know were
it started) should be ending at the end of paragraph "=" or <EOP>.
<EOL>s may or may not coincide with "end of sentence" (most likely that they
will not). But! there is an interesting issue with <EOL> characters as
observed by Currier (namely some characters tend to appear more ofter there).
This can also be inferred from the spectral analysis plots of the VMS without
any spacing, as there is a (not very high) peak in the power spectrum that
more or less corresponds to the modal line length of the vms.
It would be interesting to see the distribution of words at the <EOL>s
(excluding <EOP>s) versus the <EOP>s.
This could (if *very lucky*) indicate which are the likely candidates to "end
of sentence" within a chunk of text.
> Later, I am going to introduce internal (hypothetical) sentence-delimiter.
> For example, if I see somethyng like "am kydaiin" or "daiin pcheol" or
> "chey tschor", I will inset an end-of-sentence between them .
> "am =~ kydaiin" , "daiin =~ pcheol" , "chey =~ tschor"
Look forward to see that!
Best wishes,
Gabriel
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list