[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: truncated repeating sequences

To: vms-list@xxxxxxxxxxx
Subject: Re: VMs: truncated repeating sequences
From: Gabriel Landini <G.Landini@xxxxxxxxxx>
Date: Thu, 9 Sep 2004 16:10:31 +0100
In-reply-to: <JAEKJOMCOCMKCPMKKHGMEEGMCKAA.markefincher@travelinfosystems.com>
Organization: The University of Birmingham, UK.
References: <JAEKJOMCOCMKCPMKKHGMEEGMCKAA.markefincher@travelinfosystems.com>
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx
User-agent: KMail/1.7

On Thursday 09 September 2004 15:07, Marke Fincher wrote:
> The next step is to see if >99% of the VMs can be created by pasting
> together decent sized chunks taken from a small set of "master sequences". 

The answer is very likely to be "no" because of the word frequency 
distribution.
Note that the procedure will have to fit (somehow) all the words that appear 
once or twice in the entire ms. The number of those words is larger than 1% 
so there is no chance that 99% of the ms. is produced with other repeated 
sequences. 

I just had a look and words appearing once are about 14% of the corpus.

It would be also useful to take a look at Stolfi's concordance lists to see to 
which extent are the repetitions common. 

I would also say that it is important to test the same algorithm with other 
data (i.e. other languages and word-scrambled texts). A sample of n=1 will 
not be very convincing as we do not know relevant the effect may be. 
Worth doing, though. I would be interested to know how common this effect is 
in real languages.
I am sure that lots of repetitions are found in Askham's herbal. If I remember 
correctly, most plants descriptions start "This herbe is called ...". 

Cheers,

Gabriel

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

Follow-Ups:
- RE: VMs: truncated repeating sequences
  - From: Marke Fincher
- Re: VMs: truncated repeating sequences
  - From: Koontz John E

References:
- RE: VMs: truncated repeating sequences
  - From: Marke Fincher

Prev by Date: VMs: RE: Rosette photos
Next by Date: Re: VMs: Q. is there a to-do list?
Previous by thread: RE: VMs: truncated repeating sequences
Next by thread: RE: VMs: truncated repeating sequences
Index(es):
- Date
- Thread