[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Doubled words



Hi!

Quick question on this one:

--- Jorge Stolfi <stolfi@xxxxxxxxxxxxx> wrote:

>  > [Philip Neal:] If the current word is qokeey,
there is a 6%
>  > chance that the next word will be qokeey.

> The repetitions of "qokeey" are indeed exceptional,
>
> [...]
>
> I looked for doublets (consecutive word repeats,
> ignoring punctuation)
> in some of my reference texts, see the table below. 
> The columns are
> 
>   ndup   number of doublets in the text
>   fdup   frequency of doublets relative to num of
> tokens
>   topwd  the most frequent word appearing in those
> doublets
>   ntd    count of "topwd topwd" doublets

Here's my question: are ndup and fdup based
on the sum over all words, or for the most
commonly reduplicated word only?

>   sample   language   book                    ndup  
> fdup  topwd      ntd
>   -------- ---------- ----------------------- ----
> ------  ---------- ---
>   chin/red Mandarin   Dream_of_Red_Mansion     351
> .01002  lao3   (*)  44

So: what would fdup for the most commonly reduplicated
word be?

By the way, what is table-guessed Pinyin? :-)

Cheers, Rene

__________________________________________________
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com