[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Words, terms, tokens, etc.
On Mar 8, 10:45, Jorge Stolfi wrote:
> The recent exchange about the meaning of "word" and "term" in Antoine's
> messages highlights a common problem.
>
> Anyone who deals with text has the need to distinguish between those
> two concepts. Unfortunately each community (if not each author) will
> pick different names for them. For instance, I recall that Gabriel's
> Zipf law paper uses "word" for the instance, and "token" for the
> dictionary entry; whereas the compiler-writer community makes precisly
> the opposite choice.
>
> So, a plea for all list members: when you post statistics to the list,
> please take the time to define your terms (or words, tokens, whatever 8-).
I agree. I seem to remember a book titled "Type and Token" by
some lexicostatistician or numerolinguist (or whatever-you-call-em)
a long time ago, making this distinction. There might be 130
instances of the word "cat" in a text; to Gustav Herdan that makes
150 tokens of the "cat" type. So when I first saw Gabriel's
terms I had a hard time adjusting.
--
Jim Reeds, AT&T Labs - Research
Shannon Laboratory, Room C229, Building 103
180 Park Avenue, Florham Park, NJ 07932-0971, USA
reeds@xxxxxxxxxxxxxxxx, phone: +1 973 360 8414, fax: +1 973 360 8178
- Prev by Date:
Words, terms, tokens, etc.
- Next by Date:
Re: Words, terms, tokens, etc.
- Previous by thread:
Words, terms, tokens, etc.
- Next by thread:
Re: Words, terms, tokens, etc.
- Index(es):