[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Fw: Character n anomaly



Hi Jorge,

I am afraid that it won't do. With your method, the probability of a
random text token having k letters would be roughly p*(1-p)**(k-1)
where p is the probability of inserting a space (1/10 or 1/6 in your
examples).

The point I was (inelegantly) trying to make was that if the observed distribution was driven by an external process, a good thought-experiment would be to reconstruct how that could have been achieved using only contemporaneous information tools, like dice or cards (or chess-pieces?).


ie: throw two dice, and use (their sum -1) as the length of the next fake token. Why would someone go to the trouble of making a significantly more complex model than that?

One way to test "Nullspace" theories is to remove all spaces from the
VMS text, then re-insert them according to the proposed method. If the
theory is correct, the resulting text should have the same word
statistics and structure as the original. The above space-insertion
methods would definitely fail this test.

Completely agreed. But if the distribution is too artificial for a natural language, and too structured for a random process, what is it indicative of?


Cheers, .....Nick Pelling.....