[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Proof, a la Rugg. that the german language is nothing but a hoax (shorter and perhaps more interesting)

> This was posted today on sci.lang and other newsgroups.
> Even though the author might not have used a Cardan grill,
> this is complete gibberish. But much of it is recognizable
> English. Therefore...? Yes, therefore, anything written
> in English is gibberish. Obvious, isn't it?
> (... lots of lines removed ...)

Hello alltogether,

a few days ago I tried to generate random text fragments with the same
entropy as the EVA transcriptions of the VM. I compared them against the
real transcription text - but my results does not show me new features
of the VM text.

I wrote a little program and called it "nonsense" - it takes an
arbitrary text as input and generates random and meaningless output with
the same entropy in groups of N characters. That's what I call "data
processing": fine input and gibberish output. ;-)

(If someone want to experiment with the program - contact me via mail.)

Of course I checked my program by processing some amount of meaningful
text. A few weeks ago I began to write an introducting essay about the
VM in german language (I am a native speaker), and I decided to test my
program on it.

I assume that most readers of the mailing list do not understand german.
If you look at the following paragraphs of text, you may find it
difficult to decide which of them is german text and which is nonsense:

  Bei jedem heutigen Entschlüsselungsversuch und
  jeder anderen Form der Forschung am Text des
  Manuskriptes wird man einen Computer zur Hilfe
  nehmen wollen, das ist gar keine Frage. Und damit
  stellt sich ein Problem, welches die Schwierigkeiten
  der Symbolinterpretation auf die Spitze treibt:
  Der Text muss in einer Form vorliegen, die mit
  Hilfe eines Computerprogrammes zu verarbeiten ist.

  Beine Eine Schrauenles Buch ebe sind scher
  Zustracher Um wicht sind. So bekannen das Editors
  wüsser Fraus als einem ist sind ja schen noch gut
  ungen nich ein Objekten. Wer Annahme: Im an einender
  an durchspracht gewöhnliche Kulturmentielem, wares
  Mitten Dabet nache beschließensichkeits das Manuskript,
  dem Gruppen ihm ein des Notationen hine kleidung, Dokume
  ermöglichen Textes sind.

The second paragraph is the nonsense output. I processed a larger chunk
of generated nonsense, and I found that the word length distribution is
near to my input text, that the distribution of characters, pairs of
characters and triplets of characters is near to my input text, that the
average sentence length is near to my input text and that nearly 50
percent of my generated words are legal german words.

The "nonsense" text generation can be done by using a (well created)
three-dimensional array of character frequencies and a random number
generator in a very simple algorithm. If someone does not understand the
german language and creates fragments of gibberish text by populating an
array and applying an algorithm to that data, which creates "german
looking" text with some statistical properties near to real german
texts, what does it proof; what a kind of evidence is it? For Rugg it
seems to proof that german is gibberish - and the "problem of the german
language" is solved. I am a native speaker of the german language and do
not exist too, q.e.d.

(In my program I did not use an array, but it is possible to do so.)



To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list