[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Gordon Rugg's study follow ups
Bruce Grant wrote:
This raises an interesting question: is there a good way to measure
the similarity of texts objectively?
(I realize that calculation of entropy is one such test, but a broad one
to be sure.)
Way too broad, I think.
I have seen a couple of books which applied "stylometric" techniques to
New Testament Greek texts, for example, by comparing the relative
frequencies of synonyms, but these techniques don't appear too useful
for a text whose meaning is unknown.
Yes, about 40 years ago I saw article on that, in our
favorite publication in fact! The article said that not all
of Paul's letters in the New Testament are in fact by Paul,
something now accepted by the majority of scholars. I also
saw a book, *Trouble Enough*, doing the same thing on the
Book of Mormon. I think the algorithm counted common words
like "the", "and", various prepositions, etc. I believe
this is also what the New Testament studies did. The
*Trouble Enough* study showed that the books of the Book of
Mormon were not by several authors. So these methods
compare texts within a corpus and could help establish the
difference between A and B, but I don't know what else they
Most recently, the SHAXICON style checker showed that
Newsweek staffer Joe Klein wrote the Clinton-era *roman a`
clef* "Primary Colors". I've also seen a style checker used
to identify a notorious troll on USENET. I don't know how
these programs work. Jim Gillogly mentioned SHAXICON on the
list a long time ago, so perhaps he does.
Gabriel compared the Zipf's Law curves of known Latin texts
by different authors, and due to that wondered whether A and
B are as different as we think. So that's something else to
There is the chi-squared test, of course. Jacques has
mentioned the phi-squared test from time to time and said
that phi-squared tests not just the significance of an
observed difference between two sample data sets but also
the magnitude of the difference. That sounds like the best
thing of all.
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: