I wonder what chi2 you would see between two texts in theI don't know a lot about the chi-square test, but whenever I have seen it used in books, it involved a situation where something was being classified into a small number of categories, and then the chi-squared statistic was calculated for the difference between the observed distribution by category and the distribution expected from some hypothesis. If the calculated statistic (which is larger for greater degrees of difference) was too large, the hypothesis was rejected..
For a pair of texts, though, it is not clear to me how to "categorize" the texts, that is, which of the many categorization schemes to use. For example, you could categorize the texts by the distribution of letter frequencies, or the distribution of word lengths,
the distribution of word positions of "gallows letter" words in the line, etc. etc. Each calculation would yield a different chi-square statistic..
The categorization is also sensitive to features of the languages involved. For example, the existence of lots of "fuzzy matches" between sentences in the VMS would suggest the possibility that some characters which EVA considers to be distinct might actually be the same. If this were true, it seems like it would effect the categorization of letter frequencies strongly and could result in a quite different chi-squared statistic.