[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: VMs: introducing myself and 2 questions :)



Hi Sebastian,

oh sh.... those who need to click at all these links will end with a carpal
tunnel syndrome at the end *g*
i downloaded the jpegs an did a short compressing test with photoshop:
its possible to make them about ~30% smaller without any notable loss in
quality. i would do the compressing and zip them in one file, but:
is it legal to offer them for downloading?

Commendably, GC has already done this (in a 23MB PDF) on his http://www.baconbooks.net/ - you can find it on http://www.baconbooks.net/Voynich/voynich.htm to be precise. Thank heavens for broadband, eh? :-)


Personally, I prefer having individual files to the PDF (perhaps because I'm used to using PhotoShop and Debabelizer Pro, etc) - but feel free to use whatever works best for you. :-)

i thought about making a mysql-db with a entry for each mail. fields would
include author, date and so on. so each mail would become a single
databaseentry.
these entrys would be easily searchable so this would be an answer to the
question:
>(b) [user-centric]: what do we mean by "useful"?

I'm not sure I agree - as you'll probably have noticed already, the "Subject:" line of the VMS is largely irrelevant to where most threads lead. Also, only rarely would you want to look at every post by date or by From: (though I should perhaps say that I have done this for GC and for Rayman Malekei, to name but two)... the vast majority of times, the purpose of trawling the archives is to find what has been discussed about a particular topic or person (say, Edward Kelly, Pietro d'Abano, alchemical projection, etc), which is a kind of glorified grep.


The majority of searches, then, would be content-field-based rather than structure-field-based - which is not really where you want to be at with mySQL. :-(

hm, its possible by coding a fetcherprogramm that cuts the textfiles at each
mails beginning and end. i have some experience with this, because a few
months
ago i needed to transform html-pages from a nonexisting db-system into a
mysql-db.
its not that hard, but you need a separatortext where you can see, that an
old
mail ends and a new mail starts. i had a short look at the txtfiles and
maybe
this could be "From ".

There's also a lot of quoting going on, which might throw that out - check carefully that you'll get what you expect. Apps like MHonArc work hard to divide stuff up sensibly, but not always successfully.


cool idea, but we need to feed google first.

Absolutely - but the mistakes you make at the very first stage are likely to stay with you for a good while, so it's worth trying to do it right. :-o


we could cut the mailarchive in pieces like i described above and generate
a html-page for each email and than link it from several webpages so that
google
will index it, because afaik google dont index dynamic pages?

Like most web-crawlers, Google will index dynamic pages fine, as long as they have a unique URL (ie, ?msg=0045667 etc) to be referenced by, and a way for its robot to crawl through them all (for example, a big list of threads, which each thread cross-linked internally)


AFAIK, it's *frames* and *session-based stuff* which it can't digest. :-o

well it is at least something that will keep me busy thinking about the
whole
weekend and my girlfriend will wonder why i take the dog for a walk 4hours a
day *g*

It's a "three pipe problem", for sure! :-)


Feel free to email the list as your thoughts develop, we'll try to move it forward between us. :-)

Cheers, .....Nick Pelling.....


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list