[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMS-List-Archive Search Engine (was: AW: AW: VMs: introducing myself and 2 questions :))



Sebastian Unterreitmeier sebastian@xxxxxxxxxxxxxx

> hi jeff,
>
> im not really shure if i understand you right, because of my limit
> english-abilities, but i think you mean something like a searchindex?
> i think its like the idea that nick had:
> for often searched terms like the pagenumbers or peoplenames like
> bacon or dee we create a table in the database where in one field
> the searchterm is stored and in the other field an array of all
> emailnumbers where you can find this searchterm is stored.
> if one types in the searchterm the system searches first in this
> table and if the term is not referenced there it searches the
> whole database. if the term is listed, the system reads the array,
> fetches the emails referenced there and just need to give all emails
> out to the browser. the referencing progress to build up the
> indextable would be a simple cronjob running at night.
>
> we could use this also to build up a page with the indexlist, so one
> can easily click on it, like nick mentioned.
> for example you can have a list with all the page-numbers on it and
> so you can easily retrieve all informations i.e. for page f5v (or
> all informations about "bacon"). :)
> i would also precreate this information as a textfile at night
> with a cronjob, so that one can simply download this information
> for offline reading (of course pdf is possible, but there a license
> for this is needed.)
>
> kind regards,
> sebastian

Hi sebastian. What I really meant was text compression where common words
became numeric tokens. The word manuscript could be replaced by the number
(or hash) #1000 as a token and other common words by other allocated
numbers. The server would accept a search criteria and first tokenize this
before doing the database search. Once retrieved one of two things could
happen (worst case) the server does the demangling or javascript at the
client end could demangle. It all depends how big the token table gets. If
the exact number of tokens were determined at the server end an on-the-fly
lookup script could be constructed and sent back to the browser. Another
method would be to have a java applet at the client end collect the
tokenized results and connect to a server process at the server machine on a
specified port. This server would carry out database lookups as and when
necessary in real time. The socket connection could be a php script
returning a text/plain content type from port 80 so it could still run
through the web server. The applet would only need to issue an http get to
the php url.

Any comments welcome.

jeff


______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list