Tuesday, May 1, 2007

[BLUG] May meeting

Who wants to voluteer to present for the May meeting? I was hoping to
be able to present on Xen this month, but I haven't had the time to work
on it enough to do a presentation.

BTW: I actually enjoyed the "grab bag" meeting we had last month. Some
times it is just nice to find someone who I can actually discuss Linux
and Open Source with. I probably drive the wife-to-be nuts when I get
chatty.

Thank you,
Scott Blaydes
_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

Re: [BLUG] open source search engines?

I favor Namazu much more than ht://Dig.

http://www.namazu.org/

It can search any type of document provided you create a (Perl) filter
for it. It also has support for boolean expressions.

It has support(+) for (from /usr/share/namazu/filter):

apachecache.pl gzip.pl man.pl postscript.pl taro56.pl
bzip2.pl hdml.pl mhonarc.pl powerpoint.pl taro7_10.pl
compress.pl hnf.pl mp3.pl rfc.pl tex.pl
deb.pl html.pl msword.pl rpm.pl
dvi.pl macbinary.pl ooo.pl rtf.pl
excel.pl mailnews.pl pdf.pl taro.pl

(+) Some of the filters require third-party tools to extract the data in
to a more easily parsable form.

Creating filters is a straight-forward process. I've created several.
This can be done, for example:

* You have internal documentation in HTML with a strict style
guildeline. (For example, a title page with the author and date modified
which is more reliable than the HEAD tags.) You can leverage the benefit
of the known style to pull meta-data from the document.

* You use a mail to HTML gateway like Pipermail and you want the correct
meta-data displayed.

What is neat with Namazu is that in addition to a CGI-based interface,
it also provides a command-line interface. I get a kick out of searching
my documents from the command-line.

Not all PDF documents will be easily parsable. Some PDF documents are
actually stored as images internally. (For instance Faxes which arrive
as PDF documents.) In such cases you would want to use OCR software on
the PDF. (You would likely need to convert it to another format first.)

Cheers,
Steven Black

On Tue, 2007-05-01 at 14:45 -0400, Joe Auty wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> Does anybody have any experience working with open source search
> engines? I've looked at ht://dig, but had some problems getting it to
> do what I wanted. Has anybody used any other?
>
> Requirements:
>
> - - must be able to search .doc, .pdf, and a wide variety of other formats
> - - must be able to pass on a username and password to a site using
> Apache basic authentication
> - - must work over SSL sites
> - - must return useful results :)
>
>
> - --
> Joe Auty
> NetMusician: web publishing software for musicians
> http://www.netmusician.org
> joe@netmusician.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGN4q9CgdfeCwsL5ERArvUAJkBklb7yKsMZoQWz6dDzuI4ONl0qACcD8al
> 9nveG4STsH9pbYDlB1YPHgU=
> =5FAT
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> BLUG mailing list
> BLUG@linuxfan.com
> http://mailman.cs.indiana.edu/mailman/listinfo/blug

_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

[BLUG] open source search engines?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Does anybody have any experience working with open source search
engines? I've looked at ht://dig, but had some problems getting it to
do what I wanted. Has anybody used any other?

Requirements:

- - must be able to search .doc, .pdf, and a wide variety of other formats
- - must be able to pass on a username and password to a site using
Apache basic authentication
- - must work over SSL sites
- - must return useful results :)


- --
Joe Auty
NetMusician: web publishing software for musicians
http://www.netmusician.org
joe@netmusician.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGN4q9CgdfeCwsL5ERArvUAJkBklb7yKsMZoQWz6dDzuI4ONl0qACcD8al
9nveG4STsH9pbYDlB1YPHgU=
=5FAT
-----END PGP SIGNATURE-----

_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug