Hello,
I've just received an enquiry from a client of mine who is planning a site using Movable Type as CMS. The important content of this site will be a series of PDF files and they want a facility where users can search the content of these files.
Are there any Movable Type plugins that can seach and index PDFs? I've been searching this forum and on the web without much luck so far.
Any other ideas for this will be a great help.
Thanks in advance,
Rio
It sounds to me like what your client needs is a good search tool instead. I actually can't think of a single open source CMS which does what your client needs. Integrating a separate web-based search tool into a site with MT, however, should be not be particularly difficult.
Hi Mike,
Thanks for you input. I didn't think about this option.
However, I did searched for the search tool for this job but so far I haven't come across any yet. Do you know any good web-based search tool you can recommend?
Rio
If your host is a company with a decent budget, you might want to consider Autonomy. Otherwise, I don't know of anything that would be suitable that I can vouch for.
WRT Autonomy, it's an index and search **service** that you install on the server, not a web app. All of its index and search features can be controlled via simple HTTP requests. Therefor, if your client has some money and wants a large, scalable search system for a big web site, you could manage most of the site via MT and write your own CGI or PHP scripts which would query Autonomy via a HTTP request, parse the returned XML and display it cleanly to the user.
Autonomy can also expensive, like 'moon landings' expensive.
How many PDf's are you looking to index?
What platform are you running MT on?
If you have access to developers you could do something with Lucene (http://lucene.apache.org/) or DTSearch (http://www.dtsearch.com/).
If this a public site you could have google index them and then add a custom google search box.
The problem with Lucene is that it requires a Java environment in order to work. Unless they are doing dedicated servers or running this on a corporate network where they can build our their own configuration with a good budget, they should stick to stuff that works on the LAMP stack.
This is an interesting series of articles on using PHP (Zend) Lucene to index PDF's http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-1-pdf-meta-data-2081.html
Hi there,
It sounds more complicated than I first thought.
This will be a brand new site so the platform will be whichever the latest version of MT at the onset of the project. The number of PDF files to index won't be more than 1000.
I have found this application called SearchBlox yesterday.
Has anyone heard about this one?
http://www.searchblox.com/index.html
If this application works well, maybe I can add a search panel to MT to access it. Does this sound like a good idea?
Just make sure that it's easy to customize its UI so you can integrate it cleanly into your site.