default userpic

Seach PDF files

Vote 0 Votes

Hello,

I've just received an enquiry from a client of mine who is planning a site using Movable Type as CMS. The important content of this site will be a series of PDF files and they want a facility where users can search the content of these files.

Are there any Movable Type plugins that can seach and index PDFs? I've been searching this forum and on the web without much luck so far.

Any other ideas for this will be a great help.

Thanks in advance,

Rio

8 Replies

| Add a Reply
  • It sounds to me like what your client needs is a good search tool instead. I actually can't think of a single open source CMS which does what your client needs. Integrating a separate web-based search tool into a site with MT, however, should be not be particularly difficult.

  • Hi Mike,

    Thanks for you input. I didn't think about this option.
    However, I did searched for the search tool for this job but so far I haven't come across any yet. Do you know any good web-based search tool you can recommend?

    Rio

  • If your host is a company with a decent budget, you might want to consider Autonomy. Otherwise, I don't know of anything that would be suitable that I can vouch for.

    WRT Autonomy, it's an index and search **service** that you install on the server, not a web app. All of its index and search features can be controlled via simple HTTP requests. Therefor, if your client has some money and wants a large, scalable search system for a big web site, you could manage most of the site via MT and write your own CGI or PHP scripts which would query Autonomy via a HTTP request, parse the returned XML and display it cleanly to the user.

  • Autonomy can also expensive, like 'moon landings' expensive.

    How many PDf's are you looking to index?
    What platform are you running MT on?

    If you have access to developers you could do something with Lucene (http://lucene.apache.org/) or DTSearch (http://www.dtsearch.com/).

    If this a public site you could have google index them and then add a custom google search box.

    • The problem with Lucene is that it requires a Java environment in order to work. Unless they are doing dedicated servers or running this on a corporate network where they can build our their own configuration with a good budget, they should stick to stuff that works on the LAMP stack.

  • This is an interesting series of articles on using PHP (Zend) Lucene to index PDF's http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-1-pdf-meta-data-2081.html

  • Hi there,

    It sounds more complicated than I first thought.

    This will be a brand new site so the platform will be whichever the latest version of MT at the onset of the project. The number of PDF files to index won't be more than 1000.

    I have found this application called SearchBlox yesterday.
    Has anyone heard about this one?
    http://www.searchblox.com/index.html

    If this application works well, maybe I can add a search panel to MT to access it. Does this sound like a good idea?

  • Just make sure that it's easy to customize its UI so you can integrate it cleanly into your site.

Add a Reply

If you need to share template code, replace all the "<" signs with "&lt;" or use this utility.

Forum Groups

1773 6162

Last Topic: Excluding categories from blog by kholechek on Feb 9, 2012

86 302

Last Topic: website entries by masoud on Oct 26, 2011

1429 5077

Last Topic: What apocalypse hit this community in the middle of 2011? by 75th on Feb 10, 2012

695 2910

Last Topic: Insert Image / File Fails by Russ Miller on Feb 10, 2012

84 291

Last Topic: How to have some other characters in entry basename automatically written by Afshin Haghighatnia on Dec 22, 2011

173 737

Last Topic: About the MT version stated in HTML source by Alex E. Schneider on Feb 7, 2012

190 567

Last Topic: Analytics Reporting by michael webster on Feb 5, 2012

48 210

Last Topic: An idea and also a request by Afshin Haghighatnia on Jun 29, 2011

64 246

Last Topic: jQuery in MT 5.1 still at 1.4 - why? by perlmonkey on May 25, 2011

code.sixapart.com

137 478

Last Topic: Getting a thumbnail with xpath by Peter on Mar 13, 2011

222 720

Last Topic: Custom Field for Asset Not Appearing by android on Feb 9, 2012