ColdFusion SOLR error: org/apache/pdfbox/pdmodel/PDDocument null

I got the following pretty obscure error the other day from a cfscheduler job that runs nightly to index documents uploaded to our site:

org/apache/pdfbox/pdmodel/PDDocument null

Turns out that the error is caused by a file having the extension of .PDF instead of .pdf.  No, really. Luckily I only had one offending file, but what if I had many? Also, what if users uploaded more after I renamed the problematic one? There are two parts to “future proofing” my situation. The first part it to address the .PDF extensions in the uploads. The second part, and what I’m going to pass on to you, is a custom tag that will look in a directory you specify and rename all .PDF extensions to .pdf.

To implement:

  • Download the pdf_cleanup custom tag
  • Unzip it to whatever directory you keep you custom tags in
  • Call it using the following syntax just before you run your <cfindex> operation(s):
    <cf_pdf_cleanup dirToClean="C:\mysuperdocs">

Be forewarned I take no responsibility for your use of the tag ;-)

Download the pdf_cleanup custom tag

7 thoughts on “ColdFusion SOLR error: org/apache/pdfbox/pdmodel/PDDocument null

  1. Did you try this in 901+CHF? Solr fixes were added that may got this. If so, please be sure to file a bug report. Adobe does NOT search out blog posts like this so it’s up to us guys to use the public bug tracker.

  2. Hi, this happened to me but in my particular case, there was a PDF file without extension. So instead of File.pdf it was only File. Thanks for this info.

  3. Wow! Thanks so much for this. I was going crazy trying to find the single PDF in my collection that was causing SOLR to crash with a 500 error. I narrowed it down to one PDF (after putting one in a folder, re-indexing, putting a second pdf file, re-indexing, etc. etc — until I narrowed it down to a single file that would always bomb the indexing.) Anyway, I didn’t even notice the upper-case PDF extension.

    Wow — this is a *BIG* bug in SOLR. Crazy, crazy. Thank you so much!

  4. My CFINDEX crashed when I have a .txt file with html codes inside. Is there an easy way to locate which file that causes the CFINDEX crash.

  5. Did you have the cfindex wrapped in a cftry? If not I’d start there and see if the file name is listed in the cfcatch. Next I would check the SOLR logs. On CF9 Windows they are in C:\ColdFusion9\solr\logs. On CF10 Windows they are in C:\ColdFusion10\cfusion\jetty\logs.

  6. I too have recently received the “org/apache/pdfbox/pdmodel/PDDocument null ” error! (Running CF10, Update 13.)

    I had my cfindex wrapped in a try/catch but this error was caught by my site wide error handler.

    The error occured while trying to index a .cid file. (http://www.ntia.doc.gov/el-cid-support-center) Everything I try to index one of these file types I get the error! Ugh.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>