ColdFusion SOLR error: org/apache/pdfbox/pdmodel/PDDocument null

I got the following pretty obscure error the other day from a cfscheduler job that runs nightly to index documents uploaded to our site:

org/apache/pdfbox/pdmodel/PDDocument null

Turns out that the error is caused by a file having the extension of .PDF instead of .pdf.  No, really. Luckily I only had one offending file, but what if I had many? Also, what if users uploaded more after I renamed the problematic one? There are two parts to “future proofing” my situation. The first part it to address the .PDF extensions in the uploads. The second part, and what I’m going to pass on to you, is a custom tag that will look in a directory you specify and rename all .PDF extensions to .pdf.

To implement:

  • Download the pdf_cleanup custom tag
  • Unzip it to whatever directory you keep you custom tags in
  • Call it using the following syntax just before you run your <cfindex> operation(s):
    <cf_pdf_cleanup dirToClean="C:\mysuperdocs">

Be forewarned I take no responsibility for your use of the tag ;-)

Download the pdf_cleanup custom tag