Bamboo PDF Document Parser is a custom document parser for PDF documents and PDF forms.
It allows you to import metadata and form data from your PDF form into your SharePoint list (Promotion) and also populate your PDF document with metadata from the List (Demotion).
Download the installer from Bamboo Labs (http://community.bamboosolutions.com/media/p/4597.aspx)
Extract the zip file:
Run Setup.bat
You will see this setup screen:

Start to install the PDF Document Parser on all your web front ends, this is needed since the parser is a COM object and thus can't be installed using the WSP solution.
After this is done, install the PDF Document Parser build 0.5.0.0 on one of the servers. This is a WSP solution so it will be deployed to all servers in the farm.
Now it's time to activate the feature, it has farm scope so you need to login to Central Admin and go to Operations > Manage Farm Features.

Now you have activated parsing for all files with the PDF extension. Now it's time to create a document library which has properties corresponding to fields in the PDF document. We will use an IRS form SS-4 for our sample.
Along with the document parser comes a command line tool PDFhelp.exe which helps you to list fields in a PDF form and it can also create a document library with properties corresponding to these fields.

It has two different commands, list and create. List does what it sounds like it lists out all fields in the specified document. Create will create fields for each of those properties in a document library that you specify. If the library doesn't exist it will also create it for you.
The way properties from the pdf document are mapped to list fields is by matching the property name to the internal name of the field. After the field has been created you can rename it to a descriptive name this is called the fields Title or Display Name. (Because of this you can not remap properties by renaming an existing Field you have to create a new Field.)
If you run the list command on the IRS form SS-4 you will see something like this:

The listing is [Field Name] == [Value], since we have not filled anything all values are empty.
If we run the create command on this file
PDFhelp.exe -create fss4.PDF http://localhost/ss4library
You will get a document library looking like this:

As you can see the field names doesn't tell us much about the field it's just f1_nn(0). If you have a form with these field names there's an easy way of giving most of the fields a descriptive name and that's by importing a filled out form. When the field has a value the import program will create the field with the proper name but then give it the display name of the field value.
Let's try it on the SS-4 form.
Now let's run the same import again with this form.
And your list looks like this:
Now it's time to upload this form to see that the properties are actually read from the PDF and stored in the list. We will take the same document we just used as a template.

Since this is a pre-release version you should be aware of some limitations:
- Property Demotion is not yet implemented.
- All properties are currently promoted as strings.
Posted
Sep 12 2008, 11:49 AM
by
Jonas Nilsson