There are several PDF modules available for python, so far I’ve found Slate to be the simplest to use and PDFMiner to be potentially the most powerful but also the most complicated to use.  For the problem I needed to solve: extracting text with whitespace characters intact I found the following fragment of PDFMiner code on StackOverflow to be only solution:

"""Extract text from PDF file using PDFMiner with whitespace inatact."""

from pdfminer.pdfparser import PDFDocument, PDFParser
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter, process_pdf
from pdfminer.pdfdevice import PDFDevice, TagExtractor
from pdfminer.converter import XMLConverter, HTMLConverter, TextConverter
from pdfminer.cmapdb import CMapDB
from pdfminer.layout import LAParams
from cStringIO import StringIO

def scrap_pdf(path):
    """From http://stackoverflow.com/a/8325135/39040."""
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = file(path, 'rb')
    process_pdf(rsrcmgr, device, fp)
    fp.close()
    device.close()
    str = retstr.getvalue()
    retstr.close()
    return str

If you don’t need whitespace to be left intact I’d strongly recommend Slate over PDfMiner as its significantly easier to work with, although it does offer a smaller feature set.