San Francisco-based financial planner uses Adobe's Paper Capture tool to ensure that text in images is searchable once scanned. Plus, PDFzone checks out some other options.A rhetorical question for the Internet age: If you can't search on the contents of a digital document, is it still better than paper? For Michael Altschul, IT manager at San Francisco-based financial planning firm Kochis Fitz, the answer is both yes and no.
With the proliferation of desktop search technology, Altschul has been excited about including search in his company's budget for the coming year. So far, he said, everything works magically with desktop search tools, including Word, Excel, e-mail, and non-image PDFs. But there's one major area that's lacking: image PDF search in conjunction with batch OCR (optical character recognition).
Like many companies that are awash in paper files and client documents, Kochis Fitz has been striving to cut down on paper use. Toward this goal, about 80 percent of all the files on the network are scanned PDFs. But Altschul has not found a search product that adequately searches on the text contained within images. The ability is vital, he noted, because many of the images within documents contain relevant information on client accounts.
Click here to read about PDFzone's search for the perfect duplex, direct-to-PDF scanner.
The lack of image search capability is hindering the company from going fully paperless, Altschul said.
"At present, we scan client, mission-critical documents only," he noted. "For these documents, the accessibility of paper files is almost on par with electronic accessibility for the lay user, so we don't gain anything by scanning documents pertaining to the day-to-day firm management."
If the company could easily access scanned documents via keyword searches, Altschul said, then scanning everything in the office would be a snap.
Some Solutions
The difficulty with PDF image files is that when they are created from a scanner or an online fax service, the PDF doesn't contain text information. Like a photograph, the PDF is just a snapshot of the document. That makes it tough to tweeze out searchable elements such as individual pieces of information contained in the image.
When Altschul went looking for vendors to solve the problem, he found few that could address the challenge, but one did seem up to the task.
Adobe Systems Inc.'s Acrobat Professional, version 6.0 or 7.0, can do the conversion using the Capture tool, or batch mode for Capture. The standard version of the application has the OCR tool but doesn't include the batch mode.
"What I like about Capture is that it seems to write the OCR-[translated] text to the PDF file, such that desktop search utilities can recognize text," said Altschul. "The other vendors I researched lacked such tight integration."
Another alternative is Google Desktop Search, which Altschul has been considering as an alternative to Capture. Released in March, the Google utility supports PDF search for the first time.
To take advantage of the technology, ScanSoft Inc. has developed the OmniPage Search Indexer, a plug-in that mainly supports PDFs containing text, but also has the ability to use OCR and index image-based PDFs that have scanned text. The application returns results on a locally served page.
ScanSoft's OCR technology recreates the text information from the image content without doctoring the original file. This is crucial, according to the company, because many images have to be preserved as is for legal or regulatory reasons to show the authenticity of a receipt, contract, or piece of correspondence.
Click here to read more about searching PDFs with Google Desktop Search and OmniPage Search Indexer.
One more tool up for consideration is the AutoCapture-X4 from Acrobotics Inc. According to the company, the tool provides complete automation to use OCR on an unlimited number of PDF files. Continuous monitoring is done of complex folder and subfolder structures for automated, unattended OCR translation of PDF files that are added to any folder or subfolder. With AutoCapture-X4, all Acrobat 7.0 Paper Capture features are supported and automated.
Once Altschul has PDF image search in place, he's confident that he can move closer to a goal of implementing a utility that uses OCR to batch-convert all of the company's scanned PDFs.
An application that can handle both tasks is the FineReader, from Abbyy Software House. In the 7.0 Professional Edition, there is enhanced PDF recognition accuracy and functionality, according to the company.
"Individually, these products are cool," Altschul said. "Collectively, they would quickly become invaluable."
Can PDFzone help you solve a problem? Just ask. Click here to e-mail Editor John MacKenna.