Google says it's implementing a fix, but circumventing PDF security right now could be as easy as opening the document inside Google's e-mail service.PDF authors go to great lengths to protect their documents. But circumventing PDF security could be as easy as opening the document inside Google's e-mail service.
Last month, a blog post written by Andreas Bovensa Belgian doctoral candidate in Japanese Studies attending school in Tokyodemonstrated how Gmail's PDF-to-HTML filter could circumvent some rights-management features in PDFs, such as copying and printing limitations set by a PDF document's author.
That loophole, according to Adobe Systems, either is now closed or will be shortly. John Landwehr, Adobe's director of security solutions and strategy, said that Adobe contacted Google when it learned of the issue and the two companies worked together on a fix.
"Google's implementation of Gmail Web-based e-mail was not
accurately interpreting particular permission bits via its PDF-to-HTML conversion," Landwehr wrote in an e-mail to PDFzone. "As an aside, the Google.com search engine does interpret these bits correctly."
The DRM (digital rights management) issue involves the PDF viewer and how it parses the instructions that PDF authors indicate in their authoring software when creating the PDFs. The document spec enables authors to allow readers to print or copy and paste the contents of a PDF document, or to disallow these actions.
According to bloggers who had tested several documents, while Gmail didn't always handle page layout and images with perfect fidelity on DRM-enabled documents, it did allow users to print and copy content the authors had not wished to be duplicated.
Landwehr said that, moving forward, the Gmail HTML interpreter will no longer convert PDFs to HTML if a PDF's owner specifies that the document isn't to be copied or printedsimilar to how the Google search engine HTML interpreter handles such documents: On most PDF documents Google finds, a "View As HTML" button shows up with the search listing. When a PDF author doesn't want text copied or pasted, that button is absent in search results. In quick tests done for this article, it appears the "cached" button also doesn't show up on these documents, either.
Google didn't specify exactly what changes it's making to the Gmail HTML interpreter, but it did confirm that Adobe had contacted the company about it.
"We were notified of an issue with the way PDFs were displayed in Gmail and worked with Adobe on a change that is now being deployed," said a Google spokesperson in an e-mail to PDFzone.
While that covers Google and Gmail, discussion among blog readers suggests that other browsers, online PDF-to-HTML converters and even some creative pasting of content into Mac OS utilities might still be used to crack these PDF DRM attributes.
Such tools enable mobile users, who might only have a Web connection and a browser, to view PDFs on the go so they can get work done while away from their desks. In some cases, the tools enable blind and low-vision users to access PDFs through screen-reading software.
To prevent these tools from being used to access documents not intended for public consumption, Landwehr said that PDF authors should use more robust tools available to them in Acrobat and other PDF authoring tools. If authors assign passwords to their PDFs, HTML interpreters will be less of a problem, he said.
"For greater assurances of information protection, Adobe recommends that customers use the additional encryption capabilities from Adobe that give more granular controls beyond simple permissions to restrict who can open a document," Landwehr said.
This can be accomplished with passwords, public key infrastructure, andfor large companies that can afford to handle DRM for their PDF documents at the server levelenterprise rights management using Adobe Policy Server.