PDFZone Ziff-Davis Enterprise
Authoring | Utilities | Content Management | Document Management | Mobile | DRM | Other Formats | Tips
Home arrow Utilities arrow Blinkx: Finding and Organizing Your PDFs
Blinkx: Finding and Organizing Your PDFs
By Don Fluckinger

Rate This Article:
Add This Article To:
ADVERTISEMENT
And yes, they could make scanned-text PDFs searchable with on-the-fly OCR, but that's not quite ready, yet, for prime time.

Right now, desktop searching is one of those emerging technologies that in theory—looking through both your hard drive and the Web at the same time for relevant hits—sounds life-changing but in practice still seems like the MP3 player before the iPod came out: Yeah, they work, but no one does it.

 

Microsoft and Apple are building desktop search features into future versions of their operating systems (respectively code-named Longhorn and Tiger), and search engine giants Google and Yahoo have their own branded desktop search utilities. Even ScanSoft gets into the discussion with its PaperPort OCR/desktop organizer software, which doesn't search the Web but it overlaps with a lot of the desktop search capabilities that the others offer.

 

And then there's Blinkx, a software company taking on all of the above. Some of this startup's competitors work only on one platform or, in the case of Google, don't search PDFs.

 

We caught up with Blinkx's founder and CTO Suranga Chandratillake to chat about PDFs, searching them, why he thinks PDF users will like Blinkx and what on earth the company was thinking earlier this month when it issued a Blinkx Mac beta to coincide with Macworld and Steve Jobs' trumpeting of Spotlight, Tiger's desktop search package.

 

Don Fluckinger: Is searching a PDF technically more challenging than other documents?

 

Suranga Chandratillake: No, not really. If anything, they're slightly better. Indexing is pretty much the same thing, but once you've got a search, you can highlight text inside a PDF. The words that you search for are highlighted in a PDF, which you can't do in Word, for example.

 

Fluckinger: Do we already have more PDFs than we can organize on our hard drives? Will we have more in the future?

 

Chandratillake: It's an extremely popular format, particularly in the business context--everything from sales orders and proposal letters all the way to ad copy and brochures. ... Being able to index them and sort through them is critical. There's no way we could have launched a product without support for PDF.

 

PDF is definitely as significant as any Microsoft Office format. In the surveys and analyses that we've done, the biggest data type, by far, is e-mail that can be up to 60 percent of the average person's data. ... The other 40 percent are split between the productivity formats. There are some exceptions. You do get designers, for example, that have a lot of CAD files, but for the average office worker we see a split--by file size--of 35 percent to 40 percent of what's left is PDF, and the rest is split among the Windows Office formats.

 

I think that PDFs are going to get extremely popular. Think of the things we buy on the Web and the services we pay for on the Web. I pay my phone bill, cable bill, power on the Web. All these people send me PDFs. People are just going to keep using it more and more and more, and sure at the moment it's companies sending things off to individuals and that inevitably leads to individuals sending things to each other. That all points to a massive increase. Finding the right PDF at the right time gets to be a bigger and bigger problem, and that's where Blinkx wants to step in.

 

Fluckinger: How does Blinkx deal with password-protected PDFs or files otherwise rights-managed?

 

Chandratillake: We can't search through anything that's encrypted. If you don't have the password, there's no way for us to get to it. If it's in a secure directory or a secure file-share, but you have access to it, then we can still see that file.

 

We can index any version of PDF, including pre-version 4 [of Acrobat], which were actually pretty different formats than the ones that are popular today. The only ones we don't support are those that don't have any text, just images. We do index metadata, however, so if those PDFs have metadata embedded like the company name or author name, it will draw that out.

 

Fluckinger: PDFzone users typically have PDFs of faxes or scanner paper text pages. Will future versions of Blinkx be able to do OCR and search these too?

 

Chandratillake: We can do it, but it's very difficult for us to release. We've played around [with a freeware OCR engine plug-in]. That essentially works, but we need to find a good OCR engine and see if we can license it.

 

And secondly, everything gets a lot bigger. Good OCR engines tend to be 10 to 15MB in size, which completely blows our download size out of the water; right now we're around 5MB. It's definitely possible, but right now it's not ideally structured [for Blinkx]. Once it becomes available in the right way, we'll definitely do it.

 

Fluckinger: Why would Blinkx release a Mac-side utility with Apple building similar features into Tiger?

 

Chandratillake: The main reason is because people asked us to do it. We got lots of e-mails, really from Day One of the PC release, saying, "When are you going to do a Mac version?" Funny thing is, we talked to a lot of journalists, bloggers, writers of various sorts and analysts about the technology in the earlier days, and many of them are closet Mac users and said their personal stuff was on the Mac. Because of that massive demand, we always were going to build one.

 

The thing with Spotlight, it's a phenomenal operating system-level metadata-based search engine, but it doesn't do all the things Blinkx does. When it comes to keyword search or search based on metadata, and doing it very fast, I think Spotlight will rapidly become the industry standard on the Mac.

 

The Blinkx toolbar does a whole bunch of other things: conceptual linking, which automatically looks at text and links you to text without you searching for it, and smart folders. I think when Spotlight comes out it will take care of the straightforward search and we will be seen as the complementary tool that does some stuff alongside it.




Discuss Blinkx: Finding and Organizing Your PDFs
 
>>> Be the FIRST to comment on this article!
 

 
 
>>> More Utilities Articles          >>> More By Don Fluckinger
 



FREE ZIFF DAVIS ENTERPRISE ESEMINARS AT ESEMINARSLIVE.COM
  • Dec 5, 2 p.m. ET
    Case Studies in MSP Profitability: 10 Processes to Automate to Achieve 2008 Goals
    with Michael Krieger. Sponsored by Autotask
  • Dec 6, 12:30 p.m. ET
    The State of the Great Windows Vista Migration
    with Aaron Goldberg. Sponsored by Dell & Microsoft
  • Dec 6, 2 p.m. ET
    Three Best Practices for Securing Microsoft Exchange
    with Michael Krieger. Sponsored by Entrust
  • Dec 6, 3 p.m. ET
    Simplify Your World, part 2: A Virtual Desktops Case Study
    with Joel Shore. Sponsored by EqualLogic
  • 12-19 VTS LOGO for BotMod
    Join us on Dec. 19 for Discovering Value in Stored Data & Reducing Business Risk. Join this interactive day-long event to learn how your enterprise can cost-effectively manage stored data while keeping it secure, compliant and accessible. Disorganized storage can prevent your enterprise from extracting the maximum value from information assets. Learn how to organize enterprise data so vital information assets can help your business thrive. Explore policies, strategies and tactics from creation through deletion. Attend live or on-demand with complimentary registration!
    FEATURED CONTENT

    Sponsored by Ziff Davis Enterprise Group


    DOWNLOADABLE ROI CALCULATORS & TOOLS FROM BASELINE
      Calculate Cost and ROI of Spam, VOIP, RFID, Sarbanes-Oxley and more...


    Featured Calculators:

     



    See More Tools!
    By Category| Planners |Calculators | Quizzes

     

    Special Report


    PDFzone Special Report: Making the Perfect PDF
    The Perfect PDF
    PDFzone shows you how to shine and polish your PDF by adding the reader-friendly touches your audience desires.

    Special Report


    PDFzone Special Report: Microsoft's PDF Play
    Microsoft's PDF Play
    Microsoft planned to offer a "Save to PDF" function in Office 2007, but the threat of legal action from Adobe may have them reconsidering.

    Special Report


    PDF conversion
    PDF Conversion Central
    Convert anything and everything to PDf and back again. Word docs, RSS, AutoCAD and more.
    ADVERTISEMENT