Thursday, March 05, 2009
OCR Terminal Converts PDF and Images to Searchable Text
How many times have you stared at copy on a PDF file or screenshot and wished you could simply copy and paste the text?
It's always frustrating to to type-out all that material, word-for-word, only because there's no alternative.
Well here's some good news:
With OCR Terminal, you can convert PDF and words from images to .txt or *.rtf formats. Free membership is available at the OCR Terminal homepage. Once you've completed the registration process, you may upload,scan and convert up to 30 pages of popular document image formats- PDF,TIFF, jpgs and screenshots.If you need to convert more than 30, you can contact OCR.
Currently, OCR Terminal doesn't convert languages other than English, but developers are planning to add more languages. Other projects include a desktop client for multiple file upload and storage for all OCred documents. Product updates are posted on the company blog and Twitter page.
I wanted to test the conversion tool earlier this week, but due to a mass influx of traffic, the site was down for maintenance. Fortunately,it was up and running this morning .and I uploaded a PDF file, (a copy of my profile on Linkedin), Converting the file was a simple, four-step process.
- Browse and select the PDF file or image you want to convert the file to.
- Clck Upload.
- Click Yes- begin processing.
- Choose and click the text format you wish to convert to: .txt, .doc, ,rtf,.pdf.
My Linkedin profile converted smoothly to Notepad, and Microsoft Word , so I decided to try a JPG.
I chose our company logo, Labitat. to convert to Word. OCR successfully scanned and converted it to Word in 8 point font in Arial, without the color scheme. I wasn't sure if that was the default format, so I uploaded a 336x40 black and red marketingshift logo. I was surprised when LARGER of the two images converted to a SMALLER font size (Arial 5), AND the color scheme remained intact.
Converting to .txt resulted in an inaccurate translation of the text.The "ing" in marketing was interpreted as "M", so the .txt version read "IMARKETMG SHIFT." To be fair, OCR is still in Beta mode, and those small errors are simple for the user correct.
As OCR Terminal is tweaked and the news spreads, demand will continue to grow for this valuable service. I expect the tool to become a major asset for anyone in academia or business.
By Matt O'Hern at 11:28 AM | Comments (1)