Make Text In Image Searchable Onenote



  • Notes and images pasted a month ago still have no option to OCR search text in onenote (or the even more useless copy text from image. Just make the damn thing searchable, don't need 2 copies of it) Not sure if you paste a PDF that already successfully had OCR run would keep that searchable text once imported to OneNote?
  • Make Text in Image Searchable is Disabled When you clipping a screenshot into OneNote, and want to copy text from picture. But, sometimes you find something difference in right click menu. On the right click menu, there cannot find 'Copy Text from Picture' item.

The other day I got some work from a high school teacher to type some past examinations that she was compiling into a book. The test papers were quite many, a file worth to be exact.

In OneNote 2016, right-click over the image, and then select Make Text in Image Searchable. Select the appropriate language, and then locate the Search field in the upper-right corner.

Knowing this would be a challenge in terms of the time and effort I’ll have to set aside, I decided to look for a much quicker solution. The first thing that came to my mind was OCR (Optical Character Recognition).

OCR is basically the identification of text from image files. In layman’s terms, think of it as converting images to text.

OCR can thus save you time and money that you’d otherwise spend typing or outsourcing to professionals. In my case, I was able to reduce the workload of this particular job by about 70-80% and it would be higher were it not for the few wrongly identified characters and the touching up of some diagrams.

For my OCR needs I went with MS Office. I had considered other options before settling for it, such as the feature rich PDF-Xchange Editor that bundles OCR with its PDF viewer. Ultimately however, they all proved to be less capable compared to the OCR engine in MS Office which was more accurate and quick.

I suppose that could be attributed to them using the free Tesseract OCR engine which while powerful in its own right, tends to be outperformed by commercial alternatives.

Getting Started: Microsoft Office OCR Options

MS Office does OCR in two ways:

  • Using OneNote
  • Using Microsoft Office Document Imaging (MODI)

Any version of OneNote (2007-2016) will do for this purpose. For MODI however, things are a little bit different as it was discontinued. MS Office 2007 was the last version to feature it.

However you don’t necessarily need to have MS Office 2007 to use it as it can be installed separately and be used with newer versions of MS Office.


What You’ll Need

  • First things first, you’ll need MS Office installed. Any version will do from Office 2007, 2010, 2013 and 2016.
  • SharePoint Designer 2007 to install MODI. SharePoint Designer 2007 is provided as a free download by Microsoft. Get it from Microsoft’s download centre.
  • MS Office 2007 to install MODI. If you’ve a licensed copy of MS Office 2007 already, you can use it instead of having to download SharePoint Designer 2007.
  • Image to OCR
  • A scanner if you want to OCR during the scanning process.

1. OCR with OneNote

1. Launch OneNote and start by creating a New Note.

Onenote Read Text From Image

Make text in image searchable onenote file

2. In the ribbon, go to the Insert tab and insert the image to OCR.

Make Text In Image Searchable OnenoteInsert Image to OCR

3. Inside the note, right-click the inserted image and select Copy Text from Picture.

How To Flip Text In Microsoft Word

Copy Text from Image

4. Open MS Word or a text editor and paste the text that has been recognized.

5. You can alternatively search the text within OneNote instead of copy-pasting it elsewhere. To do that, right-click the inserted image and select Make Text in Image Searchable then select the language the text is in.

Make Text in Image Searchable

You can then use Ctrl+F to search for text inside the image. If it finds a match, it will be highlighted.

NOTE: If you need a different language, check the bottom of this post on how to install additional language packs.

2. OCR with Microsoft Office Document Imaging (MODI)

Make Text In Image Searchable Onenote

Step 1. Installing MODI

1. Run your SharePoint Designer 2007 or MS Office 2007 set up.

2. Select the Customize installation option.

3. Set all the available options to Not Available then expand Office Tools and set Microsoft Office Document Imaging to Run all from my Computer.

Install MODI

4. Now leave it to install.

Step 2: OCR with MODI

MODI OCRs in two ways:

  • OCRs Image Files
  • Connects with your scanner and automatically OCRs after the scanning is complete

i. OCR an Image

MODI only OCRs images that are in TIFF (*.tif, *.tiff) format. If you picture is in another format (e.g. JPEG, PNG, GIF) you can use an one of the many free image editors available online (XnView, IrfanView etc.) to convert them to TIFF.

You can even use Paint to do the conversion. Just open the image with Paint, choose to Save as then select Other Formats. In the save dialog, select the TIFF type and save the image.

Once you have your images in this format, do the follwoing:

1. Go to the start menu programs and inside Microsoft Office Tools open Microsoft Office Document Imaging.

Make Text In Image Searchable Onenote

2. Inside MODI, click the Open icon and select your TIFF image from the dialog.

Open Image

3. Once the image is loaded inside MODI, click the Recognize Text Using OCR button.

4. Give it time to do the OCR. Once it’s done, click the Send Text to Word button.

Send Text to Word

5. A dialog will pop up with options to send the text. If the TIFF had multiple pages, make sure to select the All Pages option. If the image had pictures/diagrams inside it that you’d wish to export too, check the option to Maintain pictures in output. Click the OK button.

Send Text to Word

6. The recognized text and any pictures it may have found will be exported to a HTML file opened by whichever version of Word you have installed.

ii. OCR directly from the Scanner

1. Connect your scanner and load the item to scan.

2. Go to the start menu programs and inside Microsoft Office Tools open Microsoft Office Document Scanning.

3. In the scanning window, click the Scanner button and select your scanner.

Select Scanner

Onenote Text Recognition

4. Depending on the nature of the item you’re scanning, you can select a suitable color preset for it : Color, Grayscale or Black and White.

5. Click the Scanning button. Your item will be scanned and after its done OCR will be done automatically.

6. The recognized text will then be opened in MODI. Finish by click the Send Text to Word button to transfer the recognized text and any pictures to Word.

NOTE:
In the default save folder, you’ll find the HTML file containing the OCR information. Check inside the corresponding HTML folder for any pictures such as diagrams that MODI will have exported.

Make Text In Image Searchable Onenote Document

Language Support for MODI and OneNote

The OCR feature in OneNote and MODI comes embedded with support for only three languages: English, French and Spanish. By default it will use the language that your installed MS Office is using. To change the language MODI uses for the OCR do the following:

  • Open Microsoft Office Document Imaging
  • Go to Tools > Options…
  • Select OCR and then choose OCR Language

Onenote Picture To Text

Select OCR Language

For other languages, particularly those using a completely different alphabet than what is used in English such as Greek, Korean, Chinese, Japanese, Arabic, Cyrillic (Slavic languages – Russian, Bulgarian, Serbian, Ukrainian) etc. you’ll have to install the corresponding Language Pack in order for it to work with OneNote or MODI.

1. Installing OCR Language Packs for OneNote

To install a language pack for OneNote to OCR with:

  • Open OneNote and go to: Options > Language.
  • Add the language from the drop down menu, then when it appears inside the languages box, click the Not Installed link below the Proofing column.
Add Language

Make Text In Image Searchable Onenote

That will take you to the Microsoft Office support site where you can download the free language packs. Make sure to download the correct language pack for the version of MS Office you’re using, i.e. whether 32 or 64-bit of MS Office 2010, 2013 or 2016.

Download Language Pack

2. Installing OCR Language Packs for MODI

For MODI, the process is a little bit complicated but there’s a really good guide on how to go about installing the language packs here.

If all this sounds like a lot of work, you can opt to use Tesseract which has a wide support for different languages. Tesseract however uses command line but you can find a couple of GUIs (versions with a user interface) for it online such as this one.