-->
- Microsoft Optical Character Recognition Software
- Condense 1 5 – Optical Character Recognition (ocr) Application Online
- Adobe Optical Character Recognition
- Condense 1 5 – Optical Character Recognition (ocr) Application Free
- Optical Character Recognition Software
Azure's Computer Vision API includes Optical Character Recognition (OCR) capabilities that extract printed or handwritten text from images. You can extract text from images, such as photos of license plates or containers with serial numbers, as well as from documents - invoices, bills, financial reports, articles, and more.
Read API
The Computer Vision Read API is Azure's latest OCR technology (learn what's new) that extracts printed text (in several languages), handwritten text (English only), digits, and currency symbols from images and multi-page PDF documents. It's optimized to extract text from text-heavy images and multi-page PDF documents with mixed languages. It supports detecting both printed and handwritten text in the same image or document.
Input requirements
The Read call takes images and documents as its input. They have the following requirements:
- Supported file formats: JPEG, PNG, BMP, PDF, and TIFF
- For PDF and TIFF files, up to 2000 pages (only first two pages for the free tier) are processed.
- The file size must be less than 50 MB (4 MB for the free tier) and dimensions at least 50 x 50 pixels and at most 10000 x 10000 pixels.
- The PDF dimensions must be at most 17 x 17 inches, corresponding to legal or A3 paper sizes and smaller.
Note
Language input
The Read call has an optional request parameter for language. Read supports auto language identification and multilingual documents, so only provide a language code if you would like to force the document to be processed as that specific language.
OCR or Optical Character Recognition is a sophisticated software technique that allows a computer to extract text from images. In the early days OCR software was pretty rough and unreliable. Now, with the tons of computing power on tap, it's often the fastest way to convert text in an image into something you can edit with a word processor.
Top 5 Optical Character Recognition (OCR) Apps And Software When producing written work there are now more ways than ever to cut down on the amount we actually need to type. Meaning we can spend more time getting our wonderful thoughts written down rather than wasting it trying to find the shift key. Grow your communities through Groups, promote and manage your events using Peatix's robust tools on the web and the app. Simple, transparent, low fees. Customer-centric support. Please watch: 'I created a Bernie Sanders Detector using YOLO' -Optical Character Recognition OCR.
OCR demo (examples)
Step 1: The Read operation
The Read API's Read call takes an image or PDF document as the input and extracts text asynchronously. The call returns with a response header field called Operation-Location
. The Operation-Location
value is a URL that contains the Operation ID to be used in the next step.
Response header | Result URL |
---|---|
Operation-Location | https://cognitiveservice/vision/v3.1/read/analyzeResults/49a36324-fc4b-4387-aa06-090cfbf0064f |
Microsoft Optical Character Recognition Software
Note
Billing
The Computer Vision pricing page includes the pricing tier for Read. Each analyzed image or page is one transaction. If you call the operation with a PDF or TIFF document containing 100 pages, the Read operation will count it as 100 transactions and you will be billed for 100 transactions. If you made 50 calls to the operation and each call submitted a document with 100 pages, you will be billed for 50 X 100 = 5000 transactions.
Step 2: The Get Read Results operation
The second step is to call Get Read Results operation. This operation takes as input the operation ID that was created by the Read operation. It returns a JSON response that contains a status field with the following possible values. You call this operation iteratively until it returns with the succeeded value. Use an interval of 1 to 2 seconds to avoid exceeding the requests per second (RPS) rate.
Field | Type | Possible values |
---|---|---|
status | string | notStarted: The operation has not started. |
running: The operation is being processed. | ||
failed: The operation has failed. | ||
succeeded: The operation has succeeded. |
Note
The free tier limits the request rate to 20 calls per minute. The paid tier allows 10 requests per second (RPS) that can be increased upon request. Use the Azure support channel or your account team to request a higher request per second (RPS) rate.
When the status field has the succeeded value, the JSON response contains the extracted text content from your image or document. The JSON response maintains the original line groupings of recognized words. It includes the extracted text lines and their bounding box coordinates. Each text line includes all extracted words with their coordinates and confidence scores.
Note
The data submitted to the Read
operation are temporarily encrypted and stored at rest for a short duration, and then deleted. This lets your applications retrieve the extracted text as part of the service response.
Sample JSON output
See the following example of a successful JSON response:
Natural reading order output (Latin only)
With the Read 3.2 preview API, specify the order in which the text lines are output with the readingOrder
query parameter. Use natural
for a more human-friendly reading order output as shown in the following example. This feature is only supported for Latin languages.
Condense 1 5 – Optical Character Recognition (ocr) Application Online
Handwritten classification for text lines (Latin only)
The Read 3.2 preview API response includes classifying whether each text line is of handwriting style or not, along with a confidence score. This feature is only supported for Latin languages. The following example shows the handwritten classification for the text in the image.
Select page(s) or page ranges for text extraction
With the Read 3.2 preview API, for large multi-page documents, use the pages
query parameter to specify page numbers or page ranges to extract text from only those pages. The following example shows a document with 10 pages, with text extracted for both cases - all pages (1-10) and selected pages (3-6).
Supported languages
The Read APIs support a total of 73 languages for print style text. Refer to the full list of OCR-supported languages. Handwritten style OCR is supported exclusively for English.
Use the cloud API or deploy on-premise
The Read 3.x cloud APIs are the preferred option for most customers because of ease of integration and fast productivity out of the box. Azure and the Computer Vision service handle scale, performance, data security, and compliance needs while you focus on meeting your customers' needs.
For on-premise deployment, the Read Docker container (preview) enables you to deploy the new OCR capabilities in your own local environment. Containers are great for specific security and data governance requirements.
OCR API
The OCR API uses an older recognition model, supports only images, and executes synchronously, returning immediately with the detected text. See the OCR supported languages then Read API.
Note
The Computer Vison 2.0 RecognizeText operations are in the process of getting deprecated in favor of the new Read API covered in this article. Existing customers should transition to using Read operations.
Next steps
- Get started with the Computer Vision REST API or client library quickstarts.
- Learn about the Read 3.1 REST API.
- Learn about the Read 3.2 public preview REST API with support for a total of 73 languages.
Introduction
In this article, we will create an optical character recognition (OCR) application using Angular and the Azure Computer Vision Cognitive Service.
Computer Vision is an AI service that analyzes content in images. We will use the OCR feature of Computer Vision to detect the printed text in an image. The application will extract the text from the image and detects the language of the text.
Currently, the OCR API supports 25 languages.
Prerequisites
- Install the latest LTS version of Node.JS from https://nodejs.org/en/download/
- Install the Angular CLI from https://cli.angular.io/
- Install the .NET Core 3.1 SDK from https://dotnet.microsoft.com/download/dotnet-core/3.1
- Install the latest version of Visual Studio 2019 from https://visualstudio.microsoft.com/downloads/
- An Azure subscription account. You can create a free Azure account at https://azure.microsoft.com/en-in/free/
Source Code
You can get the source code from GitHub.
We will use an ASP.NET Core backend for this application. The ASP.NET Core backend provides a straight forward authentication process to access Azure cognitive services. This will also ensure that the end-user won't have direct access to cognitive services.
Create the Azure Computer Vision Cognitive Service resource
Adobe Optical Character Recognition
Log in to the Azure portal and search for the cognitive services in the search bar and click on the result. Refer to the image shown below.
On the next screen, click on the Add button. It will open the cognitive services marketplace page. Search for the Computer Vision in the search bar and click on the search result. It will open the Computer Vision API page. Click on the Create button to create a new Computer Vision resource. Refer to the image shown below.
On the Create page, fill in the details as indicated below.
- Name: Give a unique name for your resource.
- Subscription: Select the subscription type from the dropdown.
- Pricing tier: Select the pricing tier as per your choice.
- Resource group: Select an existing resource group or create a new one.
Click on the Create button. Refer to the image shown below.
After your resource is successfully deployed, click on the 'Go to resource' button. You can see the Key and the endpoint for the newly created Computer Vision resource. Refer to the image shown below.
Make a note of the key and the endpoint. We will be using these in the latter part of this article to invoke the Computer Vision OCR API from the .NET Code. The values are masked here for privacy.
Creating the ASP.NET Core application
Open Visual Studio 2019 and click on 'Create a new Project'. A 'Create a new Project' dialog will open. Select 'ASP.NET Core Web Application' and click on Next. Now you will be at 'Configure your new project' screen, provide the name for your application as ngComputerVision
and click on create. Refer to the image shown below.
You will be navigated to 'Create a new ASP.NET Core web application' screen. Select '.NET Core' and 'ASP.NET Core 3.1' from the dropdowns on the top. Then, select the 'Angular' project template and click on Create
. Refer to the image shown below.
This will create our project. The folder structure of the application is shown below.
Potato fertilizer 8 24 24. The ClientApp
folder contains the Angular code for our application. The Controllers folders will contain our API controllers. The angular components are present inside the ClientAppsrcapp
folder.
The default template contains a few Angular components. These components won't affect our application, but for the sake of simplicity, we will delete fetchdata and counter folders from ClientApp/app/components
folder. Also, remove the reference for these two components from the app.module.ts
file.
Installing Computer Vision API library
We will install the Azure Computer Vision API library which will provide us with the models out of the box to handle the Computer Vision REST API response. To install the package, navigate to Tools >> NuGet Package Manager >> Package Manager Console. It will open the Package Manager Console. Run the command as shown below.
You can learn more about this package at the NuGet gallery.
Create the Models
Right-click on the ngComputerVision
project and select Add >> New Folder. Name the folder as Models. Again, right-click on the Models folder and select Add >> Class to add a new class file. Put the name of your class as LanguageDetails.cs
and click Add.
Open LanguageDetails.cs and put the following code inside it.
Similarly, add a new class file AvailableLanguage.cs and put the following code inside it.
We will also add two classes as DTO (Data Transfer Object) for sending data back to the client.
Create a new folder and name it DTOModels. Add the new class file AvailableLanguageDTO.cs in the DTOModels folder and put the following code inside it.
Condense 1 5 – Optical Character Recognition (ocr) Application Free
Add the OcrResultDTO.cs file and put the following code inside it.
Adding the OCR Controller
We will add a new controller to our application. Right-click on the Controllers folder and select Add >> New Item. An 'Add New Item' dialog box will open. Select 'Visual C#' from the left panel, then select 'API Controller Class' from templates panel and put the name as OCRController.cs
. Click on Add.
Refer to the image below.
The OCRController
will handle the image recognition requests from the client app. This controller will also return the list of all the languages supported by OCR API.
Open the OCRController.cs file and put the following code inside it.
In the constructor of the class, we have initialized the key and the endpoint URL for the OCR API.
The Post method will receive the image data as a file collection in the request body and return an object of type OcrResultDTO
. We will convert the image data to a byte array and invoke the ReadTextFromStream
method. We will deserialize the response into an object of type OcrResult
. We will then form the sentence by iterating over the OcrWord
object.
Inside the ReadTextFromStream
method, we will create a new HttpRequestMessage
. This HTTP request is a Post request. We will pass the subscription key in the header of the request. The OCR API will return a JSON object having each word from the image as well as the detected language of the text.
The GetAvailableLanguages
method will return the list of all the language supported by the Translate Text API. We will set the request URI and create a HttpRequestMessage
which will be a Get request. This request URI will return a JSON object which will be deserialized to an object of type AvailableLanguage
.
Why do we need to fetch the list of supported languages?
The OCR API returns the language code (e.g. en for English, de for German, etc.) of the detected language. But we cannot display the language code on the UI as it is not user-friendly. Therefore, we need a dictionary to look up the language name corresponding to the language code.
The Azure Computer Vision OCR API supports 25 languages. To know all the languages supported by OCR API see the list of supported languages. These languages are a subset of the languages supported by the Azure Translate Text API.
Since there is no dedicated API endpoint to fetch the list of languages supported by OCR API, we are using the Translate Text API endpoint to fetch the list of languages. We will create the language lookup dictionary using the JSON response from this API call and filter the result based on the language code returned by the OCR API.
Working on the Client side of the application
The code for the client-side is available in the ClientApp folder. We will use Angular CLI to work with the client code.
Using Angular CLI is not mandatory. I am using Angular CLI here as it is user-friendly and easy to use. If you don't want to use CLI then you can create the files for components and services manually.
Navigate to the ngComputerVisionClientApp folder in your machine and open a command window. We will execute all our Angular CLI commands in this window.
Create the client-side models
Create a folder called models inside the ClientAppsrcapp
folder. Now we will create a file availablelanguage.ts in the models folder. Put the following code in it.
Similarly, create another file inside the models folder called ocrresult.ts. Put the following code in it.
You can observe that both these classes have the same definition as the DTO classes we created on the server-side. This will allow us to bind the data returned from the server directly to our models.
Create the Computervision Service
We will create an Angular service which will invoke the Web API endpoints, convert the Web API response to JSON and pass it to our component. Run the following command.
This command will create a folder name as services and then create the following two files inside it.
- computervision.service.ts — the service class file.
- computervision.service.spec.ts — the unit test file for service.
Open computervision.service.ts file and put the following code inside it.
We have defined a variable baseURL which will hold the endpoint URL of our API. We will initialize the baseURL in the constructor and set it to the endpoint of the OCRController
.
The getAvailableLanguage
method will send a Get request to the GetAvailableLanguages
method of the OCRController
to fetch the list of supported languages for OCR.
The getTextFromImage
method will send a Post request to the OCRController
and supply the parameter of type FormData
. It will fetch the detected text from the image and language code of the text.
Create the Ocr component
Run the following command in the command prompt to create the OcrComponent
.
The --module
flag will ensure that this component will get registered at app.module.ts
.
Open ocr.component.html and put the following code in it.
We have defined a text area to display the detected text and a text box for displaying the detected language. We have defined a file upload control which will allow us to upload an image. After uploading the image, the preview of the image will be displayed using an element.
Open ocr.component.ts and put the following code in it.
We will inject the ComputervisionService
in the constructor of the OcrComponent
and set a message and the value for the max image size allowed inside the constructor.
We will invoke the getAvailableLanguage
method of our service in the ngOnInit
and store the result in an array of type AvailableLanguage
.
The uploadImage
method will be invoked upon uploading an image. We will check if the uploaded file is a valid image and within the allowed size limit. We will process the image data using a FileReader
object. The readAsDataURL
method will read the contents of the uploaded file.
Upon successful completion of the read operation, the reader.onload
event will be triggered. The value of imagePreview
will be set to the result returned by the fileReader object, which is of type ArrayBuffer
.
Inside the GetText
method, we will append the image file to a variable for type FormData
. We will invoke the getTextFromImage
of the service and bind the result to an object of type OcrResult
. We will search for the language name from the array availableLanguage
, based on the language code returned from the service. If the language code is not found, we will set the language as unknown.
We will add the styling for the text area in ocr.component.css as shown below.
Adding the links in Nav Menu
We will add the navigation links for our components in the nav menu. Open nav-menu.component.html and remove the links for Counter and Fetch data components. Add the following lines in the list of navigation links.
Optical Character Recognition Software
Execution Demo
Press F5 to launch the application. Click on the Computer Vision button on the nav menu at the top. You can upload an image and extract the text from the image as shown in the image below.
Summary
We have created an optical character recognition (OCR) application using Angular and the Computer Vision Azure Cognitive Service. The application is able to extract the printed text from the uploaded image and recognizes the language of the text. The OCR API of the Computer Vision is used which can recognize text in 25 languages.
I just released a free eBook on Angular and Firebase. You can download the free book from Build a Full-Stack Web Application Using Angular & Firebase
See Also
If you like the article, share with you friends. You can also connect with me on Twitter and LinkedIn.