Movatterモバイル変換

Text recognition, also known as optical character recognition (OCR), is supported by a set of Windows AI APIs that can detect and extract text within images and convert it into machine readable character streams.

These APIs can identify characters, words, lines, polygonal text boundaries, and provide confidence levels for each match. They are also exclusively supported by hardware acceleration in devices with a neural processing unit (NPU), making them faster and more accurate than the legacy Windows.Media.Ocr.OcrEngine APIs in theWindows platform SDK.

ForAPI details, seeAPI ref for Text Recognition (OCR).

What can I do with AI Text Recognition?

Use AI Text Recognition features to identify and recognize text in an image. You can also get the text boundaries and confidence scores for the recognized text.

Note

Characters that are illegible or small in size can generate inaccurate results.

Create an ImageBuffer from a file

In this WinUI example we call aLoadImageBufferFromFileAsync function to get anImageBuffer from an image file.

In the LoadImageBufferFromFileAsync function, we complete the following steps:

Create aStorageFile object from the specified file path.
Open a stream on the StorageFile usingOpenAsync.
Create aBitmapDecoder for the stream.
CallGetSoftwareBitmapAsync on the bitmap decoder to get aSoftwareBitmap object.
Return an image buffer fromCreateBufferAttachedToBitmap.

using Microsoft.Windows.AI.Imaging;using Microsoft.Graphics.Imaging;using Windows.Graphics.Imaging;using Windows.Storage;using Windows.Storage.Streams;public async Task<ImageBuffer> LoadImageBufferFromFileAsync(string filePath){    StorageFile file = await StorageFile.GetFileFromPathAsync(filePath);    IRandomAccessStream stream = await file.OpenAsync(FileAccessMode.Read);    BitmapDecoder decoder = await BitmapDecoder.CreateAsync(stream);    SoftwareBitmap bitmap = await decoder.GetSoftwareBitmapAsync();    if (bitmap == null)    {        return null;    }    return ImageBuffer.CreateBufferAttachedToBitmap(bitmap);}

#include <iostream>#include <sstream>#include <winrt/Microsoft.Windows.AI.Imaging.h>#include <winrt/Windows.Graphics.Imaging.h>#include <winrt/Microsoft.Graphics.Imaging.h>#include <winrt/Microsoft.UI.Xaml.Controls.h>#include<winrt/Microsoft.UI.Xaml.Media.h>#include<winrt/Microsoft.UI.Xaml.Shapes.h>using namespace winrt;using namespace Microsoft::UI::Xaml;using namespace Microsoft::Windows::AI;using namespace Microsoft::Windows::AI::Imaging;using namespace winrt::Microsoft::UI::Xaml::Controls;using namespace winrt::Microsoft::UI::Xaml::Media;winrt::Windows::Foundation::IAsyncOperation<winrt::hstring>     MainWindow::RecognizeTextFromSoftwareBitmap(        Windows::Graphics::Imaging::SoftwareBitmap const& bitmap){    winrt::Microsoft::Windows::AI::Imaging::TextRecognizer textRecognizer =         EnsureModelIsReady().get();    Microsoft::Graphics::Imaging::ImageBuffer imageBuffer =         Microsoft::Graphics::Imaging::ImageBuffer::CreateForSoftwareBitmap(bitmap);    RecognizedText recognizedText =         textRecognizer.RecognizeTextFromImage(imageBuffer);    std::wstringstream stringStream;    for (const auto& line : recognizedText.Lines())    {        stringStream << line.Text().c_str() << std::endl;    }    co_return winrt::hstring{ stringStream.str()};}

Recognize text in a bitmap image

The following example shows how to recognize some text in aSoftwareBitmap object as a single string value:

Create aTextRecognizer object through a call to theEnsureModelIsReady function, which also confirms there is a language model present on the system.
Using the bitmap obtained in the previous snippet, we call theRecognizeTextFromSoftwareBitmap function.
CallCreateBufferAttachedToBitmap on the image file to get anImageBuffer object.
CallRecognizeTextFromImage to get the recognized text from theImageBuffer.
Create a wstringstream object and load it with the recognized text.
Return the string.

Note

TheEnsureModelIsReady function is used to check the readiness state of the text recognition model (and install it if necessary).

using Microsoft.Windows.AI.Imaging;using Microsoft.Windows.AI;using Microsoft.Graphics.Imaging;using Windows.Graphics.Imaging;using Windows.Storage;using Windows.Storage.Streams;public async Task<string> RecognizeTextFromSoftwareBitmap(SoftwareBitmap bitmap){    TextRecognizer textRecognizer = await EnsureModelIsReady();    ImageBuffer imageBuffer = ImageBuffer.CreateBufferAttachedToBitmap(bitmap);    RecognizedText recognizedText = textRecognizer.RecognizeTextFromImage(imageBuffer);    StringBuilder stringBuilder = new StringBuilder();    foreach (var line in recognizedText.Lines)    {        stringBuilder.AppendLine(line.Text);    }    return stringBuilder.ToString();}public async Task<TextRecognizer> EnsureModelIsReady(){    if (TextRecognizer.GetReadyState() == AIFeatureReadyState.NotReady)    {        var loadResult = await TextRecognizer.EnsureReadyAsync();        if (loadResult.Status != AIFeatureReadyResultState.Success)        {            throw new Exception(loadResult.ExtendedError().Message);        }    }    return await TextRecognizer.CreateAsync();}

winrt::Windows::Foundation::IAsyncOperation<winrt::Microsoft::Windows::AI::Imaging::TextRecognizer> MainWindow::EnsureModelIsReady(){    if (winrt::Microsoft::Windows::AI::Imaging::TextRecognizer::GetReadyState() == AIFeatureReadyState::NotReady)    {        auto loadResult = TextRecognizer::EnsureReadyAsync().get();                   if (loadResult.Status() != AIFeatureReadyResultState::Success)        {            throw winrt::hresult_error(loadResult.ExtendedError());        }    }    return winrt::Microsoft::Windows::AI::Imaging::TextRecognizer::CreateAsync();}

Get word bounds and confidence

Here we show how to visualize theBoundingBox of each word in aSoftwareBitmap object as a collection of color-codedpolygons on aGrid element.

Note

For this example we assume aTextRecognizer object has already been created and passed in to the function.

using Microsoft.Windows.AI.Imaging;using Microsoft.Graphics.Imaging;using Windows.Graphics.Imaging;using Windows.Storage;using Windows.Storage.Streams;public void VisualizeWordBoundariesOnGrid(    SoftwareBitmap bitmap,    Grid grid,    TextRecognizer textRecognizer){    ImageBuffer imageBuffer = ImageBuffer.CreateBufferAttachedToBitmap(bitmap);    RecognizedText result = textRecognizer.RecognizeTextFromImage(imageBuffer);    SolidColorBrush greenBrush = new SolidColorBrush(Microsoft.UI.Colors.Green);    SolidColorBrush yellowBrush = new SolidColorBrush(Microsoft.UI.Colors.Yellow);    SolidColorBrush redBrush = new SolidColorBrush(Microsoft.UI.Colors.Red);    foreach (var line in result.Lines)    {        foreach (var word in line.Words)        {            PointCollection points = new PointCollection();            var bounds = word.BoundingBox;            points.Add(bounds.TopLeft);            points.Add(bounds.TopRight);            points.Add(bounds.BottomRight);            points.Add(bounds.BottomLeft);            Polygon polygon = new Polygon();            polygon.Points = points;            polygon.StrokeThickness = 2;            if (word.Confidence < 0.33)            {                polygon.Stroke = redBrush;            }            else if (word.Confidence < 0.67)            {                polygon.Stroke = yellowBrush;            }            else            {                polygon.Stroke = greenBrush;            }            grid.Children.Add(polygon);        }    }}

void MainWindow::VisualizeWordBoundariesOnGrid(    Windows::Graphics::Imaging::SoftwareBitmap const& bitmap,    Grid const& grid,    TextRecognizer const& textRecognizer){    Microsoft::Graphics::Imaging::ImageBuffer imageBuffer =         Microsoft::Graphics::Imaging::ImageBuffer::CreateForSoftwareBitmap(bitmap);    RecognizedText result = textRecognizer.RecognizeTextFromImage(imageBuffer);    auto greenBrush = SolidColorBrush(winrt::Microsoft::UI::Colors::Green());    auto yellowBrush = SolidColorBrush(winrt::Microsoft::UI::Colors::Yellow());    auto redBrush = SolidColorBrush(winrt::Microsoft::UI::Colors::Red());    for (const auto& line : result.Lines())    {        for (const auto& word : line.Words())        {            PointCollection points;            const auto& bounds = word.BoundingBox();            points.Append(bounds.TopLeft);            points.Append(bounds.TopRight);            points.Append(bounds.BottomRight);            points.Append(bounds.BottomLeft);            winrt::Microsoft::UI::Xaml::Shapes::Polygon polygon{};            polygon.Points(points);            polygon.StrokeThickness(2);            if (word.MatchConfidence() < 0.33)            {                polygon.Stroke(redBrush);            }            else if (word.MatchConfidence() < 0.67)            {                polygon.Stroke(yellowBrush);            }            else            {                polygon.Stroke(greenBrush);            }            grid.Children().Append(polygon);        }    }}

Responsible AI

We have used a combination of the following steps to ensure these imaging APIs are trustworthy, secure, and built responsibly. We recommend reviewing the best practices described inResponsible Generative AI Development on Windows when implementing AI features in your app.

Movatterモバイル変換

Share via

Get Started with AI Text Recognition (OCR)

In this article

What can I do with AI Text Recognition?

Create an ImageBuffer from a file

Recognize text in a bitmap image

Get word bounds and confidence

Responsible AI

See also

Feedback

Additional resources

In this article