You can use ML Kit to recognize text in images or video, such as the text of a street sign. The main characteristics of this feature are:
Text Recognition API | |
---|---|
Description | Recognize Latin-script text in images or videos. |
SDK name | GoogleMLKit/TextRecognition (version 2.2.0) |
Implementation | Assets are statically linked to your app at build time. |
App size impact | About 20 MB |
Performance | Real-time on most devices. |
Try it out
- Play around with the sample app to see an example usage of this API.
- Try the code yourself with the codelab.
Before you begin
- Include the following ML Kit pods in your Podfile:
pod 'GoogleMLKit/TextRecognition','2.2.0'
- After you install or update your project's Pods, open your Xcode project using its
.xcworkspace
. ML Kit is supported in Xcode version 12.4 or greater.
1. Create an instance of TextRecognizer
Create an instance of TextRecognizer
by calling
+textRecognizer
:
Swift
let textRecognizer = TextRecognizer.textRecognizer()
Objective-C
MLKTextRecognizer *textRecognizer = [MLKTextRecognizer textRecognizer];
2. Prepare the input image
Pass the image as aUIImage
or a CMSampleBufferRef
to the
TextRecognizer
's process(_:completion:)
method:
Create a VisionImage
object using a UIImage
or a
CMSampleBuffer
.
If you use a UIImage
, follow these steps:
- Create a
VisionImage
object with theUIImage
. Make sure to specify the correct.orientation
.Swift
let image = VisionImage(image: UIImage) visionImage.orientation = image.imageOrientation
Objective-C
MLKVisionImage *visionImage = [[MLKVisionImage alloc] initWithImage:image]; visionImage.orientation = image.imageOrientation;
If you use a
CMSampleBuffer
, follow these steps:-
Specify the orientation of the image data contained in the
CMSampleBuffer
.To get the image orientation:
Swift
func imageOrientation( deviceOrientation: UIDeviceOrientation, cameraPosition: AVCaptureDevice.Position ) -> UIImage.Orientation { switch deviceOrientation { case .portrait: return cameraPosition == .front ? .leftMirrored : .right case .landscapeLeft: return cameraPosition == .front ? .downMirrored : .up case .portraitUpsideDown: return cameraPosition == .front ? .rightMirrored : .left case .landscapeRight: return cameraPosition == .front ? .upMirrored : .down case .faceDown, .faceUp, .unknown: return .up } }
Objective-C
- (UIImageOrientation) imageOrientationFromDeviceOrientation:(UIDeviceOrientation)deviceOrientation cameraPosition:(AVCaptureDevicePosition)cameraPosition { switch (deviceOrientation) { case UIDeviceOrientationPortrait: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationLeftMirrored : UIImageOrientationRight; case UIDeviceOrientationLandscapeLeft: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationDownMirrored : UIImageOrientationUp; case UIDeviceOrientationPortraitUpsideDown: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationRightMirrored : UIImageOrientationLeft; case UIDeviceOrientationLandscapeRight: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationUpMirrored : UIImageOrientationDown; case UIDeviceOrientationUnknown: case UIDeviceOrientationFaceUp: case UIDeviceOrientationFaceDown: return UIImageOrientationUp; } }
- Create a
VisionImage
object using theCMSampleBuffer
object and orientation:Swift
let image = VisionImage(buffer: sampleBuffer) image.orientation = imageOrientation( deviceOrientation: UIDevice.current.orientation, cameraPosition: cameraPosition)
Objective-C
MLKVisionImage *image = [[MLKVisionImage alloc] initWithBuffer:sampleBuffer]; image.orientation = [self imageOrientationFromDeviceOrientation:UIDevice.currentDevice.orientation cameraPosition:cameraPosition];
3. Process the image
Then, pass the image to the
process(_:completion:)
method:Swift
textRecognizer.process(visionImage) { result, error in guard error == nil, let result = result else { // Error handling return } // Recognized text }
Objective-C
[textRecognizer processImage:image completion:^(MLKText *_Nullable result, NSError *_Nullable error) { if (error != nil || result == nil) { // Error handling return; } // Recognized text }];
4. Extract text from blocks of recognized text
If the text recognition operation succeeds, it returns a
Text
object. AText
object contains the full text recognized in the image and zero or moreTextBlock
objects.Each
TextBlock
represents a rectangular block of text, which contain zero or moreTextLine
objects. EachTextLine
object contains zero or moreTextElement
objects, which represent words and word-like entities such as dates and numbers.For each
TextBlock
,TextLine
, andTextElement
object, you can get the text recognized in the region and the bounding coordinates of the region.For example:
Swift
let resultText = result.text for block in result.blocks { let blockText = block.text let blockLanguages = block.recognizedLanguages let blockCornerPoints = block.cornerPoints let blockFrame = block.frame for line in block.lines { let lineText = line.text let lineLanguages = line.recognizedLanguages let lineCornerPoints = line.cornerPoints let lineFrame = line.frame for element in line.elements { let elementText = element.text let elementCornerPoints = element.cornerPoints let elementFrame = element.frame } } }
Objective-C
NSString *resultText = result.text; for (MLKTextBlock *block in result.blocks) { NSString *blockText = block.text; NSArray<MLKTextRecognizedLanguage *> *blockLanguages = block.recognizedLanguages; NSArray<NSValue *> *blockCornerPoints = block.cornerPoints; CGRect blockFrame = block.frame; for (MLKTextLine *line in block.lines) { NSString *lineText = line.text; NSArray<MLKTextRecognizedLanguage *> *lineLanguages = line.recognizedLanguages; NSArray<NSValue *> *lineCornerPoints = line.cornerPoints; CGRect lineFrame = line.frame; for (MLKTextElement *element in line.elements) { NSString *elementText = element.text; NSArray<NSValue *> *elementCornerPoints = element.cornerPoints; CGRect elementFrame = element.frame; } } }
Input image guidelines
-
For ML Kit to accurately recognize text, input images must contain text that is represented by sufficient pixel data. Ideally, each character should be at least 16x16 pixels. There is generally no accuracy benefit for characters to be larger than 24x24 pixels.
So, for example, a 640x480 image might work well to scan a business card that occupies the full width of the image. To scan a document printed on letter-sized paper, a 720x1280 pixel image might be required.
-
Poor image focus can affect text recognition accuracy. If you aren't getting acceptable results, try asking the user to recapture the image.
-
If you are recognizing text in a real-time application, you should consider the overall dimensions of the input images. Smaller images can be processed faster. To reduce latency, ensure that the text occupies as much of the image as possible, and capture images at lower resolutions (keeping in mind the accuracy requirements mentioned above). For more information, see Tips to improve performance.
Tips to improve performance
- For processing video frames, use the
results(in:)
synchronous API of the detector. Call this method from theAVCaptureVideoDataOutputSampleBufferDelegate
'scaptureOutput(_, didOutput:from:)
function to synchronously get results from the given video frame. KeepAVCaptureVideoDataOutput
'salwaysDiscardsLateVideoFrames
astrue
to throttle calls to the detector. If a new video frame becomes available while the detector is running, it will be dropped. - If you use the output of the detector to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. By doing so, you render to the display surface only once for each processed input frame. See the updatePreviewOverlayViewWithLastFrame in the ML Kit quickstart sample for an example.
- Consider capturing images at a lower resolution. However, also keep in mind this API's image dimension requirements.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-03-24 UTC.
[] [] -
-