Introduction
In the age of mobile solutions, the ability to scan and manage documents has evolved from a niche feature to a common expectation in modern applications. Whether archiving important documents or interacting with e-government platforms that require uploading forms and receipts, a clear, well-cropped scan can make all the difference.
However, manual cropping is tedious, error-prone, and often leads to inconsistent results. Users don’t want to pinch, zoom, and drag - they expect the app to do the hard work: detect the document automatically, crop it accurately, and deliver a clean, scan-ready image with minimal effort. This is where edge detection algorithms like Canny Edge come in handy.
The Canny Edge detector is a multi-stage algorithm that detects a wide range of image edges. Developed by John F. Canny in 1986, it's a renowned and widely used algorithm in computer vision for its effectiveness and precision. It proceeds through the following phases:
Grayscale Conversion: The initial step converts the input image into a single grayscale channel. This simplifies subsequent intensity analysis by removing color information, focusing solely on luminosity variations.
Gaussian Blur: Smooths the image to remove noise (random variations in pixel intensity) before edge detection.
Gradient Calculation (Sobel Operators): This phase detects the intensity changes (gradients) in both horizontal (Gx) and vertical (Gy) directions using Sobel operators. These gradients highlight regions where pixel intensity changes rapidly, indicating potential edges. The magnitude of the gradient (G) and the direction (θ) are calculated as:
- Non-Maximum Suppression: After calculating gradients, many pixels might indicate an edge. Non-Maximum Suppression thins these edges by retaining only the local maxima of the gradient magnitude along the gradient direction. This results in sharper, single-pixel-wide edges.
- Double Thresholding: Classifies potential edge pixels into three categories:
Strong edges: Pixels with a gradient magnitude above a high threshold. These are definite edges.
Weak edges: Pixels with a gradient magnitude between a low and a high threshold. These might be edges but need further verification.
Irrelevant pixels: Pixels below the low threshold, discarded as non-edges. Otsu's method can be optionally employed to automatically determine optimal high and low thresholds, adapting to different image lighting conditions and content.
Hysteresis: Connects weak edges to strong ones. This process ensures that only continuous and reliable edges are preserved, effectively filtering out isolated noise.
The output of the Canny Edge Detector is a binary image where the detected borders are prominently highlighted in white.
Our Strategy
Our approach to automatically crop documents leverages the Canny output to precisely identify the document's boundaries is broken down into three steps:
- Apply the Canny Algorithm to the original image - One document at a time
- Find the outermost borders of the document
- Create a crop area based on the previous step, and Crop the image
While a detailed implementation of the Canny Edge Algorithm is beyond the scope of this article (as it would be quite extensive), a repository with the code example will be provided at the end of this article.
The Code
Apply the Canny Algorithm to the original image
On the code repository, we created a service for the crop, called CropService, so that we can call the crop function anywhere in the codebase without code repetition. Therefore, the code below, we are calling the crop function using a simple button, passing the original image to it.
ElevatedButton.icon(
style: ElevatedButton.styleFrom(
backgroundColor: Colors.black,
padding: const EdgeInsets.symmetric(horizontal: 20, vertical: 12)),
onPressed: () {
cropService.cropDocument('assets/img_sample2.JPG');
},
icon: const Icon(
Icons.crop,
color: Colors.white,
size: 32,
),
label: const Text('Crop Image',
style: TextStyle(color: Colors.white, fontSize: 32),
),
),
Our crop function is defined as follows
Future<void> processAndTrimImage(String sourceAssetPath) async {
final rawImageData = File(sourceAssetPath).readAsBytesSync();
final baseVisualRepresentation = decodeImage(rawImageData);
if (baseVisualRepresentation == null) {
print('Failed to decode initial image data from: $sourceAssetPath');
return;
}
await _cannyAlgorithm(baseVisualRepresentation);
final temporaryWorkArea = await getTemporaryDirectory();
final edgeFeatureMapPath ='${temporaryPath.path}/map_visual.png';
File(edgeFeatureMapPath)
.writeAsBytesSync(encodePng(baseVisualRepresentation));
print('Edge feature map saved to: $edgeFeatureMapPath');
await _extractDocument(edgeFeatureMapPath, sourceAssetPath);
}
After this step, we will have the following output and can proceed to the next phase.
Find the outermost borders of the document
The Canny Edge Detector is incredibly effective at identifying all edges within an image, but for our document cropping purpose, this presents a challenge. It will highlight not only the document's perimeter but also internal features like fingerprints, facial outlines, and even individual letters. Therefore, we need a way to isolate the document boundaries specifically.
To do this, we make the assumption that the document is roughly centered in the image. Based on this assumption, we perform the following:
Left Border: We start scanning from the leftmost column, moving inwards towards the center. The moment we hit a white pixel, we record its horizontal coordinate. This marks our minX.
Right Border: Similarly, we scan from the rightmost column, moving inwards. The first white pixel we find gives us the maxX coordinate.
Top Border: We begin scanning from the topmost row, moving downwards. The first white pixel encountered defines our minY.
Bottom Border: Finally, we scan from the bottommost row, moving upwards. The first white pixel here gives us our maxY.
These four coordinates (minX, minY, maxX, maxY) precisely define the rectangular bounding box (assuming the document is not tilted) of your document within the image, setting the stage for the final cropping step
Future<void> _extractDocument(
String featureMapPath, String originalAssetPath) async {
final featureMapFile = File(featureMapPath);
final featureMapRawBytes = await featureMapFile.readAsBytes();
final featureMapImage = decodeImage(featureMapRawBytes);
final originalFile = File(originalAssetPath);
final originalRawBytes = await originalFile.readAsBytes();
final originalVisual = decodeImage(originalRawBytes);
if (featureMapImage == null || originalVisual == null) {
print('Failed to interpret required image data for trimming.');
return;
}
final imageWidth = featureMapImage.width;
final imageHeight = featureMapImage.height;
final centralHorizontalPoint = imageWidth ~/ 2;
final centralVerticalPoint = imageHeight ~/ 2;
int? boundaryX1, boundaryX2, boundaryY1, boundaryY2;
const int trimOverlapPixels = 3;
bool isProminentEdge(Pixel px) =>
px.r > 12 && px.g > 12 && px.b > 12 && px.a > 12;
final horizontalScanBuffer = (imageWidth * 0.025).toInt();
final verticalScanBuffer = (imageHeight * 0.025).toInt();
for (int x = horizontalBuffer; x <= centralPoint; x++) {
if (isProminentEdge(Image.getPixel(x,centralPoint))){
boundaryX1 = x;
break;
}
}
//Repeat for Top->Center, Right->Center, Bottom->Center
Create a cropping area and crop
If all four coordinates are found, we use Flutter's Image package to create a cropping area using the copyCrop() function
if ([boundaryX1, boundaryX2, boundaryY1, boundaryY2].any((b) =>
b == null)) {
print('Fail');
return;
}
final finalCropLeft = (boundaryX1! - trimOverlapPixels).clamp(0, imageWidth);
final finalCropTop = (boundaryY1! - trimOverlapPixels).clamp(0, imageHeight);
final finalCropRight = (boundaryX2! + trimOverlapPixels).clamp(0, imageWidth);
final finalCropBottom = (boundaryY2! + trimOverlapPixels).clamp(0, imageHeight);
final croppedImage = copyCrop(
originalVisual,
x: finalCropLeft,
y: finalCropTop,
width: finalCropRight - finalCropLeft,
height: finalCropBottom - finalCropTop,
);
Finally, it just needs to be saved.
final finalProcessedAssetPath = originalAssetPath.replaceFirst(
RegExp(r'\.(\w+)$'), '_refined_doc.JPG');
await File(finalProcessedAssetPath) .writeAsBytes(encodePng(croppedImage ));
print('Document cropping successful! Output saved to: $finalProcessedAssetPath');
Conclusion
Automatic document cropping is an essential feature in modern applications, transforming the manual process into an efficient experience. By leveraging the power of the Canny Edge Detector and a strategic approach to identify document boundaries, we can achieve highly accurate and consistent results. The multi-stage Canny Algorithm effectively pinpoints edges, and our simple strategy further refines it by locating the outermost borders. This method not only enhances user satisfaction but also ensures the production of clean, scan-ready documents, demonstrating the practical and powerful application of computer vision in everyday mobile solutions.
Comment down below if you have any improvements for this algorithm!
Top comments (0)