I was recently tasked with extracting text from an invoice using the Google Vision API.
I've used Google Vision before however mainly for pages from a book where text is simply top to bottom and straight lines which were left to right. As a standard Vision goes a great job of this, however, I found when reading an invoice which would have (for instance) the product purchased on the left and the price on the right of the image, when Vision returned the raw response it would take the right hand side "columns" and place them at the bottom of the text and this would leave the response unstructured and hard to read in plain text.
The challenge I had was that my client needed the data as structured as possible to ensure data integrity.
I thought I would share my experience and the approach I took to accomplish this task.
The first step in the process was to read the image with Google Vision. This required converting the image to a base64 encoded string so that it could be sent in the request to the API, I then used the Guzzle HTTP client to send a POST request to the Google Vision API:
// Get image contents as base64
$image_base64 = base64_encode(file_get_contents("path/to/your/image.jpg"));
// Use Guzzle to get the OCR data
$client = new \GuzzleHttp\Client();
$yourApiKey = "YOUR API KEY HERE";
$response = $client->request('POST', "https://vision.googleapis.com/v1/images:annotate?key={$yourApiKey}", [
'json' => [
'requests' => [
[
'image' => [
'content' => $image_base64
],
'features' => [
[
'type' => 'TEXT_DETECTION',
'maxResults' => 1,
]
]
]
]
],
]);
// Get the response JSON from Vision
$response = json_decode( (string) $response->getBody(), 1 );
Once I received the response from the API, I decoded the JSON and extracted the text annotations from the data.
Next, I needed to extract the vertices for the whole receipt from the textAnnotations array and calculate the "center" of the receipt text. I accomplished this by using array_reduce to iterate through the $centervertices array and sum the x and y value:
// Get textAnnotations
$textAnnotations = $response['responses'][0]['textAnnotations'];
// Get verticies for the whole receipt
$centervertices = $textAnnotations[0]['boundingPoly']['vertices'];
// Calculate the "center" of the receipt text
$centerA = [
"x" => array_reduce($centervertices, function($carry, $item) {
return $carry + $item['x'];
})/count($centervertices),
"y" => array_reduce($centervertices, function($carry, $item) {
return $carry + $item['y'];
})/count($centervertices)
];
Then I needed to extract the vertices for the longest string from the textAnnotations array, calculate the "center" of the longest string, and calculate the angle the receipt is running on. I used the array_reduce function to extract the longest string and atan2 and pi function to calculate the angle.
// Get vertices for the longest string
$vertices = array_reduce(array_slice($textAnnotations,1), function($carry, $item) {
return strlen($item['description']) > strlen($carry['description']) ? $item : $carry;
})['boundingPoly']['vertices'];
// Calculate the "center" of the longest string
$centerB = [
"x" => array_reduce($vertices, function($carry, $item) {
return $carry + $item['x'];
})/count($vertices),
"y" => array_reduce($vertices, function($carry, $item) {
return $carry + $item['y'];
})/count($vertices)
];
// Calculate the angle the receipt is running
$xDiff = $vertices[0]["x"] - $centerB["x"];
$yDiff = $vertices[0]["y"] - $centerB["y"];
$angle = (atan2($yDiff, $xDiff) * 180 / pi()) + 180;
$angle_to_rotate = -(pi() * ($angle-5) / 180);
The final steps were to re-sort the lines of text by remapping and sorting the rows and words after fixing the rotation. I used the array_map function to update the coordinates of the vertices and adding a new key 'lineSort' to each row. Then, I used usort function to sort the array based on the 'lineSort' key. I did the same for the words adding a new key 'columnSort', then using usort to sort the array and then imploded using a space:
// Remap/sort the rows after fixing rotation
$textAnnotationsSorted = array_map(function($row) use ($angle,$angle_to_rotate,$centerA) {
$vertices = $row['boundingPoly']['vertices'];
$new_vertices = array();
foreach ($vertices as $vertex) {
$x = $vertex["x"] - $centerA["x"];
$y = $vertex["y"] - $centerA["y"];
$new_x = ($x * cos($angle_to_rotate)) - ($y * sin($angle_to_rotate)) + $centerA["x"];
$new_y = ($x * sin($angle_to_rotate)) + ($y * cos($angle_to_rotate)) + $centerA["y"];
$new_vertices[] = ["x" => $new_x, "y" => $new_y];
}
$row['boundingPoly']['vertices'] = $new_vertices;
$row['lineSort'] = $row['boundingPoly']['vertices'][0]['y'];
return $row;
}, array_slice($textAnnotations,1));
usort($textAnnotationsSorted, function($a, $b) {
return $a['lineSort'] - $b['lineSort'];
});
$textAnnotationsSorted = array_values($textAnnotationsSorted);
// Setup faux rows
$newRows = [];
$index = 0;
// Setup base Y vertical
$curY = $textAnnotationsSorted[0]['boundingPoly']['vertices'][0]['y'];
// Loop the sorted rows and append faux rows
foreach( array_values($textAnnotationsSorted) as $v ) {
if( $v['boundingPoly']['vertices'][0]['y'] > $curY + 15) {
$index++;
}
if( $v['boundingPoly']['vertices'][0]['y'] < $curY - 15 ) {
$index--;
}
$newRows[$index][] = $v;
$curY = $v['boundingPoly']['vertices'][0]['y'];
}
// Loop faux rows and sort columns
foreach( $newRows as &$row ) {
$row = array_map(function($v){
$v['columnSort'] = $v['boundingPoly']['vertices'][0]['x'];
return $v;
}, $row);
usort($row, function($a, $b) {
return $a['columnSort'] - $b['columnSort'];
});
$row = implode(" ", array_map(function($item) {
return $item['description'];
}, $row));
}
I was then able to echo out the reformatted OCR data:
// Echo the newly orders OCR data
echo implode("\n", $newRows);
I hope this tutorial helps others understand the process I took to return a more human readable OCR response.
Top comments (0)