The first thing we're tasked with is to implement manually a function that calculates the intersection over union between 2 boxes, let's assume each box comes with a list of coordinates, [x1, y1, x2, y2] such as (x1, y1) are the coordinates of the top-left point and (x2, y2) are the bottom-right one
note: just learned that, unlike the cartesian axes we used to in math, where the center (0,0) starts at the bottom left and the y-axis increases as we go up, here, when working with computer vison and image processing, the center starts at the top-left and the y increases as we go downward, the x-axis stays and behaves the same, which kinda makes sense when u think of how images are 2D matrices, we start reading them from the top left pixel and we proceed right and then down so it makes sense that indeces moves like that, increase as we go down for the y-axis
now for how we actually calculate this IoU, let me try rephrasing the logic of this, so why intersection over union, we want the box the computer guesses to be, as much as possible, the same as the ground truth box (manually annoted by the human supervisor), so the predicted box needs to overlap, same coordinates, same surface and all, with the ground-truth bounding box
so if the two boxes do end up overlapping completly, that means the area of their intersection is the same as their union one, so same number / same number = 1, the guess was 100% "correct", if instead they don't overlap at all, so the intersection area is 0, then 0 / whatever = 0, that means the model is 0% correct, completly off
so now lets implement this:
first we need to calculate the intersection area, we need to get the coordinates of the intersection box they create
so like in this case the top-left point of the intersection is the top-left point of the box2, we can see this, but it is not always the case, it might as well be the other way around in other cases, we cant control what or where the computer guess the object to be and draw a box around it, it can be anywhere, we just know that that point, for our calculation, needs to be the "biggest" between the x1 of the two boxes, so the top-left point of the box with the biggest x1 and y1
for the bottom right one tho, it needs to be the smallest bottom-right point
next we calculate the width and height (to calculate the area with), but here we need another safety net, bcz lets imagine this, if we get an example where the boxes dont overlap at all, the model guesses a completly wrong box, no intersecyion, logically that means the area of intersection should be equal to 0, we know that, but we need to make the computer do that, bcz right now, lets say we have 2 boxes [0,0,2,2] and [4,4,5,5] if we dont add the safety net, python will just calculate things like:
x1_inter = 4
x2_inter = 2
w_inter = 2-4 = -2 a negative width!
which quite literally is wrong and doesnt exist or mean anything geometrically, but python doesnt know that it simply just calculate whatever is asked of it, w_inter and h_inter will both end up being negative numbers, and there multiplication will be a positive one and we just end up with a very wrong result that shouldnt be
to handle this very situation, we add the safety net that will just set the intersection box width and heigh to 0 if the ones we calculate ends up being negative numbers, thus we end up with an intersection area = 0 (correct and logical) and the iou result is 0, 0% match
def IoU(box1, box2):
#assuming they come with 4 coord, x1, y1, x2, y2,top-left, bottom-right
x1_inter = max(box1[0],box2[0])
y1_inter = max(box1[1], box2[1])
x2_inter = min(box1[2], box2[2])
y2_inter = min(box1[3], box2[3])
# this is very needed in case the boxes dont overlap at all
w_inter = max(0, (x2_inter - x1_inter))
h_inter = max(0, (y2_inter - y1_inter))
area_inter = w_inter * h_inter
w1 = box1[2] - box1[0]
h1 = box1[3] - box1[1]
area1 = w1* h1
w2 = box2[2] - box2[0]
h2 = box2[3] - box2[1]
area2 = w2* h2
area_union = area1 + area2 - area_inter
iou = area_inter / area_union
return iou
note we need to substract the area of intersection from the union here bcz it is basically counted in twice, once in area of box1 and a second time for box2 so


Top comments (0)