My pet-project is about food recognition. More info here.
Scrapy made me a folder with images and a .csv with rows like:
Apple Cake,"Some apple cake description...",https://www.some-recipes-website.ru/binfiles/images/20200109/m12b509e.jpg,"[{'url': 'https://www.some-recipes-website.ru/binfiles/images/20200109/m12b509e.jpg', 'path': 'full/ae00a78059ad08506aa4767ed925bef5dccabf63.jpg', 'checksum': '55088c744a564af5ed8d4e5ea6478d20', 'status': 'downloaded'}]"
Now, I needed to create .csv files like this:
pic-037.jpg,80,20,500,120,risotto
pic-025.jpg,520,250,1152,953,risotto
pic-with-nothing.jpg,,,,,
pic-004.jpg,0,0,1600,1113,beans
...
To do that, I googled smth like "best machine learning label tools 2022" and found Label-studio. I followed these steps from their docs:
python3 -m venv env
source env/bin/activate
python -m pip install label-studio
But I couldn't launch label-studio until I did some of these things:
pip install wheel
pip install spacy
pip install cymem
brew install postgresql
(link1, link2 might be helpful)
I didn't have any problems importing my data to Label-studio. The only setting I had to do was to add my labeling interface code:
<View>
<Image name="image" value="$image"/>
<Choices name="choice" toName="image" showInLine="true">
<Choice value="Salad" background="blue"/>
<Choice value="Soup" background="green" />
<Choice value="Pastry" background="orange" />
<Choice value="Nothing" background="orange" />
</Choices>
<RectangleLabels name="label" toName="image">
<Label value="Salad" background="green"/>
<Label value="Soup" background="blue"/>
<Label value="Pastry" background="orange"/>
<Label value="Nothing" background="black"/>
</RectangleLabels>
</View>
After that, labeling interface looked like this:
I had label "Nothing" for confusing images that I decided to exclude from the dataset:
After I finished labeling a portion of images, I chose to export them in .csv format. I got a file with rows like this:
/data/upload/1/83d8ce57-7478f053119ca2a85c4932870ef3e1833eb3eeb5.jpg,14,Pastry,"[{""x"": 13.157894736842104, ""y"": 6.5625, ""width"": 41.578947368421055, ""height"": 74.375, ""rotation"": 0, ""rectanglelabels"": [""Pastry""], ""original_width"": 570, ""original_height"": 320}]",1,20,2022-06-06T07:11:56.377261Z,2022-06-10T21:02:35.758255Z,1209.599
I was surprised when I saw x, y, width and height. Then I've read in the docs that "Image annotations exported in JSON format use percentages of overall image size, not pixels, to describe the size and location of the bounding boxes."
I wrote a small python script to check exported regions:
from PIL import Image
img = Image.open('../../images/full/7478f053119ca2a85c4932870ef3e1833eb3eeb5.jpg')
x = 13.157894736842104
y = 6.5625
width = 41.578947368421055
height = 74.375
original_width = 570
original_height = 320
pixel_x = x / 100.0 * original_width
pixel_y = y / 100.0 * original_height
pixel_width = width / 100.0 * original_width
pixel_height = height / 100.0 * original_height
left = pixel_x
upper = pixel_y
right = pixel_x + pixel_width
lower = pixel_y + pixel_height
box = (left, upper, right, lower)
region = img.crop(box)
region.show()
When I launched the script, it showed me the correct region cropped out of the original image:
So now I understood how to get pixel annotations if I need them.
Next post is going to be about trying to feed the dataset to Retinanet. I am going to use only 48 images scraped so far just to see what format of input it really needs.
Top comments (0)