Hatem Elseidy

Posted on Oct 8, 2023

Part 4 - Page Processor

Introduction

In this part of the series, we will discuss the Page Processor (link). Below are 2 examples from a story called "The Dark Forest", that raw input of the page processor on the left and the expected output on the right. And as you can see, one time we put the text right and another time we put it on the left. This is a simple alternating logic that we will discuss here as well.

Text Image

Let's start with writing text to an image. Here's a simple method that will draw white text on black background image with the default font.

def text_image():
    # input parameters
    text_message = "This is the fantastic story line"
    width, height = 256, 64    # image size
    bg_color = (0, 0, 0)    # black color
    font_color = (256, 256, 256)    # white color  

    # create an empty image with width, height and background color
    image = Image.new('RGB', (width, height), bg_color)

    # draw text on image
    draw = ImageDraw.Draw(image)
    draw.text((10, 30), text_message, fill=font_color)

    return image

This would result in the following

Now let's improve it by using a custom font. I used a font called Playfulist and you can find it here. We need to define a font size, use to define a font object and pass the font object to draw.text.

...
font_size = 12
font = ImageFont.truetype("Playfulist.ttf", font_size)
...
draw.text(
    (10, 30),
    text_message,
    font=font,
    fill=font_color,
)

Now, instead of hard-coding (10, 30) as the text location, which is the top left corner of the text, let's put it in the center of the image. To do so, we need first to calculate the size of the text in pixels. Hence, we will use ImageDraw.textbbox which will return a bounding box (in pixels) of given text relative to given anchor when rendered in font with provided direction, features, and language (reference).

We can call draw.textbbox then draw.rectangle. to visualize it.

x, y, w, h = draw.textbbox((0, 0), text_message, font=font)
draw.rectangle((x, y, w, h), outline=(0, 255, 0), width=1)

To shift that box to the center of the image we need a little bit of geometry. We know the center point of the image lies at x = width/2 and y = height/2. Let's highlight it with blue color.

Now, we want the green box to be centered around the blue dot exactly. But when we render text, we just need the top left corner. So, given our knowledge of the overall textbox size, we need to calculate the top left corner in respect to the center point. How do we do that? We start at the center point and we move left half of the width of the textbox and we move up half of the height of the text box.

top_left_x = center_x - (bbox_width / 2)
top_left_y = center_y - (bbox_height / 2)

# center point of the image
center_x = width / 2
center_y = height / 2

# from the bounding box results from 'draw.textbbox'
bbox_width = w - x
bbox_height = h - y

"""
But we also know that we started from 0, 0 when
calculating the bounding box. We can just ignore x, y.
Although, when we look at the actual values in the above example, we got x = 0 and y = 3. That's probably
because PIL adds some margin for specific fonts.
But we can ignore the tiny little margins to simplify
the calculation.
"""

# assuming x = 0 and y = 0
bbox_width = w
bbox_height = h

# Hence
top_left_x = (width / 2) - (w / 2)
top_left_y = (height / 2) - (h / 2)

# And we can simplify it more to 
top_left_x = (width - w) / 2
top_left_y = (height - h) / 2

Let's draw a rectangle with these calculations first by using:

draw.rectangle(((width - w) / 2, (height - h) / 2, (width - w) / 2 + w, (height - h) / 2 + h), outline=(255, 0, 0), width=1)

Then, let's add back the text. And as you can see below, it fits the expected rectangle exactly.

draw.text(
    ((width - w) / 2, (height - h) / 2),
    text_message,
    font=font,
    fill=font_color,
)

Removing the guide lines, we get the following nicely centered text.

Justify Text

To break the story sentences into multiple lines to look better on the page, I chose a library called JustifyText. All we need to do is give it the original sentence and the character width, and it will break it down into smaller sentences and return them as an list of strings.

text_message = "Once upon a time, there was a small village nestled deep in the heart of the Dark Forest."
results = justify(text_message, 20)
for r in results:
    print(r)

Running this will result the following same length strings.

Once  upon  a  time,
there  was  a  small
village nestled deep
in the heart of  the
Dark Forest.

And with exact same image calculations we did before, we get this.

Background Color

We are almost done with the text part of the image. As a last step what if we can set the background color of the text part to be somehow relevant to the image part of the page. To do this, we can calculate the dominant color of the image, lighten it to always keep a light background (that's a personal preference) and set the background color of the text part to that calculated color.

We can find a dominant color algorithm on the internet and try to implement it, but like lots of other things in this world, there's already a library that can do this for us. It's called fast-colorthief. We will just need to convert the image to numpy array before calling get_dominant_color.

from fast_colorthief import get_dominant_color

rgba_image = story_page_content.image.convert("RGBA")
ndarray = numpy.array(rgba_image).astype(numpy.uint8)
dominant_color = get_dominant_color(ndarray, quality=1)

And to lighten the color we can use the following method. The idea behind this method is that we want to move the color to the white side of the color spectrum with a factor called _BACKGROUND_TINT_FACTOR, on the 3 RGB values. For example, given a single value between 0 and 255 that's 150, to make it need to move it towards 255. Hence, we need to add another value to it. How much do we add, in the below method, I calculated the difference between 255 and the current value e.g. 255 - 150 that's 105, then multiply that by the tint factor e.g. 0.7. That gives us 73, which we then add to 150 to get 223. So, we moved it towards 255 by 70% factor. In order for this to make more sense, think about the extreme cases, a tint factor of 1.0 would give us 255 for any value, that's white, 100% lightened. A factor of 0.0 would give us the original value, which means it's lightened by 0%.

def _lighten_color(self, color: Tuple[int, int, int]) -> Tuple[int, int, int]:
    return (
        int(color[0] + (255 - color[0]) * self._BACKGROUND_TINT_FACTOR),
        int(color[1] + (255 - color[1]) * self._BACKGROUND_TINT_FACTOR),
        int(color[2] + (255 - color[2]) * self._BACKGROUND_TINT_FACTOR),
    )

The below image shows, the generated image on the left, the calculated dominant color in the middle and the lighted version of the right. Now, we can make that the background color for the text part of the image to make it better blend with the generated image in the same page.

Going back to text image section, you can easily see we can just change the input parameters for background color and font color to lightened version of dominant color and black (or any other color) respectively. Here's an example:

Note: to make the background color of the text image even cooler, I added a gradient effect that you can find here. I'll skip it in this tutorial but you can take a look if you are interested.

Put them together

Now that we have the generated image and a nice looking text part, let's put them together. One on the left side and one on the right side. To do this the 2 images must be of the same height. We can create an empty image pf the expected size and paste each part on the correct location.

result = Image.new(
    "RGB", (image_left.width + image_right.width, image_left.height)
)
result.paste(image_left, (0, 0))
result.paste(image_right, (image_left.width, 0))

And to make sure that it looks and feels like a book, we can alternate between text on left and text on right every other page.

if int(story_page_content.page_number) % 2 == 0:
    page_image: Image.Image = self._concat_horizontally(
        story_page_content.image, text_img
    )
else:
    page_image = self._concat_horizontally(text_img, story_page_content.image)

Paper Wrinkling Effect

That's actually a nice simple technique that you can use to add any other effect to the whole page. It blends an existing page image (here) with the full page with a factor. Using Image.blend from PIL.

def _add_paper_effect(self, page_image: Image) -> Image:
    paper: Image = Image.open(self._PAPER_IMAGE_PATH).convert(page_image.mode)
    paper = paper.resize(page_image.size)
    return Image.blend(page_image, paper, self._PAPER_BLEND_FACTOR)

Bring it all together

To bring it all together we need to do the following steps:

Calculate dominant/lightened background color.
Create text image
Concatenate image and text parts
Add paper wrinkling effect
Save and return

def create_page(
    self,
    workdir: str,
    story_page_content: StoryPageContent,
    audio: AudioInfo,
    story_size: StorySize,
) -> StoryPage:

    # calculate dominant/lightened background color
    background_color = self._calculate_background_color(story_page_content)

    # create text part
    text_img: Image.Image = self._create_text_image(
        size=(story_size.text_part_width, story_size.text_part_height),
        bg_color=background_color,
        message=story_page_content.sentence,
        font=ImageFont.truetype(self._FONT, story_size.font_size),
        font_color=self._BLACK_COLOR,
    )

    # alternate between text on left and text on right
    if int(story_page_content.page_number) % 2 == 0:
        page_image: Image.Image = self._concat_horizontally(
            story_page_content.image, text_img
        )
    else:
        page_image = self._concat_horizontally(text_img, story_page_content.image)

    # add paper effect
    page_image = self._add_paper_effect(page_image)
    page_filepath = os.path.join(
        workdir, f"page_{story_page_content.page_number}.png"
    )

    # save and return
    page_image.save(page_filepath)
    return StoryPage(
        page_content=story_page_content,
        page_image=page_image,
        page_filepath=page_filepath,
        audio=audio,
    )

In this section, we did lots of image processing, we played with colors and fonts, we did a little bit of geometry to correctly place the text in the center. We can also keep adding endless effects to how the page look and feel. And there are also lots of parameters we can play with. This includes, font size and color, tinting factors for background color and paper effect, etc.

Now that we have the nice looking pages built in this section and the our AI narrator ready from previous section. Let's compile the final video in the next part.