Yoshi

Posted on Feb 18, 2025

What I Learned from Tracking My Child’s YouTube Activity

#gas #tutorial

In modern parenting, it’s difficult to completely block YouTube. When I was a child, I had limited options for video content, either watching TV in real-time or viewing pre-recorded videos on VHS or DVD. Today's children, however, are exposed to endless content from platforms like YouTube from a young age. While this has its benefits, such as discovering new genres or learning niche knowledge that even adults may not be aware of, there are times when the sheer addictiveness of YouTube is surprising. Kids ask to watch it during spare moments, and even when we're out and about.

Until children reach around elementary school age, parents can usually manage their YouTube usage, but once they grow older, they start searching for new videos and exploring recommendations on their own. In our household, we noticed that our child was gradually watching only game YouTubers, and we began to think about what to do next. Fortunately, YouTube allows you to extract tracking data, so I decided to create a tool that would let us monitor their activity by processing that data.

Concept

The information available from YouTube history includes three main points: title, uploader, and the timestamp of the start time. Due to my professional background, I've created numerous dashboards and reporting systems, and I’ve found that, in most cases, usage tends to decrease over time. When building a reporting system, it's easy to get caught up in visualizations and enriching the data with additional details. However, for this particular purpose, I kept the idea of "simple is best" in mind and refined the necessary information through two perspectives:

Viewing Time Perspective
1. Total viewing time
2. Daily viewing time
Content Perspective
1. Viewing time by channel
2. Extraction of undesirable content

Viewing Time

Since the only data available was the timestamp of the start time, I calculated the viewing time by comparing the difference with the previous data. I also set a rule to treat any viewing gap of 90 minutes or more as the end of a session. If there was no data for a session end or if only one video was watched in a day, I couldn’t calculate the difference, so I filled in these gaps with the session median or pre-set approximate values.

Content

Once the viewing time is calculated, viewing time by channel is simply aggregated. For extracting undesirable content, I used the OpenAI API to assess whether a title could potentially harm children born in "yyyy/mm." This judgment is made on a 4-level scale, and titles rated as "Potentially Harmful (Level 3)" or "Harmful (Level 4)" are reported.

Implementation

For this project, I used GAS (Google Apps Script), a simple automation tool from Google. While I considered integrating dashboards and linking with messenger apps, the primary goal was to monitor numbers and content. Therefore, I kept it simple and mostly completed the task within Google's ecosystem.

Delivery Content

Using GAS, I set up a simple system to send reports to a specified email address with text and graphs attached. When the command is executed, the report content is sent via email to both my wife and me, as shown in the example below (the title/channel names have been changed to fictitious ones).

Here is your latest YouTube watch time report.
Period: 2025-01-24 (Thu) - 2025-02-13 (Wed)

++++++++++++++++++++++++++++++
Total Watch Time: 11h 41m
++++++++++++++++++++++++++++++

■■■ Overview:
YouTube Watch History Analysis:

Safe (Level 1): 131 videos
Mildly Inappropriate (Level 2): 3 videos
Potentially Harmful (Level 3): 1 videos
Harmful (Level 4): 0 videos

Collaboration Project: "The Hide-and-Seek Game where You are Attacked by a Maniacal Clown at a Private Hotel is Insane... [Screaming]"

■■■ Top 10 Most-Watched Channels:

Craft Life: 6h 30m
Craft World: 0h 58m
Game Master XYZ: 0h 51m
Akane-Iro: 0h 39m
MEN Play: 0h 32m
Sports NOW: 0h 24m
Creators TV: 0h 18m
Odin Studio: 0h 16m
Hikaru's Play: 0h 8m
Football Highlights: 0h 7m

■■■ Daily Watch Time:
Date: TotalWatchTime (StartTime)

2025-01-24 (Thu): 0h 40m (11:27)
2025-01-25 (Fri): ---- (--:--)
2025-01-26 (Sat): 0h 57m (08:49)
2025-01-27 (Sun): ---- (--:--)
2025-01-28 (Mon): ---- (--:--)
2025-01-29 (Tue): 0h 10m (17:04)
2025-01-30 (Wed): ---- (--:--)
2025-01-31 (Thu): ---- (--:--)
2025-02-01 (Fri): 2h 11m (15:52)
2025-02-02 (Sat): 0h 45m (15:30)
2025-02-03 (Sun): 1h 23m (15:59)
2025-02-04 (Mon): ---- (--:--)
2025-02-05 (Tue): 0h 21m (17:00)
2025-02-06 (Wed): ---- (--:--)
2025-02-07 (Thu): 1h 9m (14:55)
2025-02-08 (Fri): ---- (--:--)
2025-02-09 (Sat): 2h 26m (06:53)
2025-02-10 (Sun): ---- (--:--)
2025-02-11 (Mon): ---- (--:--)
2025-02-12 (Tue): ---- (--:--)
2025-02-13 (Wed): 1h 39m (15:30)

====================

Pros and Cons

Points of Success

The children's viewing time has been visualized with a reasonable level of quality.
- It allows discussions with the child based on data, removing subjectivity.
- We could confirm that the child was waking up early in the past to watch YouTube.
- We gained insight into the child’s main viewing channels.
- While I knew the child was watching craft games, it turned out they were mostly watching games.
- Surprisingly, soccer, another hobby, wasn’t being watched much.
We can check for titles that might have a harmful impact.
- Since it also affects suggestions, I remove any harmful titles from the history once checked.

Points for Improvement

Not fully automated.
- YouTube Takeout can only be scheduled bi-monthly, up to 6 times.
- For monthly reports, extracting data for two consecutive months requires a specified date setting.
- There are also restrictions with the child's account...
Restrictions with the child’s account:
- Can't automate with GAS for the child’s account.
- Can't forward emails from the child's account either.
Limited information available.
- Created based on title, uploader, and start time, within the available data range.
- If richer information could be fetched via the YouTube API...

After creating the report, I discussed it with my child and was surprised by how much more they were watching than expected. I also realized that as parents, we allowed continuous YouTube watching due to our own busyness. I plan to prepare better alternatives for the child to spend time outside of videos.

Technical Details

Since this was my first time using GAS, I approached it with the principle of simplicity and prioritizing Google services. Below are the services used and their respective purposes:

Google TakeOut: Retrieve YouTube data and set up periodic execution.
Documents: Transfer viewing data between parent and child, and store various data.
Spreadsheets: Expand JSON data and create graphs.
Gmail: Send report emails.
GAS: Integrate and execute services, as well as perform numerical calculations for the report.
OpenAI: Assess the potential danger of video titles.

The data flow proceeds as follows:

Points to Consider During Implementation

The code is available in this Git repository.

Data Retrieval

Since YouTube is viewed on a child-specific account, the viewing history data is retrieved from the child's account. By accessing Google’s TakeOut service through the child's account, YouTube viewing data can be extracted. The extracted data can then be received by attaching a download link to an email or storing it in various storage services. For this implementation, I chose to store it in Google Documents.

Unfortunately, there are two obstacles preventing the full automation of the report at this stage, both of which are present in this step:

Scheduled Distribution Frequency: The extraction schedule can only be set for bi-monthly intervals, with a maximum of one year. While one year is acceptable, the bi-monthly frequency creates gaps that are too large, so if you want monthly schedules, you need to set up extraction for two consecutive months.
Child Account Restrictions: With a child’s account, automatic email forwarding and the use of GAS are restricted. As a result, ideas like automatically forwarding emails to the parent’s account or automatically placing data in the shared folder for parent-child use in Google Docs couldn’t be used. Instead, manual storage became necessary. However, this can be resolved in the future, as external storage like Box is available for storing the data.

Extracting ZIP Files

The data distributed from TakeOut was in a multi-layered ZIP format, which couldn't be extracted using GAS’s standard utilities. Fortunately, there is a ZIP library available for use in GAS. Referring to this Japanese article, I was able to create a library and successfully extract the ZIP files.

Estimating and Filling Missing Viewing Data

Viewing time is extracted by comparing the difference from the previous data. However, the following issues arose. In the following, I use the term session to refer to a continuous viewing period.

Significant viewing time between sessions, ranging from a few hours to dozens of hours.
The final timestamp in the data cannot provide a value.

To address these issues, I took the following measures:

For the final data within a session, I used the median value of the session to fill the gap.
If there was only one data point within a session, I filled the gap with a provisional value of 20 minutes.

By doing this, although the numbers are estimates, I was able to calculate values that are reasonably close to the actual ones.

Harmful Title Evaluation

Although Gemini would be the ideal choice within the Google ecosystem, since I haven't used it much, I decided to leverage OpenAI's API, which I am more familiar with. The threshold for determining harmful content seemed to vary depending on age, so I set the birthYearMonth in GAS's Script Properties to adjust the evaluation based on the actual age.

function analyzeWithOpenAI(titles, birthYearMonth) {
  var apiKey = PropertiesService.getScriptProperties().getProperty("OPENAI_API_KEY");

  if (!apiKey) {
    Logger.log("❌ API key is missing in Script Properties.");
    return { error: "API key is missing. Please set OPENAI_API_KEY in Script Properties." };
  }

  var url = "https://api.openai.com/v1/chat/completions";

  var riskCounts = { "1 (Safe)": 0, "2 (Mildly inappropriate)": 0, "3 (Potentially harmful)": 0, "4 (Harmful)": 0 };
  var harmfulTitles = [];
  var analyzedResults = [];

  titles.forEach(title => {
    var messages = [
      { role: "user", content: `Evaluate the following YouTube video title for harmful content for a child born in ${birthYearMonth}.
                                 Classify it into four levels:
                                 - 1 (Safe)
                                 - 2 (Mildly inappropriate)
                                 - 3 (Potentially harmful)
                                 - 4 (Harmful)

                                 Return only the numeric classification.

                                 Title: ${title}` }
    ];

    var payload = {
      model: "gpt-4o",
      messages: messages,
      temperature: 0.3
    };

GAS Execution Time

When making requests to the OpenAI API, I process each title individually, inserting a 500ms sleep time between each request. Without considering the API request processing, the maximum number of tasks that can be handled on GAS is 720.

Since the volume was about 72 titles over two weeks, this was not an issue. However, if the usage becomes heavier or if I need to search for more videos like in the case of short videos, it may break down. In the future, I plan to filter videos with over 5 minutes of viewing time, classify titles, and process multiple titles in batches.

GAS Module System

I prepared function files for each feature, but GAS lacks a module system, so I had to determine the corresponding module file based on the function name alone. This became a bit cumbersome as the number of files increased. It’s likely because GAS isn't designed for complex applications, but this was a small stumbling point.

Future Plans

Although it’s manual for now, I plan to run the report system at least once a month. This has become a topic for both me and my child to engage with YouTube more thoughtfully. Ideally, the platform would provide such reports as part of its core features, but I’m not holding my breath for that. It's more of a "nice-to-have" feature.

DEV Community