DEV Community

Cover image for Social Learning Journal - Density
Justin L Beall
Justin L Beall

Posted on

Social Learning Journal - Density

In my previous post, we were able to create a classifier to see how the number of learning events was distributed for the year, Social Learning Journal - Classification.

Alt Text

This is a good start, but not all events are created equally. So this leads us to the question of how to evaluate the density of each event. As I have pondered this with the progression of this project, I have landed on a weighted time value - Knowledge Consumption Velocity (kcv). This is a made-up unit by which each various medium can be converted into. In future posts, I will go into detail about each learning medium, but for now, let's quickly look at one example.

Book vs AudioBook Example

How could we compare a book that is read to another event, like listening to a book?

Book

Alt Text

We have a constant, the number of pages in the book. In the above picture, this is annotated as "^432p". I utilize the ^ as an indicator of density. The unit of p stands for "pages". So given this event, a 432-page book was finished.

Our other two variables are a bit more subjective (words per page and reading speed). Publishers have a recommended number of words per page, which is 250-300 words per page. Additionally, we need to assume the average reading speed, around 150 words per minute for a non-fiction technical book. With this information, we are able to do a little bit of math to calculate a weighted time value (aka kcv).

given:
- 432 pages (p)
assumed:
- 250 words per page (wpp)
- 150 read words per minute (rwpm)

kcv = (p * wpp) / rwpm
kcv = (423 * 250) / 150
kcv = 705
Enter fullscreen mode Exit fullscreen mode

With our above example, we land on a value of 705, which is an estimate of the weighted time that can be used to compare to other events.

Audiobook

Alt Text

Audiobooks are a little easier. The duration of the production is constant here. This event is annotated as "^595m". The unit here is m which stands for minutes. Given this event, we have finished a 595-minute audiobook. We assume this is our kcv value.

Comparison

We can now compare the book and audiobook based upon a shared variable. This calculation is not perfect, and tweaking the book variables slightly can produce vastly different results. In addition, sometimes people listen to books at a 1.x speed.

What if we compared the physical book Pragmatic Programmer 2nd Edition against its audiobook counterpart?

We know the audiobook has a value of 595. Let's quickly run the calculation for the book.

given:
- 352 pages (p)
kcv = (p * wpp) / rwpm
kcv = (352 * 250) / 150
kcv = 586.66
Enter fullscreen mode Exit fullscreen mode

This is not a perfect comparison, but you can see we have a similar value for the book (586) and audiobook (595).

Outcomes

Add weight to the classified learning events. I am going to assume that the start and finish of an event denote a single value and ignore the distribution kcv over time (future post). In addition count a Tweet as a static value of 2. I have played with the idea of scrapping referenced URLs and creating a value, but that is beyond the scope of today.

Extracting Density

Instead of just adding one to each classified event, we are going to slightly change our data model and create a new extractor that will parse the kcv value for each event.

# tweet_event_model
from typing import List

class TweetEventModel:
    hashtags = None
    text = None

    def __init__(self, hashtags: List, text: str):
        self.hashtags = hashtags
        self.text = text
Enter fullscreen mode Exit fullscreen mode

I pushed the classification code into its own extractor class - classification_extractor.py. Then based upon the file classify-tweets-for-this-year.py, I created a new script:

# tweet-density-for-this-year.py
def reduce_classifications(result: dict, tweet_event: TweetEventModel) -> dict:
    classification_extractor = ClassificationExtractor(tweet_event)
    classification = classification_extractor.classify()

    result[classification] = result[classification] + 1

    return result


def hashtags_from_tweet(tweet: dict):
    return [hashtag_entity['text'].lower()
            for hashtag_entity in tweet['tweet']['entities']['hashtags']]


def full_text_from_tweet(tweet: dict):
    return tweet['tweet']['full_text']


if __name__ == "__main__":
    with open(DATA_SEED_TWITTER_PATH) as data_seed:
        data = json.load(data_seed)

    time_extractor = TimeExtractor(data)
    tweets_from_this_year = time_extractor.tweets_for_year(current_year)

    tweet_events_from_this_year = [
        TweetEventModel(hashtags_from_tweet(tweet), full_text_from_tweet(tweet))
        for tweet in tweets_from_this_year]

    classified_tweets = reduce(reduce_classifications, tweet_events_from_this_year, classifications)

    for key, value in classified_tweets.items():
        print(f'{key}: {value}')
Enter fullscreen mode Exit fullscreen mode

This gives us the same results as the classification script but utilizes a model instead of a list and the classification extractor. Now we need to make adjustments to the reduce_classification function to account for our weighted time annotations.

First, we need to do some string manipulation to find the annotation and its value:

if "^" in tweet_event.text:
    kcv_annotation = tweet_event.text[tweet_event.text.index("^") + 1:].split("\n")[0]
Enter fullscreen mode Exit fullscreen mode

Next, depending upon the ending we need to either take the int value of the result (minus the unit) or perform our calculation for books.

if kcv_annotation.endswith("m"):
    kcv_value = int(kcv_annotation.replace("m", ""))
elif kcv_annotation.endswith("p"):
    pages = int(kcv_annotation.replace("p", ""))
    kcv_value = (pages * WORDS_PER_PAGE) / READ_WORDS_PER_MINUTE
Enter fullscreen mode Exit fullscreen mode

The full script can be found here: tweet-density-for-this-year

Conclusion

By applying a density calculation to the journaled events, we are able to quantify the value. In comparison to the "raw" count, we end up with a much different distribution of learning events over the year.

2020 Tweet Density (kcv)

  • Engineering: 2783
  • Agile: 9557
  • Leadership: 3199
  • Other: 1414.33

Alt Text

Pushing everything into a time variable lets us quantify the learning data in ways that are easy for the human mind to process. The covid-19 pandemic has been difficult on us all. I found myself indulging more in my family and video games, as opposed to my career, than in previous years. Given there are 1440 minutes in a day, we can do a raw comparison of "days" spent Sharpening the Saw.

Time Spent Learning

  • 2020: 11.77 days
  • 2019: 14.72 days

Top comments (0)