Justin L Beall

Posted on Dec 14, 2020

Social Learning Journal - Density

#python #kcv #weightedtime #learningjournal

In my previous post, we were able to create a classifier to see how the number of learning events was distributed for the year, Social Learning Journal - Classification.

This is a good start, but not all events are created equally. So this leads us to the question of how to evaluate the density of each event. As I have pondered this with the progression of this project, I have landed on a weighted time value - Knowledge Consumption Velocity (kcv). This is a made-up unit by which each various medium can be converted into. In future posts, I will go into detail about each learning medium, but for now, let's quickly look at one example.

Book vs AudioBook Example

How could we compare a book that is read to another event, like listening to a book?

Book

We have a constant, the number of pages in the book. In the above picture, this is annotated as "^432p". I utilize the ^ as an indicator of density. The unit of p stands for "pages". So given this event, a 432-page book was finished.

Our other two variables are a bit more subjective (words per page and reading speed). Publishers have a recommended number of words per page, which is 250-300 words per page. Additionally, we need to assume the average reading speed, around 150 words per minute for a non-fiction technical book. With this information, we are able to do a little bit of math to calculate a weighted time value (aka kcv).

given:
- 432 pages (p)
assumed:
- 250 words per page (wpp)
- 150 read words per minute (rwpm)

kcv = (p * wpp) / rwpm
kcv = (423 * 250) / 150
kcv = 705

With our above example, we land on a value of 705, which is an estimate of the weighted time that can be used to compare to other events.

Audiobook

Audiobooks are a little easier. The duration of the production is constant here. This event is annotated as "^595m". The unit here is m which stands for minutes. Given this event, we have finished a 595-minute audiobook. We assume this is our kcv value.

Comparison

We can now compare the book and audiobook based upon a shared variable. This calculation is not perfect, and tweaking the book variables slightly can produce vastly different results. In addition, sometimes people listen to books at a 1.x speed.

What if we compared the physical book Pragmatic Programmer 2nd Edition against its audiobook counterpart?

We know the audiobook has a value of 595. Let's quickly run the calculation for the book.

given:
- 352 pages (p)
kcv = (p * wpp) / rwpm
kcv = (352 * 250) / 150
kcv = 586.66

This is not a perfect comparison, but you can see we have a similar value for the book (586) and audiobook (595).

Outcomes

Add weight to the classified learning events. I am going to assume that the start and finish of an event denote a single value and ignore the distribution kcv over time (future post). In addition count a Tweet as a static value of 2. I have played with the idea of scrapping referenced URLs and creating a value, but that is beyond the scope of today.

Extracting Density

Instead of just adding one to each classified event, we are going to slightly change our data model and create a new extractor that will parse the kcv value for each event.

# tweet_event_model
from typing import List

class TweetEventModel:
    hashtags = None
    text = None

    def __init__(self, hashtags: List, text: str):
        self.hashtags = hashtags
        self.text = text

I pushed the classification code into its own extractor class - classification_extractor.py. Then based upon the file classify-tweets-for-this-year.py, I created a new script:

# tweet-density-for-this-year.py
def reduce_classifications(result: dict, tweet_event: TweetEventModel) -> dict:
    classification_extractor = ClassificationExtractor(tweet_event)
    classification = classification_extractor.classify()

    result[classification] = result[classification] + 1

    return result


def hashtags_from_tweet(tweet: dict):
    return [hashtag_entity['text'].lower()
            for hashtag_entity in tweet['tweet']['entities']['hashtags']]


def full_text_from_tweet(tweet: dict):
    return tweet['tweet']['full_text']


if __name__ == "__main__":
    with open(DATA_SEED_TWITTER_PATH) as data_seed:
        data = json.load(data_seed)

    time_extractor = TimeExtractor(data)
    tweets_from_this_year = time_extractor.tweets_for_year(current_year)

    tweet_events_from_this_year = [
        TweetEventModel(hashtags_from_tweet(tweet), full_text_from_tweet(tweet))
        for tweet in tweets_from_this_year]

    classified_tweets = reduce(reduce_classifications, tweet_events_from_this_year, classifications)

    for key, value in classified_tweets.items():
        print(f'{key}: {value}')

This gives us the same results as the classification script but utilizes a model instead of a list and the classification extractor. Now we need to make adjustments to the reduce_classification function to account for our weighted time annotations.

First, we need to do some string manipulation to find the annotation and its value:

if "^" in tweet_event.text:
    kcv_annotation = tweet_event.text[tweet_event.text.index("^") + 1:].split("\n")[0]

Next, depending upon the ending we need to either take the int value of the result (minus the unit) or perform our calculation for books.

if kcv_annotation.endswith("m"):
    kcv_value = int(kcv_annotation.replace("m", ""))
elif kcv_annotation.endswith("p"):
    pages = int(kcv_annotation.replace("p", ""))
    kcv_value = (pages * WORDS_PER_PAGE) / READ_WORDS_PER_MINUTE

The full script can be found here: tweet-density-for-this-year

Conclusion

By applying a density calculation to the journaled events, we are able to quantify the value. In comparison to the "raw" count, we end up with a much different distribution of learning events over the year.

2020 Tweet Density (kcv)

Engineering: 2783
Agile: 9557
Leadership: 3199
Other: 1414.33

Pushing everything into a time variable lets us quantify the learning data in ways that are easy for the human mind to process. The covid-19 pandemic has been difficult on us all. I found myself indulging more in my family and video games, as opposed to my career, than in previous years. Given there are 1440 minutes in a day, we can do a raw comparison of "days" spent Sharpening the Saw.

Time Spent Learning

2020: 11.77 days
2019: 14.72 days

DEV Community

Social Learning Journal - Density

Book vs AudioBook Example

Book

Audiobook

Comparison

Outcomes

Extracting Density

Conclusion

2020 Tweet Density (kcv)

Time Spent Learning

Top comments (0)

Read next

Como identificar se a automação está executando localmente ou no BotCity Orquestrador?

How to hide files or data in a JPEG Image

JSON in data science projects: tips & tricks

Daily code 56 | Fibonacci Sequence Generator