DEV Community

Cover image for Wordcount for huge data
Mukit, Ataul
Mukit, Ataul

Posted on

Wordcount for huge data

If we are to process unique word counts for input data (read many sentences), the solution is pretty straightforward using a map. Javascript example below:

Map Solution

However, if the number of unique words is huge and there are memory limitations, then finding an optimal solution becomes difficult. In that case we can go with a tree traversal route to save memory. Say, we have a to find the count of few words from input data. Words are "Stack", "Stick", "Stuck", "The", "Then". We can form a multi node tree where the leaf nodes will contain the count. Illustration below:

Binary Tree

As you can see, instead of having stuck, stack, stick, stuck as separate elements in the map, we are creating a tree to save some letters (bytes) being reused.

Disclaimers note: I don't have the exact algorithms in my head as code yet :) and the representation of "the" and "then" in the image is not exactly right.

Thanks Joynal (https://www.linkedin.com/in/joynaluu/) for bringing up the topic to ponder upon in social media.

signing off,
Mukit

Heroku

Build apps, not infrastructure.

Dealing with servers, hardware, and infrastructure can take up your valuable time. Discover the benefits of Heroku, the PaaS of choice for developers since 2007.

Visit Site

Top comments (0)

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay