DEV Community: Itsdru

Kenyan Small-Medium Enterprises & The Digital Divide

Itsdru — Wed, 07 Feb 2024 08:09:48 +0000

Is there a digital divide between SMEs and digital business tools in Kenya?

While Kenya boasts a vibrant entrepreneurial spirit, many small and medium-sized enterprises (SMEs) remain stuck in the paper age. This "digital divide" between awareness and adoption presents a crucial question: could bridging this gap unlock the key to Kenyan SME growth? Other factors considered.

My recent participation in the Google Hustle Academy's 5-Day Bootcamp for Kenyan Small Businesses opened my eyes to this stark reality. From the conversations I picked on, they opened my eyes to a vast digital divide that separates the sophisticated business tools offered by tech companies and the everyday needs of small and medium-sized enterprises (SMEs) in Kenya.

While I, a data and workflow automation enthusiast, envision building solutions to optimize business or organization operations, the reality was far more grounded – many small businesses still rely on paper records, or even lack any formal record-keeping system at all. Whether this is by design or lack of knowledge of existing tools, I do not know.

This disconnect sparked a crucial question: are business tool providers failing to reach this critical segment of the Kenyan market? SMEs are the backbone of any economy, and Kenya is no exception. They contribute significantly to GDP and employment, and their success hinges on their ability to adapt to the digital landscape.

Looking at the goals of business owners, at least during the bootcamp, which I guess is also the reality on the ground, majority if not all businesses are looking to: increase sales, access capital, branding, expand their operations whether through growth or scaling. So, definitely integrating business tools in their operations would come in handy; tools that include a selection of customer relationship management (CRM) software, data analytics tools, basic accounting software, online ordering software, social media marketing tools, digital payment systems, review and information platforms, etc.

Well, it is not all gloomy as a number of businesses had already implemented some of these systems and were looking to tweak them some more. Even more interesting is the fact that some business owners mentioned they had implemented some of the tools during the bootcamp and seen some positive change. So it is not all gloom but the question still lingers, is this the case of digital divide or am I wrong?

So, why this disconnect? While cost remains a significant barrier for many SMEs operating on tight margins, the lack of awareness about available tools plays an equally important role in hindering their digital adoption. Unlike advanced economies where having internet access was once a luxury, many Kenyan SMEs remain unaware of the vast possibilities offered by modern business tools. Stuck in their established, albeit outdated, systems, they are missing out on the transformative potential of digital solutions. This creates a significant knowledge gap that needs to be addressed.

To effectively bridge this gap, we need to move beyond solely cost-related concerns. While affordability of these tools is crucial, let's not underestimate the power of targeted marketing strategies. Are current campaigns truly reaching this segment of the market? Do they resonate with the specific needs and concerns of smaller, less tech-savvy businesses? Perhaps a shift towards localized messaging, showcasing success stories of similar Kenyan SMEs who have embraced digitalization, could bridge the understanding gap and encourage wider adoption. By demonstrating the tangible benefits and impact of these tools, we can empower SMEs to move beyond outdated systems and embrace the opportunities of the digital age.

Remember these businesses don't need to be convinced that digital tools exist, they need to see how these tools can solve their specific problems. It's not just about providing knowledge, it's about demonstrating the value and making it attainable.

Bridging the digital divide for Kenyan SMEs isn't just about individual business success; it's about unlocking the collective potential of a vibrant and entrepreneurial sector. By addressing the knowledge gap, making tools more accessible, and tailoring marketing strategies, we can empower these businesses to thrive in the digital age. Let's not leave them behind with pen and paper, but equip them with the tools they need to write their own stories of success.

Exploring the Possibilities with Andrew Muhoro.

Python(/Programming Languages) and Human Languages(English)

Itsdru — Tue, 30 Jan 2024 08:58:42 +0000

What makes Python so similar to the language we use every day?

We know Python is a general purpose programming language. Its design philosophy revolves around readability and indentation, space at the beginning of a line. At its core, it is like Python whispers a familiar tune, echoing the very language we use to navigate the world – human language.

Human languages are “Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa! A means to communicate to create understanding." This is according to two people, Laura and Dan, sitting right here at the time of writing this. That is the best I can do on explaining what human languages are. I know you know what a human language is by the virtue of you as a human being on earth reading this.

Aliens on the other hand??? Eeeeeeeeeeeeh yup!!!

Python is considered to be a beginner friendly programming language as it is easy to read and understand the flow of execution especially for English speakers. Not sure whether it is the case in other languages? Now looking at it, it is ironic that the majority of the programming languages I have come across are oriented to English speakers. For non-English speakers, what is your experience with programming languages in your language? I started here myself, Towards a Universal Python: Translating the Natural Modality of Python into Other Human Languages.

For the sake of my laziness in having to rewrite this article so far, we will use the English language as a representative of human languages. Do forgive me!

If you think about it, any human language somewhat, be it English, Swahili, or Mandarin, rests on a foundation of building blocks – letters, numbers, and symbols. They are the basic foundation of building meaning using that language. Python, too, dances to this. Its fundamental units, the data types like integers, floats, and strings, act as its alphabet, forming the bedrock of complex programs. As we all know programming languages are built to process data into useful information which comes in the forms of bits,0s and 1s, state of on and off.

Human Language Building Blocks
Datatypes:
1. Integer - 4, 99, 100
2. Float - 4.0, 99.78, 100.29
3. String - “Let Us”
4. Boolean - True(1), False(0)

Human languages take these seemingly simple elements and turn them into words, sentences and stories giving meaning. This meaning is given by the characters mixed together to form words that are then structured according to the language’s syntax and semantics. Python on the other hand is built around the same structure where instructions that give meaning to the data types are built around the data type following the language’s syntax and semantics.

So can I go ahead and say that these two share the following concepts but not limited to:

Building Blocks - Just like any alphabet/symbols/numerals constructs words, Python's data types (int, float, string) serve as the fundamental elements for building complex instructions.
Meaning & Structure - Similar to how words combine with syntax and semantics to convey meaning in human languages, Python follows a specific grammatical structure (syntax) and rules of interpretation (semantics) to give meaning to the combinations of data types and instructions.
Flow of Execution - The emphasis on readability and the clear flow of execution in Python mirrors how human languages strive for clarity and logical progression in spoken or written communication.
Abstraction - Python, and many other high-level languages, utilize abstraction to hide the underlying machinery of bits and bytes, allowing programmers to focus on the bigger picture logic, just like human languages allow us to express complex ideas without dwelling on the mechanics of sound or symbols.
Evolution - Both human languages and programming languages evolve over time. New words and expressions arise in languages, while features and libraries are added to programming languages. Exploring the parallels in how these changes occur and impact usage and understanding could be another interesting angle.
Limitation - Despite the similarities, it's important to acknowledge the differences. Human languages have ambiguity and nuance that programming languages struggle with. Conversely, programming languages offer precision and determinism that are often absent in human communication.

So, is Python truly a linguistic twin to the human languages? Perhaps not a perfect mirror image, but certainly a close cousin.

I could be wrong but I think I am onto something. It's a conversation worth having, reminding us that even in the cold logic of machines, there beats a familiar rhythm.

Exploring the Possibilities: Andrew Muhoro.

Streamlining Your Feedback Workflow: An Automated Solution

Itsdru — Wed, 08 Nov 2023 10:18:46 +0000

In today's fast-paced world, efficiency is key, especially when it comes to handling customer feedback. We've all been there, managing feedback forms, deciding where they should go, and storing valuable data. But what if I told you that there's a way to streamline this process and make your life easier?

In this article, we'll explore a simple yet powerful implementation of an automated workflow for a feedback form. Our mission is clear: save all submissions to a database and send relevant details to a Slack channel, based on the type of feedback.

Here are the tools we'll be using:

Automation: Perceptif AI
Form: Jotform
Database: Supabase
Communication: Slack

Our workflow, created in Perceptif AI, is triggered by a submission made in the Jotform form. But here's where the magic happens. We've integrated a decision-making node into our workflow. This node assesses the feedback type and makes a crucial choice: should we send it to a Slack channel, or should we leave it as it is?

The next step involves sending the submission to a table in a Supabase database, where we can safely store and organize the data. The entire process is beautifully demonstrated in this video.

If you're intrigued and eager to explore the platform we're building, don't hesitate to reach out. You can schedule a demo by following this link, Schedule Demo Form.

We're excited to show you how this solution can streamline your operations and elevate your business to new heights. Join us on this journey of process mining, automation, and business process optimization. Let's make your workflow work for you!

My Current Data Project Workflow: Simplifying Complexity with a Touch of Humor

Itsdru — Wed, 23 Aug 2023 07:00:00 +0000

As the curtains draw back on the realm of data projects, I find myself standing at the intersection of experience and reflection. A simple thought crossed my mind: how can I make this journey even better? With an imaginary cup of coffee in one hand and a notepad in the other, I embarked on a quest to review my data project routine, dissecting each step to reveal its strengths, shortcomings, and the potential pitfalls that often hide in the shadows.

To make it even more interesting, I will be dissecting a case example where a client seeks to unravel the story hidden in their E-commerce business data.

It's essential to note that these steps are not standalone; they interact and iterate with each other in a dynamic dance of refinement and improvement. Each phase influences the other, creating a continuous loop of enhancement and adaptation.

This iterative nature ensures that insights gained from one step can ripple through the entire workflow, sparking adjustments and enhancements elsewhere. As one progresses, the synergy between these steps becomes evident, enriching the overall quality and depth of the data projects.

Time to put on our data detective hats!

A Walkthrough of My Data Project Workflow

1. Project Introduction: The Spark of Possibility

Every journey begins with that first spark, the client reaching out with their data enigma. Our paths converge, each with its unique mysteries to unravel.

2. Deep Dive Discussion: Navigating the Landscape

A collaborative dance of conversations, where the intricacies of the data landscape are mapped out. Like cartographers of information, we navigate challenges and chart solutions.

3. Blueprint Creation: Sketching the Masterplan

In the solitude of preparation, I sit down to craft the masterplan. It's the blueprint that guides our actions and charts our course.

4. Client Synergy: Transforming Plans into Reality

A client rendezvous, a moment of truth. Ideas merge, and the plan inches closer to reality. It's the bridge between concept and execution.

5. Execution: Breathing Life into the Blueprint

The stage is set, and the script is in hand. Code is written, algorithms executed, data's transformation begins.

6. Pipeline Development: Laying Digital Foundations

Data pipelines weave like a digital tapestry. Yet, the challenge lies in their seamless integration. Like architects, we build these invisible bridges.

7. Data Understanding: The Core Unveiled

Data's anatomy is unveiled, its structure and intricacies laid bare. In this process, we uncover insights hidden within the layers.

8. Filtering and Enhancement: Crafting the Gem

Raw data, a mine of potential. But like mining, it requires sifting through the rubble to reveal the gem within.

9. Analysis: The Alchemist's Touch

Data is raw material. Analysis, the alchemy that turns it into gold. Insights shimmer in the crucible of statistical exploration.

10. Dashboard Design: Visual Tales Unfold

Numbers have stories. Dashboards are the storytellers, presenting data as visual narratives that resonate with clients.

11. Creativity Unleashed: Designing Data's Home

Dashboards are not just data displays; they're the homes where numbers find meaning. Design choices mirror the data's essence.

12. Client Preview: The Dress Rehearsal

A moment of truth, clients review the draft. Feedback and iterations, the symphony of improvements begins.

13. Refinement: Polishing the Gem

Revisions refine the data gem. It's the quest for perfection that guides us through the final stages.

Identifying Pitfalls and Seeking Improvement

In this journey, I've observed moments that shimmer with efficiency and steps that occasionally stumble. Pitfalls, though often hidden, are stepping stones to progress. One notable aspect is the substantial time invested in building data pipelines and data preprocessing. While these steps are crucial, streamlining them might offer more room for the creative and analytical processes.

Moreover, ensuring seamless communication with clients at each stage can help preempt misunderstandings and fine-tune the direction. A tighter integration of storytelling techniques within the data analysis and dashboard creation phases can also elevate the final presentation, making data more relatable and impactful.

As the curtain falls on this introspection, I stand poised to elevate my data project routine, to transform challenges into opportunities, and data into insights. After all, it's in these evolutions that true mastery emerges.

And now, a dry joke to wrap it up:
Why did the data analyst stay calm during the storm? Because he knew it was just a data set in a sea of numbers!

Exploring the Possibilities: Let's Collaborate on Your Next Data Venture! You can check me out at this Link.

Case Example: Elevating Customer Insights for E-Commerce Success

Itsdru — Wed, 23 Aug 2023 07:00:00 +0000

Introduction: A Retailer's Dilemma
Step into the world of an e-commerce retailer facing a perplexing puzzle of customer behaviors. Despite their vigorous e-marketing efforts, their online sales remain frustratingly low. The client is on a quest to uncover the elusive reasons behind this discrepancy, seeking the key to unlock their website's true potential.

This is the story of their journey and how I am about to apply my data project workflow to uncover insights that will fuel their understanding of their operations.

1. Project Introduction: The Call That Sparks Possibility

The retailer's call reverberates with the struggle an e-commerce endeavour grappling with the intricacies of customer data. With that call, the wheels of possibility begin to turn.

The retailer shares their story, describing the journey that led them to this crossroads. I listen keenly, probing for every detail. The challenge is intriguing, and I'm eager to dive into the data to uncover the hidden truths.

2. Deep Dive Discussion: Deciphering the Shopping Landscape

In a virtual rendezvous, the retailer and I immerse ourselves in the intricacies of their business. We delve into the granular details of customer data, searching for patterns and insights that lie beneath the surface.

By the end of this discussion, I've developed a nuanced understanding of the retailer's unique challenges. Their objectives and goals are etched into my mind as I mentally piece together the fragments of the puzzle.

3. Blueprint Creation: Mapping the Insight Pathway

Alone in the realm of planning, I set the stage by sketching a comprehensive blueprint. I outline the systematic steps needed to unearth the coveted insights and identify the data sources that will fuel this journey.

Though a work in progress, this blueprint serves as my guiding compass. With each element, I grow increasingly confident in my ability to navigate this complex data terrain.

4. Client Synergy: Breathing Life into the Blueprint

In another virtual rendezvous, the blueprint takes center stage. The retailer's excitement mirrors my own as we explore the possibilities my ideas bring forth.

Together, we refine the blueprint, aligning it with their vision. We discuss timelines, budgets, and set the stage for a collaborative journey to uncover the hidden treasures within their data.

5. Execution: Turning Plans into Data Transformation

With the blueprint in hand, the data transformation begins. I collect the necessary data from their website, carefully preparing it for analysis.

Data pipelines come to life, seamlessly channeling information from the retailer's website to my analytical tools. This ensures a constant flow of fresh data, enabling agile and informed decision-making.

6. Pipeline Development: Building Data Highways

Data pipelines become our virtual construction sites. Like tributaries merging into a river, datasets converge to create a stream of information that guides us toward clarity.

The pipelines I construct form the backbone of our analysis, ensuring that we have access to the most up-to-date data. These data highways empower me to derive meaningful insights.

7. Data Understanding: Peeling Layers of Customer Patterns

Layer by layer, the intricate patterns of customer behavior are revealed. Their preferences and digital footprints guide us closer to understanding their motivations.

As I delve into the data, I begin to unravel the narrative of the retailer's customers. I identify trends, preferences, and critical touchpoints that shape their online journey.

8. Filtering and Enhancement: Crafting Customer Insights

Just as an artisan shapes raw stone into a gem, I refine the raw data. I filter out noise, extract valuable signals, and unveil the glittering insights hidden within.

The data enhancement process ensures that the insights we extract are accurate and meaningful. It's akin to sculpting, revealing the true essence of customer behavior.

9. Analysis: Unveiling Shopping Patterns

Like a detective deciphering a cryptic message, I analyze the data. Shopping patterns emerge a tale woven from clicks, selections, and purchases.

With rigorous analysis techniques, I bring order to the data chaos. The patterns that surface hold the keys to understanding customer behavior and, subsequently, the keys to improving sales.

10. Dashboard Design: Visual Narratives Emerge

Numbers transition into narratives as dashboards transform data into visual stories. These canvases illustrate the customer's journey, painting vibrant tales of interactions and decisions.

Through thoughtful dashboard design, I create a visual symphony that resonates with the retailer. The insights embedded in these visuals bridge the gap between data and action.

11. Creativity Unleashed: Crafting Data's Visual Symphony

Dashboards are more than a collage of graphs, they're an artistic symphony. Design choices harmonize, amplifying the resonance of the insights and enriching the data narrative.

The creative process of dashboard design merges data and aesthetics. Colors, layout, and visual representations blend seamlessly to evoke understanding and engagement.

12. Client Preview: Revealing the First Act

As the curtains rise on our insights, the retailer witnesses the first act of their data-driven success story. Draft dashboards are unveiled, offering a glimpse into the treasure trove of insights.

I present the draft dashboards to the retailer, their eyes lighting up as they absorb the revelations. This initial presentation marks the beginning of their journey toward data-enriched decision-making.

13. Refinement: From Rough Draft to Masterpiece

Feedback becomes the guiding star as refinement takes center stage. The symphony crescendos, each iteration honing the dashboards into a masterpiece of actionable insights.

The retailer's input guides me as I refine the dashboards, ensuring that the insights are crystal clear. The journey from rough draft to polished final form mirrors the data's transformation.

Pitfalls and Lessons

In this data voyage, potential pitfalls lurk amidst the excitement. One challenge lies in managing incomplete or inaccurate data, which can skew insights. Rigorous data quality checks are imperative to prevent such distortions.

Another pitfall is the risk of drawing erroneous conclusions. Approaching analysis with a balanced perspective and corroborating findings through various techniques can counteract this.

Conclusion: A Symphony of Retail Insights

In the end, this data project is about transforming data into insights and insights into actions. By following a structured workflow, I aim to provide the retailer with the information they need to improve their e-commerce performance.

Exploring the Possibilities: Let's Collaborate on Your Next Data Venture! You can check me out at this Link.

A Short Story on Process Mining

Itsdru — Thu, 27 Jul 2023 08:51:03 +0000

Once upon a time in the bustling world of modern business, there was a powerful technique known as "Process Mining." This remarkable method held the key to unlocking hidden insights within an organization's processes, enabling them to embark on a journey of growth and improvement.

At the heart of Process Mining were event logs and data, which acted as the building blocks of its magical capabilities. These event logs, derived from various systems like ERP, CRM, and data-capturing tools, revealed the secrets of how processes were executed in the organization.

Our protagonist, an e-commerce company, found themselves facing a daunting challenge. Their once-thriving business was now plagued by a decrease in sales and a surge of customer complaints. The wise management suspected that inefficiencies and bottlenecks lurked within their order fulfillment process, causing delayed shipments and leaving customers dissatisfied.

Feeling the weight of this challenge, the company turned to Process Mining for guidance. The first step on their adventure was to gather event logs from the different systems involved in the order fulfillment process – the order management system, inventory management system, shipping providers, and customer support system.

With the event logs in hand, they embarked on a journey of data preprocessing, ensuring that irrelevant and sensitive information was cast aside, leaving behind pristine data in a suitable format for analysis.

The hero's next step was process discovery, where Process Mining tools like Perceptif wove their magic, creating process flow diagrams that unveiled the intricate paths through which the order fulfillment process unfolded.

As the story unfolded, the company delved into performance analysis. Here, they unveiled the hidden bottlenecks, delays, and deviations that had been sabotaging their efficiency all along. Patterns and trends began to emerge, shedding light on the mysterious forces impacting their operations.

With determination and courage, they conducted a thorough root cause analysis, bravely drilling down into specific process steps to identify the culprits behind the delays and customer complaints. The true enemies were revealed – inventory issues, communication gaps, and pesky manual errors.

Armed with these newfound insights, they knew what needed to be done. They embarked on a quest for process optimization, devising strategies to combat their adversaries. Automation was summoned to relieve the burden of manual tasks, steps were reorganized for greater harmony, and communication channels were fortified to facilitate smoother operations.

The company's efforts did not end there. With the process now optimized, they embraced the practice of continuous monitoring. Using Process Mining techniques as their trusty compass, they kept a watchful eye on their journey, ensuring that the path they chose remained true and effective.

The tale of their bravery did not go unnoticed. Key performance indicators such as order processing time, customer satisfaction, and sales metrics bore witness to the positive impact of their efforts. As sales soared and customer complaints dwindled, the company's success echoed throughout the land.

And so, the e-commerce company's journey with Process Mining proved to be a story of transformation and triumph. With newfound insights and a commitment to improvement, they thrived in a world of competitive challenges, forever guided by the powerful technique that is Process Mining.

Start Your Journey to Process Excellence

Optimize Your Processes Today by Harnessing the Power of Process Mining!
Get In Touch!

Tales from the Technical Writing 101 Trenches: An Imaginative Approach to Technical Writing

Itsdru — Tue, 20 Jun 2023 11:39:08 +0000

I often wonder if I'm the only one whose mind immediately conjures up perplexing manuals filled with mind-boggling jargon and those oh-so-brief explanations that leave you scratching your head whenever I hear about technical writing. Unless you're a master in the field, it's like reading an alien language! But fear not, my fellow adventurer, for I have taken up the noble quest to unravel the secrets of technical writing. My hope is that, armed with this knowledge, I can banish confusion and make those complex materials delightfully simple for all. Or am I over-reaching, well there is no harm in trying.

As technology zooms ahead at the speed at which it is, it's becoming crystal clear that regular folks like you and me are diving deeper into its intricate layers. No longer confined to the realm of tech wizards, we're all venturing into the exciting world of technical know-how. So bridging the gap between complex technical concepts and user-friendly documentation is highly needed.

What does this mean?

To translate technical concepts into short and simple text explanations that means I will probably need to:

Study the basics: To master any skill, you must master the basics of that skill. This is no different, grasping the fundamental principles of technical writing is essential. Understanding the purpose of technical documentation, its target audience, and the importance of clarity and accuracy in conveying information.
Become a master in plain language: A key aspect of technical writing is communication in plain language. Key is to avoid unnecessary complex jargons, acronyms, and terminologies. Always keep the readers in mind, this means anyone can understand, regardless of their technical expertise. Opt for simple and concise wording.
Understand the User's Perspective: Wear the shoes of the user to anticipate their needs, possible questions and challenges they may have. This will help in proactively addressing them in the writings.
Breakdown Complex Concepts: This is the belly of technical writing, breaking complex concepts into the simplest forms of the concept. Peeling away the layers of complexity reveals the core essence of the concept, making it easier for you to communicate the idea and also, for the users to grasp and understand. This is a quest to simplify the seemingly insurmountable!
Test and Iterate: Now you have the document, seek feedback from users and subject matter experts. This will provide you with insights on the on clarity, comprehensibility, and usability. Based on that then you can review and improve your document. To create high quality technical documentation, always iterate and continuously improve.
Explore Tools and Resources: To make your work easier and pump out good quality work, take advantage of the numerous tools and resources available that support writing and technical writings. This could be in the form of style guides, grammar checkers, readability tools and technical writing communities or forums where you can connect with professionals in the field, learn from their experiences, and exchange insights.

Creative Approach to Technical Writing

I think technical writing doesn't have to be monotonous and dry. For a fact we know it is often associated with rigid structures and formalities. I think injecting some creativity into it can make it more engaging, memorable, and effective.

Here are some ideas I think can help and will be exploring in later articles:

Storytelling Techniques: Weave narratives to make technical content relatable and compelling.
Visualizing Complex Concepts: Utilize illustrations, diagrams, and infographics to enhance understanding.
Gamification: Incorporate interactive elements and challenges to make learning technical concepts enjoyable.
Infusing Humour and Wit: Use humor strategically to lighten the tone and engage readers.
Interactive Multimedia Presentations: Create dynamic presentations using multimedia elements like videos and interactive slides.
Step-by-Step Instructions: Break down complex procedures into easy-to-follow steps, guiding users through the process.

Conclusion

By embracing and mastering the basics of technical writing and infusing creativity into it, we can transform technical writing into a more captive way of communicating complex ideas. In a world where technology rapidly advances, the need to bridge the gap between technical concepts and user-friendly documentation becomes increasingly vital. Let's strive to make technical writing friendlier, enabling everyone to navigate the intricate layers of technology with ease. We can conquer the challenges and create a harmonious connection between users and the technical world.

How did the technical writer fix their broken computer? They turned it off and on again, then documented the entire process.

Exploring the Possibilities: Let's Collaborate on Your Next Data Venture! You can check me out at this Link.

Simplifying Data Exploration with Dexplorer

Itsdru — Mon, 15 May 2023 13:25:14 +0000

Data exploration is a vital step in the field of data science, enabling analysts to uncover patterns, relationships, and the underlying structure within datasets. However, this process often proves time-consuming and demands expertise in data analysis tools and techniques. To address this challenge, I developed a user-friendly web application called Dexplorer. With its streamlined interface and automation capabilities, Dexplorer aims to simplify and accelerate the data cleaning and exploration process.

Introducing Dexplorer: Making Data Exploration Effortless

In my quest to harness the capabilities of Streamlit and eliminate repetitive initial data exploration steps, I set out to create a straightforward, bare-bone web application. Dexplorer, my solution, seeks to automate the data cleaning and exploration process, making it more accessible to a wider audience.

You can access the Dexplorer application here.

Version 1.0 Features and Benefits

In its current version, Dexplorer offers the following key features:

Data Upload and Preview: Easily upload and preview data in various formats, including CSV and Excel.
Basic Data Insights: Gain quick insights into the uploaded data, such as the number of rows, columns, missing values, and duplicates.
Row Previews: View a sample of the first and last rows of the dataset, providing a glimpse into the data's structure.
Column Overview: Get an overview of data types, missing values, and column names presented side by side for easy reference.
Descriptive Statistics: Obtain summary statistics for selected numeric columns, providing a deeper understanding of the data's distribution and characteristics.
Data Sampling and Manipulation: Sample a percentage of the dataset, drop unnecessary columns, and select and order specific fields for customized data exploration.
Download Processed Data: Download the processed data as a CSV file, allowing for further analysis or sharing with colleagues.

Enhancements and Future Plans

While Dexplorer's current version provides a basic yet powerful set of features, I have plans to expand its capabilities in future iterations. I aim to address more advanced data exploration techniques and incorporate user feedback to improve the application's functionality and usability.

Try Dexplorer Today

You can access the live version of Dexplorer hosted on Streamlit's cloud deployment here. I welcome your ideas and suggestions for future enhancements. Feel free to reach out and share any features or improvements that you believe would add value to Dexplorer.

By simplifying the data exploration process, Dexplorer empowers analysts of all skill levels to gain meaningful insights from their datasets efficiently and effectively. Discover the power of automated data exploration today with Dexplorer!

Exploring the Possibilities: Let's Collaborate on Your Next Data Venture! You can check me out at this Link.

A Friendly Data Science Workflow

Itsdru — Fri, 07 Apr 2023 11:02:37 +0000

As a human being learning to build my problem-solving skills, I've found that data science projects can often feel daunting and overwhelming without a clear roadmap. However, by breaking down the project into smaller steps and following a simple workflow, I've discovered that the process becomes more manageable and less intimidating.

From conception to completion, the steps involved in solving a problem in data science are iterative, much like in other areas. But the key to success is having a clear understanding of the problem you want to solve, the data you need, and the tools you'll use to analyze and model that data.

One powerful approach to problem-solving is first-principle thinking, which involves breaking down a problem into its fundamental elements and reasoning from those basic principles. By taking this approach, you can develop a deeper understanding of the problem and identify more effective solutions.

But first-principle thinking is just one part of a successful data science workflow. It's also important to have a clear plan for data collection, cleaning, and preprocessing, as well as a solid understanding of the tools and technologies needed to build and deploy models. Let us explore a basic workflow data science steps with an example that is hosted here:

Define the problem: Start by clearly defining the problem you want to solve and identify what you want to achieve.
We want to predict the quality of milk based on certain parameters such as fat content, pH, temperature, turbidity, etc.
Collect and clean data: Gather data from various sources and clean it so it's ready for analysis.
Gather data on milk quality from various sources such as dairy farms or milk processing plants. Clean the data to remove any missing values or outliers. In this case, we will just download a kaggle dataset that is already cleaned.

# Download the dataset
!kaggle datasets download -d harinuu/milk-quality-prediction

# Unzip the downloaded dataset
!unzip milk-quality-prediction.zip

# Load data into a dataframe
data = pd.read_csv('milknew.csv')

Analyze data: Use exploratory data analysis to find patterns and insights in the data. We can plot the distribution of milk quality scores and see if there are any correlations between the different parameters.

# plot a scatter plot to visualize any correlation between fat and color  
plt.scatter(data_cp['Fat'], data_cp['Turbidity'])
plt.xlabel('Fat Content')
plt.ylabel('Turbidity')
plt.title('Correlation between Fat and Turbidity')
plt.show()

Create features: Create new features or transform existing ones to extract more useful information.
For example, we can calculate the ratio of fat content to Turbidity to see if this has an impact on milk quality.
Train a model: Choose a machine learning algorithm, train it on the data, and evaluate its performance.
For example, we can use a random forest classifier to predict milk quality based on the parameters we've collected. We'll split the data into a training set and a testing set, and use the training set to train the model and the testing set to evaluate its performance.

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the model
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)


print("Accuracy of SVM classifier: {:.2f}%".format(accuracy_svm * 100))

Optimize the model: Fine-tune the model by adjusting its parameters to improve its performance.
For example, we can try different values of n_estimators or max_depth for the random forest classifier and see which values give the best results.
Evaluate the model: Test the model's performance on a validation dataset to make sure it can generalize well.

# Evaluate the support vector machine classifier
y_pred_svm = svm.predict(X_test)
accuracy_svm = accuracy_score(y_test, y_pred_svm)

print(classification_report(y_test, y_pred_svm))

Deploy the model: Once the model is ready, deploy it in a production environment so it can be used by others. This could involve creating a web application or integrating the model into an existing software system.

To make this workflow easier, you can use some tools like:

Virtual environment: A virtual environment is a way to create an isolated environment for your project so that the dependencies and packages you use in your project don't conflict with other projects or the system-level packages. You can create a virtual environment using tools like virtualenv, conda, or pipenv.
Requirements.txt: A requirements.txt file is a text file that lists all the packages and dependencies needed for your project. This file makes it easy for others to install and set up your project without having to manually install all the dependencies.
.gitignore: A .gitignore file is a configuration file that tells Git which files or directories to ignore when tracking changes to your project. This is useful when you have files or directories that don't need to be version controlled, such as temporary files, log files, or large data files.
Data Version Control (DVC): DVC is a version control system for data and models that works alongside Git. DVC makes it easy to track changes to your data and models, collaborate with others, and reproduce experiments. DVC also provides tools for data pipeline management, data versioning, and data storage. You can refer to this article I did on DVC.
Docker: Docker is a containerization platform that allows you to package your project and its dependencies into a container that can be run on any platform or environment. Docker makes it easy to deploy and scale your project in a consistent and reproducible way. With Docker, you can create a container image of your project that includes all the dependencies, configurations, and files needed to run it.

By combining first-principle thinking with a simple workflow and the right tools, you can approach data science projects with more confidence and focus, reducing the likelihood of giving up and increasing your chances of success.

Why don't scientists trust atoms?
Because they make up everything.

Exploring the Possibilities: Let's Collaborate on Your Next Data Venture! You can check me out at this Link

Introduction to Data Version Control

Itsdru — Tue, 28 Mar 2023 12:00:33 +0000

Introduction

While using Git I have come to learn of "Git for data", specifically Data Version Control, DVC. This is an open-source tool that works like Git to manage versioning for data science projects.

Developed by iterative to build models faster with data and experiment versioning and reproducible pipelines.

It is designed to simplify the process of tracking changes and collaborating on projects, and is increasingly becoming an essential tool for data scientists and machine learning engineers.

Why?

The main difference between Git and DVC is the purpose they both serve. Git is primarily a version control system for source code, while DVC is a version control system for data and machine learning models.

The two have a somewhat similar structure in how they are used to control versioning.

Using dvc, data experts can store and version control their datasets in a central repository, which is much like a code repository, ensuring there is seamless access to the latest project version by collaborators. This tool also allows versioning of machine learning models, which means it is easy to keep track of changes to models and experimenting with different parameters and techniques while keeping records of previous versions.

A key benefit of dvc is its seamless integration with existing machine learning frameworks like TensorFlow, PyTorch and scikit-learn, etc. Not to forget it provides a range of other useful features like data and model pipelines, automated experiments, and visualization tools. These features can be used to automate many repetitive aspects associated with data science projects.

Using for example, iterative's Studio one can automate bookkeeping tasks for example visualizing important metrics across projects, iterating faster by re-using code in a no-code environment, etc.

Example

In this example, using git we will control a Python file and also use dvc to control a data file and a trained machine learning model. We will also go step by step of how versioning works: initialize, add, commit, etc. The task instructions for both git and dvc are listed in the same block to compare the two systems.

Initialize a repository

# Initialize git repository
git init

# Initialize dvc repository
dvc init

Add file to created repository

# Add a file to the git repository
git add example.py

# Add data file to dvc repository
dvc add data_file.csv

Commit changes to repository

# Commit the file to the git repository
git commit -m "Initial commit"

# Commit data file to dvc repository
git add data_file.csv.dvc
git commit -m "Add data file to dvc repository"

Make changes

# Make changes to the file in git
echo "print('Hello, World!')" >> example.py

# Train machine learning model in dvc
python train_model.py data_file.csv

Add changes

# Add the changes to the git repository
git add example.py

# Add trained model to dvc repository
dvc add model.pkl

Commit changes

# Commit the changes to the repository
git commit -m "Add print statement"

# Commit trained model to dvc repository
git add model.pkl.dvc
git commit -m "Add trained model to dvc repository"

As observed above, even though both share a lot of similarities they have different commands and workflows tailored to the specific use case.

Conclusion

DVC is an essential tool for data scientists and machine learning engineers who are looking to streamline their workflow and collaborate effectively. It is a tool worth checking out for anyone doing data science/machine learning related projects.

Please note this is not meant to be a comprehensive knowledge check rather it is a quick run over what the tool is.

JokeofTheDay: Why did the data scientist use both Git and DVC?
Because he didn't want to get data-tached from his version control!

Exploring the Possibilities: Let's Collaborate on Your Next Data Venture! You can check me out at this Link

Getting started with Sentiment Analysis

Itsdru — Tue, 21 Mar 2023 15:33:36 +0000

Intro'

Sentiment analysis is a technique that is used to determine the emotional tone behind a particular text. For example, a business can use sentiment analysis to classify reviews as positive, negative or neutral.

Looking at online reviews, insights can be gained on the sentiment behind each review and then the common themes frequently mentioned in the reviews can be identified. Based on these insights, then a business/organisation or individuals can make informed decisions in their respective operations.

In today's world, advancement in technology has made it possible for systems to learn how to do tasks. This is through Artificial Intelligence, AI. So it is also possible to teach a system how to perform sentiment analysis getting rid of the need for repetitive analysis of the data by a human.

In this article, we will briefly go over how to get a computer to perform sentiment analysis by itself using machine learning algorithms.

Dataset

In order to do this, we will use a collection of about 1.6 million tweets. This dataset Sentiment140 is hosted on Kaggle.

The tweets in the dataset were collected in February 2009 using the Twitter API and were labeled with sentiment polarity using emoticons present in the tweets. For instance, tweets with positive emoticons like :) were labeled as positive, tweets with negative emoticons like :( were labeled as negative, and tweets without any emoticons were labeled as neutral.

The Sentiment140 dataset is commonly used in research and industry for sentiment analysis tasks due to its large size and labeled sentiment polarity. Researchers and practitioners can use this dataset to develop and evaluate machine learning models for sentiment analysis tasks, such as sentiment classification or sentiment regression.

Implementation

Similar to any data science project, there are general steps involved in performing any data analysis. In this case, here are the steps:

1. Data Collection:

Instead of downloading the data to the local machine, the dataset will be extracted from Kaggle directly into Colab where the analysis will happen.

Authenticating the Kaggle API client

# Get the username and key from your Kaggle account
os.environ['KAGGLE_USERNAME'] = "username"
os.environ['KAGGLE_KEY'] = "key"

Download and unzip the dataset from Kaggle

!kaggle datasets download -d kazanova/sentiment140

# Unzip the downloaded dataset
!unzip sentiment140

Load the downloaded dataset

tweets_df = pd.read_csv('training.1600000.processed.noemoticon.csv', encoding='latin-1')
tweets_df.head()

2. Data Pre-Processing:

Next step is to preprocess the data by cleaning it and converting it into a structured format that can be used for analysis.

# Using the .columns method insert a list of the column names
tweets_df.columns = ['target', 'id', 'date', 'flag', 'user', 'text']
tweets_df.head()

Pre-process the text column data using regular expressions to remove elements like punctuations, special characters, urls, hashtags, stop-words, usernames and convert all to lowercase.

Before making any structural changes to the dataset, I created a copy of the original dataset and are working on the copy.

# import NLTK, Natural Language Toolkit, library
# This library provides good tools for loading and cleaning text
import nltk
import re
from nltk.corpus import stopwords

nltk.download('stopwords')

stop_words = set(stopwords.words('english'))

# define a function to implement the pre-processing & cleaning of the text data
def clean_text(text):
    text = re.sub(r'http\S+', '', text)  # Remove URLs
    text = re.sub(r'@[^\s]+', '', text)  # Remove usernames
    text = re.sub(r'#([^\s]+)', r'\1', text)  # Remove hashtags
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    text = text.lower()  # Convert to lowercase
    text = ' '.join([word for word in text.split() if word not in stop_words])  # Remove stopwords
    return text

# Apply the above clean_text function to the text column values
# Drop the text column after adding the clean_text column to the dataframe
tweets_cp['clean_text'] = tweets_cp['text'].apply(clean_text)
tweets_cp.drop(['text'], axis=1)

3. Feature Extraction

After data preprocessing then convert the preprocessed text into a numerical format that can be used for analysis. This involves a technique like TF-IDF, Term Frequency Inverse Document Frequency. TF-IDF can be defined as the calculation of how relevant a word in a series or corpus is to a text.

#Convert the text data into numerical features using TF-IDF
tfidf = TfidfVectorizer(stop_words='english', max_features=5000)
X = tfidf.fit_transform(tweets_cp['clean_text'])

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, tweets_cp['target'], test_size=0.3, random_state=42)

4. Model Selection

The next step is to pick an appropriate machine learning algorithm to classify the sentiment of the tweet text. In this case we will try this with Naive Bayes.

# Train a Naive Bayes classifier on the training data
nb = MultinomialNB()
nb.fit(X_train, y_train)

# Test the model on the testing data
y_pred = nb.predict(X_test)

5. Model Training

We will train the model using the labeled training dataset that we split in the Feature Extraction.

# Test the model on the testing data
y_pred = nb.predict(X_test)

6. Model Evaluation

After training the model, we need to evaluate its performance on a test dataset(30% of the original dataset) that we split in the Feature Extraction section.

print('Accuracy:', accuracy_score(y_test, y_pred))
print('Precision:', precision_score(y_test, y_pred))
print('Recall:', recall_score(y_test, y_pred))
print('F1-Score:', f1_score(y_test, y_pred))

The model without fine turning it has an accuracy score of 75% and a precision of 75%.

Evaluation Score

Accuracy: 0.7511354166666667
Precision: 0.7564523638210522
Recall: 0.7427183457378064
F1-Score: 0.7495224455818614

7. Predict

We will try predict the sentiment of a new tweet using the model we have trained, tested and evaluated.

new_tweet = 'I hate Mondays'
new_tweet_cleaned = clean_text(new_tweet)
new_tweet_vectorized = tfidf.transform([new_tweet_cleaned])
sentiment = nb.predict(new_tweet_vectorized)[0]
print('Sentiment:', sentiment)

The model predicts the new tweet has a negative tone.

Sentiment: 0

Conclusion

Sentiment analysis can help gauge how the outside world feels about a business, product, trend and so many more. With the integration of machine learning models into such analysis, the results can be outstanding. Even with fine turning of a simple model like the one that we just built can really inform decision-making at the said entity.

You can find the model code at this Link.

Why did the sentiment analyst's computer keep crashing? It couldn't handle all the feelings.

Exploring the Possibilities: Let's Collaborate on Your Next Data Venture! You can check me out at this Link

Essential SQL Commands for Data Science Tasks

Itsdru — Tue, 14 Mar 2023 08:21:07 +0000

Introduction

SQL is a language of asking or 'requesting' a store that holds information to provide you with specific information you are looking for. A real life example that illustrates why SQL is essential is going to a library that holds over thousands of books and you are looking to read or borrow one specific book. You could go to the library and manually look for the specific book from the many shelves or you could just go to the librarian and ask them whether they have the book or point you in the right direction to locate the book.

Asking the librarian is pretty efficient and straightforward as they have access to a library system that keeps record of the books they have now, whether in the physical location or borrowed. The other way may take you days if you don't know how to locate the section that may hold the book and you may spend all that time looking for a book only to find out they don't have it. SQL in this case is the librarian you give requests to and it provides you with the information you are looking for.

SQL is essential for working with data as it makes it possible to make queries to databases that may hold a few rows or as many as millions of rows. SQL(Structured Query Language) is a programming language used for managing and manipulating data in relational databases.

SQL allows you to store, manipulate and retrieve data. Why SQL is widely used in transactional processing and analytical application include:

Inserting, updating, and deleting data from a relational database.
Describing structured data.
Building, deleting and updating databases and tables.
To establish permissions and restrictions for table columns, views and stored procedures.
Accessing data from a relational database management system.

Real Life Example of SQL at work

Imagine you have a company that sells products, and you want to keep track of your inventory. You can create a database with a table called "products" that has columns such as "product_id", "product_name", "price", and "quantity_in_stock".

To add a new product to the database, you would use an SQL INSERT statement. For example, to add a new product with product_id = 1001, product_name = "iPhone 13", price = 999.99, and quantity_in_stock = 50, you would write the following SQL statement:

INSERT INTO products (product_id, product_name, price, quantity_in_stock) 
VALUES (1001, 'iPhone 13', 999.99, 50);

To update the price of an existing product, you would use an SQL UPDATE statement. For example, to update the price of the product with product_id = 1001 to 1099.99, you would write the following SQL statement:

UPDATE products 
SET price = 1099.99 
WHERE product_id = 1001;

To retrieve information about the products in your database, you would use an SQL SELECT statement. For example, to retrieve the product_id, product_name, price, and quantity_in_stock for all products in the database, you would write the following SQL statement:

SELECT product_id, product_name, price, quantity_in_stock 
FROM products;

This is just a simple example of what SQL can do, but it should give you an idea of how it can be used to manage and manipulate data in a database.

Essential SQL Commands

Please note that this post is by no means aimed to be a comprehensive list of the commands you need to know.

Data Retrieval

SELECT - Used to retrieve data from a database.

SELECT * FROM customers;

This will retrieve all data from the customers table.

DISTINCT - Used to retrieve unique values only from a column in a table.

SELECT DISTINCT category FROM products;

This will retrieve all the unique categories from the table products.

Data Retrieval with Conditions

In this case the data retrieved meets specified condition.

WHERE - This command is used to filter data based on certain conditions.

SELECT * FROM customers 
WHERE age > 30;

This will retrieve all the data from the customers table where their age is greater than 30.

ORDER BY - This command sorts the data in a descending or ascending order.

SELECT * FROM customers 
ORDER BY age DESC;

This will retrieve all data from the customers table and order it by age in a descending order.

LIMIT - This command limits the retrieved data to the specified count.

SELECT * FROM customers 
LIMIT 10;

This will retrieve the first 10 rows from the customers table.

You can also specify the starting row for the retrieved data, using the 'offset' command.

SELECT * FROM customers 
LIMIT 5 
OFFSET 10;

This will retrieve 5 rows from the customers table starting from the 11th row.

Aggregations

Aggregations are used to get a summary of a dataset.

GROUP BY - This command groups data based on the specified criteria.

SELECT country, COUNT(*) FROM customers 
GROUP BY country;

This will retrieve the count of customers in each country.

COUNT() - This command is used to count the number of rows that meet a specific condition in a table.

SELECT COUNT(*) FROM customers 
WHERE country = 'USA';

This will return the number of rows where the customers have 'USA' as their country.

SUM() - This command is used to total the values in a specified column.

SELECT SUM(sales) FROM orders;

This will return the sum of the values in the "sales" column of the table "orders".

AVG() - This command is used to calculate the average of the values in the specified column.

SELECT AVG(salary) FROM employees;

This will return the average of the values in the "salary" column of table "employees".

HAVING - This command is used in combination with the GROUP BY command to filter out results based on a condition applying to the groups.

SELECT category, SUM(sales) FROM products 
GROUP BY category 
HAVING SUM(sales) > 1000;

This will group the rows in the "products" table by "category" and calculate the sum for each. Then the HAVING clause will filter the results to only include groups where the sum of sales is greater than 1000.

MIN() - This command is used to find the minimum value in a specified column.

SELECT MIN(price) FROM products;

This will return the minimum value in the "price" column of table "products".

Alias - This command gives a temporary name to a table of column in a query. Used to make queries easier to read or avoid naming conflicts when combining data from multiple tables.

SELECT p.product_name AS name, s.quantity AS stock 
FROM products p JOIN stock s ON p.product_id = s.product_id;

This query joins the "products" and "stock" tables using aliases to rename the "product_name" and "quantity" columns to "name" and "stock" respectively.

Joins

Joins are used to combine data from multiple tables into a single result set based on a common key shared by the tables.

INNER JOIN - Inner join returns only the rows that have matching values in both tables.

SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;

This query will return the Name of the customer, OrderID, and OrderDate for all customers who have placed an order.

LEFT JOIN - Returns all the rows from the left table and the matching rows from the right table. If there is no matching row in the right table, the result will contain NULL values for the right table columns.

SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;

RIGHT JOIN - Returns all the rows from the right table and the matching rows from the left table. If there is no matching row in the left table, the result will contain NULL values for the left table columns.

SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
RIGHT JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;

FULL OUTER JOIN - Returns all the rows from both tables, including those with no matching rows in the other table. If there is no matching row in one of the tables, the result will contain NULL values for the columns of the other table.

SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
FULL OUTER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;

Create Database

To create a database in SQL, you can use the CREATE DATABASE statement followed by the database name.

CREATE DATABASE my_database;

This will create a database named "my_database".

Create Table

To create a table in SQL, you can use the CREATE TABLE statement followed by the table name and column definitions.

CREATE TABLE Customers (
CustomerID int,
Name varchar(255),
Address varchar(255)
);

This will create a table named "Customers" with columns for CustomerID, Name, and Address.

Change Data Types

To change the data type of a column in SQL, you can use the ALTER TABLE statement followed by the table name and column definition.

ALTER TABLE Customers
ALTER COLUMN CustomerID varchar(50);

This will change the data type of the CustomerID column in the Customers table from int to varchar(50).

Complex Conditions

SQL supports complex conditions using logical operators such as AND, OR, and NOT. You can also use parentheses to group conditions.

Suppose we have a table named Products with columns for ProductID, ProductName, Category, and Price. We want to retrieve all products in the "Electronics" category that are either priced at $100 or less or have "Discounted" in their product name.

SELECT *
FROM Products
WHERE Category = 'Electronics'
AND (Price <= 100 OR ProductName LIKE '%Discounted%');

This query will return all products in the "Electronics" category that are either priced at $100 or less or have "Discounted" in their product name.

Conclusion

SQL is an essential tool for data scientists and data analysts alike. With its ability to manipulate and retrieve data, SQL is indispensable for managing and analyzing large datasets. By mastering the essential SQL commands, data scientists can more effectively and efficiently work with relational databases, providing insights that can drive critical business decisions.

This is just a tip of the iceberg on what you can do with SQL and this is definitely not a comprehensive list of commands.

In conclusion, we wonder why the SQL query crossed the road. Only to find out it went to get the other SELECT-ion.

Exploring the Possibilities: Let's Collaborate on Your Next Data Venture! You can check me out at this Link