DEV Community: Catherine Kawara

Getting started with sentiment analysis

Catherine Kawara — Tue, 21 Mar 2023 12:11:16 +0000

Sentiment analysis is a technique used to determine the emotional tone of a piece of text. It involves the use of natural language processing, machine learning, and other computational methods to classify text into categories such as positive, negative, or neutral.
Sentiment analysis is used in a wide range of applications, including social media monitoring, customer feedback analysis, and market research.
In this article, we'll provide an introduction to sentiment analysis and walk you through the steps to get started.

1. Define Your Problem

What type of text do you want to analyze, and what do you want to learn from it? Some common applications of sentiment analysis include:

Social media monitoring: Analyzing tweets, Facebook posts, and other social media content to track brand sentiment and customer opinions.
Customer feedback analysis: Analyzing customer reviews, surveys, and other feedback to identify areas for improvement.
Market research: Analyzing news articles, blogs, and other content to identify trends and track sentiment about particular products or companies.

Once you've defined your problem, you can begin to gather the data you need to perform sentiment analysis.

2. Gather Data

You'll need a dataset of text that you want to analyze. There are several ways to gather data for sentiment analysis. You can collect data manually by downloading social media posts or customer reviews, or you can use an API to automatically collect data from social media platforms or review sites.

3. Preprocess Data

Preprocessing involves cleaning the text data and preparing it for analysis. Common preprocessing steps include:

Lower casing - each text is converted to lowercase.
Removing any unnecessary information, such as URLs and special characters.
Removing stop words: Stop words are common words such as "and" and "the" that don't carry much meaning. Removing them can improve the accuracy of your analysis.
Tokenization: Tokenization involves breaking the text into individual words or phrases, which can then be analyzed separately.
Stemming: Stemming involves reducing words to their base form (e.g., "running" becomes "run"). This can help to reduce the number of unique words in the dataset, making analysis easier.

4. Choose a Sentiment Analysis Tool

Once you've preprocessed your data, you can begin to perform sentiment analysis. There are several tools available for sentiment analysis, including:

Rule-based systems: Rule-based systems use predefined rules to classify text into positive, negative, or neutral categories. These systems can be useful for simple analyses, but they may not be accurate for complex data.
Machine learning systems: Machine learning systems use algorithms to learn from data and classify text based on patterns. These systems can be more accurate than rule-based systems, but they require more training data.

5. Analyze Your Data

The final step is to analyze your data. Depending on the sentiment analysis tool you've chosen, you may be able to get a simple positive/negative/neutral classification for each piece of text, or you may be able to get a more detailed analysis that includes information such as sentiment intensity and topic analysis.

In conclusion, sentiment analysis can be a powerful tool for understanding the emotional tone of text data. By defining your problem, gathering data, preprocessing the data, choosing a sentiment analysis tool, and analyzing the data, you can gain valuable insights into customer opinions, brand sentiment, and market trends.
With the availability of open-source libraries like NLTK, spaCy, and TextBlob, it's easier than ever to get started with sentiment analysis.

For a practical example of sentiment analysis using Python and the NLTK library, check out this Github repository.
I have perfomed Sentiment Analysis on tweets fetched from the sentiment140 dataset using Python's NLTK library, Pandas for data manipulation, Matplotlib for data visualization, and Scikit-learn for building machine learning models.

Till next time, happy coding!✌️

Essential SQL Commands for Data Science

Catherine Kawara — Wed, 15 Mar 2023 11:46:47 +0000

In the previous article, Introduction to SQL for Data Analysis, I provided an introduction to SQL for data analysis, covering the basics of SQL syntax, data types, operators, and table creation/modification. In this article, we will delve into more advanced SQL features that can be used for complex data analysis tasks. In this article, we will cover some of the essential SQL commands that every data scientist should know.

1. SELECT

SELECT is the most basic and important command in SQL. It is used to retrieve data from a table. The basic syntax of the SELECT command is:

SELECT column1, column2, ... FROM table_name;

2. WHERE

WHERE is used to filter data based on certain conditions. The basic syntax of the WHERE command is:

SELECT column1, column2, ... FROM table_name WHERE condition;

3. GROUP BY

GROUP BY is used to group rows based on the values in one or more columns. The basic syntax of the GROUP BY command is:

SELECT column1, column2, ..., aggregate_function(column_name)
FROM table_name
GROUP BY column1, column2, ...;

Here, aggregate_function is a function used to perform an operation on a set of values, such as COUNT, SUM, AVG, MAX, and MIN. For example, if you want to retrieve the number of customers from each country in the "customers" table, you can use the following command:

SELECT country, COUNT(*)
FROM customers
GROUP BY country;

This command will retrieve the number of customers from each country in the "customers" table.

4. ORDER BY

ORDER BY is used to sort the data in ascending or descending order based on one or more columns. The basic syntax of the ORDER BY command is:

SELECT column1, column2, ... FROM table_name ORDER BY column_name [ASC|DESC];

Here, column_name is the name of the column that you want to sort by.

5. JOIN

JOIN is used to combine rows from two or more tables based on a related column between them. There are different types of joins, including INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN. The basic syntax of the JOIN command is:

SELECT column1, column2, ... FROM table1 JOIN table2 ON table1.column_name = table2.column_name;

Table1 and table2 are the names of the tables that you want to join, and column_name is the name of the related column between them.

INNER JOIN

INNER JOIN returns only the rows in which the joined tables have matching values. For example, if we have two tables - employees and departments - with a common column "department_id", we can join them as follows:

SELECT employees.employee_id, employees.first_name, employees.last_name, departments.department_name
FROM employees
INNER JOIN departments
ON employees.department_id = departments.department_id;

This statement will return only the rows where the "department_id" column in the employees table matches the "department_id" column in the departments table.

LEFT JOIN

LEFT JOIN is used to combine all rows from the left table and the matching rows from the right table. If there are no matching rows from the right table, the result will contain NULL values.

SELECT column1, column2, ... FROM table1 LEFT JOIN table2 ON table1.column_name = table2.column_name;

Table1 and table2 are the names of the tables that you want to join, and column_name is the name of the related column between them.

Say you want to retrieve all the customers and their orders from the "customers" and "orders" tables, even if some customers have not made any orders yet, you can use the following command:

SELECT customers.customer_name, orders.order_id
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id;

RIGHT JOIN

RIGHT JOIN is used to combine all rows from the right table and the matching rows from the left table. If there are no matching rows from the left table, the result will contain NULL values. Here's the syntax:

SELECT column1, column2, ... FROM table1 RIGHT JOIN table2 ON table1.column_name = table2.column_name;

FULL OUTER JOIN

FULL OUTER JOIN is used to combine all rows from both tables, including the rows that have no matching values in either table. The basic syntax of FULL OUTER JOIN command is:

SELECT column1, column2, ...
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name;

table1 and table2 are the names of the tables that you want to join, and column_name is the related column between them.

SELECT c.name, o.order_id
FROM customers c
FULL OUTER JOIN orders o
ON c.customer_id = o.customer_id;

This query will return all customers and their corresponding order IDs, including customers who have not placed any orders yet. For customers without orders, the order ID column will contain NULL values.

SELF JOIN

Self join is a type of join where a table is joined with itself. This is useful when you want to compare data in the same table or when you have hierarchical data in a table.

To perform a self join on a table, we need to give the table an alias name so that we can refer to it separately within the same query. Here's an example of a self join that retrieves the names of all employees and their corresponding supervisors:

SELECT e.employee_name, s.employee_name as supervisor_name
FROM employees e
INNER JOIN employees s ON e.supervisor_id = s.employee_id;

In this example, we gave the "employees" table the alias name "e" for the first instance of the table, and "s" for the second instance of the table. We then joined the two instances of the table on the "supervisor_id" column in the first instance of the table, and the "employee_id" column in the second instance of the table.

The result of the query will be a list of all employee names with their corresponding supervisor names.

6. Subqueries

Subqueries are used to nest one query inside another query. They can be used to retrieve data that meets specific criteria or to create temporary tables for use in other queries.

For example, if we want to retrieve the names of employees who work in departments with more than 100 employees, we can use a subquery as follows:

SELECT first_name, last_name
FROM employees
WHERE department_id IN (
SELECT department_id
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 100
);

This statement uses a subquery to retrieve the department IDs of departments with more than 100 employees, and then retrieves the names of employees who work in those departments.

7. Aggregation Functions

Aggregation functions are used to perform calculations on a set of data, such as finding the average, maximum, or minimum value in a column. Common aggregation functions include SUM, AVG, MAX, MIN, and COUNT.

For example, if we want to find the average salary of employees in each department, we can use the AVG function as follows:

SELECT department_id, AVG(salary) AS average_salary
FROM employees
GROUP BY department_id;

This statement groups employees by department and calculates the average salary for each department.

8. Window functions

Window functions are used to perform calculations on a subset of data within a larger data set. They can be used to calculate running totals, rankings, and other complex calculations.
For example, to rank the sales figures for each month, we can use the following query:

SELECT month, sales, RANK() OVER (ORDER BY sales DESC) AS sales_rank
FROM sales_table;

Here, month is the column that contains the month name or number, sales is the column that contains the sales figures for each month, and sales_table is the name of the table that contains the data.

The RANK function is used to rank the rows based on a specific column, and the OVER clause specifies the order in which the rows should be processed. Sales_rank is the column that will contain the rank of each month's sales figures, and DESC is used to order the sales figures in descending order.

Overall, window functions provide a powerful tool for performing complex calculations on subsets of data within a larger data set.

Conclusion

In this article, we covered some of the more advanced SQL features that are commonly used in data science. Joins allow us to combine data from multiple tables based on a common column, subqueries enable us to nest one query inside another query, aggregation functions let us perform calculations on a set of data, and window functions allow us to perform calculations on a subset of data within a larger dataset.

As you continue to work with SQL for data analysis, it's important to keep practicing and exploring the different features and commands available. With enough practice, you'll become more comfortable with SQL and be able to use it to extract valuable insights from even the largest datasets.

Till next time, happy coding!✌️

When to Use Machine Learning Solutions

Catherine Kawara — Wed, 01 Mar 2023 07:54:57 +0000

Machine learning has become increasingly popular in recent years, and for a good reason. With the ability to analyze large amounts of data and make predictions based on patterns, machine learning has the potential to solve a wide range of problems. This article will explore when to use machine learning solutions and when they may be unnecessary.

Does your problem need a machine learning solution?

Understanding what machine learning is and what it can do is essential. Machine learning is an approach that enables computers to identify and learn complex patterns from data and make predictions or decisions based on that data.

Now to our question, "does your problem need a machine learning solution?". Specific characteristics of a problem make it a good candidate for a machine learning solution, and we will explore a few.

1. Data

ML algorithms require large amounts of data to learn from. If you have a problem involving a lot of data, machine learning may be a good solution.

For example, you want to analyze customer behavior on an e-commerce website. In that case, you'll likely have a lot of data to work with, such as clickstream data, purchase history, and demographic information. Machine learning algorithms can help you identify patterns in this data and predict customer behavior.

Launching an ML system without training data is possible, but the system will learn from incoming data in production. All the same, serving insufficiently trained models to users come with risks, like poor customer experience.

2. Complex patterns

ML is especially useful when there are patterns in the data that are difficult for humans to identify.

Say you want to detect credit card fraud. There may be patterns in the data that are not immediately obvious. ML algorithms can help you identify these patterns and flag potentially fraudulent transactions.

3. Dynamic environment

ML algorithms can adapt to changing environments and make predictions based on new data. If you have a problem that involves a dynamic environment, hardcoded solutions like handwritten rules can quickly become outdated.

Figuring out how your problem has changed so you can update your handwritten rules can be too expensive or impossible. This is when ML comes in.

For example, if you want to predict stock prices or weather patterns, ML can analyze new data and make predictions based on the latest information.

4. Repetitiveness

A repetitive problem occurs frequently and consistently over time. Meaning that there are consistent patterns in the data that the ML algorithm can learn from. This makes collecting enough data to train the algorithm effectively and make accurate predictions on new data easier.

Now we could continue with the list of use cases, which will keep growing as ML grows in the industry. Even though ML can solve many problems very well and effectively, there are specific problems for which machine learning may not be the best solution. For example;

If your problem involves simple rules that can be easily programmed, then a traditional programming solution may be more appropriate.
ML algorithms are only as good as the data they are trained on. If your data is biased, incomplete, or inaccurate, the algorithm won't be as effective as it should be. The same goes for the amount of data available, if you have a small amount of data and no way to collect more data, then ML may not be effective.

Conclusion

Machine learning can be a powerful tool for solving certain types of problems. When considering whether to use a machine learning solution, it's important to consider the characteristics of the problem and determine whether machine learning is the best approach.

Additionally, it's important to consider the resources required to implement a machine learning solution and whether you have the expertise and resources to do so.

However, if ML can't solve the whole of your problem, it might be possible to break it into smaller components and use ML to solve some of them.

Till next time, Happy coding!✌️

Exploratory Data Analysis - Ultimate Guide

Catherine Kawara — Sun, 26 Feb 2023 06:45:59 +0000

Exploratory data analysis (EDA) is an essential step in any data analysis project. It involves examining and analyzing data to uncover patterns, relationships, and insights.EDA is an iterative process that begins with simple data exploration and progresses to more complex analysis.

The main objective of EDA is to gain a deeper understanding of the data and use this understanding to guide further analysis and modeling. EDA can help in making informed decisions and generating new hypotheses, which can lead to new insights and discoveries.

Steps of EDA

1. Understanding your variables

Before beginning data cleaning, it is important first to understand the variables in your dataset. Understanding the variables will help you identify potential data quality issues and determine the appropriate cleaning techniques to use.

Reading data

We will use the IT Salary Survey for EU region(2018-2020) dataset

data = pd.read_csv("./Data/IT Salary Survey EU  2020.csv")

# get first five rows
data.head()

# get the last five columns
data.tail()

data.shape

This will return the number of rows by the number of columns for my dataset. The putput is (1253, 23) meaning our dataset has 1253 rows and 23 columns

data.describe()

It summarizes the count, mean, standard deviation, min, and max for numeric variables.

# get the name of all the columns in our dataset
data.columns

# check for unique values
data.nunique()

# check unique value for specific values
data['Seniority'].unique()

2. Cleaning the data

Before you start exploring your data, it's essential to clean and preprocess the data. Data cleaning involves identifying and correcting errors and inconsistencies in the data, such as missing values, duplicate records and incorrect data types.
By doing so, we can ensure that our analysis is based on high-quality data and that any insights or conclusions drawn from the data are accurate and reliable.

Renaming columns
First, we rename our columns with shorter, more relevant names.

data.columns = ["Year", "Age", "Gender","City","Position","Experience","German_Experience","Seniority","Main_language","Other_Language","Yearly_salary","Yearly_bonus","Last_year_salary","Last_year_bonus","Vacation_days","Employment_status","Сontract_duration","Language","Company_size","Company_type","Job_loss_COVID","Kurzarbeit","Monetary_Support"]

Remove any columns we might don't need

data.drop([
        'German_Experience', # analysis focuses on only German
        'Last_year_salary', # analysis covers only 2020 
        'Last_year_bonus',  # analysis covers only 2020 
        'Job_loss_COVID', # analysis isn't related to covid
        'Kurzarbeit', # analysis isn't related to covid
        'Monetary_Support', # analysis isn't related to covid
        ], axis=1, inplace=True)

Dealing with missing and duplicate values

Duplicates are records that are identical in all variables. Duplicates and missing values can affect the statistical analysis and visualization. One approach to handling this is to remove them from the dataset.

# check for missing values
data.isna().sum()

# Drop missing and duplicate values
data=data.dropna(subset=['Age','Gender','Position','Experience','Seniority','Main_language', 'Сontract_duration', 'Yearly_bonus']) 
data=data.drop_duplicates()

We can use data.isna().sum() to confirm if there are any missing values left

Handling data type issues

Data type issues can arise from errors in data collection or data entry. For example, a variable may be stored as a string instead of a numerical value. One approach to handling data type issues is to convert the variable to the appropriate data type. In our case, we will convert datetime to date

#  changing datetime to date
data['Year'] = pd.to_datetime(data['Year']).dt.year

3. Analyzing relationships between variables

Correlation Matrix
The correlation is a measurement that describes the relationship between two variables. Therefore, a correlation matrix is a table that shows the correlation coefficients between many variables.

# calculate correlation matrix
corelation = data.corr()

we will use sns.heatmap() to plot a correlation matrix of all of the variables in our dataset.

# plot the heatmap
sns.heatmap(corelation,xticklabels=corelation.columns,yticklabels=corelation.columns, annot=True)

Countplot

A countplot is a type of visualization in which the frequency of categorical data is plotted.
It is a bar plot where the bars represent the number (or count) of occurrences of each category in a categorical variable.
Countplots are a useful tool for exploring the distribution of categorical data and identifying patterns or trends.

sns.countplot(x='Gender', data=data)

The hue parameter

The hue parameter is used to display the frequency of the data based on a second categorical variable. In our case, Contract_duration.

sns.countplot(x='Gender', hue='Сontract_duration', data=data)

The resulting plot shows the frequency of gender, grouped by the contract duration.

Histogram

A histogram is a graphical representation of the distribution of a continuous variable, typically showing the frequencies of values in intervals or "bins" along a numerical range.

bins = [15,20,25,30,35,40,45,50] 
pd.cut(data['Age'],bins=bins).value_counts(normalize=True).mul(100).round(1).plot(kind='bar')

Conclusion

EDA is a critical step in any data analysis project. It allows you to understand the data, identify patterns and relationships, and extract insights that can help you make informed decisions.
There are several other types of visualizations that we didn't cover that you can use depending on the dataset, but by following the steps outlined in this article, you can conduct an effective EDA and extract valuable insights from your data.

The rest of the code where I perform more EDA on this dataset can be found on GitHub.
Till next time, Happy coding!✌️

Introduction to SQL for Data Analysis

Catherine Kawara — Sun, 19 Feb 2023 09:51:58 +0000

Structured Query Language (SQL) is a programming language used for managing and manipulating relational databases. It is one of the most popular languages used for data analysis and is used by data analysts, data scientists, and database administrators alike. SQL can be used to store, access, and extract massive amounts of data in order to carry out the data science process smoothly.

This article aims to provide an introduction to SQL for data analysis. We will cover the basics of SQL, including its syntax, data types, and operators.
We will also discuss how to create, modify, and delete tables in SQL, and how to retrieve data from these tables using queries.
Finally, we will provide an overview of some of the more advanced SQL features that can be used for more complex data analysis tasks.

SQL Basics

SQL is a declarative language, meaning that users define what they want the database to do, and the database management system (DBMS) determines the best way to perform the operation.
SQL statements consist of one or more clauses, each of which specifies a particular action to be taken. The most common clauses are SELECT, INSERT, UPDATE, DELETE and CREATE. We will look at them in detail later in this article.

SQL syntax is relatively simple, with commands composed of keywords and parameters that are often enclosed in parentheses or quotes. It is not case-sensitive, but it is common practice to write SQL keywords in uppercase.

SQL Data Types

SQL supports several data types, including;

Numeric (integers, decimals, and floats)
Character (strings of text)
Date/time
Boolean types (values of true or false)

SQL Operators

Operators are used to performing operations on SQL data. There are several types of operators, including;

Arithmetic - Addition, subtraction, multiplication, and division
Comparison - Equal to, not equal to, greater than, and less than
Logical - AND, OR, and NOT
String operators - Concatenation and pattern matching.

Creating and Modifying Tables

One of the primary functions of SQL is to create, modify, and delete tables. Tables are used to organize data into rows and columns, making it easier to analyze and manipulate the data.
We use the following command;

CREATE TABLE table_name(
column1 datatype,
column2 datatype,
.....
columnN datatype,);

So to create an employees Table with various columns we would use the following statement

CREATE TABLE employees (
  employee_id INT PRIMARY KEY,
  first_name VARCHAR(50),
  last_name VARCHAR(50),
  salary DECIMAL(10,2),
  hire_date DATE
);

This statement creates a table with five columns, each with a different data type. The "employee_id" column is specified as the primary key, which means that it uniquely identifies each row in the table.

Once a table has been created, it can be modified using the ALTER TABLE statement.

ALTER TABLE table_name {ADD|DROP|MODIFY} column_name {data_ype};

This statement allows users to add or remove columns, change the data type of a column, or modify the constraints on the table.

Retrieving Data from Tables

Users must use the SELECT statement to retrieve data from a table.

SELECT column1, column2....columnN
FROM   table_name;

The SELECT statement specifies the columns to be retrieved and the table from which to retrieve the data.
For example, to retrieve all of the data from the "employees" table, we would use the following SQL statement:

SELECT * FROM employees;

Users can also specify specific columns to retrieve and use WHERE clauses to filter the data. For example,

SELECT first_name, last_name
FROM employees
WHERE salary > 50000;

This statement retrieves only the "first_name" and "last_name" columns for employees whose "salary" is greater than $50,000.

Advanced SQL Features

SQL also includes several advanced features that can be used for more complex data analysis tasks, we have discussed this at length in the second part of this article, essential-sql-commands-for-data-science. These features include:

Joins: They are used to combine data from multiple tables based on a common column or key. There are several types of joins, including inner joins, outer joins, and self-joins.
Subqueries: Subqueries are used to nest one query inside another query. They can be used to retrieve data that meets specific criteria or to create temporary tables for use in other queries.
Aggregation functions: Aggregation functions are used to perform calculations on a set of data, such as finding the average, maximum, or minimum value in a column. Common aggregation functions include SUM, AVG, MAX, MIN, and COUNT.
Window functions: Window functions are used to perform calculations on a subset of data within a more extensive data set. They can be used to calculate running totals, rankings, and other complex calculations.

Conclusion

SQL is a powerful tool for managing and manipulating large amounts of data. Its simple syntax and flexible data types make it an ideal language for data analysis tasks.
Mastering SQL as an analysis tool is essential for anyone interested in data analysis. To master SQL it is necessary to know the basics of SQL first. Now There are many resources available for learning SQL, and hopefully the explanation above can help you get started.

Till next time, happy coding!✌️

Data Structure and Algorithms 102: Deep Dive into Data Structure and Algorithms

Catherine Kawara — Fri, 08 Jul 2022 08:44:13 +0000

In my previous DSA article, we did a basic introduction to data structures and algorithms. Today we are going to talk about something a little bit more complex Time Complexity. For this article, I will be using Ruby for the examples.

When writing code it is important to consider the amount of time it would take for the code to complete a specific task even as the input size grows. Time complexity determines how an algorithm is going to perform as the input size grows. Input size in this context is the arguments a method takes. If a method takes in a string as an argument, that is the input. The length of the string becomes the input size.

Big O Notation

Big O notation describes the complexity of your code using algebraic terms. Big O notation can express an algorithm's best, worst, and average-case running time. For our purposes, we're going to focus primarily on Big O as it relates to time complexity and we'll be covering four main time complexities.
There's more on the Big O notation here.

Constant Runtime: “O (1)”

O (1) means that it takes a constant time to run an algorithm, regardless of the size of the input.

arr = [3, 1, 6, 9, 10, 2]
def print_all(arr)
   puts "#{arr[0]}" # prints out 3
   puts "#{arr[1]}" # prints out 1
end
print_all(arr)

Linear Runtime: “O (n)”

O (n) means that the run-time increases at the same pace as the input. One of the most common linear-time operations is traversing an array. Methods like each and map run through the entire collection of data, from start to finish.

arr = [3, 1, 6, 9, 10, 2]
def print_all(arr)
   arr.each do |num|
      puts "#{num}"
   end
end
print_all(arr)

Exponential Runtime: “O (n²)”

O(n²) means that the calculation runs in quadratic time, which is the squared size of the input data.
In programming, many of the more basic sorting algorithms have a worst-case run time of O(n²):

Bubble Sort
Insertion Sort
Selection Sort

In the example below we have a nested loop, which means that depending on the size of the array, the outer loop will make its first iteration, and then the inner loop will iterate over the ENTIRE array before it goes back to the second iteration of the outer loop and it will continue on until the outer loop reaches the end of the array. That’s a LOT of iterations. Even an array with as little as 1,000 elements would create a million operations.

def print_all(arr)
   arr.each do |letter1|
      arr.each do |letter2|
         puts "#{letter1}" + "#{letter2}"
      end
   end
end
print_all(["A", "B", "C"]) # prints out 9 pairs
print_all(["A", "B", "C", "D"]) # prints out 16 pairs

Logarithmic Runtime: “O (log n)”

O(log n) means that the running time grows in proportion to the logarithm of the input size. This means that the run time barely increases as you exponentially increase the input. This is a highly efficient runtime. An example that always has logarithmic runtime is Binary Search. When you have a sorted array, a binary search will continually halve your data by checking if the item is larger or smaller than a given middle.

def binary_search(n, arr)
  middle = arr.length / 2
  i = 0
  j = arr.length - 1

  while i < j
    if arr[middle] == n
      return true
    elsif arr[middle] < n
      i = middle + 1
      middle = i + j / 2
    else
      j = middle - 1
      middle = i + j / 2
    end
  end
  false
end

That's all for today, it might be a lot to take in so take your time to try and understand. Here's a cheat sheet that might be of great help.
Till next time, Happy coding ✌!

Hosting your sinatra backend(API) with heroku

Catherine Kawara — Sun, 03 Jul 2022 21:10:43 +0000

I was recently tasked with a full-stack project that incorporated the use of a react front end and ruby backend. The backend would use Sinatra and active records for migrations, database handling, and endpoint creation. Then host it to get the API to be consumed by our react front-end. The most challenging part of this project was hosting the API and that is what I will talk about today.
Here's the repository to this project

Assuming that you have your project set up,

Create Heroku Account
This is a pretty straightforward process, and can be done here

Create a new app
Once the account is set up, proceed to create a new Heroku app. You can do this in many ways, but the easiest is using the GUI. Give the app the name you'd want your API to be called.

Connect your project to Heroku
There are various ways to do this, as you can find here, but for this example, we will use the CLI, for simplicity.

Install Heroku CLI with npm install -g heroku
Log in to your Heroku account heroku login
clone your project's source code to your local machine

heroku git:clone -a your-project 
cd your-project

Make some changes to the code you just cloned and deploy them to Heroku using Git.

git add .
git commit -am "make it better"
git push heroku master

Configure database
In your database.yml file, set the adapter to the sql language of your choice. Postgres, sqlite3,MySQL or oracle. note that using sqlite3 will be problematic if you want to use this API in production.
I would suggest using Postgres

Add Postgres gem to your gem file
Add this using gem 'pg' so it will be included when installing the gems

Add procfile
In your file set up, add a file called Procfile this is in charge of running the entry points in any Heroku web app. In our case, the config.ru

Add command to profile
Add this to your profile so that when you push your work to production, it can be executed first.

web: bundle exec rackup config.ru -p $PORT
release: bundle exec rake db:migrate

the first line will run our config.ru
the second line will run to run migrations and execute the schema.

Add a Postgres extension on Heroku

On your Heroku account,

Go to resources
Search Postgres, and pick the first suggestion (Heroku Postgres)
Add extension and submit the order form This extension will be managing your database in production

And we're done, once all this is set up, push your code to Heroku, check your logs and test out your API.

Till next time, Happy coding ✌

Data Structures 101: Introduction to Data Structures and Algorithms.

Catherine Kawara — Sun, 19 Jun 2022 20:58:12 +0000

A data structure is a particular way of organizing data in a computer so that it can be used effectively.As developers, we must have good knowledge about data structures as it is a key topic when it comes to Software Engineering interview questions.
The most commonly used data structures include;

Arrays
Linked lists
Stack
Queues

Algorithms

An Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a certain order to get the desired output. They are normally independent of any programming language, as they can be implemented in more than one programming language.

Characteristics of a good algorithm

Input − should have 0 or more well-defined inputs.
Output −should have 1 or more well-defined outputs, and should match the desired output.
Finiteness − they must terminate after a finite number of steps.
Feasibility − Should be feasible with the available resources.
Independent − should be independent of any programming code.
Unambiguous − Each of its steps, and their inputs/outputs should be clear and must lead to only one meaning.

Example of an algorithm

Algorithms are mostly writen in a step by step manner. For example, an algorithm to ad two numbers would look something like this

Step 1: Start
Step 2: Declare variables num1, num2 and sum.
Step 3: Read values for num1 and num2.
Step 4: Add num1 and num2 and assign the result to sum.
sum=num1+num2
Step 5: Display sum
Step 6: Stop

That is the basic introduction to algorithms, of course this is a much deeper topic which we will continue to explore. Till next time, happy coding✌!

Crypto React App;Material UI, Firebase and Vercel

Catherine Kawara — Fri, 27 May 2022 14:56:18 +0000

This week I was tasked with building a react site from scratch, and implement a Firebase authentication. I came up with the project idea which was, A site where users can access both the news and information about various cryptocurrencies. This was my first big-ish react app.
The live site can be found here , the design here and the repository on GitHub

The MVPs for the site were;

A home page with a list of the coins.
A filter for the currency they want to use for the coin price (USD / EUR)
A news page
Coin details page
Time filter for the graph in the detail section
Log in to coin to favorites

In this article, I will talk about my experience with the various technologies I used.

React-js

React is a JavaScript library for building user interfaces.
To create a react app you run npx create-react-app my-app on your terminal and it will create a project for you with some starter code.
I worked with various hooks like useState() to handle various states and
useEffect() to handle fetching data from APIs I also used context API to handle props.

Material UI

I used MUI for styling. It provides a simple, customizable, and accessible library of React components.
To install Mui, run npm i @mui/core on your terminal.
Now, with react 18, you might encounter an error, upon which you will run npm i @mui/core --force Because React 18 was just launched some libraries can't find a way to fit in properly.
It was my first time interacting with Mui, so it was a bit challenging, but with tutorials and documentation, I was able to achieve my set goals.

Firebase

I used Firebase for authentication and to store users' favorite coins.
This was such a challenging feature for me to implement as it was a feature I had not planned for.
I was able to attain the set goals through various tutorials on you tube and articles.

Vercel

I used Vercel for deployment. I had issues deploying my site as my git files some how got corrupted while pushing some code to GitHub.
I had planned on using netlify as I have interacted with it before, but it could not deploy my work from GitHub, or manually. That is when I discovered Vercel, it is very easy to use, it doesn't need any build command and if it encounters a problem while deploying, it will give you a solution suggestion. All you need to do is connect it to your GitHub account.

Conclusion

React is a very interesting framework. I enjoyed working on this project and I'm looking forward to working on more React projects.

My experience building a cryptocurrency news site - CryptoHut

Catherine Kawara — Sun, 24 Apr 2022 15:30:31 +0000

My project this week at Moringa School was to build a Single Page Application using HTML and CSS for the frontend and JavaScript to communicate with a public API of our choice. The learning goals of this assignment were;

Design and architect features across a frontend
communicate and collaborate in a technical environment
Integrate JavaScript and an external API
build and iterate on a project MVP

The Development Process

Research
Since we had to come up with our own project ideas, I had to brainstorm for an existing problem in the society that could be solved. During this stage I also have to to consider a couple of factors like the kind of API I was going to use for the project.
Eventually I came up with a problem statement and settled on a Crypto-News website.
I decided I was going to use the Crypto-news API for my project, only to find out I needed to give my credit card information for me to use the API, so a few days to my presentation I had to change that to The Rapid API for the news data and CoinRanking API for the real-time Crypto prices of the top coins.

UI Design and Building
I outlined the features of my MVP , so I could design around it and the user Journey. The MVP included;

A landing page
Display of the top currencies section
Display of the top and general stories
A feature for the users to star their favorite coin
A contact section

Now normally, I would do a hi-fi design before I start the actual coding, but this time, the low fidelity design was enough to get me started.
I wanted to get the logic right first, then focus on the UI details. Honestly, looking back, I don't think that's the best approach. At least for me its not. I drew some of my UI inspirations from Cointelegraph seeing as it is a crypto-news website, and Dribble - a very useful site to share and also get design inspiration.

So I start coding and interacting with the APIs, and I can confidently say it was not a walk in the park. Although some of the blockers could have been avoided if i did some more research on the APIs, this was definitely a challenging yet interesting task.
I learnt so much from this assignment, I feel like I grew ad a developer and generally as person.
The most important all-round lessons I learnt were;

Do a lot of research on a product you intend to use.
Be patient with yourself especially when learning.

Testing and Deployment
This stage was not very difficult, as I was using GitHub's gh-pages to host my site.

Conclusion
I really enjoyed working on this project, and I learnt a lot of things I wouldn't have by just reading.
Practice is always the best way to understand a new concept and what better way to grow as a developer than take up new project often.

You can interact more with this project on GitHub