DEV Community: Kelvin Vosky

Kilifi county insights( km-insights)

Kelvin Vosky — Sat, 04 Apr 2026 18:04:29 +0000

Kilifi County 2022 General Election Statistics (Last Election: August 9, 2022)

Kilifi County had 588,602 registered voters (IEBC data). Overall voter turnout was low at approximately 49% (consistent with coastal counties; national average was ~64.8%, but Kilifi saw lower participation due to factors like apathy, economic issues, and post-COVID effects).

Presidential results (county-wide): Raila Odinga (Azimio/ODM) dominated with 71.64% (204,536 votes) vs. William Ruto (UDA) at 27.09% (77,331 votes). Valid votes totalled ~285,496 + rejects. This reflected strong Azimio support along the coast.

Gubernatorial results: Gideon Mung’aro (ODM) won with 50.69% (143,773 votes), beating Aisha Jumwa (UDA) at 23.23% and George Kithi (PAA) at 22.68%.

Women Representative: Gertrude Mbeyu (ODM) won with 39.33% (111,039 votes).

Registered Voters by Constituency and Ward

Here are the IEBC-registered voter figures (per County Assembly Ward/CAW). Kilifi has 35 wards across 7 constituencies:

Kilifi North (116,742 total)

Tezo: 14,830
Sokoni: 23,955
Kibarani: 15,655
Dabaso: 13,717
Matsangoni: 15,168
Watamu: 15,341
Mnarani: 18,076

Kilifi South (97,696 total)

Junju: 15,747
Mwarakaya: 12,714
Shimo La Tewa: 32,052 (largest ward)
Chasimba: 14,090
Mtepeni: 23,093

Kaloleni (73,009 total)

Mariakani: 21,332
Kayafungo: 14,897
Kaloleni: 25,995
Mwanamwinga: 10,785

Rabai (59,165 total)

Mwawesa: 8,793 (smallest ward)
Ruruma: 12,058
Kambe/Ribe: 11,119
Rabai/Kisurutini: 27,195

Ganze (67,257 total)

Ganze: 16,586
Bamba: 18,599
Jaribuni: 12,782
Sokoke: 19,290

Malindi (94,605 total)

Jilore: 10,464
Kakuyuni: 8,750
Ganda: 18,566
Malindi Town: 30,570
Shella: 26,255

Magarini (80,128 total)

Marafa: 9,026
Magarini: 17,099
Gongoni: 17,685
Adu: 16,263
Garashi: 10,466
Sabaki: 9,589

(Full ward-level registered voters from official IEBC PDF.)

Elected MCAs by Ward (IEBC Gazette Results)

ODM performed strongly, winning the majority of the 35 MCA seats (~20+), reflecting the party's coastal dominance. Other wins went to PAA, UDA, independents, ANC, JP, and SPK. Here is the full list of elected MCAs with party and votes garnered (where available from IEBC declarations):

Kilifi North

Tezo: Thomas Mumba Chengo (PAA) – 1,887
Sokoni: Ray Maro Katana (ODM) – 3,099
Kibarani: Moses Furaha Kea (ODM) – 2,010
Dabaso: Emmanuel Changawa Kombe (UDA) – 2,031
Matsangoni: Hassan Mohamed Said (PAA) – 2,059
Watamu: Ibrahim Abdi Athman (ODM) – 2,155
Mnarani: Juma Chengo Thoya (ODM) – 2,427

Kilifi South

Junju: Said Juma Iddi (ODM) – 2,507
Mwarakaya: Stallone Humphrey Mkadi (ODM) – 3,304
Shimo La Tewa: Haron Tete Ndundi (ODM) – 5,789 (highest votes in county)
Chasimba: Ronald Kazungu Mbura (ODM) – 3,287
Mtepeni: Brown Safari Kahindi (ODM) – 2,257

Kaloleni

Mariakani: Martha Koki Musyoki (ODM) – 3,176
Kayafungo: Agnes Sidi (ANC) – 2,049
Kaloleni: Jonathan Birya Fondo (ODM) – 4,148
Mwanamwinga: Edward Kazungu Ziro (IND) – 1,704

Rabai

Mwawesa: Kalama Mumba Ngome (SPK) – 738 (lowest votes)
Ruruma: Naphtali Nyae Kombo (JP) – 2,487
Kambe/Ribe: Morgan Mwagawe Kubo (UDA) – 2,924
Rabai/Kisurutini: Cantona Mae Mwadena (ODM) – 2,738

Ganze

Ganze: Benson Karisa Maneno (ODM) – 3,380
Bamba: Mwambire Mohamed Kadhengi (PAA) – 4,971
Jaribuni: Peter Safari Shehe (UDA) – 3,121
Sokoke: Harrison Taura Mweni (PAA) – 3,837

Malindi

Jilore: Hamisi Mumbo Jambo (ODM) – 1,945
Kakuyuni: Morris Kitsao Hinzano (UDA) – 1,568
Ganda: Oscar Iha Wanje (ODM) – 2,999
Malindi Town: Rashid Odhiambo Ogelo (ODM) – 5,029
Shella: Twaher Abdulkarim Mohamed (ODM) – 3,224

Magarini (partial from available data; pattern holds)

Marafa: Emmanuel Karisa Baya (ODM) – 2,851
Magarini: Paul Charo Kitsao (IND) – 2,432
Gongoni: Baya Stephen Mwaro (party not fully specified in partial data, but fits ODM/IND trend in results) (Additional Magarini wards like Adu, Garashi, Sabaki follow similar low-to-mid vote patterns with mixed parties.)

The 51 MCAs (including nominated) were sworn in on September 20, 2022.

Who Were the Majority of Voters?

Ethnic majority: The vast majority of voters in Kilifi wards are from the Mijikenda ethnic group, particularly the Giriama subgroup (dominant in rural and hinterland areas like Kilifi North, Ganze, and Magarini). Other Mijikenda subgroups (e.g., Chonyi, Rabai) are prominent in specific wards. This is the core voting bloc across most wards.
Demographics: Nationally (and applicable to Kilifi), women made up ~49% of registered voters, and youth (~18-35) ~40%. No ward-specific gender/age turnout data exists publicly, but coastal elections often see strong youth and women participation driven by local issues. Turnout was lower in urban-ish wards like Malindi Town/Shimo La Tewa compared to rural ones.

Voting was heavily influenced by ethnic loyalty to Mijikenda candidates, local patronage, and party machinery (especially ODM's coastal base), rather than strict national tickets.

Manifestos and What “Worked”

Detailed ward-level manifesto analysis is limited in public post-election reports, but patterns from results and coastal politics show:

ODM/Azimio-aligned platforms succeeded most: Promises on local development (roads, water, health facilities, markets, title deeds/land rights, fisheries/agriculture support, and youth jobs) resonated strongly. Coastal grievances like historical land injustice and marginalization played a role. ODM's incumbency advantage and Raila's coastal popularity helped many MCA candidates.
What worked overall:
- Service delivery focus over national ideology (e.g., ODM MCAs in Shimo La Tewa and Malindi Town won big by emphasizing infrastructure).
- Anti-dynasty or “hustler” messaging in some UDA/PAA wins (e.g., in competitive wards like Bamba or Jaribuni), appealing to youth disillusionment.
- Independent/local personality wins (e.g., Mwanamwinga, Magarini) where voters rejected party tickets in favor of known community figures promising direct accountability.
Low turnout suggests manifestos on economic relief (post-COVID, inflation, drought) mattered more than grand promises. Ethnicity and “homeboy/homegirl” factors often trumped detailed policy.

Kilifi politics remains personality- and ethnicity-driven at the ward level, with ODM consolidating power in 2022.

SQL joins and window functions

Kelvin Vosky — Sun, 08 Mar 2026 22:12:11 +0000

Joins and Window Functions

Sql joins

When working with relational databases, data is usually in different tables .Join functions help in combining 2 or more tables based on a related column between them usually a primary key.
Types of Joins

Inner Join
Left Join
Right Join
Full join
Self Join

Inner Join
Combines rows that have matching values in both tables. Inner joins only show values that are not null.
Example: Suppose we have a customers table with columns (customer_id, first_name, last_name, membership_status) and a sales table with columns (sale_id, customer_id, product_id, sale_date, total_amount). To find customers who made sales and their total purchase amounts:

This returns only customers with matching sales records.

Left Join
A left join Returns all rows from the left table, and only the matched rows from the right table. If there's no match the right table returns null.
Example: Using the customers and sales tables, to list all customers and their sales, even if they have no sales:

_Returns all customers. For those without sales, sale_id and total_amount show NULL.

Right Join
Opposite of the left join, Returns all rows from the right table and matching rows from the left table. If no match, left table columns show NULL.
Example: Using the sales and inventory tables (inventory has product_id, stock_quantity), to list all sales and their stock, even if some sales have no current inventory match:

This includes all inventory records. For unmatched sales, stock_quantity shows NULL.

Full Join(Outer)
Returns all rows from both tables, with NULLs in places where there is no match. It combines Left and Right Joins.
Example: Using the products table (product_id, product_name) and sales table, to list all products and sales, including unsold products and sales without product matches:

This includes all products and all sales, with NULLs for unmatched rows.(returns everything including null values from both tables)
Self Join
Self join combines a table on itself using aliases.
SQL aliases are used to give a column or a table a temporary name.
Aliases are used to make column names more readable OR Easy to remember.
The syntax for a self join table would be
SELECT column_list
FROM table alias1
INNER JOIN table alias2
ON alias1.common_column = alias2.common_column;

Example Example: In the customers table, to find pairs of customers with the same membership_status:

This pairs customers sharing the same status without duplicates.
we also have natural joins and cross joins but they aren't commonly used

Window functions
SQL window functions allow performing calculations across a set of rows that are related to the current row, without collapsing the result into a single value. They are commonly used for tasks like
Running totals (e.g., cumulative sales over time).
Rankings (e.g., top customers by spending).
Moving averages (e.g., average stock over months).
Year-over-year comparisons.
Window functions fall into three main categories: aggregate, ranking, and value
1)Aggregate window functions
Window functions are used with aggregate functions eg, sum(),avg(),Count,Max and Min,
Example; To find the Average stock quantity across all products

2. Ranking Window Functions
These assign ranks or positions: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), PERCENT_RANK(), CUME_DIST().

ROW_NUMBER(): Unique sequential number per row in the window (no ties).

Example: Number of sales per customer by date.

RANK(): Rank with gaps for ties (e.g., 1, 2, 2, 4).

Example: Rank products by total sales (join sales and products).

DENSE_RANK() :Return product ranks without gaps for ties, (i.e 1,2,2,3,4,4)
NTILE(n): Divides rows into n brackets (e.g., quartiles).or percentage groups
Example: Divide customers into 4 groups by registration date.

PERCENT_RANK() and CUME_DIST(): Percentile-based ranks (0-1 scale).
Example: Percent rank of sales amounts.

Give percentages ranks with highest as 1 and lowest as zero with reference to price
3. Value (Analytic) Window Functions
Used to return values from other rows
Includes LEAD(), LAG(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE().
1)LEAD(column, offset): Value from next row (offset=1 default).
Shows the next value from a column ,,
Example: Compare each sale to the next for a customer.

for my data the customers did not have subsequent sales , lead would have brought the next sale associated with the customer in the event another sale would have been made from the same customer.
2)LAG(column, offset): Value from previous row.
This function allows you to access the quantity of the previous row for
each customer.
LEAD() can be used to compare a customer's orders across different time periods, such as
comparing the quantity of products purchased this month with the next month.
LAG() is useful when analyzing trends or behavior changes over time, like comparing how much a customer ordered in the previous period to their current order.

Which window function do you think is useful for your sql needs,let's meet and connect in the comments

How Analysts Translate Messy Data, DAX, and Dashboards into Action Using Power BI.

Kelvin Vosky — Tue, 10 Feb 2026 18:14:24 +0000

Power BI is an interactive data visualization and business intelligence tool developed by Microsoft. It’s part of the Microsoft Power Platform and brings together apps, services, and connectors that turn data from sources like databases, spreadsheets, PDFs, APIs, and cloud services into interactive reports and dashboards.

At the core of Power BI’s analytical power is Data Analysis Expressions (DAX)—a formula language used to create calculations, measures, and logic within data models. DAX helps analysts move beyond raw numbers to meaningful metrics that drive decisions.

The real value comes from how analysts work with messy data, model it properly, write smart DAX, and design dashboards that answer real business questions.

Data Cleaning

In the real world, data is rarely clean.

You’ll see:

Missing values

Duplicate records

Inconsistent naming (e.g. “Nairobi”, “NRB”, “NBO”)

Dates in different formats

Numbers stored as text

Analysts use Power Query to handle this stage:

Removing duplicates

Standardizing formats

Splitting and merging columns

Creating calculated columns

Combining multiple data sources

This step is less about perfection and more about making data usable and reliable.

Data Modeling

Once data is cleaned, analysts design a data model—usually a star schema.

Fact tables → transactions, sales, orders, events

Dimension tables → dates, customers, products, locations

Good modeling:

Reduces DAX complexity

Improves performance

Makes measures reusable

Prevents incorrect totals

Relationships, cardinality, and filter direction matter more than visuals at this stage. A clean model is what allows dashboards to scale.

3.** DAX**

DAX is where analysis happens.

Instead of just showing totals, analysts write measures like:

Revenue growth

Month-over-month change

Rolling averages

Conversion rates

Performance vs targets

Common DAX concepts include:

Measures vs calculated columns

Filter context vs row context

Time intelligence functions

CALCULATE, FILTER, ALL, VALUES

Dashboards A good dashboard doesn’t try to show everything.

It focuses on:

Key metrics

Trends over time

Comparisons

Exceptions and outliers

Analysts design dashboards around questions like:

What’s performing well?

What’s declining?

Where should attention go now?

Interactivity (filters, slicers, drill-throughs) allows users to explore data without overwhelming them.

From Dashboard to Action

The final step is impact.

Power BI dashboards help teams:

Track KPIs in real time

Identify inefficiencies

Support operational decisions

Communicate insights clearly

When data is modeled correctly, DAX is intentional, and dashboards are focused, insights move from reports into decisions and action.What is your best part as an analyst using power bi for your analysis work ?

Schemas and Data Modelling in Power BI

Kelvin Vosky — Mon, 02 Feb 2026 08:00:41 +0000

A Beginner-Friendly Guide to Building Fast and Accurate Reports

Introduction

When working with Power BI, many beginners focus heavily on visuals, charts, and dashboards. While visuals are important, the real foundation of a good Power BI report is the data model behind it.

If your data model is poorly designed:

Reports become slow

Numbers don’t add up correctly

Filters behave unexpectedly

Dashboards are hard to maintain

This is where schemas and data modelling come in. Understanding these concepts early will save you a lot of frustration and help you build reliable, high-performance reports.

What Is a Schema?

A schema is the overall structure or blueprint of how data is organized in a database or data model.

Think of a schema as a map that shows:

What tables exist

What columns (fields) each table contains

How the tables are connected to each other

In Power BI, the schema is what you see in the Model view, where tables are connected by relationship lines.

In simple terms:

A schema explains how your data is arranged and how different pieces of data relate to one another.

Data Modelling

Data modelling is the process of designing that structure in a way that makes data easy to analyze, accurate to report on, and fast to query.

Data modelling involves decisions like:

_Which tables should exist

Which tables store numbers vs descriptions

How tables should be connected

Which relationships should be active

How filters should flow across the model_

You can think of data modelling as:

Designing the “engine room” of your Power BI report.

Good visuals sit on top of a good model. Poor visuals often hide a bad model
Key Building Blocks of Data Models

Before looking at schema types, it’s important to understand two fundamental table types used in BI.

Fact Tables

A fact table stores measurable, numerical, transactional data.

Characteristics:

Contains metrics like sales, quantity, revenue, cost

Usually very large

Records events or transactions

Contains foreign keys that link to dimensions

Example: Sales Fact Table

OrderID

CustomerID

ProductID

DateID

Quantity

Revenue

Fact tables answer questions like:

How much?

How many?

How often?

Dimension Tables

A dimension table stores descriptive information that helps explain or categorize facts.

Characteristics:

Smaller than fact tables

Used for filtering, grouping, and slicing data

Contains names, categories, dates, and attributes

Examples:

Customer Dimension → CustomerID, Name, Age, Region

Product Dimension → ProductID, Product Name, Category, Brand

Date Dimension → DateID, Year, Quarter, Month, Day

Dimension tables answer questions like:

Who?

What?

Where?

When?

Relationships in Power BI

Relationships define how tables are connected.

The most common relationship in Power BI is:

One-to-Many (1:*)

Example:

One customer → many sales

One product → many sales

One date → many transactions

In Power BI:

Relationships are created in Model view

Filters usually flow from dimension tables to fact tables

Correct relationships are critical for accurate calculations

Types of Schemas in Business Intelligence

The most common schemas used in Power BI are:

Star Schema

Snowflake Schema

Galaxy Schema (Fact Constellation)
1. Star Schema

The Star Schema is the most recommended and widely used schema in Power BI.

Structure

A central fact table in the middle

Dimension tables directly connected to the fact table

The layout looks like a star

Components of a Star Schema
Fact Table

Contains numeric measures

Contains foreign keys to dimensions

Large in size

Example: Sales Fact

OrderID

CustomerID

ProductID

DateID

Quantity

Revenue

Dimension Tables

Contain descriptive attributes

Smaller in size

Used in slicers and filters

Examples:

Customer → Name, Age, Region

Product → Category, Brand

Date → Year, Month, Day

Why Star Schema Is Recommended

Star schema is considered best practice in Power BI because:

High Performance
Fewer joins mean faster queries and better dashboard responsiveness.

Simplicity
Easy for beginners, analysts, and business users to understand.

Accurate Calculations
DAX measures behave more predictably with clean one-to-many relationships.

Better Filtering
Filters flow cleanly from dimensions to facts.

Microsoft Recommendation
Microsoft explicitly recommends star schemas for Power BI semantic models.

2. Snowflake Schema

A Snowflake Schema is a more complex version of a star schema.

Instead of having one large dimension table, dimensions are split into multiple related tables.

Structure

Fact table remains central

Dimension tables are normalized

Dimensions branch out into sub-dimensions

Example:

Customer → City → Region → Country

Product → Category → Department

Example Snowflake Design

Fact Table: Sales

OrderID

CustomerID

ProductID

DateID

Revenue

Dimension Tables:

Customer (CustomerID, Name, CityID)

City (CityID, CityName, RegionID)

Region (RegionID, RegionName, CountryID)

Country (CountryID, CountryName)
Advantages of Snowflake Schema

Reduces data redundancy

Saves storage space

Maintains better data consistency

Useful for very large datasets with complex hierarchies

Disadvantages in Power BI

More complex to design and maintain

Slower performance due to many joins

Harder for beginners and business users

More complicated DAX calculations

Because of this, snowflake schemas are generally not recommended for Power BI unless absolutely necessary.

Why Good Data Modelling Is Critical in Power BI

Data modelling directly affects everything in Power BI.

1. Performance

Well-designed models:

Reduce the number of joins

Improve query speed

Use memory efficiently

Keep dashboards responsive

2. Accurate Reporting

Clean relationships ensure:

Correct totals and aggregations

Filters work as expected

No duplicated or missing values

3. Scalability

Good models are:

Easier to extend

Easier to debug

Easier to reuse across reports

4. User Experience

Clear schemas:

Make reports easier to understand

Reduce confusion for stakeholders

Improve self-service analytics

5. Alignment with Best Practices

Power BI works best when you use:

Star schemas

One-to-many relationships

Dedicated date tables

Clean, well-named fields

Final Summary

Schemas **and **data modelling are the foundation of effective Power BI reporting.

A schema defines how data is structured and connected

Fact tables store numbers and metrics

Dimension tables store descriptive information

Star schemas are simple, fast, and recommended

Snowflake schemas save space but add complexity

In short:

Good data modelling turns raw data into fast, accurate, and trusted insights.
Thanks for reading , comment below which schema you prefer and why

I don't know excel, where should I start from

Kelvin Vosky — Sun, 25 Jan 2026 13:56:44 +0000

Microsoft Excel is one of the easiest and most popular tools for basic data analysis — especially when you're just starting out. You don't need to be a math genius or a programmer. Excel helps you organize numbers (and text), quickly find answers, spot patterns, and make simple charts — all with clicks and a few easy formulas.
Step 1: Get Your Data into Excel
Open Excel → new blank workbook
Type or paste your data (example: sales records)
Example simple dataset :

Step 2: Basic Cleaning & Organizing (Very Important First Step)

Remove duplicates → Select data → Data tab → Remove Duplicates
Fix empty cells → Find & Select → Go To Special → Blanks → type something or delete rows
Convert text to numbers/dates if needed (select column → Data → Text to Columns)
Make your data a Table, Select all data → Insert tab → Table → This adds filters automatically + makes formulas easier

Step 3:Quick Calculations

Excel formulas start with = sign.
Most useful beginner ones:
=SUM– Sum adds the values of selected cells
Eg, =SUM(B2:B100)
=AVERAGE- Average finds the average of selected values
Eg, =AVERAGE(B2:B100) → average
=MAX(B2:B100) / =MIN(B2:B100) → Finds the highest & lowest value
=COUNT(B2:B100) → how many numbers are there
=COUNTA(B2:B100) →How many values are there (includes numbers and text)
=COUNTIF(D2:D100, "Nairobi") → how many sales in Nairobi
Example: In a new cell type
=SUM(f2:f6)

Step 4: Sort & Filter – Find Answers Fast
Click anywhere in your table
Data tab → Sort (example: sort by Sales highest to lowest)
Or click the little arrow in column header → filter only "Phone" or sales > 50000

Step 5: The Super-Powerful PivotTable
PivotTables let you summarize hundreds/thousands of rows in seconds.
How to create one
Click anywhere in your data/table
Insert tab → PivotTable → OK (new sheet)

In the right panel drag fields:
Drag Region to Rows
Drag Sales to Values (it becomes Sum of Sales automatically)

Result: You instantly see total sales per region.
Change it to:
Average sale per region
Count of sales per product
Add Product to Columns → now you get a cross-table (pivot!)
You can also add filters (slicers) to simplify things further

Step 6: Make Simple Charts (Visuals Tell the Story Better)

Select your summary numbers (or the PivotTable)
Insert tab → Recommended Charts
Or directly: Insert → Column chart / Pie chart / Line chart
Example good charts:
Column/Bar chart → compare sales by product or region
Pie chart → show % share of regions (only if few categories)
Line chart → sales over time (if you have dates)
Finally , Visualization
Create a new blank sheet, format the sheet to fill then add the main excerpts, titles, and kpi’s , add slicers in the chart and beside the slicers, add your charts, formart colors for general aesthetics .

I am learning data science but I don't know where to start from:A Beginner's Guide to Git, Git Bash, Installation, and SSH Keys

Kelvin Vosky — Sun, 18 Jan 2026 09:18:15 +0000

What is Version Control & Why Git?
Version control = a system that records changes to files over time so you can recall specific versions later.
Git is a distributed version control:

Every developer has a full copy of the project history on their computer.
You work offline, then sync when ready.

Benefits for beginners:

Undo mistakes easily
See who changed what and when
Experiment without fear (branches!)
Collaborate without overwriting others' work

Most devs use GitHub (or GitLab/Bitbucket) as the "cloud home" for repos.

Git

Git is free, open-source, and used everywhere—from Kaggle competitions to real-world AI jobs. Git Bash? It's a command-line interface that makes Git feel like it's running on Linux/Mac, even on Windows. And SSH keys? They're a secure way to connect to remote repos without typing passwords every time.

Ready? Let's install and set up.
Step 1: Installing Git
First things first—download Git. It's available for Windows, Mac, and Linux.

Go to the Official Site: Head to https://git-scm.com/downloads and pick your OS.
For Windows Users (Recommended: Includes Git Bash):
Download the installer.
Run it and follow the prompts. Stick to defaults.
This installs Git and Git Bash—a terminal where you'll run Git commands.

For Mac Users:
If you have Homebrew (a package manager—install it via /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"), just run brew install git.
Or download from the site.

For Linux (e.g., Ubuntu):
Open your terminal and type sudo apt update && sudo apt install git.

Verify Installation:
Open your terminal (Git Bash on Windows) and type git --version. You should see something like "git version 2.43.0".

Step 2: Setting Up Git Basics

Before we connect to the cloud, configure Git on your machine.

Set Your Identity (Git needs to know who you are for commits)git config --global user.name "Your Name" # e.g., "Kelvin Vosky"
git config --global user.email "your.email@example.com" # Use the email tied to your GitHub account
Create a GitHub Account (If You Don't Have One):
Sign up at github.com—it's free!
This is where you'll store your data science projects remotely.

Push and pull requests

What is git push? (The Direct Upload)

git push is a simple command that uploads (sends) your committed changes from your local computer up to the remote repository (like GitHub).
Example:
Bashgit push origin main

This means: "Take my local 'main' branch commits and push them to the 'main' branch on GitHub."
It's fast and direct — great when:
You're working alone (your own data science projects).
You have permission to push directly to the main branch (e.g., your personal repo).

In data science: Use it to backup your analysis notebooks, update your portfolio, or sync code between your laptop and desktop.

Think of push as "I'm sending my work up to the cloud!"

What is a Pull Request (PR)? (The Review & Merge Step)
A pull request is not a Git command like git push — it's a feature on GitHub (or GitLab/Bitbucket) that lets you propose changes and ask someone to review and merge them.
Here's how it usually works (the safe, team-friendly way):

You create a new branch (e.g., feature-new-model).
Make changes, commit, and push the branch to GitHub:Bashgit push origin feature-new-model
On GitHub, click "Compare & pull request" (or "New pull request").
You describe your changes (e.g., "Added random forest model for better accuracy").
Others review your code, comment, suggest fixes.
When approved, someone (or you, if allowed) merges the PR into the main branch.

Initialize Your First Repo:

Create a folder for your project: mkdir my-data-project then cd my-data-project.
Run git init to start tracking changes.

Now, let's make your first commit:

Create a file, e.g., touch README.md (or use a text editor).
Add some text: "My first data science project!"
Stage it: git add README.md
Commit: git commit -m "Initial commit"
Check status: git status (should say "nothing to commit").

Boom—your changes are tracked!

Step 3: Generating and Connecting SSH Keys

Passwords are annoying, especially when pushing code multiple times a day. SSH keys provide secure, passwordless access to GitHub.

Check for Existing Keys:

In Git Bash/terminal: ls -al ~/.ssh
If you see id_rsa.pub or id_ed25519.pub, you might already have one. Skip to adding it to GitHub.

Generate a New SSH Key:

Run: ssh-keygen -t ed25519 -C "your.email@example.com"
Hit Enter for defaults (or set a passphrase for extra security).
This creates two files: ~/.ssh/id_ed25519 (private key—keep secret!) and ~/.ssh/id_ed25519.pub (public key—to share).

Add the Key to SSH Agent (For Auto-Loading):
Start the agent: eval "$(ssh-agent -s)"
Add your key: ssh-add ~/.ssh/id_ed25519

Copy the Public Key:
Mac/Linux: cat ~/.ssh/id_ed25519.pub (copy the output).
Windows: clip < ~/.ssh/id_ed25519.pub (copies to clipboard).

Add to GitHub:
Log in to GitHub > Settings (top-right avatar) > SSH and GPG keys > New SSH key.
Paste your public key, title it (e.g., "My Laptop Key"), and add.

Test the Connection:
Run: ssh -T git@github.com
If it says "Hi username! You've successfully authenticated," you're good!

Now, when creating/cloning repos, use the SSH URL (e.g., git@github.com:username/repo.git) instead of HTTPS.
Step 4: Connecting to a Remote Repo and Basic Commands
Let's put it all together.

Create a Repo on GitHub:
New > Name it "my-data-project" > Create (don't add README yet).

Link Local to Remote:
In your local folder: git remote add origin git@github.com:yourusername/my-data-project.git

Push Your Code:
git push -u origin main (first time; later just git push).

Pull Changes (If Collaborating):
git pull origin main to get updates from remote.

Other essentials for data science:

Clone a Repo: git clone git@github.com:username/repo.git (great for Kaggle datasets or open-source projects).
Branching: git checkout -b experiment-model (try new ideas without breaking main).
Merge: git checkout main then git merge experiment-model.
Ignore Files: Create .gitignore for big datasets (e.g., add "data/*.csv").

Common shortcomings (And How to Fix Them)

Permission Denied? Double-check SSH setup—test with ssh -T git@github.com.
Conflicts on Pull? Git will highlight issues; edit files, then git add and commit.
Forgot to Commit? Always git status before pushing.
Next:Installing Power bi, d beaver and postgress sql

I'm New Here

Kelvin Vosky — Tue, 13 Jan 2026 18:30:08 +0000

Hello Techies , I'm new here