DEV Community: Lameck Odhiambo

Publishing and Embedding - BEST ways to SHARE your PowerBi Reports

Lameck Odhiambo — Wed, 15 Jul 2026 10:49:31 +0000

Introduction

Imagine (or if you can't imagine you can try this practically ) you have finished your analysis in powerBi Desktop and you want to share your insights, How do you share these well thought insights? Worry no more you're just at the right place at the right time to get better in PowerBi. Sit tight and Buckle up!
‎
‎Well that's where publishing and embedding comes in - These are just 'fancy' terms trust me, these processes are as easy as ABC. The last step of preparing powerBi dashboard and reports is publishing or embedding so that you can share your reports to the concerned persons.
‎
‎Before we continue learning, Let's get a clear difference between these terms. That is PowerBi Desktop vs powerBi service, publishing vs embedding.

PowerBI Desktop vs PowerBI Service

PowerBI Desktop is a free Windows application used for building data models and designing reports. Power BI Service is a cloud-based web platform used to publish, share, and collaborate on those reports.

Publishing vs Embedding

Publishing is the initial step where you upload a report from Power BI Desktop to the cloud (Power BI Service). Embedding is the secondary step of taking that published report and displaying it directly inside an external website, portal, or custom application (via an iFrame or REST API) ‎

Pre -requisites

Curiosity
Accounts created in PowerBi service in your favourite browser(Premium account)
PowerBi Desktop App
VsCode or any other editor
PowerBi report you want to publish/embedd

1. Publishing Reports

Publishing from Power BI Desktop is required when you want to move your local .pbix file to the cloud for sharing and collaboration. You should publish when your report is ready, requires team collaboration, or needs to be embedded into secured platforms like Microsoft Teams or SharePoint

Below is a sample report I want to publish named ' diaspora remittance '.

Create an account in PowerBi service as earlier stated by going to your search browser and searching for powerBi service.
Using the same details also log in to your powerBi desktop service

Create a workspace in powerBi service to hold your project eg I created one named ' Data_Remittance '

In your powerBi desktop click on publish on the top right and select the workspace you created in powerBi service
My work space was as below - The workspace I created in powerBi service. You should also be able to see yours here
After selecting your workspace. When you go back to powerBi service, you should see your powerBi reports under your workspace.
You can share the report as link to the targeted audience or person. This is one process.

2. Embedding Reports

Power BI embedding is used to integrate interactive data visuals directly into your web applications, internal portals (like SharePoint), or SaaS products. This allows end-users to analyze data without leaving their familiar workflows or logging into the external Power BI service

To get the code to use to embed the report; navigate to the file section within the published report in the powerBi service, select embed report then publish to web
Copy the HTML code. This you can use in your desired website by putting it within the codes of your website.

Below is an example how I embedded it within this sample web using VSCode editor.
This is how it looked within the sample web. Amazing!

Conclusion

Just like that, you can share your powerBi reports. Publishing and embedding reports in Power BI democratizes your data by transitioning static files into dynamic, cloud-based experiences. It bridges the gap between raw analysis and actionable business insights, allowing stakeholders to easily access, filter, and consume live data directly within their daily workflows.

The ONLY Guide you NEED to connect SQL Databases to POWERBI

Lameck Odhiambo — Sun, 05 Jul 2026 08:41:41 +0000

Introduction

Integrating SQL databases with Power BI is one of the most effective and widely adopted approaches for creating powerful, scalable business intelligence solutions. With Power BI, you can easily connect to various relational databases such as Microsoft SQL Server, Azure SQL Database, PostgreSQL, MySQL, Oracle, and many others. This seamless connectivity enables you to analyze massive datasets with ease and build dynamic, interactive reports and dashboards.

In this one particular I accessed data sitting in a database (both local and in cloud), for this case it will be postgresql database. After accessing the database, then connecting it powerBi.

Must have

PostgreSQL must be installed and running on your computer (or a local/server machine you can access). Ensure you are able to access the following while installing the Postgres.

Host (usually localhost or 127.0.0.1)
Port (default is 5432)
Database name
Username (e.g., Postgres)
Password(of the database created)

Aiven account and in it install postgresql
DBeaver(Universal database software)
PowerBi Desktop

Local Database

A local database is a data storage system that resides directly on your device or a restricted local network rather than on a remote cloud server. In this case it resides in our system device.

Step 1: Dbeaver and Database set up

Open DBeaver.
Click Database - New Database Connection - Select PostgreSQL.

Fill in the same details (for your local database): Server Host, Port, Database, Username, Password.
Click Test Connection. If it says "Connected", you’re good.
Expand your database to confirm you can see the tables. Once this works, you can connect the exact same details in Power BI.

Inside your database you can create schema so that you can import or insert data in your database as a test experiment. In this case I uploaded a jcars.csv dataset.

To import data into your postgresql database, right click on the schema, where there will be an option to import data.
Select input files > browse(to get data in the pc) > proceed and finish.

Step 2: Connecting local database to PowerBi

Open Power BI Desktop.
Click Home > Get Data > Database > PostgreSQL database.

In the connection window, fill in: - Server: localhost (or 127.0.0.1 or your server name) eg 127.0.0.1:5432
Database: Your database name eg postgres
Click OK.
Enter your Username eg postgres and Password when prompted. -Click Connect.

Step 3: Load Your Data

In the Navigator window, you will see all available tables and views. Select the tables you want to use.
Click Load or Transform Data (to clean/edit first in Power Query).

A cloud-based SQL database such as Aiven

Step 1: Test Connection in DBeaver

Open DBeaver.
Click New Connection - Select PostgreSQL - Next.
Fill in the details from Aiven account, you simply copy and paste:

Inside your database you can create schema so that you can import or insert data in your database as a test experiment. In this case I uploaded a jcars.csv dataset.
To import data into your postgresql database, right click on the schema, where there will be an option to import data.
Select input files > browse(to get data in the pc) > proceed and finish.

Step 2: Open Power BI Desktop

Go to Home > Get Data > Database > PostgreSQL database.

Enter the connection details:
Server: yourhost.aivencloud.com:port (Example: pg-abc123.aivencloud.com:port provided)
Database: defaultdb (or your database name)
Click OK.
Enter your Username and Password when prompted.
In the Navigator window, select the tables you want - Click Load if it is clean (or Transform Data so that you clean and analyze it).

In my case as below it was unable to connect. To solve this issue you must download the CA certificate

The CA certificate is in the Aiven account
In your search bar seach manage user certificate then follow the steps below
Windows Trusted Root Certification > Right click on the certificates > All task > Import > Next > Browse > Change file types to all files > Click on the downloaded certificate > Next > Finish > Yes > Ok
Under CA Certificate, browse and select ca.pem file you downloaded from Aiven, import then finish installing.
Close your powerbi and enter the details again. This time it will connect to the database

Conclusion

Connecting database data to Power BI completely eliminates the need for manual, error-prone spreadsheet exports . By centralizing information, organizations create a reliable single source of truth . This ensures consistent metrics and unlocks scalable, automated business intelligence.

Data Modeling, Joins, Relationships and Schemas

Lameck Odhiambo — Mon, 22 Jun 2026 07:52:40 +0000

Before data reaches its final used destination it needs to be organized in a structured way to enable easy retrieval performance and good storage - This is data modeling. Data modeling is the process of creating a blue print of how data is connected, stored and retrieved in a system. This enables you to create an organized structure for your tables.

Reasons for data modeling

Data consistency
Optimize performance of the queries
Scalability and maintenance
Optimize cost in storage

Layers of data modeling

1.Conceptual Data Model
The highest-level, business-focused view. It defines what data is being collected and how business concepts relate to one another that is subject, characteristics and relation. Invovles gathering information from stake holders.
Agile and Waterfall method of gathering requirements - Waterfall and Agile are two fundamentally different approaches to project management. Waterfall is a linear, step-by-step process where each phase must be completed before the next begins. Agile is an iterative, flexible approach that breaks projects into smaller cycles for continuous improvement and rapid delivery.

Focus:Business entities (e.g., Customers, Products, Orders) and their relationships.
Audience: Business stakeholders, domain experts, and product managers.
Details: Tech-agnostic; no attributes, data types, or system implementations are specified

2. Logical Data Model
The bridge between the business requirements and the technical solution. It defines structure by establishing facts (events) and dimensions (context).

Focus: Data attributes, primary/foreign keys, and specific data objects.
Audience: Data architects and business analysts.
Details:Technology-neutral; independent of the specific database management system (DBMS) being used

Here we come up with an ER(Entity)- Relation Diagram

3. Physical Data Model
The most technical and concrete layer. It dictates exactly how the data will be stored and structured in a specific database.

Focus: Table names, column specifications, data types, storage methods, and compression techniques.
Audience: Database administrators, developers, and data engineers.
Details: Highly specific to a chosen engine (e.g., PostgreSQL, Snowflake, BigQuery)

Types of Data modeling

1. OLTP (Online Transactional Processing)

The process of designing databases to handle high volumes of fast, real-time, day-to-day transactions (such as e-commerce checkouts or banking transfers). Its primary goal is to ensure data integrity, eliminate redundancy, and support rapid write, update, and delete operations.
This is the fast step taken before moving data to a datawarehouse from a database.

Core Principles of OLTP Modeling

Normalization (up to 3NF): Data is broken down into smaller, logical tables to eliminate duplication. For instance, a customer’s address will live in a single Addresses table rather than being repeated on every single order.
Entity-Relationship (ER) Design: Models are created by identifying distinct entities (e.g., Customers, Products, Orders) and establishing strict relationships (e.g., one-to-many, many-to-many) between them.
ACID Compliance: The model prioritizes atomicity, consistency, isolation, and durability so that complex, multi-step transactions either succeed entirely or roll back cleanly without data corruption

Best Practices

Implement Strong Constraints: Use Primary Keys (PK), Foreign Keys (FK), UNIQUE constraints, and NOT NULL rules at the database level to enforce strict data integrity.
Index Wisely: Index your Primary and Foreign Keys to speed up row retrieval, but avoid over-indexing, as this will slow down write-heavy transactions.
Choose the Right Technology: Utilize robust Relational Database Management Systems (RDBMS) like postgresql, oracle or MySQL.

Common Data Types in Data Modeling

Data types are generally divided into standard primitive types and advanced complex structures.

Numeric Types

Integer: Stores whole numbers without decimals (e.g., ID numbers or inventory counts).
Float / Real: Stores approximate numerical values with fractional decimals for scientific data.
Decimal / Numeric: Stores exact fixed-point decimals, making it ideal for financial amounts.

String and Text Types

CHAR: Holds fixed-length text character strings, padding shorter inputs with spaces.
VARCHAR: Holds variable-length text strings up to a specified maximum length.
TEXT / CLOB: Stores large blocks of character data, such as product descriptions or articles.

Date and Time TypesDATE: Records calendar dates consisting of the year, month, and day.

TIME: Captures precise hours, minutes, and seconds.
TIMESTAMP: Combines date and time to track real-time systemic events or logs.

Logical and Binary

TypesBoolean: Evaluates to true or false states to support logical checks.
BLOB: Keeps raw binary large objects, including uploaded imagery, video files, or document attachments.

Complex and Semi-Structured Types

Array: Groups a list of multiple values inside a single column field.
Struct / JSON: Embeds a nested key-value format block to represent flexible, semi-structured object details

Primary keys in a Database

In a database, a key is an attribute (column) or a collection of attributes used to uniquely identify rows within a table and establish relationships between multiple tables. Keys are foundational for enforcing data integrity, preventing duplication, and ensuring efficient data retrieval.

Why Database Keys Matter

Enforce Uniqueness: They stop identical duplicate rows from muddying your datasets.
Connect Data: They link related concepts (e.g., matching a CustomerID foreign key in an Orders table back to the master Customers profile).
Speed Up Searches: Database engines automatically build indexes around key fields, drastically accelerating query performance.

Relationships in a database

Database relationships are logical links established between two or more tables based on a common column. In a relational database management system (RDBMS) like MySQL or PostgreSQL, these connections dictate how records interact. They use Primary Keys (PK) and Foreign Keys (FK) to eliminate redundant data and maintain data integrity
Inorder to connect different entities in a database we need relationships, to configure relationships we need cardinality

Used Cases

One to Many
A one-to-many (1:N) relationship occurs when a single record in one table (the parent) links to multiple records in another table (the child), but each child record maps back to exactly one parent record. It is the most common pattern in database design because it minimizes redundant data and enforces clear hierarchies
e.g Customers and Orders, Departments and Employees

Many to Many
A many-to-many (M:N) relationship occurs in a database when multiple records in one table are associated with multiple records in another table. Relational database systems cannot link two tables directly in this manner because doing so violates database normalization principles, leading to severe data duplication and maintenance issues.

One to One
A one-to-one (1:1) database relationship occurs when a single record in Table A is linked to exactly one record in Table B, and vice versa. It means each row in either table has a maximum of one matching row on the opposite side. eg Person and Passport,Country and Capital City,Car and License Plate,Store User and Shopping Cart,Employee and Desk Assignment,App Account and Premium Subscription

Normalization

Database normalization is a systematic design process used to organize data in a relational database to minimize data redundancy and eliminate data modification anomalies. e.g 1NF, 2NF,3NF

After creating these relationships, the next process is the last layer that is the physical layer. Implementing the conceptual layer and logical layer by writing SQL scripts.

2. OLAP Data Modeling(Online Analytical Processing)

Source of data are the databases created using OLTP data modeling. Online Analytical Processing (OLAP) data modeling structures data for rapid querying and business intelligence. It organizes information into a multidimensional model

Database ---------> Bronze -----------> Silver ----------> Gold

Bronze

Exact replica of tables from database

Silver

Transformed data
Aggregations e.g One big table(OBT)

Gold

Dimension data model
Fact and Dimensions Tables

The most common ways to physically structure OLAP models are through specific schemas in a data warehouse

1. Star Schema
The most widely used and recognizable model.Structure: Consists of a central fact table surrounded by multiple dimension tables.

Fact Table: Contains the quantitative measurements (e.g., Sales Amount, Units Sold) and foreign keys mapping to the dimensions.

Dimension Tables: Highly denormalized tables containing descriptive attributes (e.g., Customer Name, Store Location, Product Category).

Benefit: Simplicity and extremely fast read times, as it requires fewer table joins to get analytical results.

2. Snowflake Schema
A refinement of the star schema.

Structure: Similar to the star schema, but the dimension tables are normalized, meaning they branch out into sub-dimension tables.

Example: A "Product" dimension might connect to a "Category" sub-dimension, which connects to a "Department" sub-dimension.

Benefit: Reduces data redundancy and takes up less storage space, though queries may require more complex joins

Dimensions tables do change with time, hence need for SCDs( Slowly Changing Dimensions)

Type 0 - No change
Type 1 - Upsert / Overwrite
Type 2 - Tracking history of changes
Type 3 - Adds new column

SCDS explained check here https://en.wikipedia.org/wiki/Slowly_changing_dimension

Joins in data modeling

Joins are heavily used in both OLTP and OLAP, but they are used for completely different reasons and perform differently in each system.
In data modeling, joins are operations used to combine rows from two or more tables horizontally into a single dataset, based on a related common key (such as an ID). They are fundamental for integrating normalized databases and bringing related data together for reporting and analysis.

OLTP (Online Transaction Processing)

How it’s used: Joins are necessary. OLTP systems process day-to-day business transactions (like an e-commerce checkout) and use highly normalized schemas.

The Goal: Data is split into many small tables (e.g., customers, orders, products) to prevent duplication and ensure fast, accurate data entry.

Impact: Queries join a few tables together, but they typically only touch a very small number of rows (e.g., a single customer's specific order), making these joins extremely fast.

OLAP (Online Analytical Processing)

How it’s used: Joins are typically used in relational data warehouses (using Star or Snowflake schemas) to connect a central Fact table to surrounding Dimension tables.

The Goal: OLAP is designed for complex, historical analysis scanning millions of rows. Because large-scale joins are computationally expensive, OLAP models use denormalization (duplicating some data) to keep joins to a minimum and boost query performance.

Impact: Queries involve multi-table joins and massive aggregations, which naturally take longer (seconds or minutes) but yield deep business insights.

The 4 Primary Types of Joins

The type of join you choose determines how unmatched data (rows that don't share a common key) is handled:

INNER JOIN: Returns only the rows where there is a matching value in both tables. If a record doesn't exist on both sides, it is excluded.

LEFT JOIN (Left Outer): Returns all rows from the left table, and the matching rows from the right table. If there is no match on the right side, the result will contain null for the right-hand columns.

RIGHT JOIN (Right Outer): Returns all rows from the right table, and the matching rows from the left table. If there is no match on the left side, the result will show null for the left-hand columns.

FULL JOIN (Full Outer): Returns all records when there is a match in either the left or right tables. If there is no match on either side, the result will contain null.

Point worth noting

Joins vs. Relationships Joins physically merge or combine datasets to create a new, static result set (commonly used in SQL queries or Power Query).Relationships establish an ongoing, logical connection between tables so the modeling tool (like Microsoft Power BI or Tableau) can calculate metrics across the tables dynamically.

Conclusion

The Master Blueprint of Data Success
Data modeling is not just about organizing tables. It is the secret blueprint that turns messy, raw numbers into powerful business insights. By mastering schemas, relationships, and joins, you build a solid foundation for any data project.The Star Schema serves as your ultimate map, keeping your data clean and organized. Relationships act as smart bridges, letting your tables talk to each other without creating clutter. Meanwhile, joins work like glue to merge data when you need a single, complete view.When these three tools work together, magic happens. Your reports run faster, your numbers stay accurate, and your business can grow without slowing down. In short, a great data model turns confusing data into clear, actionable answers.

Linux Fundamentals for Data Engineers

Lameck Odhiambo — Mon, 08 Jun 2026 19:30:49 +0000

Introduction

Linux is a popular open-source operating system modeled after UNIX (Think of Unix as the original blueprint or architectural inspiration, and Linux as a modern, completely independent recreation built using that same blueprint). At its core is the Linux kernel - the base code that manages the communication between a computer's hardware and software.

Used cases of Linux other than in Data Engineering?

You likely use Linux every day without realizing it:
Mobile Devices: The Android operating system is built on top of the Linux kernel.
Servers & Cloud: The vast majority of web servers and cloud services (like AWS and Google Cloud) run on Linux.
Smart Home & IoT: Smart TVs, routers, and embedded devices often use Linux.
Supercomputers: An estimated 90% of the world’s supercomputers run on Linux for peak performance and efficiency.
Gaming: Handheld gaming devices and PC gaming platforms (like SteamOS) rely heavily on Linux to run Windows-based games.

Because we are focusing on Data Engineering lets see how Data Engineers use Linux come along...

Data engineers use Linux as the underlying foundation for modern data infrastructure, since nearly all cloud environments, container systems, and big data frameworks run natively on Linux servers.

Linux used cases for Data Engineers

Processing data before python touches it
Building Automation & Ingestion scripts
Interracting with Cloud Systems and remote servers
Deploying containers and Orchestration tools
Debugging and Infrastructure monitoring

Sample Linux Commands

File & Directory management

ls -la                     # List all files (including hidden) with details
ls -lh                     # List files with human-readable sizes
pwd                        # Print current working directory
cd /path/to/dir            # Change directory
cd ~                       # Go to home directory
cd -                       # Go back to previous directory

mkdir foldername           # Create directory
mkdir -p dir1/dir2/dir3    # Create nested directories
touch filename.txt         # Create empty file

cp file.txt /dest/         # Copy file
cp -r folder/ /dest/       # Copy folder recursively
mv oldname newname         # Rename or move file/folder
rm file.txt                # Remove file
rm -rf folder/             # Remove folder and contents (use with caution!)

System Information

uname -a                   # Show kernel and system info
lsb_release -a             # Show distribution info
cat /etc/os-release        # Show OS details
hostname                   # Show hostname
uptime                     # Show system uptime
free -h                    # Show memory usage (human readable)
df -h                      # Show disk space usage
du -sh /path               # Show size of directory
top                        # Live process viewer (press q to quit)
htop                       # Better interactive process viewer (if installed)

Process Management

ps aux                     # List all running processes
ps aux | grep nginx        # Find specific process
kill 1234                  # Kill process by PID
kill -9 1234               # Force kill process
pkill nginx                # Kill process by name
jobs                       # List background jobs
fg %1                      # Bring job to foreground
bg %1

                # Send job to background

File searching & Content

find / -name "*.txt" 2>/dev/null   # Find files by name
locate filename                    # Fast search (needs updatedb)
grep "search text" file.txt        # Search inside file
grep -r "text" /path/              # Recursive search in directory
cat file.txt                       # Display file content
less file.txt                      # View file with scrolling
head -n 20 file.txt                # First 20 lines
tail -n 20 file.txt                # Last 20 lines
tail -f /var/log/syslog            # Follow log file in real-time

Networking

ip addr show               # Show network interfaces (modern)
ifconfig                   # Show interfaces (older)
ping google.com            # Test connectivity
curl -I https://example.com # Get HTTP headers
wget https://example.com/file.zip
ssh user@192.168.1.100     # SSH into remote server
scp file.txt user@host:/path/   # Copy file via SSH
netstat -tuln              # Show listening ports
ss -tuln                   # Modern alternative to netstat

Package Management

#### Debian/Ubuntu
sudo apt update
sudo apt upgrade
sudo apt install htop
sudo apt remove htop

User & Permissions

whoami                     # Current user
sudo command               # Run as superuser
su - username              # Switch user
chmod 755 file.sh          # Change permissions (rwxr-xr-x)
chmod +x script.sh         # Make executable
chown user:group file.txt  # Change owner
id                         # Show user/group IDs
passwd                     # Change password

Compression & Archives

tar -czvf archive.tar.gz /folder/     # Create compressed tarball
tar -xzvf archive.tar.gz              # Extract
zip -r archive.zip folder/            # Create zip
unzip archive.zip                     # Extract zip

Practical Example

Conclusion

Linux is the essential foundation for modern data engineering. Mastery of Linux command-line skills, shell scripting, text processing, process management, and server administration is critical for building, managing, and troubleshooting data pipelines effectively.As data infrastructure grows more complex with cloud, containers, and tools like Spark, Kafka, Airflow, and Kubernetes, strong Linux knowledge provides a significant competitive edge. It enables faster automation, better problem-solving, and higher efficiency.Key Takeaway: Investing in Linux fundamentals offers one of the best returns for any data engineer. The terminal is the primary language of data platforms — master it to unlock greater productivity and career growth.