DEV Community: Amo InvAnalysis

A Comprehensive Guide to Publishing and Embedding Power BI Reports on the Web with IFrames

Amo InvAnalysis — Sun, 05 Apr 2026 18:12:27 +0000

Introduction: The Power of Public Data

A screenshot of a bar chart tells people what was true when you took it. In a world

A live, embedded Power BI report tells them what is true right now, and lets them interact with the data themselves. That distinction matters and might be the only game changer or edge users need.

Web-embedded Power BI reports allow readers to filter, drill down, and explore data without leaving the page they are on.

For data analysts building public portfolios, web developers integrating business intelligence into client sites, and business owners who want customers or stakeholders to engage with real numbers in real time, this capability changes what a webpage can do.

What you will need before you start:

• A Power BI Pro or Premium Per User license (or Premium capacity for your workspace)
• A published report in the Power BI Service
• Basic familiarity with HTML

Phase 1: Preparing Your Report for the Web

Data sanitization comes first. The "Publish to Web" feature in Power BI creates a publicly accessible link. Anyone with that link can view your report, and search engines can index it. Before you generate any embed code, confirm that your report contains no personally identifiable information, confidential financials, or any data your organization has not cleared for public distribution.

Once you have confirmed the data is safe to publish, optimize the report canvas for web viewing.

Canvas configuration checklist:

• Set your page size to 16:9 for standard web layouts, or use a custom size if your target container has specific dimensions
• Configure the Mobile Layout under the View tab in Power BI Desktop so the report adapts on smaller screens
• Check that every visual loads correctly and all slicers and filters are set to the default state you want visitors to land on
• Remove any developer or test filters you left on during build

Phase 2: The Publishing Workflow

Open your report in Power BI Desktop. Select File > Publish > Publish to Power BI and choose your target workspace. Wait for the confirmation message, then open the Power BI Service at app.powerbi.com and navigate to that workspace.

Locate your report in the workspace list and open it. Then follow this path: File > Embed Report > Publish to Web (Public).

Note: Power BI offers two different embed options. "Embed Report" (without the "Public" qualifier) generates a secure embed for internal portals and requires viewers to authenticate. "Publish to Web (Public)" generates an embed code with no authentication requirement. Make sure you select the correct one for your use case.

Power BI will display a dialog with two outputs:

A direct URL you can share as a link
An HTML IFrame snippet ready to paste into a webpage

Copy the IFrame code. You will use it in the next phase.

Phase 3: The Anatomy of an IFrame

An IFrame (inline frame) is an HTML element that loads a separate webpage inside a defined rectangle on your page.

When someone visits your site, their browser fetches the Power BI report in the background and renders it inside that rectangle.

The code Power BI generates looks roughly like this:

html
<iframe
  title="My Report"
  width="800"
  height="600"
  src="https://app.powerbi.com/view?r=XXXX"
  frameborder="0"
  allowFullScreen="true">
</iframe>

Each attribute controls something specific:
• src points to the Power BI embed URL. Do not modify this value manually.
• width and height set the dimensions in pixels. Hard-coded pixel values break responsiveness, which is why you will replace these shortly.
• frameborder="0" removes the default browser border around the frame.
• allowFullScreen lets viewers expand the report to full screen.

DRY principle in practice: If you plan to embed multiple reports on one site, avoid copying the style or width attributes into each IFrame tag individually.

Instead, define a shared CSS class and apply it to all your IFrame elements. This way, changing the layout of all your embeds requires editing one CSS rule, not ten IFrame tags.

Phase 4: Implementing the Embed

Vanilla HTML/CSS: Paste the IFrame code directly into your .html file. To make the embed responsive, wrap it in a container div and use the aspect-ratio CSS property:

html
<div class="report-container">
  <iframe
    title="My Report"
    src="https://app.powerbi.com/view?r=XXXX"
    frameborder="0"
    allowFullScreen="true">
  </iframe>
</div>
css
.report-container {
  position: relative;
  width: 100%;
  aspect-ratio: 16 / 9;
}

.report-container iframe {
  position: absolute;
  top: 0;
  left: 0;
  width: 100%;
  height: 100%;
}

Pro Tip: The aspect-ratio property is supported in all modern browsers and eliminates the need for the older "padding-top hack." Use it.

WordPress / CMS: In the block editor, add a "Custom HTML" block and paste the wrapper div and IFrame code into it. Avoid using the default embed block for Power BI URLs since it does not handle the iframe attributes correctly.

React or Vue: Wrap the IFrame in a component and pass the src as a prop. This lets you reuse the same component for multiple reports by swapping the URL.

Advanced Customization: URL Parameters

The Power BI embed URL accepts query parameters that let you control what the viewer sees on load.

Filter by field value: Append a filter string to the URL to pre-filter the report to a specific value:
?filter=Store/City eq 'Nairobi'

The syntax follows the format: TableName/FieldName eq 'Value'. String values go in single quotes. Numbers do not.

Clean up the interface: Two parameters reduce UI clutter for a polished embed:
• &navContentPaneEnabled=false hides the left navigation pane
• &filterPaneEnabled=false hides the filter pane on the right

Combine them at the end of your src URL:
https://app.powerbi.com/view?r=XXXX&navContentPaneEnabled=false&filterPaneEnabled=false

Troubleshooting and Common Pitfalls

The grey box issue: If your IFrame renders a grey or blank box instead of the report, check three things.

First, confirm the embed link is still active in the Power BI Service under Admin Portal > Embed Codes.

Second, confirm the workspace has not been moved or renamed.

Third, check whether a network or browser policy is blocking IFrame content from external domains.

Data refresh latency: The embed does not update the moment your dataset refreshes. There is typically a lag of up to one hour between when your data refreshes in the Power BI Service and when the embedded report reflects those changes. Plan your refresh schedules accordingly if your audience expects near-real-time data.

License expiry: If your Pro trial expires, your published reports stop loading for anyone who visits the page. The embed link breaks silently. Set a calendar reminder before your trial ends so you can upgrade or migrate the report.

Security and Ethical Considerations

"Public" in "Publish to Web (Public)" means genuinely public. The report URL can be discovered through search engines, shared by anyone who has it, and viewed without authentication.

Treat the decision to publish a report this way the same way you would treat publishing any public document.

For internal dashboards, client-facing portals, or reports that contain any sensitive data, use Power BI Embedded via the REST API instead.

This method requires viewers to authenticate, supports row-level security, and gives you control over who sees what. It requires more setup and an Azure subscription, but for non-public data, it is the correct path.

Note: You can check and revoke all active public embed codes from the Power BI Admin Portal under Embed Codes. Review this list periodically.

Conclusion

The workflow from Power BI Desktop to a live web embed is straightforward once you understand each phase. You prepare the report, publish it to the Power BI Service, generate the public embed code, and paste it into your HTML with a responsive wrapper.

URL parameters give you control over filtering and layout without requiring additional code.

For anyone building a public data portfolio, this approach is worth learning. A well-designed embedded report does more to demonstrate your skills than any static image or PDF export. Start with public datasets, embed them on a personal site or GitHub Pages, and build from there.

Resources for further reading:

Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained

Amo InvAnalysis — Sun, 29 Mar 2026 18:50:45 +0000

According to Microsoft, data modeling in Power BI refers to the process of connecting data from various sources, defining their relationships with each other and structuring it all for efficient analysis and visualization.

You might have read all this and gone, so what? Why data modeling in Power BI? Imagine that you have a massive box full of many random receipts from your purchases. If someone asked you to determine how much you spent on fuel, or say fruits on a certain month, how would you go about it?

If you were to manually sift through it all, not only would it be tedious, but also there is a high chance your answer may not be 100 percent accurate. However, if you happened to have a system of organizing those receipts into folders for each week or month, and a way to link it all together, you would probably find a more accurate answer easily.

This is all what data modeling in Power BI is, or at least a simplified version of it. You can think of data modeling as the act of connecting the receipts or raw data from various sources into folders, connecting them or in other words, defining their relationships; with the end goal of structuring it all for purposes of analysis and visualization.

By the end of this article, you will understand data modeling in Power BI. And to keep things light, you can expect this article to take you through joins, relationships and schemas in Power BI with lots of analogies relating to day-to-day stuff you encounter. Don't let the technical jargon scare you, dive in!

Foundations First: Fact vs. Dimension Tables

Before diving into joins and relationships, you need to understand the two fundamental table types in data modeling.
Fact Tables (Verbs) contain the measurements or metrics of your business—the things that happen.

Think of them as the action tables. Sales transactions, website clicks, insurance claims, or M-Pesa transactions are all fact tables. They answer questions like "how much?" or "how many?" These tables typically contain numeric values you want to aggregate: revenue, quantity sold, claim amounts, or transaction counts.

Dimension Tables (Nouns) contain the descriptive attributes that give context to your facts. These are the "who, what, where, when" tables. Customer information, product catalogs, date calendars, or regional data are dimension tables. They provide the categories you use to slice and analyze your fact tables by product category, by customer location, by month, or by sales region.

The relationship between these is straightforward: dimension tables describe the context, fact tables record what happened in that context. A sales fact table records transactions; dimension tables tell you which customer made the purchase, what product they bought, and when it occurred.

SQL Joins in Power Query

Going by the earlier definition of data modeling in Power BI, joins, also called merging is a critical component in bringing together all pieces of the puzzle. In this case, stitching together two separate tables or data sources to come up with a unified table based on a related column between them.

The Six Types of Joins

There are different types of joins in SQL and Power Query. Here's a quick rundown using a relatable example.
Assume you're a teacher preparing for a school Math competition.

As part of this competition, you have a list of all students at the school (List A) and a list of students taking part in the competition (List B). Here's how the various joins work:

Inner Join (Matching Pairs Only) ** returns only students who appear on both lists, i.e., students who are both enrolled at the school AND competing. If a student is enrolled but not competing, they're excluded. If someone is on the competitor list but not enrolled (perhaps from another school), they're also excluded. This is the most restrictive join.

Left Outer Join (The Default Join) ** is more generous. It retains all names from List A (all students), and where there's a match, brings in competition data from List B. Students not competing show up with blank/null values in the competition columns. This is Power Query's default merge type because it preserves all records from your primary table.

Right Outer Join ** is the complete opposite of the left outer join. It retains all names on the competitor list, and in case of a match, brings the student name from List A. The rest result in nulls if there isn't a match. This join is less common in Power BI workflows.

Full Outer Join ** is for when you want everyone on both lists in your results, regardless of whether they match or not. This is perfect for creating some sort of master directory. Every student appears, every competitor appears, with nulls filling in where data doesn't exist on one side.

Left Anti Join ** shows who doesn't match. In our example, a left anti join would show non-competitor students and would help in identifying potential students to recruit for the competition. This is useful for finding gaps or customers who haven't made purchases, products without sales, or claims without follow-up.

Right Anti Join ** shows competitors who are not students. This will be helpful in identifying non-students trying to compete essentially flagging data quality issues or exceptions that need investigation.

Power BI Relationships (The Model View)

While joins in Power Query physically combine tables into one, relationships in Power BI's Model View keep tables separate but connected. This is more efficient for large datasets and more flexible for analysis.

When you load data into Power BI, you can switch to Model View to see all your tables and the lines connecting them. These lines are relationships, and they tell Power BI how tables relate to each other without duplicating data.

Cardinality: "Who Connects to Whom"

Cardinality defines how many records in one table can relate to records in another table. Understanding this is critical for accurate analysis.

One-to-Many (1:M) ** is the most common relationship type. One record in the dimension table relates to many records in the fact table. For example, one customer can have many transactions, one product can appear in many sales records, or one county can contain many customers. This is the backbone of star schema design and should be your default relationship type.

Many-to-Many (M:M) ** occurs when multiple records in one table can relate to multiple records in another. For instance, students and classes; one student takes many classes, and one class has many students. While Power BI supports this relationship type, it can create performance issues and ambiguous filter paths. Use sparingly and only when necessary.

One-to-One (1:1) ** means one record in each table matches exactly one record in the other. This is rare and usually indicates that your tables should be combined. An example might be employee basic information in one table and employee salary information in another (separated for security reasons). While valid in specific scenarios, question whether you actually need two tables.

Cross-Filter Direction

This determines how filters flow between related tables.

Single Direction is the default and recommended setting. Filters flow from the "one" side to the "many" side. When you select a customer, it filters their transactions. When you select a product, it filters sales of that product. This is predictable, performs well, and avoids circular dependencies.

Both Directions (Bi-directional filter) allows filters to flow both ways. Select a transaction, and it filters back to show only relevant customers. While this sounds convenient, it can create ambiguous filter paths, performance issues, and unexpected results. Microsoft recommends avoiding bi-directional filtering except in specific many-to-many scenarios. If you think you need it, reconsider your data model first.

Active vs. Inactive Relationships

Power BI allows only one active relationship between any two tables. All other relationships become inactive, shown as dashed lines in Model View.

Active relationships filter automatically. Inactive relationships require explicit activation in DAX using the USERELATIONSHIP function.

This is useful when you have multiple date fields; order date, ship date, delivery date, all relating to the same calendar table. One relationship is active by default; you activate others in specific measures when needed.

Schemas: Designing the Architecture

How you arrange your fact and dimension tables matters. The schema you choose affects query performance, model complexity, and how easily users can understand your data.

The Star Schema (Highly Recommended/Standard)
The star schema is the gold standard for Power BI data modeling. One central fact table connects directly to multiple dimension tables, forming a star pattern when viewed in Model View.

For example, a sales fact table sits in the center, connected directly to Customer, Product, Date, and Store dimension tables. Each dimension connects only to the fact table, not to each other. This design is simple, performs exceptionally well, and is easy for users to understand. Microsoft explicitly recommends star schema for Power BI implementations.

The benefits are significant: fast query performance because Power BI's engine is optimized for this pattern, easy DAX calculations since relationships are straightforward, and simple troubleshooting when things go wrong.

The Snowflake Schema

The snowflake schema extends the star by normalizing dimension tables. Instead of one Product dimension, you might have Product, Category, and Subcategory tables linked together.

While this reduces data redundancy and looks elegant from a database design perspective, it creates problems in Power BI. Additional table joins slow query performance. More complex relationship paths make DAX harder to write and maintain. Users struggle to understand multi-hop relationships.

Unless you're working with extremely large dimension tables where normalization significantly reduces data size, avoid snowflake schemas in Power BI. The performance cost outweighs the storage savings.

The Flat Table (Denormalized Large Aggregated Table)

A flat table combines everything into one massive table all facts and dimensions merged together. Every row contains complete information: transaction amount, customer name, product details, date information, everything.

This approach seems simple at first. No relationships to manage, no joins to configure. However, it creates serious problems: massive data redundancy (customer names repeated millions of times), poor performance as table size explodes, difficult maintenance when dimension attributes change, and inefficient memory usage.

Flat tables have their place, small datasets for quick analysis or pre-aggregated summary tables. But for proper data modeling in Power BI, stick with star schema.

Getting Your Hands Dirty: Creating All These in Power BI

Theory is important, but implementation is where understanding solidifies. Here's how to go about it all:

How to Join (Merge) in Power Query
Open Power BI Desktop and load your data sources. Click "Transform Data" to open Power Query Editor.

Select the first table you want to merge, then click "Merge Queries" in the Home ribbon.

Choose the second table from the dropdown. Select the matching columns in both tables—these are your join keys.

Choose your join type from the options provided (Left Outer is default). Click OK.

A new column appears with "Table" values. Click the expand icon next to the column header, select which columns to bring from the second table, and click OK. The tables are now merged.

Click "Close & Apply" to load the result into Power BI.

How to Create Relationships in Model View

Switch to Model View using the icon on the left sidebar. You'll see all loaded tables displayed as boxes.

To create a relationship, click and drag a field from one table to the matching field in another table. Power BI automatically detects the relationship type and cardinality based on the data.

A line appears connecting the tables. Click the line to see relationship properties: cardinality (1:M, M:M, 1:1), cross-filter direction (single or both), and active/inactive status.

Using "Manage Relationships"

For more control, click "Manage Relationships" in the Home ribbon. This opens a dialog showing all relationships in your model.

Click "New" to create a relationship manually. Select the two tables and the columns that should relate them.

Choose cardinality and filter direction. Check "Make this relationship active" if needed. Click OK.

From this dialog, you can also edit existing relationships, delete relationships, or toggle them between active and inactive.

This is particularly useful when you have multiple relationships between the same tables and need to control which one is active.

Final Thoughts

Data modeling isn't just a technical exercise, it's the foundation of every insight your Power BI reports will generate.

A well-designed model makes DAX calculations straightforward, query performance fast, and reports easy to maintain. A poorly designed model creates endless headaches: slow refreshes, incorrect results, and DAX measures that take hours to write.

Start with star schema unless you have compelling reasons to deviate. Use one-to-many relationships with single-direction filtering as your default. Merge tables in Power Query when you need to physically combine data, but use relationships in Model View whenever possible to keep your model flexible and performant.

The receipt box analogy holds true: organization upfront saves massive time later. Invest effort in proper data modeling before building a single visual. Your future self and everyone who uses your reports will thank you for it.

References
[1] Microsoft Power BI Documentation - Data Modeling - https://docs.microsoft.com/en-us/power-bi/transform-model/desktop-modeling-view
[2] Microsoft Power BI - Bi-directional Relationships Guidance - https://docs.microsoft.com/en-us/power-bi/guidance/relationships-bidirectional-filtering
[3] Microsoft Power BI - Star Schema Guidance - https://docs.microsoft.com/en-us/power-bi/guidance/star-schema

Linux Fundamentals for Data Engineering

Amo InvAnalysis — Tue, 10 Feb 2026 04:45:38 +0000

The average assumption is that with Python and SQL mastered, you have it all figured out as a data engineer. While this might hold true, it isn’t the complete truth. True mastery of the terminal is what truly sets you apart as a competent data engineer.

Why you might ask? It’s simple. Most if not all of the infrastructure in the data engineering ecosystem runs on Linux, which again is heavy on terminal usage. As a data engineer, you’re bound to face a case of a failed pipeline in a Kubernetes pod or something similar.

In such circumstances, you will not be able to get a graphical user interface (GUI) tool. You’ll have to get your hands dirty, and figure out what went wrong, and even apply the fix via terminal.

Data engineers don’t always have the luxury of swanky GUI tools like say, data analysts or business intelligence analysts. Few, if any tools they get to use have a GUI component, so learning to use the terminal, and by extension terminal based editors like Vi and Nano is non-negotiable.

Dig in and learn basic navigation, essential data manipulation in the terminal and how to edit files on the server using Vi and Nano.

Why Data Engineers Live in the Terminal

GUIs, as easy, intuitive or even convenient as they may seem are just not practical, slow and even not available for data engineering. However, this is not necessarily a bad thing. The terminal is just as good or even better from a data engineer’s perspective.

You can do more and faster with terminal, and here are three reasons why terminal is, or is about to be your staple as a data engineer:

I. Cloud Dominance

As a discipline data engineering involves systems and infrastructure that collect, store and process massive amounts of raw data with the aim of transforming it to usable formats. This naturally calls for significant investments into infrastructure, which while possible in certain circumstances, is certainly not feasible in all cases.

This is where Amazon Web Services (AWS) and Google Cloud Platform (GCP) come in play. Data engineers and organizations can “rent” these massive systems and infrastructure at a fraction of the actual cost and get to collect, store and process their data.

The upside of this is that organizations and data engineers get to reduce operational overhead to do what needs to be done, while also transferring the ever-persistent security risk to the cloud vendors. Not to mention infrastructure de-risking where organizations and engineers shift the risk and responsibility of building, maintaining and scaling necessary systems to a more capable provider.

The downside to this? Not so much, if terminal is second nature to you. So much if not.

To be able to provide such a critical service, cloud providers run most of their infrastructure on Linux, which is a conveniently more resource efficient operating system when it comes to idle RAM usage and even background processes that eat into precious CPU power.

The price for this is more terminal usage, and less to non GUI. Simply put, terminal is a primary skill for a data engineer and you just have to prime your skills to use the terminal better as it’ll make you a better and effective data engineer.

II. Big Data Tools

Data tools like Microsoft Excel or even Google Sheets are impressive for simple data analysis tasks, or even for extracting and mild processing work. However, to a data engineer, they’re vastly under equipped for the job.

To put it into perspective, picture this. Microsoft Excel would be akin to a spade or shovel that is only appropriate for shoveling a small pile of dust whereas data engineering involves breaking down and processing a whole mountain. Hence, the need for something better equipped for the task like Spark, Kafka and Hadoop just to name a few.

Most of these tools are Linux native meaning they are either built on top of, or run best on Linux, which again involves a lot of terminal usage. But it’s also more than just the fact that most big data tools are Linux native.

When the main objective is to process whole “mountains” of data, there is literally no RAM or CPU power to spare for fancy GUI icons and animations.

III. The Automation Factor

Automation is the name of the game in data engineering. Imagine this for a second. Your job description involves getting a factory’s throughput data each day at exactly 2.30 AM, as it’s the only time that operational costs are low for this factory to work. After extracting this data you’re supposed to clean it, and store it in a specified data lake, six days a week.

The smart thing to do in this case would be to automate everything, but you also have to ensure everything runs correctly, consistently and reliably. Now you can’t risk messing up the workflow as it’ll compromise your job by relying on something not built for automation. This again brings you back to Linux; the industry standard especially when it comes to automation.

With Linux, such a job would be as easy as writing a simple bash script and setting up a Cron job so that Linux can run the workflow precisely at 2.30 AM, six days a week without you lifting a finger.

Despite the fearmongering around terminal, it is actually an amazing, convenient tool once you get the hang of it. The more you keep at it, the better you’ll get working your way around the terminal.

Basic Linux Commands – Essential Navigation & File Management

Now time to slay the beast. Here are a few basic Linux terminal commands to help you navigate around the terminal and even manage your files.

Finding Your Bearings

Can’t seem to know where you are once in the terminal? Type in and run these commands:

pwd

The Print Working Directory (pwd) command tells you what folder or directory you’re in, or as the definition says, the directory you’re working from.

Now you know what folder or directory you’re working from but can’t tell what else lurks within. Run this command to find out:

ls

The list (ls) command kind of shines a spotlight on what else is in the folder or directory you’re working on. However, this isn’t very descriptive and you may need some more details in which case you can run the following command:

ls -lh

This combination now outputs extra details about the files and folders in the directory or folder you’re asking about and you also get easy to understand file size specifications so instead of say 52428800 Bytes you get an output of say 50MB which is much easier to make sense of.

Navigating Around

Moving around is the easy part. Navigating in the terminal comes down to this simple command

cd

Change directory (cd) command helps you move up or down the terminal folder/ directory structure. You can:
cd foldery.name

This moves you to “foldery.name” which now becomes your current working directory. If you need to move back to the previous folder or in Linux terms, move up a directory, you can run the following command:

cd ..

And that’s it for navigating around the terminal.

Viewing Files Safely

Viewing files is another easy part, but with a major caveat. Done the wrong way you can easily freeze your terminal and you don’t want that. The thing is, there are many commands that do the same thing, but when used in the wrong context may cause you to run into problems.

A fine example of this would be running such a command to view files:

cat

Although it’s commonly used to quickly view files and even to edit files directly on the terminal, when used on a sufficiently large file, it can easily overwhelm your system as the command explicitly directs the terminal to present all contents in your file on the terminal which can cause it to freeze.

Here's how to view files safely on terminal:

Run the following command with “filename.md” being the name of the file you want to view

less filename.md

This command opens the file in a safe book reader format that allows you to easily scroll through a file’s contents without crashing your terminal session or server.
Alternatively, you can also safely view files on terminal using these two commands:

head filename.md

and

tail filename.md

These two allow you to view the first and the last 10 lines of a given file respectively.

Creating and Deleting Folders/Directories

Creating and deleting folders or directories on terminal is similarly simple. Simply run the following commands to create or delete a directory.

mkdir

The above command creates a folder;

rmdir

Whereas rmdir removes or deletes a folder/ directory.

Nano vs. Vi/Vim Plus Practical Usage

Aside from knowing basic Linux commands, at some point you’ll have to edit files and while you can always use any editor, it’s best practice to settle on one and master how to use it effectively. So which should you go for? Nano or Vi/Vim.

The answer depends on your personal preferences and even the task at hand. Here’s a quick rundown of the differences between the editors, that should hopefully inform your pick.

First off, Nano is more of simple looking editor, like what you would find on Windows operating system as Notepad. Vi/ Vim on the other hand is anything but simple, at first.

However, the tradeoff for an easy to use, simple looking editor is less utility on the part of Nano. Vi/Vim on the other hand, despite the steep learning curve is quite more powerful than Nano.

When it comes down to it, Nano is perfect for simple, regular editing. With time and effort, Vi/Vim can be just as simple. Unfortunately, Nano isn’t very common on most servers compared to Vi/Vim which is the default editor.

Both are capable editors, but if your aim were to be able to work on as many servers as possible, mastering Vi/Vim would be the better choice.

Now let’s move onto the practical side of things. Here’s how to edit files using Nano and Vi/Vim:

Nano

First off, navigate to the directory you’ll want to be working from then run the following command with “myfirstedit.md” being the filename:

nano myfirstedit.md

This will open the file using Nano and as you can see, it’s just like any other text editor you’ve worked with before. You can start typing away and relevant commands for navigating and using the editor being right at the bottom.

After you’ve edited the file to your satisfaction, hit Ctrl + O simultaneously to save your changes to the file, then name your file if you hadn’t done so.

(You may notice all the commands at the bottom of the terminal have this symbol “^” which represents the “Ctrl” button)

Lastly, hit the Ctrl + X keys simultaneously to close the editor.

Vi/Vim

After navigating to the appropriate directory, run the following command on terminal

vi myFirstEdit.md

This opens your file using Vi editor. As you can see the editor is stripped bare with zero clues on how to use the editor. You also can’t edit the file immediately until you enter the “insert mode” which you can do so by hitting i (letter “i”) on the terminal.

Once in insert mode, type away and once you’re done, presxs the “esc” button to exit “Insert” mode, then hit “Shift + :” followed by “wq” to save your changes then hit “Enter.”

And that’s how easy it is to use Nano and Vi/Vim. The more you do it, the easier it becomes with time.

Wrapping Up

The terminal isn't your enemy, it's your advantage.

As a data engineer, the Linux terminal is where the real work happens. Python and SQL open doors, but terminal fluency keeps you effective when pipelines fail at 3 AM, when you're troubleshooting a Kubernetes pod, or when you're automating workflows that process terabytes of data daily.

GUI tools won't save you in production environments. Cloud infrastructure runs on Linux. Big data tools like Spark and Hadoop are Linux-native. Automation workflows depend on bash scripts and Cron jobs. The sooner you learn the terminal, the faster you'll move from competent to indispensable.

Start small. Practice basic navigation with pwd, ls, and cd until they become second nature. Get comfortable viewing files with less instead of risking a terminal freeze with cat. Pick your editor—Nano for simplicity, Vi/Vim for power and ubiquity—and commit to mastering it. These aren't just commands; they're the building blocks of everything you'll do as a data engineer.

The learning curve feels steep at first. Every data engineer you admire went through this same process. They struggled with Vi's modes, forgot to add -lh to their ls commands, and accidentally froze their terminals with massive files. The difference between them and beginners isn't talent, it's repetition.

The more you work in the terminal, the more natural it becomes. What feels awkward today will be muscle memory in weeks. Commands that seem cryptic now will become your preferred way of working because they're faster, more powerful, and more reliable than any GUI alternative.

Your job as a data engineer is to build systems that turn raw data into business value. The terminal is your workshop. Learn to use it well, and you'll build better systems, solve problems faster, and establish yourself as someone who can handle whatever production throws at you.

Get comfortable in the terminal. Your future self will thank you.

How Analysts Translate Messy Data, Dax, and Dashboards Into Action Using Power BI

Amo InvAnalysis — Mon, 09 Feb 2026 17:00:37 +0000

Data is everywhere. According to IBM, 90% of the world's data was created in just the last two years. Yet despite this explosion of information, most organizations struggle with a fundamental challenge: their data is messy, scattered across multiple systems, and nearly impossible to transform into actionable insights.

For business analysts working in insurance, banking, retail, and telecom across Kenya and globally, this problem is particularly acute. You receive spreadsheets with inconsistent formats, databases with duplicate entries, and reports with conflicting numbers yet leadership expects clear answers by Friday. The gap between having data and making data-driven decisions has never been wider.

Enter Power BI. Microsoft's business intelligence platform has become the bridge between chaos and clarity for analysts worldwide. With over 30 million monthly active users as of 2025, Power BI's ecosystem creates a repeatable workflow that transforms raw data into executive-ready insights.

The Reality of Messy Data

Before diving into solutions, let's acknowledge the elephant in the room: most organizational data is a disaster.

Messy data takes many forms. Customer names appear as “John Mwangi,” “JOHN MWANGI,” and “Mwangi, John” in the same database. Date fields contain both “12/31/2024” and “31-Dec-24” formats. Excel files arrive with merged cells, hidden rows, and calculations embedded in ways that break when you try to refresh them.

M-Pesa transaction data lives in one system, inventory in another, and customer information in a third with no clear way to connect them.

The cost of ignoring this cleanup phase is quite significant. A 2023 Gartner report found that poor data quality costs organizations an average of $12.9 million annually. But more importantly, dashboards built on messy data create misleading insights.

Traditional approaches involve manually cleaning data in Excel every single time it refreshes. This is where Power BI's methodology differs fundamentally: you build the cleanup process once, and then apply it automatically forever.

In this article, we’ll view how to translate messy data, Dax and Dashboards into action using Power BI in four major phases:

Phase 1: Taming Chaos with Power Query

Power Query is Power BI's transformation engine, and it's where the magic begins.

The process starts with connecting to your data sources and the amazing thing about Power BI is it supports over 100 connectors: Excel files, SQL databases, SharePoint lists, cloud platforms like Azure and AWS, web APIs, just to name a few.

However, here is a critical insight experienced analysts understand: you should almost never load data directly into Power BI. Always transform first.

When you click “Transform Data” instead of “Load,” Power Query Editor opens a dedicated environment for reshaping your data. Every transformation you make is recorded as a step, creating an automated workflow that executes each time your data refreshes.

Essential transformations include removing duplicates and handling blanks using the “Remove Duplicates” function. Fixing data types is fundamental, a text column showing “1000” won't sum properly until you change its type to “Whole Number.”

Splitting and merging columns handles real-world messiness, while pivoting and unpivoting restructures crosstab layouts into proper tabular format.

Let's walk through a practical example. An insurance analyst at a Kenyan insurer receives monthly claims data with these issues: dates in mixed formats, duplicate claim IDs, and claim amounts in both KES and USD.

The transformation workflow looks like this:

Connect to the Excel file and open Power Query Editor
Select the Date column and click “Change Type” → “Date” to standardize all formats
Select the Claim_ID column and click “Remove Duplicates”
Add a conditional column: If Currency = “USD”, then Amount * 130 (assuming current exchange rate), else Amount
Name this new column “Amount_KES” and remove the original Amount and Currency columns
Click “Close & Apply”

The result of this? Clean, analysis-ready data. When next month's file arrives with the same messy structure, you simply refresh and Power Query will apply every transformation automatically.

Phase 2: Creating Intelligence with DAX

Clean data is necessary but not sufficient. Data Analysis Expressions or DAX in short is Power BI's formula language, and it's what transforms clean data into business intelligence.

Contrary to what it may seem at first, DAX isn't Excel formulas, though the syntax looks similar. The fundamental difference is context. In Excel, a formula in cell B2 references specific cells whereas in DAX, calculations operate on entire columns and respond dynamically to filters and slicers.

In practice, analysts use three types of DAX calculations as shown below.
i. Calculated columns are added to tables and computed row-by-row. Example: Profit = Sales[Revenue] - Sales[Cost].
ii. Measures are dynamic calculations that respond to how users interact with your dashboard. Example: Total Revenue = SUM(Sales[Amount]).
iii. Calculated tables create entirely new tables from expressions, like creating a date table: Calendar = CALENDAR(DATE(2020,1,1), DATE(2025,12,31)).

In most cases, analysts tend to include time intelligence calculations. Understanding year-over-year performance is critical for business context. Here are some examples:
i. Sales YTD = TOTALYTD(SUM(Sales[Amount]), Calendar[Date])
ii. Sales vs Last Year = [Total Sales] - CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Calendar[Date]))

The CALCULATE function is DAX's Swiss Army knife for filtering with context:
i. High Value Sales = CALCULATE([Total Sales], Sales[Amount] > 100000)
ii. Flagged Claims = CALCULATE([Claim Count], Claims[Status] = “Under Review”)

According to Microsoft's documentation, mastering 15-20 core functions covers 80% of business analysis scenarios. Start with SUM, AVERAGE, COUNT, IF, CALCULATE, and time intelligence functions like TOTALYTD then add more to your stack with time.

Phase 3: Designing Dashboards That Drive Decisions

A dashboard with accurate calculations but poor design won't drive action. Having the necessary inferences is half the part, getting the target audience to understand and act is another.

Research from the Nielsen Norman Group shows that users scan digital content in an F-pattern, focusing on the top and left side of screens. This means your most important metrics belong in the top-left corner.

Choosing the right visual is critical. KPI cards work for single important numbers like Total Revenue or Customer Count. Bar and column charts excel at comparing categories like say, Sales by County or Product Performance.

Line charts on the other hand excel at showing trends over time. It goes without saying that you should generally avoid pie charts when presenting more than three categories as they become unreadable.

Consider a fraud-monitoring dashboard for an insurance company. Given that insurance fraud costs the American economy $80 billion annually, and similar proportional losses occur in Kenya's growing insurance sector, detection dashboards are mission-critical.

The dashboard structure might include a top section with key metrics at a glance. So, Total Claims Submitted, Claims Flagged for Review, Fraud Detection Rate. The middle section would show trend analysis: flagged claims over time with month-over-month comparisons. The bottom section provides detailed breakdown by claim type, region, and adjuster. A sidebar contains interactive filters for date range, region, and claim amount thresholds.

Phase 4: The Final Translation (Actionable Insights)

This is where Power BI transcends traditional business intelligence tools. The gap between seeing data and taking action is where most dashboards fail, and Power BI's features are designed specifically to close this gap.

Here are a few of the features that help do just that:
Power BI Goals (Metrics) are scorecards that track KPIs against targets over time. Define the metric, set your target, assign owners. For instance, tracking fraud detection rates against quarterly targets creates accountability. Remember, making metrics visible drives action.

Smart Narratives use AI to automatically generate plain-English summaries of your data. The system analyzes your visuals and identifies key insights, trends, and outliers. Example: “High-value claims increased 23% in Nairobi region, primarily driven by property damage claims.” These narratives update automatically with data refreshes, perfect for executive summaries.

Alerts turn data into triggers. Set up data-driven alerts in Power BI Service to notify you when fraud detection rates drop below 85%, when claims in a region exceed thresholds, or when inventory levels hit reorder points. Integration with email and mobile notifications shifts your operation from reactive to proactive, thereby helping catching problems before they escalate.

This represents movement up the analytics maturity curve. Most organizations are stuck at descriptive analytics (what happened?), but Power BI enables the leap to prescriptive analytics (what should we do?).

Microsoft Teams Integration embeds Power BI reports directly in Teams channels. Stakeholders discuss insights without leaving their collaboration hub. Real-time notifications to Teams when alerts trigger mean a fraud investigation team gets instant notification when suspicious claims are flagged.

Power Automate creates automated workflows triggered by Power BI data. With over 400 connectors available, the possibilities are extensive. Example workflow: When fraud score exceeds threshold, Create case in CRM, Assign to investigator, then Send notification. Or this other scenario, where: When inventory drops below reorder point, Generate purchase order, then Email supplier.

Consider this real-world scenario from a Nairobi insurance company that would implement a similar solution: Dashboard flags claim with high fraud probability. Alert triggers automatically. Power Automate creates investigation case in their case management system. Teams notification sent to fraud investigation team. Investigator receives email (and additional or secondary notification) with claim details and dashboard link. All of this happens in under 60 seconds from claim submission.

In such a case, data doesn't just inform decision, it initiates actions automatically when it matters most, rather than react to matters long after they’ve happened.

The Analyst as Translator

The best Power BI analysts aren't just technical experts, they're translators. They translate between the IT department's capabilities and the C-Suite's questions. Transforming raw data into the language of business value. They translate insights into actions that move the organization forward. This is a strategic role, not just a technical one.

Done the right way, the formula is straightforward: Clean Data + Smart DAX + Intentional Design = ROI.

Clean data through Power Query eliminates the manual cleanup treadmill. Build your transformations once, benefit forever. This alone saves analysts 60-80% of their time. Smart DAX reveals what matters. Not every dashboard needs complex calculations, but knowing when and how to use them separates good analysts from great ones. Intentional design means dashboards built around specific business questions, with clear hierarchies and appropriate interactivity.

The result? Organizations make faster decisions, spot opportunities earlier, allocate resources more effectively. That's measurable ROI.

Power BI Desktop is free to download from Microsoft. Start with one messy dataset you work with regularly. Apply the four phases: clean with Power Query, calculate with DAX, design for clarity, enable action with Goals and Alerts.

In a data-saturated world, competitive advantage belongs to organizations that act on insights faster than competitors. The messy spreadsheets on your desk right now contain answers to business questions that matter.

Git – What is it and Why is Version Control Important – A Comprehensive Guide

Amo InvAnalysis — Sat, 24 Jan 2026 15:19:49 +0000

A Brief Overview on Git and GitHub
At some point, you may have come across or heard of Git. Everyone mentions it, and it seems to be a staple for software engineers, data engineers you name it. However, to understand what Git is, consider the following.
Any piece of code is bound to run into issues at some point, and in the midst of you trying to debug the underlying issue(s) you may introduce a breaking change into your code. What do I mean by this? Have you ever tried to fix your code but it only ended up worse? That is but one case. If it were a project in production, the consequences could be far worse.
Now consider this other case. You are part of a large team working on a single codebase. With everyone working on their part, locally, or put simply in their own isolated personal computers (PCs), some conflicts are bound to occur. One person or more may introduce breaking changes, or even overwrite what someone else has already done.
The answer to this, and more other issues, Git.
Think of Git as this open-source distributed version control system. With Git, a large team can easily collaborate on a project, and should there be any conflicts, Git can help pinpoint where exactly the conflict originated. In addition to this, with Git, each individual can access a copy of the entire codebase, work on their part, and upon completion or when necessary, merge or rather consolidate with the rest of the team’s work.
That is just a brief overview of what Git is. It helps with tracking changes in code no matter how insignificant, tracking who made what changes to the code and enabling collaboration in a codebase.
In addition to Git, you may also have heard of GitHub. So what is the difference? First off, Git is an open-source distributed version control software, and it is usually in the form of a command line based tool. Furthermore, Linus Torvalds developed it in 2005 as an answer to the numerous issues he was running into with the version control system used at the time when working on the operating system Linux.
GitHub on the other hand is a server based or web-based version of Git, with a better, more user-friendly graphic user interface. GitHub is a service owned and maintained by Microsoft, whereas Linux owns and maintains Git.
Another major difference between the two is that Git is free to use while GitHub is paid service. Users get to use GitHub at no cost for certain tiers, but advanced features or extra usage like with larger teams come at a cost.
There are many other web-based versions of Git like BitBucket, but GitHub is the most commonly used.
How to Push Code to GitHub
So how do you push code to GitHub? It sounds or seems hard, but it is easier than that really. Here are three major ways to push code to GitHub:

Using terminal or a command line interface (CLI) like Git
Via a graphical user interface (GUI) like GitHub desktop.
Using the web-based GitHub For brevity’s sake, this article discusses how to push code to GitHub using terminal or a CLI tool like Git. Using Terminal or a CLI like Git Before you proceed, ensure you have met the following prerequisites: I. Git is installed on personal computer (PC) (Mac and Linux users have this installed by default, but if not, or if you’re a windows user you can quickly install by visiting https://git-scm.com/) II. You have a project on your PC (You can create something simple for this like an HTML file, nothing too overboard) III. Have a GitHub account (You can create this for free if you don’t have one at GitHub.com) That said, here is how to push code to GitHub using Git or terminal. Step One: Create a Repository on GitHub Navigate to your GitHub.com account and log in if you have not done so already. Look for + / plus sign on the top right corner of your browser, click on it and select “New Repository”

Give your new repository any name you would like but in practice this would be something along the lines of what your project is actually about. Once you are done, click on “Create repository”

Next, copy the repository uniform resource locator (URL) as so. The link should end with .git and https:// is recommended.

Step Two: Launch Git and Navigate to Your Local Project Folder
Launch Git on your device and navigate to the relevant project directory. In this case you would use the cd or change directory command.

cd path/to/your/project

Check if you are in the correct directory by typing in this command. If you are still not there you can reference this handy resource to figure out how to navigate to the project directory.

pwd

Step Three: Stage Your Local Project
Once you are in the project folder or directory key in this command and hit enter.

git init

This is meant to tell Git, hey, this is a new local Git repository. Therefore, it can start taking snapshots of whatever you do in the folder.
Next, type this in and hit enter

git add .

This command stages or in simpler terms readies your files for movement from your local computer to GitHub. More specifically, the command readies all files in the current folder or directory you are working on. Git already knows you have a certain file in your working directory, but by using this command, you’re now telling Git to start actively take not of the file(s) you have specified.
In case you want to move an individual file you can use the following command with the “filename.txt” being a placeholder for the actual name of the file you wish to upload.

git add filename.txt

Finally, it’s time to commit what you’ve staged to your local device’s repository. Think of this as some sort of timestamp with a description. Type the following command and hit enter.

git commit -m "My first commit"

Step Four: Connect the Local to the Remote Repository, then Push to Remote
Now to the final bit.
Type the following into your terminal, and hit enter after each command.

git remote add origin https://github.com/USERNAME/myfirstrepo.git

Then,

git push -u origin main

The first command tells Git on your local computer where you want to push the code to, and the last one, well, it tells Git to now push the code to where you pointed out in the first command.
With the last command, you may have to complete authentication but this is usually a breeze. Once you’re done, you should be able to see the code from your local repository on your GitHub repository.
How to Pull Code from GitHub
First, navigate to your GitHub account and open the repository that you want to pull code from.
Next, look for a green “< > Code” button as shown below and click on it.

Copy the URL, either option would do but HTTPS is much easier if you’re doing it for the first time.

Next, launch your terminal or Git and navigate to the directory or folder you’re working with for the project.
Lastly key in the following command, paste the URL you copied in the placeholder for the URL and hit enter. (Do not include the “ ” in your command)

git clone "pasted-url"

For subsequent pulling, you will only need to navigate to the directory you used above and run this command.

git pull

That is how easy it is to pull code from GitHub to your local computer.
How to Track Changes using Git
If there is one thing Git is exceptionally good at, it’s tracking changes. Tracking changes using Git typically follows the following cycle: Modify-add-commit.
That said; let us try tracking changes using Git practically.
In this instance, let’s assume you’re trying to create a simple car driving manual and we’ll track changes using Git.
First thing let’s create a directory named “driving_tutorial” by running the following command.

mkdir driving_tutorial

Next, navigate to the directory you have just created.

cd driving_tutorial/

Now create a file named manual_car.md, type the following text into it “Manual car”, and in the next line, add “Key Instructions.” This example uses Vi as the editor, but you can use Nano, or any other you prefer.

vi manual_car.md

Now time to track changes. First we’ll try to if you created the file correctly, then try to check the contents of the file and lastly, you’ll check if Git is taking note of any changes to files in your working directory.
So, start with “ls” which lists current files in the working directory
ls

Then, “cat” which reveals the contents of the subject file

cat manual_car.md

And lastly,

git status

The output from this last command tells us Git is not tracking the file. To resolve this, run this command, which now tells Git to start tracking changes. (“git add .” would have worked similarly, but since we know the file we need to track the command below would do)

git add manual_car.md

Once you run the command, now check the status again with git status
This what you should see.

Now that Git knows it has to track changes to files in your working directory, the next step would be record all this in a sort of timestamp referred to as a commit. Just like how you would save your video game’s progress at a certain checkpoint, or whenever you feel like it. (Remember, the commit message can be anything but ensure it is brief and adequately descriptive of what you’re trying to do with what you’re committing).
To do this, run the following command

git commit -m "Create a handy manual for new drivers"

Now check again on the status of your file with git status

If you get this, or something similarly, you’re well on track.
Now lastly, run the following command

git log

With that, you’ve managed to track changes with Git. However, in practice you won’t be limited to this. Remember, it’s a constant “modify-add-commit” cycle with Git, and you just have to keep doing it to ensure Git keeps track of everything you do.