Abdulganeey Abdulazeez

Posted on Apr 3

Getting Data from Different Sources in Power BI

#analytics #dataengineering #microsoft #tutorial

Introduction

In today's data-driven world, organizations rarely store all their information in a single system. Sales figures might live in Excel, transactional data in SQL Server, external reports arrive as PDFs, employee directories sit in SharePoint, and API responses come back as JSON. Making sense of all this scattered information requires a tool that can bridge every one of these silos, and that is precisely where Microsoft Power BI excels.

Power BI is Microsoft's industry-leading business intelligence platform that enables analysts, engineers, and decision-makers to connect to virtually any data source, transform raw data into clean, structured datasets, and then visualize those datasets in rich, interactive reports and dashboards. It is used by millions of professionals worldwide and integrates seamlessly into the Microsoft 365 ecosystem as well as dozens of third-party services.

This article focuses on one of the most fundamental, yet most important, capabilities of Power BI: data ingestion. We will walk through five of the most commonly encountered source types:

Excel workbooks and flat text files (CSV, TXT)
Relational SQL Server databases
Semi-structured JSON files
PDF documents containing tabular data
SharePoint Online folder libraries

For each source, we will cover: why it matters, how to connect, key transformation steps in Power Query Editor, practical examples, annotated screenshot placeholders, and best practices. By the end of this guide, you will have a clear mental model of Power BI's data connectivity layer and the confidence to pull data from any combination of these sources into a single coherent data model.

Architectural Overview

Before diving into individual connectors, it is important to understand how Power BI's data acquisition layer is structured. At a high level, every piece of data that appears in a Power BI report has passed through three distinct stages:

The Get Data dialog is your entry point into stage 1. Power BI Desktop ships with over 100 built-in connectors organized into categories such as File, Database, Power Platform, Azure, Online Services, and Other. When you choose a connector, Power BI hands control to Power Query Editor, a powerful ETL (Extract, Transform, Load) environment built on top of the M functional language. Every action you perform in Power Query (rename a column, filter rows, split a text field) is recorded as a step in a query recipe that can be audited, edited, or replayed at any time.

Understanding this pipeline is essential because it determines where problems appear: authentication errors manifest at stage 1, data quality issues surface at stage 2, and relationship or aggregation problems become visible at stage 3. This guide focuses primarily on stages 1 and 2.

Power BI allows you to connect to a wide range of data sources. Below are step-by-step guides for each major source.

1. Excel and Text/CSV Files

1.1 Why These Sources Matter

Excel is the world's most ubiquitous data tool. Finance teams, HR departments, operations managers, and field engineers all produce Excel workbooks daily. Similarly, CSV (Comma-Separated Values) files are the lingua franca of data exchange between systems; virtually every enterprise application can export to CSV. Power BI's flat-file connectors are therefore among the most frequently used in real-world projects.

1.2 Connecting to an Excel Workbook

Excel files can contain multiple sheets, named Tables, and named Ranges. Power BI can see all three. Named Tables are preferable because they carry metadata (column headers, data types inferred from Excel) and auto-expand when rows are added.

Step-by-step guide

Open Power BI Desktop and click Home > Get Data > Excel Workbook (or press Alt+H, GD, E).
Browse to your .xlsx or .xls file in the file picker dialog and click Open.
The Navigator dialog appears. The left pane lists all Sheets (Sheet1, Sheet2…), named Tables (e.g. SalesData), and named Ranges. Select one or more objects by checking their boxes.
Click Transform Data to open Power Query Editor (recommended), or Load to load directly into the model.
In Power Query Editor, verify column names and data types. Right-click a column header to change its type, or use the Home > Transform > Detect Data Type ribbon button.
Click Home > Close & Apply to commit the query and load data into the model.

Key Power Query transformations for Excel data

• Promote Headers: If the first row contains column names, use Home > Use First Row as Headers.
• Remove Top Rows: Skip metadata rows at the top that are not part of the dataset.
• Unpivot: If months or categories are spread across columns (wide format), select those columns and choose Transform > Unpivot Columns to convert to a tall format suitable for analysis.
• Data Type casting: Always explicitly set each column's data type rather than relying on Auto-detect to avoid silent coercions.

1.3 Connecting to a CSV or Text File

CSV files follow the same workflow with one important difference: Power BI must infer the delimiter and encoding, which can sometimes go wrong with non-standard files.

Step-by-step guide

Click Home > Get Data > Text/CSV.
Select the file. Power BI shows a preview with its best guess at the delimiter (comma, semicolon, tab, fixed-width, etc.) and encoding (UTF-8, Windows-1252, etc.).
If the preview looks incorrect, click the Delimiter dropdown and select Custom, then type the actual delimiter character. Adjust the File Origin (encoding) as needed.
Click Transform Data to proceed to Power Query Editor.
Power Query may insert a 'Changed Type' step automatically inspect this step carefully. If it incorrectly typed a column, delete the step and re-cast manually

Handling common CSV challenges

Tip: For Excel/CSV files that are updated daily, store them in a SharePoint document library or OneDrive for Business. Power BI can connect to those paths and refresh automatically without needing to update local file paths.

2. SQL Server Databases

2.1 Why SQL Server Is the Most Important Connector

For most medium-to-large enterprises, SQL Server (or Azure SQL Database, Azure Synapse Analytics, or SQL Server on AWS RDS) is the canonical store of record for transactional and operational data. Connecting Power BI directly to SQL Server unlocks the full power of the relational model. You can join tables, push query logic down to the server (query folding), and work with datasets far too large to fit in memory.

2.2 Import Mode vs DirectQuery

When connecting to a database, Power BI offers two fundamental storage modes:

2.3 Step-by-Step: Connecting to SQL Server

Click Home > Get Data > SQL Server.
In the SQL Server Database dialog enter the Server name (e.g. PROD-SQL01\MSSQLSERVER or mydb.database.windows.net for Azure SQL) and optionally the Database name.
Choose Data Connectivity mode: Import (default) or DirectQuery.
Expand Advanced Options if you want to paste a custom SQL query directly (useful for complex multi-table joins you want the server to execute).
Click OK. Choose authentication: Windows (Kerberos/NTLM SSO), Database (SQL login), or Microsoft Account (for Azure AD).
The Navigator appears. Browse the database tree — Tables, Views, and Stored Procedures are all visible. Select one or more objects.
Click Transform Data to open Power Query Editor, where you can filter rows before loading (this enables query folding, the filter is pushed to the server as a WHERE clause, drastically reducing data transfer)

2.4 Query Folding — The Performance Secret

Query folding is the mechanism by which Power Query translates your applied steps back into native SQL (or the query language of the source). When folding is active, filtering, sorting, grouping, and joining operations all execute on the server, and only the resulting rows travel over the network to Power BI. This is critical for large tables.

To check whether a step is folding, right-click the step in the Applied Steps pane in Power Query Editor. If View Native Query is available (not greyed out), the step folds. If it is greyed out, that step and all subsequent steps execute locally.

Tip: Steps that break query folding include: custom M functions, Table.AddColumn with complex logic, merges against non-SQL sources, and certain Text transformation functions. Always place non-folding steps as late as possible in the query pipeline to maximize the folded portion.

2.5 Writing a Native SQL Query

For complex analytical queries it is often more efficient to write the SQL yourself rather than building the equivalent transformations in Power Query. Use the Advanced Options > SQL Statement field in the connector dialog:

Note: When using a native SQL query, Power BI wraps it as a subquery. This means query folding for subsequent Power Query steps is disabled. Consider using Views or Stored Procedures on the SQL Server side for complex logic.

2.6 Handling Multiple Related Tables

A strength of the SQL Server connector is the ability to load multiple tables and define relationships in Power BI's Model view. The recommended pattern is:

Load each dimension table (Customers, Products, Date, Geography) as a separate query.
Load the fact table (Sales, Orders, Transactions) as a separate query.
In Model view, drag and drop keys to create relationships (Power BI often auto-detects these from column name matches).
Use a star schema layout, fact table in the centre, dimensions surrounding it for optimal DAX performance.

3. JSON and PDF Files

3.1 JSON Files

When JSON data appears in Power BI projects

JSON (JavaScript Object Notation) is ubiquitous in modern web APIs, microservices, and configuration files. You will encounter it when exporting data from REST APIs (e.g., Salesforce, HubSpot, Stripe), reading configuration exports, or receiving data from IoT or log-aggregation platforms. Unlike tabular CSV, JSON is hierarchical — records contain nested objects and arrays, which requires careful expansion in Power Query.

Step-by-step: Loading a JSON file

Click Home > Get Data > JSON.
Select your .json file. Power Query opens and displays the data as a single cell showing 'List' (if the root is an array) or 'Record' (if the root is an object).
If the root is a List, click the List cell, then click the Convert To Table button in the ribbon (Transform > To Table). This converts the list into a table with a single column named Column1.
Click the Expand icon (double arrow) at the top of Column1 to expand the nested Records into columns. Uncheck any columns you do not need.
Repeat the expand operation for any nested List or Record columns until the data is fully flattened.
Set data types on all columns explicitly.
Click Home > Close & Apply.

Example: Parsing a typical REST API JSON export

Consider a JSON file returned by a sales API with the following structure:

The transformation sequence in Power Query would be:

Convert root List to Table → Column1 contains Records.
Expand Column1 → reveals order_id, order_date, customer (Record), items (List).
Expand customer Record → reveals customer.id, customer.name, customer.country.
Expand items List → each row duplicates the order data, once per line item.
Expand items Record → reveals items.sku, items.qty, items.price.
Rename and type-cast all columns

Tip: When consuming live JSON from a REST API (rather than a file), use Power BI's Web connector (Get Data > Web) and paste the API endpoint URL. For authenticated APIs, use the Basic or OAuth credential options, or pre-process the API call using Power Automate into a SharePoint list.

3.2 PDF Files

Power BI's built-in PDF table extractor
Many organizations receive data locked inside PDF reports — supplier invoices, government statistical publications, bank statements, or regulatory filings. Power BI Desktop (version 2.73+) includes a native PDF connector that uses heuristic algorithms to detect and extract tables from PDF pages.

Step-by-step: Loading a PDF file

Click Home > Get Data > PDF (if not visible, search for 'PDF' in the search bar of the Get Data dialog).
Select your .pdf file. Power BI analyses every page and lists detected tables named Table001, Table002, Page001, Page002, etc.
Preview each item in the Navigator to identify the tables you need. Tables will show a grid preview; Page items show the raw extracted text from a page.
Select the relevant tables and click Transform Data.
In Power Query Editor, the first row often contains column headers — use Home > Use First Row as Headers.
Remove any blank rows or footer rows (typically the last 1-2 rows) using Home > Remove Rows.
Verify and fix data types, especially for numeric columns which PDF extraction sometimes returns as text

Limitations and workarounds

Note: The PDF connector quality varies significantly based on how the original PDF was created. PDFs generated programmatically (e.g. from Excel or reporting software) extract cleanly. Scanned documents or complex layout PDFs (columns, watermarks, rotated text) require pre-processing.

4. SharePoint Folders

4.1 Why SharePoint as a Data Source

SharePoint Online document libraries are the de facto file collaboration platform for Microsoft 365 organizations. Teams frequently drop updated Excel workbooks, CSV exports, or PDF reports into shared SharePoint folders on a regular schedule. Rather than manually downloading and re-importing each file, Power BI's SharePoint Folder connector can monitor an entire folder and automatically combine all files of the same type into a single unified dataset.

This pattern is extremely powerful for operational reporting: a finance team adds a new monthly Excel file to a SharePoint folder, and the scheduled Power BI refresh automatically picks it up and incorporates it into the report — zero manual intervention required.

4.2 Finding Your SharePoint Site URL

Before connecting, you need the root URL of your SharePoint site (not the full URL of a specific document library). Examples:

Do not include the document library name (/Shared Documents, /Reports, etc.) in this URL; Power BI will enumerate all libraries within the site

4.3 Step-by-Step: SharePoint Folder Connector

Click Home > Get Data > More… and search for 'SharePoint', or click Online Services > SharePoint Online List / SharePoint Folder. Use SharePoint Folder (not List) to access document library files.
Paste your SharePoint site root URL into the Site URL field. Click OK.
Sign in with your Microsoft 365 / Azure AD organizational account when prompted.
The Navigator shows all document libraries on that site. Each library is listed with its path. Choose the specific folder path you need, or select the library root and filter in Power Query.
Click Transform Data to open Power Query Editor. You will see a table with columns: Content (binary), Name, Extension, Date accessed, Date modified, Date created, Attributes, Folder Path.
Filter the Extension column to keep only the file types you want (e.g., .xlsx or .csv) using the column dropdown filter.
Filter the Folder Path column if you only want files from a specific subfolder.
Click the Combine Files button (double-down-arrow icon) on the Content column header. Power BI asks you to choose the transformation sample file — select the most representative file in the folder. Click OK.
Power BI generates a combination query that applies the sample file's transformation to every file in the folder. Review the 'Transform Sample File' query and adjust column types or header promotion as needed — these changes propagate to all files.
Click Home > Close & Apply.

4.4 How the Combine Files Pattern Works Internally

When you click Combine Files, Power BI automatically creates three helper queries behind the scenes:

Transform Sample File query: applies your defined transformations to a single representative file.
Transform File function: a parameterized version of the sample transformation, applied to each binary file.
The main query: iterates over every row (file) in the folder table and invokes the Transform File function, then appends all results into a single table.

This architecture means you only need to configure the transformation once (on the sample file), and it automatically applies to every existing and future file in the folder. Adding a new file to the SharePoint library and clicking Refresh is all that is needed.

4.5 Best Practices for the SharePoint Folder Pattern

Tip: If you need to connect to a SharePoint folder that is also used by non-Power BI users, consider adding a Power Automate flow that copies only finalized/approved files to a dedicated subfolder. This prevents Power BI from picking up in-progress drafts

4.6 SharePoint vs OneDrive for Business

OneDrive for Business is technically SharePoint under the hood — the same connector and authentication flow applies. Use the OneDrive for Business site URL:

For files that are personal/individual rather than team-shared, OneDrive is the appropriate choice. For team data that multiple people contribute to, use a SharePoint team site document library.

5. Conclusion

Data acquisition is the foundation upon which every Power BI report is built. No matter how sophisticated your DAX measures or how beautiful your visualizations, a report is only as trustworthy as the data flowing into it. This guide has walked through five of the most essential connector patterns in Power BI:

Excel and CSV files provide rapid access to analyst-produced and system-exported data with minimal setup.
SQL Server delivers enterprise-grade relational data with the performance benefits of query folding and support for both Import and DirectQuery modes.
JSON enables integration with modern APIs and semi-structured data sources through Power Query's hierarchical expansion capabilities.
PDF unlocks data that would otherwise remain trapped in static documents, with caveats around OCR-required scans.
SharePoint Folders automate the ingestion of regularly-updated file collections, eliminating manual refresh workflows.

A critical theme across all five connectors is the role of Power Query as the transformation layer. Investing time in building clean, well-documented, and folding-aware Power Query recipes pays dividends in report reliability, refresh performance, and maintainability. A poorly-designed query is as dangerous as a bad data source — both corrupt the insights your stakeholders depend on.

As you advance, explore combining multiple sources in a single model: a Sales fact table from SQL Server enriched with a Territory mapping from Excel, product metadata from a JSON API, quota targets from a SharePoint Excel file, and a historical performance PDF report — all unified in one Power BI data model. This multi-source integration capability is where Power BI truly earns its title as an enterprise analytics platform.

Quick Reference: Connector Cheatsheet

Further Learning: Microsoft Learn's Power BI documentation (learn.microsoft.com/power-bi), the Power Query M Language Reference, and the Power BI Community forums (community.powerbi.com) are the best resources for deepening your connector and transformation knowledge.

DEV Community

Getting Data from Different Sources in Power BI

Introduction

Architectural Overview

1. Excel and Text/CSV Files

1.1 Why These Sources Matter

1.2 Connecting to an Excel Workbook

Step-by-step guide

Key Power Query transformations for Excel data

1.3 Connecting to a CSV or Text File

Step-by-step guide

2. SQL Server Databases

2.1 Why SQL Server Is the Most Important Connector

2.2 Import Mode vs DirectQuery

2.3 Step-by-Step: Connecting to SQL Server

2.4 Query Folding — The Performance Secret

2.5 Writing a Native SQL Query

2.6 Handling Multiple Related Tables

3. JSON and PDF Files

3.1 JSON Files

Step-by-step: Loading a JSON file

Example: Parsing a typical REST API JSON export

3.2 PDF Files

Step-by-step: Loading a PDF file

Limitations and workarounds

4. SharePoint Folders

4.1 Why SharePoint as a Data Source

4.2 Finding Your SharePoint Site URL

4.3 Step-by-Step: SharePoint Folder Connector

4.4 How the Combine Files Pattern Works Internally

4.5 Best Practices for the SharePoint Folder Pattern

4.6 SharePoint vs OneDrive for Business

5. Conclusion

Quick Reference: Connector Cheatsheet

Top comments (0)