<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sankalp</title>
    <description>The latest articles on DEV Community by Sankalp (@sankalp_fabric_data_architect).</description>
    <link>https://dev.to/sankalp_fabric_data_architect</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3712508%2F0ea72dca-cfd9-430e-b798-3093e7b1780a.png</url>
      <title>DEV Community: Sankalp</title>
      <link>https://dev.to/sankalp_fabric_data_architect</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sankalp_fabric_data_architect"/>
    <language>en</language>
    <item>
      <title>Shortcut &amp; Mirroring</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Sun, 26 Apr 2026 19:01:30 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/shortcut-mirroring-jma</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/shortcut-mirroring-jma</guid>
      <description>&lt;p&gt;&lt;strong&gt;Shortcut:&lt;/strong&gt; Point to original source of data, only virtual connection. Data is not stored physically in target.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mirroring:&lt;/strong&gt; Data is physically stored in target location.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros &amp;amp; cons&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7y79wex07okl8lxwil2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7y79wex07okl8lxwil2.png" alt=" " width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>computerscience</category>
      <category>data</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>MS Fabric Architect Interview Questions</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Sun, 26 Apr 2026 17:48:01 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/ms-fabric-architect-interview-questions-50mm</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/ms-fabric-architect-interview-questions-50mm</guid>
      <description>&lt;p&gt;&lt;strong&gt;Q1. What is Microsoft Fabric?&lt;/strong&gt;&lt;br&gt;
Unified SaaS data platform combining Data Engineering, Data Science, Data Warehouse, Real-Time Analytics, and Power BI&lt;br&gt;
Built on OneLake (single data lake)&lt;br&gt;
Eliminates need for multiple services like ADF, Synapse, Power BI separately&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q2. What is OneLake?&lt;/strong&gt;&lt;br&gt;
Central storage layer (like OneDrive for data)&lt;br&gt;
Uses Delta Lake format&lt;br&gt;
Supports shortcuts (no data duplication)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q3. Difference between Microsoft Fabric and Azure Data Factory?&lt;/strong&gt;&lt;br&gt;
ADF → orchestration + pipelines only&lt;br&gt;
Fabric → end-to-end platform (storage + compute + BI)&lt;br&gt;
Fabric pipelines ≈ ADF but tightly integrated with lakehouse&lt;br&gt;
🔹 2. Architecture &amp;amp; Design&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q4. How do you design a scalable Fabric architecture for 1000+ customers?&lt;/strong&gt;&lt;br&gt;
Expected points:&lt;br&gt;
Workspace strategy (per customer vs domain-based)&lt;br&gt;
Capacity planning (F SKU sizing)&lt;br&gt;
Data isolation (schemas, folders, lakehouses)&lt;br&gt;
Use of shortcuts for shared datasets&lt;br&gt;
Governance via Purview&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q5. What are Lakehouse and Warehouse in Fabric?&lt;/strong&gt;&lt;br&gt;
Lakehouse&lt;br&gt;
Files + tables (Delta format)&lt;br&gt;
Good for data engineering &amp;amp; ML&lt;br&gt;
Warehouse&lt;br&gt;
SQL-based analytics&lt;br&gt;
Optimized for BI queries&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q6. When would you use Lakehouse vs Warehouse?&lt;/strong&gt;&lt;br&gt;
Lakehouse → ingestion, transformation, ML&lt;br&gt;
Warehouse → reporting, star schema, Power BI&lt;/p&gt;

&lt;p&gt;🔹 3. Data Engineering &amp;amp; Pipelines&lt;br&gt;
&lt;strong&gt;Q7. How do Fabric Data Pipelines differ from ADF pipelines?&lt;/strong&gt;&lt;br&gt;
Similar UI and activities&lt;br&gt;
Fabric pipelines are tightly integrated with OneLake&lt;br&gt;
No need for separate IR (mostly)&lt;br&gt;
Better native support for Lakehouse&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q8. Explain incremental load strategies in Fabric.&lt;/strong&gt;&lt;br&gt;
You already know this—expect follow-ups:&lt;br&gt;
Watermark (last run timestamp)&lt;br&gt;
CDC (Change Data Capture)&lt;br&gt;
Delta table merge&lt;br&gt;
Using Copy Activity with filters&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q9. How do you implement pagination in Fabric pipelines?&lt;/strong&gt;&lt;br&gt;
(They may expect something like your ESRI API scenario)&lt;br&gt;
Use Until loop&lt;br&gt;
Maintain offset variable&lt;br&gt;
Call API using Copy/Web activity&lt;br&gt;
Append to Lakehouse table&lt;/p&gt;

&lt;p&gt;🔹 4. Delta Lake &amp;amp; Data Modeling&lt;br&gt;
&lt;strong&gt;Q10. What is Delta Lake and why is it important in Fabric?&lt;/strong&gt;&lt;br&gt;
ACID transactions&lt;br&gt;
Time travel&lt;br&gt;
Schema evolution&lt;br&gt;
Supports incremental loads&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q11. How do you handle slowly changing dimensions (SCD) in Fabric?&lt;/strong&gt;&lt;br&gt;
Use MERGE in Delta tables&lt;br&gt;
dbt snapshots (if using dbt)&lt;br&gt;
Maintain valid_from, valid_to&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q12. Bronze, Silver, Gold architecture in Fabric?&lt;/strong&gt;&lt;br&gt;
Bronze → raw ingestion&lt;br&gt;
Silver → cleaned/transformed&lt;br&gt;
Gold → business-ready&lt;/p&gt;

&lt;p&gt;🔹 5. Performance Optimization&lt;br&gt;
&lt;strong&gt;Q13. How do you optimize performance in Fabric Lakehouse?&lt;/strong&gt;&lt;br&gt;
Partitioning (date/customer)&lt;br&gt;
Z-ordering&lt;br&gt;
File size optimization (avoid small files)&lt;br&gt;
Caching&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q14. What is shortcut in OneLake and when to use it?&lt;/strong&gt;&lt;br&gt;
Reference external data without copying&lt;br&gt;
Useful for multi-workspace sharing&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔹 6. Security &amp;amp; Governance&lt;/strong&gt;&lt;br&gt;
Q15. How do you secure data in Fabric?&lt;br&gt;
Workspace-level access&lt;br&gt;
Row-level security (Power BI)&lt;br&gt;
Object-level security&lt;br&gt;
Integration with Purview&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q16. How do you manage multi-tenant data securely?&lt;/strong&gt;&lt;br&gt;
Separate workspaces OR schemas&lt;br&gt;
Use RBAC&lt;br&gt;
Data masking&lt;/p&gt;

&lt;p&gt;🔹 7. Real-Time &amp;amp; Advanced Topics&lt;br&gt;
&lt;strong&gt;Q17. What is Real-Time Analytics in Fabric?&lt;/strong&gt;&lt;br&gt;
Event streams + KQL database&lt;br&gt;
Used for IoT/log analytics&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q18. How would you design IoT data ingestion in Fabric?&lt;/strong&gt;&lt;br&gt;
Event streaming → KQL DB&lt;br&gt;
Store raw in Lakehouse&lt;br&gt;
Transform to Delta tables&lt;br&gt;
Serve via Power BI&lt;/p&gt;

&lt;p&gt;🔹 8. Scenario-Based Questions (VERY IMPORTANT)&lt;br&gt;
&lt;strong&gt;Q19. A client has 700+ customers and 1000+ workspaces. How would you optimize?&lt;/strong&gt;&lt;br&gt;
Consolidate workspaces (domain-based)&lt;br&gt;
Use shortcuts instead of duplication&lt;br&gt;
Central governance&lt;br&gt;
Capacity optimization&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q20. API data ingestion with pagination and failure handling?&lt;/strong&gt;&lt;br&gt;
Until loop&lt;br&gt;
Retry logic&lt;br&gt;
Logging table&lt;br&gt;
Idempotent loads&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q21. How do you handle data quality in Fabric?&lt;/strong&gt;&lt;br&gt;
DQ rules table (like your setup)&lt;br&gt;
PySpark validation&lt;br&gt;
Separate failed records&lt;br&gt;
Monitoring dashboards&lt;/p&gt;

&lt;p&gt;🔹 9. Integration with Other Tools&lt;br&gt;
&lt;strong&gt;Q22. How does Fabric integrate with dbt?&lt;/strong&gt;&lt;br&gt;
Use dbt with Lakehouse/Warehouse&lt;br&gt;
dbt models for transformation&lt;br&gt;
dbt snapshots for SCD&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q23. Can Fabric replace Snowflake?&lt;/strong&gt;&lt;br&gt;
Depends:&lt;br&gt;
Fabric → unified + cheaper (in some cases)&lt;br&gt;
Snowflake → mature + strong performance&lt;br&gt;
Many orgs use hybrid&lt;/p&gt;

&lt;p&gt;🔹 10. Trick / Deep Questions&lt;br&gt;
&lt;strong&gt;Q24. What are limitations of Fabric?&lt;/strong&gt;&lt;br&gt;
Still evolving&lt;br&gt;
Some enterprise features missing vs Synapse/Snowflake&lt;br&gt;
Capacity-based pricing challenges&lt;br&gt;
Q25. How does compute work in Fabric?&lt;br&gt;
Capacity-based (F SKUs)&lt;br&gt;
Shared compute across workloads&lt;br&gt;
🔥 How to Prepare Smartly (Based on your profile)&lt;br&gt;
You already worked on:&lt;br&gt;
Pagination pipelines&lt;br&gt;
Incremental load&lt;br&gt;
dbt + Snowflake&lt;br&gt;
👉 So focus on:&lt;br&gt;
Mapping your experience → Fabric concepts&lt;br&gt;
Scenario-based answers (interviewers LOVE this)&lt;br&gt;
Architecture decisions (not just features)&lt;br&gt;
If you want, I can:&lt;br&gt;
Do a mock interview (real questions + feedback)&lt;br&gt;
Or give answers to these questions in detail (architect-level)&lt;br&gt;
Or create Fabric architecture diagrams for interviews &lt;/p&gt;

</description>
    </item>
    <item>
      <title>ACID with delta table</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Mon, 19 Jan 2026 09:25:51 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/acid-with-delta-table-47li</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/acid-with-delta-table-47li</guid>
      <description>&lt;p&gt;************************ ACID property *******************************&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Atomicity:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Either All or NULL&lt;/li&gt;
&lt;li&gt;Transaction should be complete all operation successfully then only commit else rollback all transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consistency:&lt;/strong&gt;    &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sum of total money of A &amp;amp; B should be same before and end of transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Isolation:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parallel schedule transaction can be converted in serial schedule conceptually to make the transaction consistence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Durability:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Changes done by transaction should be permanent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;********************** Parquet Vs Delta file format *******************&lt;/p&gt;

&lt;p&gt;Parquet:&lt;br&gt;&lt;br&gt;
Type        :   Column storage format.&lt;br&gt;
Optimized for   :   Efficiently read performance, especially for data analytics.&lt;/p&gt;

&lt;p&gt;Key features    :   1.) Column wise compression --&amp;gt; Reduce file size.&lt;br&gt;
                2.) Splitable files         --&amp;gt;     Parallel processing.&lt;br&gt;
                3.) Work well with Hive, Spark, Big Query etc.&lt;/p&gt;

&lt;p&gt;Delta:&lt;br&gt;&lt;br&gt;
    Built on    : Parquet format + transactional layer ( _delta logs).&lt;br&gt;
    Optimized for   : Reliable, Scalable delta lake with support ACID transactions. &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Key features    :   1.) Acid transactions (safe read/write)

            2.) Schema Enforcement        :     Delta Lake ensures that the data written to a table matches the table’s schema. 
                                    This prevents issues like inserting a string into a column that expects an integer.

            3.) Schema Evolution          :     When enabled, Delta can automatically adapt to changes in the schema (e.g., adding new                                      columns) during write operations. This is useful for agile data pipelines where the                                         schema may evolve over time.

            4.) Time Travel (Query Past Versions) : Delta Lake maintains a transaction log (_delta_log) that records every change to the                                        data. You can query a table as it existed at a specific point in time or version using.                                         This is useful for debugging, auditing.                                                             
                                    e.g. 
                                    SELECT * FROM table_name VERSION AS OF 5;
                                    -- or
                                    SELECT * FROM table_name TIMESTAMP AS OF '2025-06-10T12:00:00';

            5.)     Ideal for Streaming + Batch (Unified Workflows):
                                    &amp;gt;   Delta Lake supports both streaming and batch reads/writes on the same table.
                                    &amp;gt;   This unification simplifies architecture: you don’t need separate pipelines or                                            storage for real-time and historical data.

                                    e.g. 
                                    You can ingest real-time data using Spark Structured Streaming and run batch analytics                                      on the same Delta table.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Conclusion:&lt;br&gt;
        Use Parquet when you need fast, storage-efficient analytics on append-only data.&lt;br&gt;
        Use Delta when you need reliability, schema control, time travel, and transactional operations on top of Parquet.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        ************************ SCD Types *******************************
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;SCD Type 0  :   Fixed , No changes allowed (e.g., Date of Birth).&lt;/p&gt;

&lt;p&gt;SCD Type 1  :   Overwrite, Old data is overwritten. No history is kept. Simple but loses historical data.&lt;/p&gt;

&lt;p&gt;SCD Type 2  :   Add Row, New row for every change with versioning or effective dates. Full history preserved.&lt;/p&gt;

&lt;p&gt;SCD Type 3  :   Add Column, Adds a new column to track previous value. Limited history (usually just 1 change).&lt;/p&gt;

&lt;p&gt;SCD Type 4  :   History Table, Separate historical table stores changes; main table holds current data. Good for large history storage.&lt;/p&gt;

&lt;p&gt;SCD Type 6  :   Hybrid, Combination of Types 1, 2, and 3. Tracks history with current data easily accessible.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>when multiple job write same delta table</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Fri, 16 Jan 2026 11:33:06 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/when-multiple-job-write-same-delta-table-1lo8</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/when-multiple-job-write-same-delta-table-1lo8</guid>
      <description>&lt;p&gt;delta table solve this using ACID property.&lt;br&gt;
each write operation commit transaction log, when delta detect that multiple job conflict in concurrent run, it safely fail one job instead of silently corrupt data.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>sort merge join, hash join, nested loop join</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Fri, 16 Jan 2026 11:24:41 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/sort-merge-join-hash-join-nested-loop-join-f0h</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/sort-merge-join-hash-join-nested-loop-join-f0h</guid>
      <description>&lt;p&gt;&lt;strong&gt;nested loop join:&lt;/strong&gt;&lt;br&gt;
can use for small data set.&lt;br&gt;
easy to use but scan every row to get the match record, performance will be degrade when tables are big&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;short merge join:&lt;/strong&gt;&lt;br&gt;
work well for large tables but data should be sorted (data should be bucketed) but sorting is expensive operation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;hash join:&lt;/strong&gt;&lt;br&gt;
work well for large tables but need extra memory for hash table.&lt;br&gt;
hash table group the record for same id and create a hash key&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=-htbah3eCYg" rel="noopener noreferrer"&gt;&lt;/a&gt; &lt;/p&gt;

</description>
    </item>
    <item>
      <title>starter pool and custom pool/ spark pool</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Fri, 16 Jan 2026 09:23:25 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/starter-pool-and-custom-pool-spark-pool-55i3</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/starter-pool-and-custom-pool-spark-pool-55i3</guid>
      <description>&lt;p&gt;&lt;strong&gt;starter pool:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;good to use when we have to test the notebook. there are always active medium size node with default libraries, dynamic allocation and auto scale [1 to 10].&lt;br&gt;
its help to initialize the session in 5 to 10 seconds.&lt;br&gt;
charges are only when session is active.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;spark pool/ custom pool:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;user can configure as per demand of work&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Microsoft purview - short detail</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Fri, 16 Jan 2026 07:52:35 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/microsoft-purview-short-detail-3kcg</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/microsoft-purview-short-detail-3kcg</guid>
      <description>&lt;p&gt;&lt;strong&gt;purview&lt;/strong&gt; is a family of data governance, risk, compliance from Microsoft and integrated with fabric.&lt;br&gt;
its help to govern, protect and manage entire data across cloud, on-premises and SaaS.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;features:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;provide the lineage of entire data for data governance&lt;/li&gt;
&lt;li&gt;automatically show metadata of all fabric items in purview unified catalog&lt;/li&gt;
&lt;li&gt;allow to view and manage data from purview catalog&lt;/li&gt;
&lt;li&gt;protect data using sensitivity label(labels can be defined in catalog)&lt;/li&gt;
&lt;li&gt;DLP (Data Loss Prevention) policies can be applied in Power BI semantic model (for e.g. detecting credit card no and generate alert)&lt;/li&gt;
&lt;li&gt;all fabric user activity are logged in purview audit log&lt;/li&gt;
&lt;li&gt;purview hub accessible to fabric admin, it offer dashboard for insight &lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

</description>
    </item>
    <item>
      <title>Z-Ordering optimization</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Fri, 16 Jan 2026 07:30:57 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/z-ordering-optimization-414a</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/z-ordering-optimization-414a</guid>
      <description>&lt;p&gt;Z-Ordering is a technique to co-locate the related information in same set of files.&lt;br&gt;
This feature improve the data reading dramatically because its ease to read relational data from same set of files. &lt;/p&gt;

&lt;p&gt;E.g.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;BEFORE Z-Ordering -&lt;/strong&gt;&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;Data files are not organized by customer_id or order_date&lt;br&gt;
Spark has no idea where the relevant rows live&lt;/p&gt;

&lt;p&gt;So it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scans ~2,500 out of 2,700 files&lt;/li&gt;
&lt;li&gt;Reads a huge amount of data&lt;/li&gt;
&lt;li&gt;Causes high disk I/O&lt;/li&gt;
&lt;li&gt;Takes a long time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Result:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large scan&lt;/li&gt;
&lt;li&gt;Slow query&lt;/li&gt;
&lt;li&gt;Wasted resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;AFTER Z-Ordering -&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What happens now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Spark knows which files are likely to contain customer_id = 101&lt;/li&gt;
&lt;li&gt;It skips irrelevant files (data skipping)&lt;/li&gt;
&lt;li&gt;It reads only ~120 files instead of ~2,500&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Result:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small scan&lt;/li&gt;
&lt;li&gt;Low I/O&lt;/li&gt;
&lt;li&gt;Much faster query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnth32dq1j36r5tk1o9xn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnth32dq1j36r5tk1o9xn.png" alt=" " width="517" height="579"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;refer below MS url for configuration&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.delta.io/optimizations-oss/#z-ordering-multi-dimensional-clustering" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>OPTIMIZE property configuration in fabric</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Fri, 16 Jan 2026 07:22:10 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/optimize-property-configuration-in-fabric-4hf6</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/optimize-property-configuration-in-fabric-4hf6</guid>
      <description>&lt;p&gt;spark perform very well for standard size large file but problem start occurring when it has to deal with many small files in the same time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OPTIMZE, coalesce many small file in to a larger one to maintain the balance standard size.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;it dynamically optimize the partition by generating file with default 128MB size (default size can be changed as per requirement) &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Advantages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;maintain the ability of V-Order and Z-Order&lt;/li&gt;
&lt;li&gt;coalesce small files in large balance file size(No matter how many tuple in file)&lt;/li&gt;
&lt;li&gt;auto compaction of delta table and files&lt;/li&gt;
&lt;li&gt;no impact on reading delta table before and after OPTIMIZE&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;Refer below MS Url for configuration&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.delta.io/optimizations-oss/" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>as per work load - resource profile</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Fri, 16 Jan 2026 06:44:08 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/as-per-work-load-resource-profile-4i4d</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/as-per-work-load-resource-profile-4i4d</guid>
      <description>&lt;p&gt;Resource profile can be configured as per the work load as mentioned below.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;readHeavyForSpark&lt;/li&gt;
&lt;li&gt;readHeavyForPBI&lt;/li&gt;
&lt;li&gt;writeHeavy&lt;/li&gt;
&lt;li&gt;custom&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;refer below Microsoft official URL for configuration in MS Fabric&lt;/p&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/fabric/data-engineering/configure-resource-profile-configurations" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>V-Order optimization</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Fri, 16 Jan 2026 06:17:23 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/v-order-optimization-388g</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/v-order-optimization-388g</guid>
      <description>&lt;p&gt;V-Order optimize parquet file through sorting, row group distribution, encoding and compression.&lt;/p&gt;

&lt;p&gt;Disadvantage of V-order optimization is that it increase the write time by up to 15 but positive side is that it boost the data compression by 50% and data read time improve by 10% and also in some cases read time improve up to 50% as well.&lt;/p&gt;

&lt;p&gt;parquet engine can read it as a regular parquet file.&lt;br&gt;
there is not any impact on delta table other features, like Z-Order, vacuum, time travel, compaction etc.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;V-Order is disabled by default&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03h7zti8x77q4bk0h0s3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03h7zti8x77q4bk0h0s3.png" alt=" " width="634" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In Fabric runtime 1.3 and higher versions, the spark.sql.parquet.vorder.enable setting is removed. As V-Order is applied automatically during Delta optimization using OPTIMIZE statements, there's no need to manually enable this setting in newer runtime versions. If you're migrating code from an earlier runtime version, you can remove this setting, as the engine now handles it automatically.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparksql" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>performance</category>
    </item>
    <item>
      <title>partition pruning</title>
      <dc:creator>Sankalp</dc:creator>
      <pubDate>Fri, 16 Jan 2026 05:59:52 +0000</pubDate>
      <link>https://dev.to/sankalp_fabric_data_architect/partition-pruning-2fdp</link>
      <guid>https://dev.to/sankalp_fabric_data_architect/partition-pruning-2fdp</guid>
      <description>&lt;p&gt;Working with Big Data often presents the challenge of slow query results due to the overhead of scanning massive datasets. Optimization involves more than just how you read or aggregate data; for high-performance scanning, data must be organized in a way that the Spark engine can consume efficiently. &lt;br&gt;
This is where partition pruning becomes essential. If data is well-partitioned within storage systems like HDFS, S3, or ADLS, Spark queries will only scan the specific partition folders required, significantly reducing processing time. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;you can see in below code, instead of giant file, spark create directory with hierarchy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;df.write.partitionBy("Year", "Month").parquet("/data/consumer")&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Physical Result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;/data/sales/Year=2023/Month=01/&lt;/li&gt;
&lt;li&gt;/data/sales/Year=2023/Month=02/&lt;/li&gt;
&lt;li&gt;/data/sales/Year=2024/Month=01/&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Partition Pruning happens automatically when you query the data. &lt;br&gt;
If you run: &lt;br&gt;
&lt;code&gt;df = spark.read.parquet("/data/consumer").filter(col("Year") == 2024)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Spark looks at the query, looks at the folder structure, and says, "Okay, the user only wants 2024. I am going to completely ignore the 'Year=2023' folder. I won't even list the files inside it."&lt;/p&gt;

&lt;p&gt;This can turn a 10TB scan into a 100GB scan instantly.&lt;/p&gt;

</description>
      <category>data</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
