<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: hammoudi wissem</title>
    <description>The latest articles on DEV Community by hammoudi wissem (@hammoudi_wissem_2a55b8cc2).</description>
    <link>https://dev.to/hammoudi_wissem_2a55b8cc2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2826896%2Ffa59cfad-e949-4686-ab51-4e7d0c3386a3.jpg</url>
      <title>DEV Community: hammoudi wissem</title>
      <link>https://dev.to/hammoudi_wissem_2a55b8cc2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hammoudi_wissem_2a55b8cc2"/>
    <language>en</language>
    <item>
      <title>The Power of Context in Data Visualization (Part 1)</title>
      <dc:creator>hammoudi wissem</dc:creator>
      <pubDate>Thu, 20 Nov 2025 15:30:43 +0000</pubDate>
      <link>https://dev.to/hammoudi_wissem_2a55b8cc2/the-power-of-context-in-data-visualization-part-1-10f8</link>
      <guid>https://dev.to/hammoudi_wissem_2a55b8cc2/the-power-of-context-in-data-visualization-part-1-10f8</guid>
      <description>&lt;p&gt;Effective data visualization doesn’t start with charts, colors, or dashboards. It starts long before that—&lt;em&gt;with context&lt;/em&gt;. Understanding the “why” behind your communication is the foundation of compelling data storytelling. In this article, we’ll break down what context really means, why it matters, and how you can use it to communicate insights that influence decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Exploratory vs. Explanatory Analysis: Two Different Worlds&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;One of the biggest mistakes people make is mixing exploratory and explanatory analysis. Although they work together in the analytics process, they serve very different purposes:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Exploratory Analysis&lt;/strong&gt; – &lt;em&gt;The Search&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;This is your behind-the-scenes detective work. You dig into the data, test hypotheses, slice and filter metrics, and look at patterns from many angles. Think of opening 100 oysters to find a few valuable pearls.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Explanatory Analysis&lt;/strong&gt; – &lt;em&gt;The Story&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;This is what you share with your audience. You don’t show all the oysters—you present only the pearls. The goal is clarity, not completeness.&lt;/p&gt;

&lt;p&gt;Too often, presenters drown audiences in exploratory material instead of focusing on the insights that matter. The key is restraint: highlight the findings your audience &lt;em&gt;needs&lt;/em&gt;, not everything you discovered along the way.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Who, What, and How: The Core of Context&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Before designing any visual or writing a single line of narrative, you must answer three questions with precision:&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;1. Who: Understand Your Audience&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Effective communication starts with a clear picture of who you’re speaking to. Avoid vague definitions like “everyone” or “stakeholders.” Identify &lt;em&gt;one specific audience&lt;/em&gt; or decision-maker and tailor your message around them.&lt;/p&gt;

&lt;p&gt;Consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What does this audience care about?&lt;/li&gt;
&lt;li&gt;What are their goals and constraints?&lt;/li&gt;
&lt;li&gt;How familiar are they with the topic?&lt;/li&gt;
&lt;li&gt;What is your relationship with them—trusted advisor, newcomer, peer?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your understanding of these factors determines how you frame your message, what you emphasize, and even how much detail you include.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;2. What: Inspire Meaningful Action&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Ask yourself: &lt;strong&gt;What do I want my audience to &lt;em&gt;know&lt;/em&gt; or &lt;em&gt;do&lt;/em&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the heart of your communication. If you cannot articulate why your message matters, you’re not ready to present it.&lt;/p&gt;

&lt;p&gt;As the person closest to the data, you are in the best position to guide your audience. Even if you can’t make direct recommendations, you can still suggest possible directions or pose thoughtful next-step questions.&lt;/p&gt;

&lt;p&gt;Clear calls to action turn passive listening into meaningful response.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;3. How: Choosing the Right Format and Tone&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Mechanism – The Format&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;How will you deliver your message?&lt;/p&gt;

&lt;p&gt;Communication exists on a spectrum.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live presentations&lt;/strong&gt; offer control, emphasis, and interaction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Written reports&lt;/strong&gt; offer depth, reviewability, and precision.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right format depends on your audience, the complexity of your message, and the action you want them to take.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j44gkdfwvab01j6ql21.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j44gkdfwvab01j6ql21.png" alt=" " width="800" height="544"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Tone – The Emotion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Tone shapes how your message is received.&lt;br&gt;
Are you celebrating success? Addressing a problem? Urging change?&lt;/p&gt;

&lt;p&gt;Your words, pacing, and style should reinforce the emotional intention of your message.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;How to Build Your Story with Data&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Once you know your audience and your objective, the next step is to determine &lt;em&gt;what data&lt;/em&gt; best supports your narrative.&lt;/p&gt;

&lt;p&gt;Data provides credibility and evidence, but it should never overshadow your story. Use data to strengthen your argument—not complicate it.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Crafting Clarity: The 3-Minute Story and the Big Idea&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Two powerful frameworks can help you distill even the most complex information into a crisp narrative:&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The 3-Minute Story&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This approach forces you to explain your insights concisely—no slides, no distractions. It’s perfect for quick executive updates or impromptu conversations.&lt;/p&gt;

&lt;p&gt;A strong 3-minute story includes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The problem or opportunity&lt;/li&gt;
&lt;li&gt;What you did&lt;/li&gt;
&lt;li&gt;What you found&lt;/li&gt;
&lt;li&gt;What should happen next&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Big Idea&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Your entire message in &lt;em&gt;one clear sentence&lt;/em&gt;.&lt;br&gt;
Nancy Duarte describes three key elements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;unique point of view&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A sense of &lt;strong&gt;what’s at stake&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;complete, standalone sentence&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your Big Idea guides your entire communication, ensuring consistency and focus.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Example&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;3-Minute Story:&lt;/strong&gt;&lt;br&gt;
A group of us in the science department were brainstorming ways to address a recurring issue: fourth-grade students often enter science classes with negative attitudes. We wondered whether giving students earlier exposure to science could change this perception. So we piloted a summer program for second- and third-graders. To measure its impact, we surveyed students before and after. Initially, 40% of students felt only “OK” about science. After the program, nearly 70% expressed positive interest. This indicates strong success, and we recommend continuing and expanding the program.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Big Idea:&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;The pilot summer learning program improved students’ perceptions of science, and we recommend expanding it; please approve the budget.&lt;/em&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Storyboarding: Your Blueprint for Clear Communication&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Storyboarding is one of the most valuable yet overlooked steps in data storytelling. It acts as a visual outline of your message and ensures clarity before you ever open presentation software.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why Avoid Starting with Slides?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Tools like PowerPoint push you into “slide creation mode.” You start decorating instead of thinking. The flow becomes cluttered and unfocused.&lt;/p&gt;

&lt;p&gt;Instead, start &lt;em&gt;low-tech&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paper&lt;/li&gt;
&lt;li&gt;Post-it notes&lt;/li&gt;
&lt;li&gt;Whiteboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These give you the freedom to rearrange, refine, and rethink without emotional attachment to specific layouts.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Example Storyboard: Summer Learning Program&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Problem:&lt;/strong&gt; Students have negative perceptions of science.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action:&lt;/strong&gt; Introduce a summer early-exposure program.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Results:&lt;/strong&gt; Significant improvement in perceptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Big Idea:&lt;/strong&gt; Expand the program.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You might even experiment by &lt;em&gt;starting&lt;/em&gt; with the Big Idea, especially if you want your audience to engage immediately and sharply.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Why Storyboarding Matters&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Storyboarding ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A logical, audience-focused flow&lt;/li&gt;
&lt;li&gt;Purpose behind every section&lt;/li&gt;
&lt;li&gt;Fewer revisions later&lt;/li&gt;
&lt;li&gt;Faster content creation overall&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It saves time, improves clarity, and sets you up for a more persuasive final product.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Start Smart, Not Fast&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Though storyboarding may feel like a slow first step, it prevents wasted effort and confusion later. It grounds your message in intent and gives you a strong foundation for building visuals and narrative.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Note: This article is inspired by the book **Storytelling with Data&lt;/em&gt;* by Cole Nussbaumer Knaflic, which offers foundational principles for impactful data communication.*&lt;/p&gt;

</description>
      <category>analytics</category>
    </item>
    <item>
      <title>Storytelling with Data: The Art of Communicating Insights Effectively (Introduction)</title>
      <dc:creator>hammoudi wissem</dc:creator>
      <pubDate>Thu, 20 Nov 2025 15:25:12 +0000</pubDate>
      <link>https://dev.to/hammoudi_wissem_2a55b8cc2/storytelling-with-data-the-art-of-communicating-insights-effectively-part-1-24j8</link>
      <guid>https://dev.to/hammoudi_wissem_2a55b8cc2/storytelling-with-data-the-art-of-communicating-insights-effectively-part-1-24j8</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Bad Graphs Are Everywhere&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;No one intentionally creates a bad graph—yet we see them everywhere. Across industries, organizations, and even major media outlets, poorly designed charts make their way into presentations and reports. Most of these come from well-meaning individuals who simply haven’t been taught how to visualize data effectively.&lt;/p&gt;

&lt;p&gt;But &lt;em&gt;why&lt;/em&gt; is this so common?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1tmfo5jgysnhrkfjncb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1tmfo5jgysnhrkfjncb.png" alt=" " width="800" height="968"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;We’re Not Naturally Equipped for Data Storytelling&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In school, we develop two main skill sets: language and mathematics. We learn to craft cohesive stories and analyze numerical problems. Yet these two skills—storytelling and data analysis—rarely meet.&lt;/p&gt;

&lt;p&gt;In today’s data-driven world, the ability to tell meaningful stories with numbers has become essential. But most of us were never taught how to turn raw data into a compelling narrative, nor how to design visuals that communicate clearly.&lt;/p&gt;

&lt;p&gt;As technology enables us to collect enormous amounts of data, making sense of it has become both a challenge and a necessity. Data visualization bridges this gap, transforming numbers into insights that support better decisions.&lt;/p&gt;

&lt;p&gt;However, without proper training, many people rely heavily on tools like Excel or PowerPoint. While these tools make graph creation accessible, they also produce visuals overloaded with unnecessary 3D effects, excessive colors, and confusing pie charts.&lt;/p&gt;

&lt;p&gt;The result?&lt;br&gt;
&lt;strong&gt;Missed opportunities to inform, persuade, and inspire.&lt;/strong&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;The Journey of Storytelling with Data&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Effective data storytelling is a transformative skill—it allows you to communicate complex insights clearly and persuasively. This series of articles will explore the core principles from &lt;em&gt;Storytelling with Data&lt;/em&gt;, breaking each concept down into actionable techniques you can apply immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Part 1: The Importance of Context&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before creating a single chart, start with one critical question: &lt;strong&gt;What is the context?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ask yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Who is my audience?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;What do I want them to know or do?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your answers determine the structure, tone, and level of detail in your narrative. Executives may need a high-level summary, while technical teams may want deeper analysis.&lt;/p&gt;

&lt;p&gt;Understanding context from the beginning reduces rework and ensures your visuals support your objectives.&lt;/p&gt;

&lt;p&gt;Our next article will explore practical ways to uncover audience needs and define your narrative purpose.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Part 2: Choosing the Right Visual&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Selecting the right visual form is essential for clarity. Common and effective visuals include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bar charts&lt;/strong&gt; – for comparisons&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Line graphs&lt;/strong&gt; – for trends over time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heatmaps&lt;/strong&gt; – for patterns and correlations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slopegraphs&lt;/strong&gt; – for showing changes between two points&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Equally important is knowing what &lt;em&gt;not&lt;/em&gt; to use. Pie charts and 3D graphs often distort information and confuse audiences.&lt;/p&gt;

&lt;p&gt;In an upcoming article, we’ll explore how to match the visual to your message—ensuring clarity, accuracy, and impact.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Part 3: Clutter Is Your Enemy&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Clutter is the silent killer of good data visualization. Every extra line, color, or label makes it harder for your audience to understand your message.&lt;/p&gt;

&lt;p&gt;To avoid this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apply &lt;strong&gt;Gestalt principles&lt;/strong&gt; to organize visual elements&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;white space&lt;/strong&gt;, &lt;strong&gt;alignment&lt;/strong&gt;, and &lt;strong&gt;contrast&lt;/strong&gt; to guide the eye&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you remove unnecessary elements, your insight becomes the center of attention. We’ll share concrete decluttering techniques in a future article.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Part 4: Focus Your Audience’s Attention&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Great storytelling directs attention where it matters most.&lt;/p&gt;

&lt;p&gt;Use techniques like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Preattentive attributes&lt;/strong&gt; (color, size, position)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual hierarchy&lt;/strong&gt; to emphasize key points&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tools ensure your audience immediately sees what you want them to notice. Our article on this topic will include practical examples of guiding attention through design.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Part 5: Think Like a Designer&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Good design is not decoration—it's functionality. In data visualization, &lt;strong&gt;form follows function&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Key design concepts include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accessibility&lt;/strong&gt; – making visuals usable for everyone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Affordances&lt;/strong&gt; – making important elements stand out&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aesthetics&lt;/strong&gt; – balancing beauty with clarity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will explore how design thinking helps create visuals that are both clear and appealing.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Part 6: Dissecting Model Visuals&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;What makes a visualization &lt;em&gt;great&lt;/em&gt;? In this section, we’ll analyze exemplary charts, examining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How graph types are selected&lt;/li&gt;
&lt;li&gt;How data is ordered&lt;/li&gt;
&lt;li&gt;How color and alignment strengthen the message&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By breaking down strong examples, you’ll learn how thoughtful design improves understanding and decision-making.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Part 7: Lessons in Storytelling&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Numbers rarely stick with people—but stories do.&lt;/p&gt;

&lt;p&gt;This chapter introduces storytelling frameworks with clear beginnings, middles, and ends, applying them specifically to data communication. You'll learn how to use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Narrative flow&lt;/li&gt;
&lt;li&gt;Repetition for emphasis&lt;/li&gt;
&lt;li&gt;Smooth transitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tools help transform your analysis into a compelling message that people remember.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Part 8: Pulling It All Together&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The final chapter brings the entire process to life through a complete example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Defining the context&lt;/li&gt;
&lt;li&gt;Choosing appropriate visuals&lt;/li&gt;
&lt;li&gt;Removing clutter&lt;/li&gt;
&lt;li&gt;Designing for clarity and impact&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a coherent and persuasive data story that informs, convinces, and inspires action.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;The Journey Begins&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Storytelling with data is more than presenting numbers—it’s about crafting narratives that influence decisions. This series will guide you through each step of the process, helping you build the skills necessary to communicate insights powerfully and effectively.&lt;/p&gt;

&lt;p&gt;Let’s begin this journey together—one part at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This article is inspired by &lt;em&gt;Storytelling with Data&lt;/em&gt; by Cole Nussbaumer Knaflic, a highly recommended resource for mastering the principles of effective data visualization.&lt;/p&gt;




&lt;p&gt;If you'd like, I can also:&lt;/p&gt;

&lt;p&gt;✅ Format this as a &lt;strong&gt;LinkedIn article&lt;/strong&gt;&lt;br&gt;
✅ Add visuals suggestions&lt;br&gt;
✅ Make it shorter / more concise&lt;br&gt;
✅ Adapt it to your writing style&lt;/p&gt;

&lt;p&gt;Just tell me!&lt;/p&gt;

</description>
      <category>analytics</category>
    </item>
    <item>
      <title>Change Data Capture(Example using Microsoft SQL Server Log-based CDC vs Trigger-based CDC</title>
      <dc:creator>hammoudi wissem</dc:creator>
      <pubDate>Thu, 20 Nov 2025 15:16:41 +0000</pubDate>
      <link>https://dev.to/hammoudi_wissem_2a55b8cc2/change-data-captureexample-using-microsoft-sql-server-log-based-cdc-vs-trigger-based-cdc-15og</link>
      <guid>https://dev.to/hammoudi_wissem_2a55b8cc2/change-data-captureexample-using-microsoft-sql-server-log-based-cdc-vs-trigger-based-cdc-15og</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Definition of Change Data Capture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Change Data Capture (CDC) is a technique used to detect, record, and track changes—such as inserts, updates, and deletes—made to data in real-time or near real-time. It is widely used in databases and information systems to maintain an accurate history of modifications and synchronize data across systems.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; CDC is considered an ETL (Extract, Transform, Load) technique where only changed data is extracted from a source system and loaded into a target system such as a data warehouse or data lake.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Importance of Change Data Capture&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Real-time Data Synchronization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CDC captures and propagates changes instantly, ensuring all connected systems remain synchronized without delay. This is especially important when multiple databases or systems depend on updated data.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Event-Driven Architectures&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CDC transforms data changes into events that trigger downstream processes or workflows. This enables responsive systems capable of reacting immediately to user actions, business events, or operational changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Efficient Data Processing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Since CDC transmits only modified data, it reduces the overhead of full database scans or batch processing. This speeds up pipelines and ensures downstream systems always receive fresh data.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Scalability and Flexibility&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CDC supports asynchronous, decoupled communication, making it easier for distributed systems to scale horizontally while maintaining reliability.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Enhanced Analytics and Insights&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Real-time data flow enables updated dashboards, instant anomaly detection, and more proactive decision-making.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Change Data Capture (CDC) Principles&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capture:&lt;/strong&gt; Identify inserts, updates, and deletes without impacting the source system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log-based Tracking:&lt;/strong&gt; Leverage transaction logs or replication logs to detect changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental Updates:&lt;/strong&gt; Transfer only changed rows instead of full datasets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time or Near Real-time:&lt;/strong&gt; Propagate changes quickly to ensure data freshness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotent Processing:&lt;/strong&gt; Avoid duplicate processing and maintain consistency.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Use Cases of Change Data Capture&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Warehousing&lt;/strong&gt; – Keep analytical systems updated with the latest transactional data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replication&lt;/strong&gt; – Synchronize data across distributed systems for load balancing or disaster recovery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Integration&lt;/strong&gt; – Integrate modern and legacy systems or synchronize between cloud and on-premises systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Analytics&lt;/strong&gt; – Feed streaming platforms or dashboards with immediate updates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Synchronization&lt;/strong&gt; – Maintain consistency across microservices, caching layers, or distributed components.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Real-World Example: CDC at Netflix&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Netflix employs a CDC pipeline built on &lt;strong&gt;Apache Kafka&lt;/strong&gt; and &lt;strong&gt;Apache Flink&lt;/strong&gt; to extract changes from internal databases and process them in real-time. This architecture powers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Service Usage Monitoring&lt;/strong&gt; – Tracking user interactions to optimize performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommendation Engine&lt;/strong&gt; – Providing personalized content suggestions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fraud Detection&lt;/strong&gt; – Identifying abnormal activities instantly.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Change Data Capture Methods&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Below are the most common CDC techniques, including how they work and their pros/cons.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;1. Log-Based CDC&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How It Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Changes are captured directly from the database transaction log, which records every data modification for recovery purposes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2myrkcce9522pw88ad9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2myrkcce9522pw88ad9.png" alt=" " width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Advantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Minimal performance overhead&lt;/li&gt;
&lt;li&gt;No changes required to the source schema&lt;/li&gt;
&lt;li&gt;Highly accurate and reliable&lt;/li&gt;
&lt;li&gt;No triggers or additional queries needed&lt;/li&gt;
&lt;li&gt;Works with most transactional databases&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Disadvantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Some tools require a paid license&lt;/li&gt;
&lt;li&gt;More complex to configure&lt;/li&gt;
&lt;li&gt;Requires primary or unique keys&lt;/li&gt;
&lt;li&gt;Logs must be retained if the target system is down&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;2. Trigger-Based CDC&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How It Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Database triggers are created on the source table to record inserts, updates, or deletes into a &lt;strong&gt;shadow table&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flny5st5kr374efpqygks.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flny5st5kr374efpqygks.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Advantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Works on any database supporting triggers&lt;/li&gt;
&lt;li&gt;Provides real-time capture&lt;/li&gt;
&lt;li&gt;Shadow table can be customized&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Disadvantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Impacts source database performance&lt;/li&gt;
&lt;li&gt;Adds processing overhead&lt;/li&gt;
&lt;li&gt;More difficult to maintain, especially with schema changes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;3. Timestamp-Based CDC&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How It Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Requires a &lt;code&gt;LastModified&lt;/code&gt; or timestamp column.&lt;/li&gt;
&lt;li&gt;Queries fetch only rows where the timestamp &amp;gt; last extraction time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkwmqwocf8kufe72wuo8g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkwmqwocf8kufe72wuo8g.png" alt=" " width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Advantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Easy to implement&lt;/li&gt;
&lt;li&gt;No triggers or log access required&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Disadvantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Deletes cannot be detected&lt;/li&gt;
&lt;li&gt;Can miss changes if timestamps are not updated&lt;/li&gt;
&lt;li&gt;Can slow performance due to full table scans&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion on CDC Methods&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Choosing the right CDC method depends on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time requirements&lt;/li&gt;
&lt;li&gt;Database load tolerance&lt;/li&gt;
&lt;li&gt;Schema complexity&lt;/li&gt;
&lt;li&gt;Infrastructure capabilities&lt;/li&gt;
&lt;li&gt;Operational constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Log-based CDC is generally the best option for performance-critical systems, while trigger-based CDC is useful when log access is restricted.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Log-Based CDC and Trigger-Based CDC Using SQL Server&lt;/strong&gt;
&lt;/h1&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Installation&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SQL Server&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Download and install SQL Server (Developer Edition recommended).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SQL Server Management Studio (SSMS)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Download SSMS and connect to the SQL Server instance.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Types of SQL Server Authentication&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Windows Authentication&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Uses the logged-in Windows user credentials.&lt;br&gt;
Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No password management&lt;/li&gt;
&lt;li&gt;More secure&lt;/li&gt;
&lt;li&gt;Integrated with enterprise policies&lt;/li&gt;
&lt;li&gt;Simplifies permission handling&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;SQL Server Authentication&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Requires a username and password stored in SQL Server.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5dy0xlcgqrs8pmht4d5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5dy0xlcgqrs8pmht4d5.png" alt=" " width="800" height="604"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Choice:&lt;/strong&gt; Windows Authentication was selected for its security and ease of use.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h1&gt;
  
  
  &lt;strong&gt;Implementation of Trigger-Based CDC&lt;/strong&gt;
&lt;/h1&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Create Source Table&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;person_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;person_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Create History Table&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;hist_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;person_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;person_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="n"&gt;UserName&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="n"&gt;InsertedDateTime&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;GO&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Create Trigger&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;tr_AllOperationsTrigger&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;person&lt;/span&gt;
&lt;span class="k"&gt;AFTER&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="c1"&gt;-- DELETE&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UserName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InsertedDateTime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'deleted'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUSER_SNAME&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;-- INSERT&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UserName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InsertedDateTime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'inserted'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUSER_SNAME&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;-- UPDATE&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;BEGIN&lt;/span&gt;
        &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'updated old value'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUSER_SNAME&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'updated new value'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUSER_SNAME&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GO&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  &lt;strong&gt;Implementation of Log-Based CDC&lt;/strong&gt;
&lt;/h1&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Enable CDC for the Database&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sp_cdc_enable_db&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GO&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enable CDC for a Table&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sp_cdc_enable_table&lt;/span&gt;
    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;source_schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="s1"&gt;'dbo'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;source_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="s1"&gt;'persontri'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;role_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GO&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Disable Cleanup Job (Optional)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Prevents cleanup from removing captured data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sp_cdc_drop_job&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;job_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'cleanup'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Verify CDC Status&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;is_cdc_enabled&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;databases&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'test'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;cdc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;change_tables&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjci4gyax3ghesopnb0s9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjci4gyax3ghesopnb0s9.png" alt=" " width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;View CDC Data&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;cdc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo_person_CT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatu07duvztz8gmxxggzy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatu07duvztz8gmxxggzy.png" alt=" " width="800" height="65"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Operation Types in &lt;code&gt;__$operation&lt;/code&gt;&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Delete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Insert&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Update (old values)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Update (new values)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Metadata operation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Adding User and Operation Details&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Add User Column&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;cdc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo_person_CT&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="n"&gt;userid2&lt;/span&gt; &lt;span class="n"&gt;nvarchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;USER_NAME&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Create Operation Table&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;operation&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;operation_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;operation_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="k"&gt;operation&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Delete'&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Insert'&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'update(old)'&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'update(new)'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Convert LSN to Timestamp&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CONVERT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;CONVERT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;DATETIME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fn_cdc_map_lsn_to_time&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;__start_lsn&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="mi"&gt;108&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ModificationHourMinute&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CONVERT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fn_cdc_map_lsn_to_time&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;__start_lsn&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ModificationDate&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="k"&gt;operation&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt;
    &lt;span class="n"&gt;cdc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo_person_CT&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;operation_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  &lt;strong&gt;Comparison Between Log-based and Trigger-based CDC&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Testing tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SQLQueryStress&lt;/strong&gt; → workload generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Monitor&lt;/strong&gt; → hardware-level performance metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Monitored Metrics&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Disk Reads/Writes per second&lt;/li&gt;
&lt;li&gt;Virtual memory page reads/writes&lt;/li&gt;
&lt;li&gt;Transactions per second&lt;/li&gt;
&lt;li&gt;Lock requests per second&lt;/li&gt;
&lt;li&gt;Batch requests per second&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Performance Summary&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SQLQueryStress Results&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Log-Based CDC&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~0.0037 seconds/iteration&lt;/li&gt;
&lt;li&gt;~5.672 logical reads&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Trigger-Based CDC&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~0.0017 seconds/iteration&lt;/li&gt;
&lt;li&gt;~3.013 logical reads&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;→ Log-based CDC outperforms trigger-based CDC by &lt;strong&gt;55 seconds&lt;/strong&gt; over 100,000 iterations.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Performance Monitor Results&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Log-Based CDC&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lower transactions/sec (623,995)&lt;/li&gt;
&lt;li&gt;Lower disk writes/sec (316,574)&lt;/li&gt;
&lt;li&gt;Much lower lock requests/sec (4,978)&lt;/li&gt;
&lt;li&gt;Higher page writes/sec&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Trigger-Based CDC&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;More batch requests/sec (833,331)&lt;/li&gt;
&lt;li&gt;Very high lock requests/sec (2,548,997)&lt;/li&gt;
&lt;li&gt;Higher disk writes/sec (370,824)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Interpretation:&lt;/strong&gt;&lt;br&gt;
Log-based CDC uses transaction logs efficiently without introducing overhead, while trigger-based CDC generates extra writes and locks, causing higher resource consumption.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Final Conclusion&lt;/strong&gt;
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Log-based CDC&lt;/strong&gt; is the best choice for large-scale, high-performance systems.&lt;br&gt;
It minimizes overhead, provides accurate change tracking, and integrates naturally with SQL Server transaction logs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Trigger-based CDC&lt;/strong&gt; is easier to implement and works on many databases but introduces overhead and complexity.&lt;br&gt;
It is suitable when log access is restricted or custom auditing logic is required.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In performance tests, log-based CDC clearly outperforms trigger-based CDC, resulting in lower lock contention, fewer disk operations, and overall faster execution.&lt;/p&gt;

</description>
      <category>database</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Change Data Capture(Example using Microsoft SQL Server Log-based CDC vs Trigger-based CDC</title>
      <dc:creator>hammoudi wissem</dc:creator>
      <pubDate>Thu, 20 Nov 2025 15:16:41 +0000</pubDate>
      <link>https://dev.to/hammoudi_wissem_2a55b8cc2/change-data-captureexample-using-microsoft-sql-server-log-based-cdc-vs-trigger-based-cdc-1a9p</link>
      <guid>https://dev.to/hammoudi_wissem_2a55b8cc2/change-data-captureexample-using-microsoft-sql-server-log-based-cdc-vs-trigger-based-cdc-1a9p</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Definition of Change Data Capture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Change Data Capture (CDC) is a technique used to detect, record, and track changes—such as inserts, updates, and deletes—made to data in real-time or near real-time. It is widely used in databases and information systems to maintain an accurate history of modifications and synchronize data across systems.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; CDC is considered an ETL (Extract, Transform, Load) technique where only changed data is extracted from a source system and loaded into a target system such as a data warehouse or data lake.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Importance of Change Data Capture&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Real-time Data Synchronization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CDC captures and propagates changes instantly, ensuring all connected systems remain synchronized without delay. This is especially important when multiple databases or systems depend on updated data.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Event-Driven Architectures&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CDC transforms data changes into events that trigger downstream processes or workflows. This enables responsive systems capable of reacting immediately to user actions, business events, or operational changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Efficient Data Processing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Since CDC transmits only modified data, it reduces the overhead of full database scans or batch processing. This speeds up pipelines and ensures downstream systems always receive fresh data.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Scalability and Flexibility&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CDC supports asynchronous, decoupled communication, making it easier for distributed systems to scale horizontally while maintaining reliability.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Enhanced Analytics and Insights&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Real-time data flow enables updated dashboards, instant anomaly detection, and more proactive decision-making.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Change Data Capture (CDC) Principles&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capture:&lt;/strong&gt; Identify inserts, updates, and deletes without impacting the source system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log-based Tracking:&lt;/strong&gt; Leverage transaction logs or replication logs to detect changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental Updates:&lt;/strong&gt; Transfer only changed rows instead of full datasets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time or Near Real-time:&lt;/strong&gt; Propagate changes quickly to ensure data freshness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotent Processing:&lt;/strong&gt; Avoid duplicate processing and maintain consistency.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Use Cases of Change Data Capture&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Warehousing&lt;/strong&gt; – Keep analytical systems updated with the latest transactional data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replication&lt;/strong&gt; – Synchronize data across distributed systems for load balancing or disaster recovery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Integration&lt;/strong&gt; – Integrate modern and legacy systems or synchronize between cloud and on-premises systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Analytics&lt;/strong&gt; – Feed streaming platforms or dashboards with immediate updates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Synchronization&lt;/strong&gt; – Maintain consistency across microservices, caching layers, or distributed components.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Real-World Example: CDC at Netflix&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Netflix employs a CDC pipeline built on &lt;strong&gt;Apache Kafka&lt;/strong&gt; and &lt;strong&gt;Apache Flink&lt;/strong&gt; to extract changes from internal databases and process them in real-time. This architecture powers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Service Usage Monitoring&lt;/strong&gt; – Tracking user interactions to optimize performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommendation Engine&lt;/strong&gt; – Providing personalized content suggestions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fraud Detection&lt;/strong&gt; – Identifying abnormal activities instantly.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Change Data Capture Methods&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Below are the most common CDC techniques, including how they work and their pros/cons.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;1. Log-Based CDC&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How It Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Changes are captured directly from the database transaction log, which records every data modification for recovery purposes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2myrkcce9522pw88ad9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2myrkcce9522pw88ad9.png" alt=" " width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Advantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Minimal performance overhead&lt;/li&gt;
&lt;li&gt;No changes required to the source schema&lt;/li&gt;
&lt;li&gt;Highly accurate and reliable&lt;/li&gt;
&lt;li&gt;No triggers or additional queries needed&lt;/li&gt;
&lt;li&gt;Works with most transactional databases&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Disadvantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Some tools require a paid license&lt;/li&gt;
&lt;li&gt;More complex to configure&lt;/li&gt;
&lt;li&gt;Requires primary or unique keys&lt;/li&gt;
&lt;li&gt;Logs must be retained if the target system is down&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;2. Trigger-Based CDC&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How It Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Database triggers are created on the source table to record inserts, updates, or deletes into a &lt;strong&gt;shadow table&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flny5st5kr374efpqygks.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flny5st5kr374efpqygks.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Advantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Works on any database supporting triggers&lt;/li&gt;
&lt;li&gt;Provides real-time capture&lt;/li&gt;
&lt;li&gt;Shadow table can be customized&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Disadvantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Impacts source database performance&lt;/li&gt;
&lt;li&gt;Adds processing overhead&lt;/li&gt;
&lt;li&gt;More difficult to maintain, especially with schema changes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;3. Timestamp-Based CDC&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How It Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Requires a &lt;code&gt;LastModified&lt;/code&gt; or timestamp column.&lt;/li&gt;
&lt;li&gt;Queries fetch only rows where the timestamp &amp;gt; last extraction time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkwmqwocf8kufe72wuo8g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkwmqwocf8kufe72wuo8g.png" alt=" " width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Advantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Easy to implement&lt;/li&gt;
&lt;li&gt;No triggers or log access required&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Disadvantages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Deletes cannot be detected&lt;/li&gt;
&lt;li&gt;Can miss changes if timestamps are not updated&lt;/li&gt;
&lt;li&gt;Can slow performance due to full table scans&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion on CDC Methods&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Choosing the right CDC method depends on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time requirements&lt;/li&gt;
&lt;li&gt;Database load tolerance&lt;/li&gt;
&lt;li&gt;Schema complexity&lt;/li&gt;
&lt;li&gt;Infrastructure capabilities&lt;/li&gt;
&lt;li&gt;Operational constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Log-based CDC is generally the best option for performance-critical systems, while trigger-based CDC is useful when log access is restricted.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Log-Based CDC and Trigger-Based CDC Using SQL Server&lt;/strong&gt;
&lt;/h1&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Installation&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SQL Server&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Download and install SQL Server (Developer Edition recommended).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SQL Server Management Studio (SSMS)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Download SSMS and connect to the SQL Server instance.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Types of SQL Server Authentication&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Windows Authentication&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Uses the logged-in Windows user credentials.&lt;br&gt;
Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No password management&lt;/li&gt;
&lt;li&gt;More secure&lt;/li&gt;
&lt;li&gt;Integrated with enterprise policies&lt;/li&gt;
&lt;li&gt;Simplifies permission handling&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;SQL Server Authentication&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Requires a username and password stored in SQL Server.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5dy0xlcgqrs8pmht4d5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5dy0xlcgqrs8pmht4d5.png" alt=" " width="800" height="604"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Choice:&lt;/strong&gt; Windows Authentication was selected for its security and ease of use.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h1&gt;
  
  
  &lt;strong&gt;Implementation of Trigger-Based CDC&lt;/strong&gt;
&lt;/h1&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Create Source Table&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;person_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;person_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Create History Table&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;hist_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;person_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;person_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="n"&gt;UserName&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="n"&gt;InsertedDateTime&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;GO&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Create Trigger&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;tr_AllOperationsTrigger&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;person&lt;/span&gt;
&lt;span class="k"&gt;AFTER&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="c1"&gt;-- DELETE&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UserName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InsertedDateTime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'deleted'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUSER_SNAME&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;-- INSERT&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UserName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InsertedDateTime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'inserted'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUSER_SNAME&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;-- UPDATE&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;BEGIN&lt;/span&gt;
        &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'updated old value'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUSER_SNAME&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;person_history&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;person_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'updated new value'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUSER_SNAME&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GO&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  &lt;strong&gt;Implementation of Log-Based CDC&lt;/strong&gt;
&lt;/h1&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Enable CDC for the Database&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sp_cdc_enable_db&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GO&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enable CDC for a Table&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sp_cdc_enable_table&lt;/span&gt;
    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;source_schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="s1"&gt;'dbo'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;source_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="s1"&gt;'persontri'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;role_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GO&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Disable Cleanup Job (Optional)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Prevents cleanup from removing captured data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sp_cdc_drop_job&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;job_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'cleanup'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Verify CDC Status&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;is_cdc_enabled&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;databases&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'test'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;cdc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;change_tables&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjci4gyax3ghesopnb0s9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjci4gyax3ghesopnb0s9.png" alt=" " width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;View CDC Data&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;cdc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo_person_CT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatu07duvztz8gmxxggzy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatu07duvztz8gmxxggzy.png" alt=" " width="800" height="65"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Operation Types in &lt;code&gt;__$operation&lt;/code&gt;&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Delete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Insert&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Update (old values)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Update (new values)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Metadata operation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Adding User and Operation Details&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Add User Column&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;cdc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo_person_CT&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="n"&gt;userid2&lt;/span&gt; &lt;span class="n"&gt;nvarchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;USER_NAME&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Create Operation Table&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;operation&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;operation_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;operation_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="k"&gt;operation&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Delete'&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Insert'&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'update(old)'&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'update(new)'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Convert LSN to Timestamp&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CONVERT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;CONVERT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;DATETIME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fn_cdc_map_lsn_to_time&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;__start_lsn&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="mi"&gt;108&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ModificationHourMinute&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CONVERT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fn_cdc_map_lsn_to_time&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;__start_lsn&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ModificationDate&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="k"&gt;operation&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt;
    &lt;span class="n"&gt;cdc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo_person_CT&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;operation_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  &lt;strong&gt;Comparison Between Log-based and Trigger-based CDC&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Testing tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SQLQueryStress&lt;/strong&gt; → workload generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Monitor&lt;/strong&gt; → hardware-level performance metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Monitored Metrics&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Disk Reads/Writes per second&lt;/li&gt;
&lt;li&gt;Virtual memory page reads/writes&lt;/li&gt;
&lt;li&gt;Transactions per second&lt;/li&gt;
&lt;li&gt;Lock requests per second&lt;/li&gt;
&lt;li&gt;Batch requests per second&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Performance Summary&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SQLQueryStress Results&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Log-Based CDC&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~0.0037 seconds/iteration&lt;/li&gt;
&lt;li&gt;~5.672 logical reads&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Trigger-Based CDC&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~0.0017 seconds/iteration&lt;/li&gt;
&lt;li&gt;~3.013 logical reads&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;→ Log-based CDC outperforms trigger-based CDC by &lt;strong&gt;55 seconds&lt;/strong&gt; over 100,000 iterations.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Performance Monitor Results&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Log-Based CDC&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lower transactions/sec (623,995)&lt;/li&gt;
&lt;li&gt;Lower disk writes/sec (316,574)&lt;/li&gt;
&lt;li&gt;Much lower lock requests/sec (4,978)&lt;/li&gt;
&lt;li&gt;Higher page writes/sec&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Trigger-Based CDC&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;More batch requests/sec (833,331)&lt;/li&gt;
&lt;li&gt;Very high lock requests/sec (2,548,997)&lt;/li&gt;
&lt;li&gt;Higher disk writes/sec (370,824)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Interpretation:&lt;/strong&gt;&lt;br&gt;
Log-based CDC uses transaction logs efficiently without introducing overhead, while trigger-based CDC generates extra writes and locks, causing higher resource consumption.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;strong&gt;Final Conclusion&lt;/strong&gt;
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Log-based CDC&lt;/strong&gt; is the best choice for large-scale, high-performance systems.&lt;br&gt;
It minimizes overhead, provides accurate change tracking, and integrates naturally with SQL Server transaction logs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Trigger-based CDC&lt;/strong&gt; is easier to implement and works on many databases but introduces overhead and complexity.&lt;br&gt;
It is suitable when log access is restricted or custom auditing logic is required.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In performance tests, log-based CDC clearly outperforms trigger-based CDC, resulting in lower lock contention, fewer disk operations, and overall faster execution.&lt;/p&gt;

</description>
      <category>database</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Web Scraping with Python and BeautifulSoup: A Beginner-Friendly Guide</title>
      <dc:creator>hammoudi wissem</dc:creator>
      <pubDate>Wed, 19 Nov 2025 08:45:00 +0000</pubDate>
      <link>https://dev.to/hammoudi_wissem_2a55b8cc2/web-scraping-with-python-and-beautifulsoup-a-beginner-friendly-guide-4e82</link>
      <guid>https://dev.to/hammoudi_wissem_2a55b8cc2/web-scraping-with-python-and-beautifulsoup-a-beginner-friendly-guide-4e82</guid>
      <description>&lt;h2&gt;
  
  
  A General Overview
&lt;/h2&gt;

&lt;p&gt;When people discuss "web scrapers," they commonly refer to a process that involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Retrieving HTML data from a domain name&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parsing that data for target information&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storing the target information&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optionally, moving to another page to repeat the process&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction to BeautifulSoup
&lt;/h2&gt;

&lt;p&gt;Because the BeautifulSoup library is not a default Python library, it must be installed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install beautifulsoup4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most commonly used object in the BeautifulSoup library is, appropriately, the BeautifulSoup object. Let's take a look at it in action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page1.html')
bs = BeautifulSoup(html.read(), 'html.parser')
print(bs.h1)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|&amp;lt;h1&amp;gt;Beautiful Soup Documentation&amp;lt;a class="headerlink" href="#module-bs4" title="Link to this heading"&amp;gt;¶&amp;lt;/a&amp;gt;&amp;lt;/h1&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that this returns only the first instance of the h1 tag found on the page. By con‐ vention, only one h1 tag should be used on a single page, but conventions are often broken on the web, so you should be aware that this will retrieve the first instance of the tag only, and not necessarily the one that you're looking for.&lt;/p&gt;

&lt;p&gt;Another popular parser is lxml. This can be installed through pip:|&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install lxml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;lxml has some advantages over html.parser in that it is generally better at parsing "messy" or malformed HTML code. It is forgiving and fixes problems like unclosed tags, tags that are improperly nested, and missing head or body tags. It is also some‐ what faster than html.parser&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://www.crummy.com/software/BeautifulSoup/bs4/doc/')
bs = BeautifulSoup(html, 'lxml')
print(bs.h1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connecting Reliably and Handling Exceptions
&lt;/h2&gt;

&lt;p&gt;The web is messy. Data is poorly formatted, websites go down, and closing tags go missing. One of the most frustrating experiences in web scraping is to go to sleep with a scraper running, dreaming of all the data you'll have in your database the next day - only to find that the scraper hit an error on some unexpected data format and stopped execution shortly after you stopped looking at the screen.&lt;br&gt;
Two main things can go wrong in this line:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The page is not found on the server (or there was an error in retrieving it)&lt;/li&gt;
&lt;li&gt;The server is not found.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, if the page is retrieved successfully from the server, there is still the issue of the content on the page not quite being what you expected. Every time you access a tag in a BeautifulSoup object, it's smart to add a check to make sure the tag actually exists.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
def getTitle(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        return None
    try:
        bs = BeautifulSoup(html.read(), 'html.parser')
        title = bs.body.h1
    except AttributeError as e:
        return None
    return title
title = getTitle('https://www.crummy.com/software/BeautifulSoup/bs4/doc/')
if title == None:
    print('Title could not be found')
else:
    print(title)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  find() and find_all() with BeautifulSoup
&lt;/h2&gt;

&lt;p&gt;BeautifulSoup's find() and find_all() are the two functions you will likely use the most. With them, you can easily filter HTML pages to find lists of desired tags, or a single tag, based on their various attributes. The two functions are extremely similar, as evidenced by their definitions in the BeautifulSoup documentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;find_all(tag, attributes, recursive, text, limit, keywords)
find(tag, attributes, recursive, text, keywords)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These examples demonstrate how find_all can be used to locate elements based on different criteria such as tag names, attributes, and text content within an HTML or XML document parsed with BeautifulSoup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bs.find_all(['h1','h2','h3','h4','h5','h6'])
bs.find_all('span', {'class':{'n'}})
nameList = bs.find_all(text='soup')
title = bs.find_all(id='searchlabel')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The find_all method searches for all occurrences of tags that match the given criteria. On the other hand, The find method is used to search for the first occurrence of a tag that matches the given criteria.&lt;/p&gt;

&lt;h2&gt;
  
  
  Navigating Trees
&lt;/h2&gt;

&lt;p&gt;The find_all function is responsible for finding tags based on their name and attributes. But what if you need to find a tag based on its location in a document?&lt;/p&gt;

&lt;p&gt;**Dealing with children and other descendants&lt;/p&gt;

&lt;p&gt;For example, if you need to retrieve the contents of a table by accessing only its immediate children, you can use the .children method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://www.crummy.com/software/BeautifulSoup/bs4/doc/')
bs = BeautifulSoup(html, 'html.parser')
for child in bs.find('table').children:
    print(child)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;**Dealing with siblings&lt;/p&gt;

&lt;p&gt;Additionally, to extract data from a table beyond its immediate children, BeautifulSoup offers the next_siblings() function. This function simplifies the process by allowing us to retrieve siblings of a &lt;/p&gt;
&lt;tr&gt; tag.&lt;br&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://www.crummy.com/software/BeautifulSoup/bs4/doc/')
bs = BeautifulSoup(html, 'html.parser')
for child in bs.find('table').tr.next_siblings:
    print(child)
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Dealing with parents&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;hen you're scraping web pages, you often focus more on finding children or siblings of tags rather than their parents. Typically, when you begin crawling an HTML page, you start by examining the top-level tags and then work your way down to locate specific pieces of data. However, there are times when you might encounter unusual scenarios that necessitate using BeautifulSoup's parent-finding functions like .parent and .parents.&lt;/p&gt;

&lt;p&gt;we can also get the table by find the tag td then finding his parent and printing all his siblings&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('https://www.crummy.com/software/BeautifulSoup/bs4/doc/')
bs = BeautifulSoup(html, 'html.parser')

element = bs.find('td')  
if element:
    parent_element = element.parent
    for sibling in parent_element.next_siblings:
        print(sibling)
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;&lt;strong&gt;Regular Expressions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One classic example of regular expressions can be found in the practice of identifying email addresses. Although the exact rules governing email addresses vary slightly from mail server to mail server, we can create a few general rules.&lt;br&gt;
[A-Za-z0–9._+]+@[A-Za-z]+.(com|org|edu|net)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r2j146lyemvmaaznmx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r2j146lyemvmaaznmx0.png" alt=" " width="800" height="923"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When scraping web pages using BeautifulSoup, the synergy with regular expressions becomes invaluable, especially for tasks like extracting specific elements such as event images from complex HTML structures. For instance, consider a scenario where you need to retrieve URLs of event images from a page.&lt;/p&gt;

&lt;p&gt;To address this, leveraging regular expressions allows you to pinpoint images by specific attributes like their src attribute, which contains the file path. This approach ensures you accurately target product images regardless of their position or surrounding elements on the page.&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen('http://enetcomje.com/gallerie.php')
bs = BeautifulSoup(html, 'html.parser')

# Adjust the regular expression pattern to match 'event' images
images = bs.find_all('img', {'src': re.compile('event\d+\.png')})

for image in images:
    print(image['src'])
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;&lt;strong&gt;Accessing Attributes&lt;/strong&gt;&lt;br&gt;
When web scraping, you often need to access attributes of HTML tags rather than their content. This is especially useful for tags like &lt;a&gt;, where the URL is in the href attribute, or &lt;img&gt;, where the image source (src) is crucial. BeautifulSoup simplifies this process with its attrs attribute, which returns a Python dictionary containing all attributes of a tag.&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(bs.img.attrs['src'])

&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;
  
  
  Writing Web Crawlers
&lt;/h2&gt;

&lt;p&gt;So far, you've seen single static pages . In this part, you'll start looking at real-world problems, with scrapers traversing multiple pages and even multiple sites.&lt;/p&gt;

&lt;p&gt;Web crawlers are called such because they crawl across the web. At their core is an element of recursion. They must retrieve page contents for a URL, examine that page for another URL, and retrieve that page, ad infinitum.&lt;/p&gt;

&lt;p&gt;this is a code that retives an arbitrary wikepideia pages and produces a list of link of that pages :&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://en.wikipedia.org/wiki/Kevin_Bacon')
bs = BeautifulSoup(html, 'html.parser')
for link in bs.find_all('a'):
    if 'href' in link.attrs:
        print(link.attrs['href'])
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;If you look at the list of links produced, you'll notice that all the articles you'd expect are there: "Apollo 13," "Philadelphia," "Primetime Emmy Award," and so on. However, there are some things that you don't want as well: //wikimediafoundation.org/wiki/Privacy_policy //en.wikipedia.org/wiki/Wikipedia:Contact_us&lt;/p&gt;

&lt;p&gt;If you examine the links that point to article pages (as opposed to other internal pages), you'll see that they all have three things in common:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They reside within the div with the id set to bodyContent.&lt;/li&gt;
&lt;li&gt;The URLs do not contain colons.&lt;/li&gt;
&lt;li&gt;The URLs begin with /wiki/.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;so we can improve the last code to recivee only the desired article by using the regular expression ^(/wiki/)((?!:).)*$")&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen('http://en.wikipedia.org/wiki/Kevin_Bacon')
bs = BeautifulSoup(html, 'html.parser')
for link in bs.find('div', {'id':'bodyContent'}).find_all(
    'a', href=re.compile('^(/wiki/)((?!:).)*$')):
    if 'href' in link.attrs:
        print(link.attrs['href'])
&lt;/code&gt;&lt;/pre&gt;



&lt;h2&gt;
  
  
  Crawling an Entire Site
&lt;/h2&gt;

&lt;p&gt;Web scrapers that traverse an entire site are useful for several purposes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generating a Site Map: Using a crawler, you can scan an entire site, gather all internal links, and organize the pages into their actual folder structure. This could help you discover hidden sections and accurately count the number of pages.&lt;/li&gt;
&lt;li&gt;Gathering Data: If you need to collect articles (such as stories, blog posts, news articles, etc.) to create a prototype for a specialized search platform, you would need data from only a few sites but want a broad collection. Therefore, you will build crawlers that traverse each site and collect data specifically from article pages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The general approach to an exhaustive site crawl is to start with a top-level page (such as the home page), and search for a list of all internal links on that page. Every one of those links is then crawled, and additional lists of links are found on each one of them, triggering another round of crawling.&lt;/p&gt;

&lt;p&gt;this a code that could give me all the link available in wikipidia :&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
pages = set()
def getLinks(pageUrl):
    global pages
    html = urlopen('http://en.wikipedia.org{}'.format(pageUrl))
    bs = BeautifulSoup(html, 'html.parser')
    for link in bs.find_all('a', href=re.compile('^(/wiki/)')):
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages:
                #We have encountered a new page
                newPage = link.attrs['href']
                print(newPage)
                pages.add(newPage)
                getLinks(newPage)
getLinks('')
&lt;/code&gt;&lt;/pre&gt;



&lt;h2&gt;
  
  
  Collecting Data Across an Entire Site
&lt;/h2&gt;

&lt;p&gt;Web crawlers would be fairly boring if all they did was hop from one page to the other. To make them useful, you need to be able to do something on the page while you're there. Let's look at how to build a scraper that collects the title, the first para‐ graph of content, and the link to edit the page (if available).&lt;/p&gt;

&lt;p&gt;As always, the first step to determine how best to do this is to look at a few pages from the site and determine a pattern. By looking at a handful of Wikipedia pages (both articles and nonarticle pages such as the privacy policy page), the following things should be clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All titles (on all pages, regardless of their status as an article page, an edit history page, or any other page) have titles under h1 → span tags, and these are the only h1 tags on the page.&lt;/li&gt;
&lt;li&gt;As mentioned before, all body text lives under the div#bodyContent tag. How‐ ever, if you want to get more specific and access just the first paragraph of text, you might be better off using div#mw-content-text → p (selecting the first para‐ graph tag only). This is true for all content pages except file pages (for example, &lt;a href="https://en.wikipedia.org/wiki/File:Orbit_of_274301_Wikipedia.svg" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/File:Orbit_of_274301_Wikipedia.svg&lt;/a&gt;), which do not have sections of content text.&lt;/li&gt;
&lt;li&gt;Edit links occur only on article pages. If they occur, they will be found in the li#ca-edit tag, under li#ca-edit → span → a.
&lt;/li&gt;
&lt;/ul&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
pages = set()
def getLinks(pageUrl):
    global pages
    html = urlopen('http://en.wikipedia.org{}'.format(pageUrl))
    bs = BeautifulSoup(html, 'html.parser')
    try:
        print(bs.h1.get_text())
        print(bs.find(id ='mw-content-text').find_all('p')[0])
        print(bs.find(id='ca-edit').find('span')
         .find('a').attrs['href'])
    except AttributeError:
        print('This page is missing something! Continuing.')

    for link in bs.find_all('a', href=re.compile('^(/wiki/)')):
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages:
            #We have encountered a new page
                newPage = link.attrs['href']
                print('-'*20)
                print(newPage)
                pages.add(newPage)
                getLinks(newPage)
getLinks('')
&lt;/code&gt;&lt;/pre&gt;



&lt;h2&gt;
  
  
  Note: Handling Redirects
&lt;/h2&gt;

&lt;p&gt;Redirects allow a web server to point one domain name or URL to a piece of content at a different location. There are two types of redirects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Server-side redirects, where the URL is changed before the page is loaded&lt;/li&gt;
&lt;li&gt;Client-side redirects, sometimes seen with a "You will be redirected in 10 sec‐ onds" type of message, where the page loads before redirecting to the new one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With server-side redirects, you usually don't have to worry. If you're using the urllib library with Python 3.x, it handles redirects automatically! If you're using the requests library, make sure to set the allow-redirects flag to True:&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;r = requests.get('http://github.com', allow_redirects=True)
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;Just be aware that, occasionally, the URL of the page you're crawling might not be exactly the URL that you entered the page on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Crawling Across the Internet
&lt;/h2&gt;

&lt;p&gt;Before you start writing a crawler that follows all outbound links, you should ask yourself a few questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What data am I trying to gather? Can this be accomplished by scraping just a few predefined websites (almost always the easier option), or does my crawler need to be able to discover new websites I might not know about?&lt;/li&gt;
&lt;li&gt;When my crawler reaches a particular website, will it immediately follow the next outbound link to a new website, or will it stick around for a while and drill down into the current website?&lt;/li&gt;
&lt;li&gt;Are there any conditions under which I would not want to scrape a particular site? Am I interested in non-English content?&lt;/li&gt;
&lt;li&gt;How am I protecting myself against legal action if my web crawler catches the attention of a webmaster on one of the sites it runs across?
&lt;/li&gt;
&lt;/ul&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import time
import random
from urllib.request import urlopen
from urllib.parse import urlparse, urljoin
from bs4 import BeautifulSoup
import re

pages = set()

# Seed random generator with current time
random.seed(time.time())

def getInternalLinks(bs, includeUrl):
    """ Retrieves internal links from a BeautifulSoup object """
    includeUrl = '{}://{}'.format(urlparse(includeUrl).scheme,
                                  urlparse(includeUrl).netloc)
    internalLinks = []
    for link in bs.find_all('a', href=re.compile('^(/|.*'+includeUrl+')')):
        if 'href' in link.attrs:
            if link.attrs['href'] is not None:
                url = urljoin(includeUrl, link.attrs['href'])
                if url not in internalLinks:
                    internalLinks.append(url)
    return internalLinks

def getExternalLinks(bs, excludeUrl):
    """ Retrieves external links from a BeautifulSoup object """
    externalLinks = []
    for link in bs.find_all('a', href=re.compile('^(http|www)((?!'+excludeUrl+').)*$')):
        if 'href' in link.attrs:
            if link.attrs['href'] is not None:
                url = link.attrs['href']
                if url not in externalLinks:
                    externalLinks.append(url)
    return externalLinks

def getRandomExternalLink(startingPage):
    """ Retrieves a random external link from the starting page """
    html = urlopen(startingPage)
    bs = BeautifulSoup(html, 'html.parser')
    externalLinks = getExternalLinks(bs, urlparse(startingPage).netloc)
    if len(externalLinks) == 0:
        print('No external links found, looking for internal links')
        domain = '{}://{}'.format(urlparse(startingPage).scheme, urlparse(startingPage).netloc)
        internalLinks = getInternalLinks(bs, domain)
        if internalLinks:
            return getRandomExternalLink(random.choice(internalLinks))
        else:
            return None
    else:
        return random.choice(externalLinks)

def followExternalOnly(startingSite):
    """ Follows external links recursively from the starting site """
    if startingSite in pages:
        return
    pages.add(startingSite)
    externalLink = getRandomExternalLink(startingSite)
    if externalLink:
        print('Random external link is: {}'.format(externalLink))
        followExternalOnly(externalLink)
    else:
        print('No external links found on {}'.format(startingSite))

# Start following external links from a specified starting site
followExternalOnly('http://oreilly.com')
&lt;/code&gt;&lt;/pre&gt;



&lt;h2&gt;
  
  
  Web Crawling Models
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Planning and Defining Objects
&lt;/h2&gt;

&lt;p&gt;One common trap of web scraping is defining the data that you want to collect based entirely on what's available in front of your eyes.&lt;/p&gt;

&lt;p&gt;If you want to collect product data, you may first look at a clothing store and decide that each product you scrape needs to have the &lt;/p&gt;

&lt;p&gt;following fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product name&lt;/li&gt;
&lt;li&gt;Price&lt;/li&gt;
&lt;li&gt;Description&lt;/li&gt;
&lt;li&gt;Sizes&lt;/li&gt;
&lt;li&gt;Colors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking at another website, you find that it has SKUs (stock keeping units, used to track and order items) listed on the page. You definitely want to collect that data as well, even if it doesn't appear on the first site! You add this field:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Item SKU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clearly, this is an unsustainable approach. Simply adding attributes to your product type every time you see a new piece of information on a website will lead to far too many fields to keep track of. Not only that, but every time you scrape a new website, you'll be forced to perform a detailed analysis of the fields the website has and the fields you've accumulated so far, and potentially add new fields (modifying your Python object type and your database structure). This will result in a messy and difficult-to-read dataset that may lead to problems using it.&lt;/p&gt;

&lt;p&gt;One of the best things you can do when deciding which data to collect is often to ignore the websites altogether. You don't start a project that's designed to be large and scalable by looking at a single website and saying, "What exists?" but by saying "What do I need?" and then finding ways to seek the information that you need from there.&lt;/p&gt;

&lt;p&gt;It's important to take a step back and perform a checklist for each item you consider and ask yourself the following questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Will this information help with the project goals? Will it be a roadblock if I don't have it, or is it just "nice to have" but won't ultimately impact anything?&lt;/li&gt;
&lt;li&gt;If it might help in the future, but I'm unsure, how difficult will it be to go back and collect the data at a later time?&lt;/li&gt;
&lt;li&gt;Is this data redundant to data I've already collected?&lt;/li&gt;
&lt;li&gt;Does it make logical sense to store the data within this particular object? (Storing a description in a product doesn't make sense if that description changes from site to site for the same product.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you do decide that you need to collect the data, it's important to ask a few more questions to then decide how to store and handle it in code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is this data sparse or dense? Will it be relevant and populated in every listing, or just a handful out of the set?&lt;/li&gt;
&lt;li&gt;How large is the data?&lt;/li&gt;
&lt;li&gt;Especially in the case of large data, will I need to regularly retrieve it every time I run my analysis, or only on occasion?&lt;/li&gt;
&lt;li&gt;How variable is this type of data? Will I regularly need to add new attributes, modify types, or is it set in stone (shoe sizes)?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Dealing with Di„erent Website Layouts
&lt;/h2&gt;

&lt;p&gt;The most obvious approach is to write a separate web crawler or page parser for each website. Each might take in a URL, string, or BeautifulSoup object, and return a Python object for the thing that was scraped. &lt;/p&gt;

&lt;p&gt;The following is an example of a Content class (representing a piece of content on a website, such as a news article) and two scraper functions that take in a Beauti fulSoup object and return an instance of Content:&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
from bs4 import BeautifulSoup
class Content:
    def __init__(self, url, title, body):
        self.url = url
        self.title = title
        self.body = body
def getPage(url):
    req = requests.get(url)
    return BeautifulSoup(req.text, 'html.parser')
def scrapetheguardian(url):
    bs = getPage(url)
    title_tag = bs.find("h1", {"class": "dcr-u0152o"})
    title = title_tag.text.strip() if title_tag else "Title not found."

    article_body_div = bs.find("div", {"class": "article-body-commercial-selector article-body-viewer-selector dcr-fp1ya"})
    if article_body_div:
        paragraphs = article_body_div.find_all("p")
        body = '\n'.join([p.text.strip() for p in paragraphs])
    else:
        body = "Body content not found."
    return Content(url, title, body)
def scrapeBrookings(url):
    bs = getPage(url)
    title_tag = bs.find("h1")
    title = title_tag.text.strip() if title_tag else "Title not found."
    body_tag = bs.find("section", {"id": "content"})
    if body_tag:
        paragraphs = body_tag.find_all("p")
        body = '\n'.join([p.text.strip() for p in paragraphs])
    else:
        body = "Body content not found."
    return Content(url, title, body)
# Example usage
url_brookings = 'https://www.brookings.edu/articles/delivering-inclusive-urban-access-3-uncomfortable-truths/'
content_brookings = scrapeBrookings(url_brookings)
print('Title: {}'.format(content_brookings.title))
print('URL: {}\n'.format(content_brookings.url))
print(content_brookings.body)
url_theguardian = 'https://www.theguardian.com/us-news/article/2024/jul/01/trump-hush-money-supreme-court-immunity'
content_theguardian = scrapetheguardian(url_theguardian)
print('\nTitle: {}'.format(content_theguardian.title))
print('URL: {}\n'.format(content_theguardian.url))
print(content_theguardian.body)
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;&lt;a href="https://github.com/wissemhammoudi/Extraction-resume-from-resumeviking" rel="noopener noreferrer"&gt;Example: Extracting Resume Using Selnuim and beautifulSoup&lt;/a&gt;&lt;/p&gt;


&lt;/tr&gt;

</description>
      <category>webscraping</category>
    </item>
    <item>
      <title>The Data Analytics Lifecycle</title>
      <dc:creator>hammoudi wissem</dc:creator>
      <pubDate>Sun, 16 Nov 2025 20:31:10 +0000</pubDate>
      <link>https://dev.to/hammoudi_wissem_2a55b8cc2/the-data-analytics-lifecycle-14ph</link>
      <guid>https://dev.to/hammoudi_wissem_2a55b8cc2/the-data-analytics-lifecycle-14ph</guid>
      <description>&lt;p&gt;The data analytics lifecycle is a series of steps to transform raw data into valuable and easily consumable data products. These can range from well-managed datasets to dashboards, reports, APIs, or even web applications. In other words, it describes how data is created, collected, processed, used, and analyzed to achieve a specific product or business goal.&lt;/p&gt;

&lt;p&gt;The increasing complexity in organizational dynamics directly impacts how data is handled. Numerous people must use the same data but with different goals. While a top executive might need to know just a few top-level key performance indicators to track business performance, a middle manager might need a more granular report to support daily decisions.&lt;/p&gt;

&lt;p&gt;This highlights the need for a governed and standardized approach to creating and maintaining data products based on the same data foundation. Given the many decisions an organization must make regarding its data governance, technologies, and management processes, following a structured approach is fundamental to documenting and continuously updating an organization's data strategy.&lt;/p&gt;

&lt;p&gt;The data analytics lifecycle is, therefore, an essential framework for understanding and mapping the phases and processes involved in creating and maintaining an analytics solution . It is an essential concept in data science and analytics and provides a structured approach to managing the various tasks and activities required to create an effective analytics solution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ny176fh6wsdxhjnemmi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ny176fh6wsdxhjnemmi.png" alt=" " width="800" height="819"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The data analytics lifecycle typically includes the following stages:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Problem definition :&lt;/strong&gt;&lt;br&gt;
 The first phase of the analytics cycle is about understanding the problem that needs to be solved. This includes identifying the business objectives, the available data, and the resources needed to solve the problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Data modeling :&lt;/strong&gt;&lt;br&gt;
 After the business requirements are identified, and an assessment of data sources is completed, you can begin modeling your data according to the modeling technique that best meets your needs. You can choose a diamond strategy, a star schema, a Data Vault, or even a fully denormalized technique.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Data ingestion and transformation :&lt;/strong&gt;&lt;br&gt;
 The next phase is to ingest and prepare the data that's coming from the source systems to match the models created. Depending on the overall information architecture, you can opt for a schema-on-write strategy, where you put more effort into transforming the raw data directly into your models, or a schema-on read strategy, where you ingest and store the data with minimal transformations and move heavy transformations to the downstream layers of your data platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Data storage and structuring :&lt;/strong&gt;&lt;br&gt;
 Once the data pipelines are designed and potentially implemented, you need to decide on the file formats to use - simple Apache Parquet or more advanced formats like Delta Lake or Apache Iceberg - as well as the partitioning strategies and storage components to use - a cloud-based object store like Amazon Simple Storage Service (S3) or a more data warehouse–like platform like Redshift, Big‐Query, or Snowflake.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Data visualization and analysis:&lt;/strong&gt;&lt;br&gt;
 Once the data is available, the next step is to explore it, visualize it, or create dashboards that directly support decision making or enable business process monitoring. This phase is very business oriented and should be created in close coordination with business stakeholders.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;- Data quality monitoring, testing, and documentation : *&lt;/em&gt;&lt;br&gt;
Although illustrated as the final phase of the analytics lifecycle, data quality should be an end-to-end concern and ensured by design across the whole flow. It involves implementing all quality controls to ensure that stakeholders can trust your exposed data models, documenting all transformations and semantic meanings, and ensuring proper testing along the pipelines as the data continues to flow.&lt;/p&gt;

&lt;p&gt;The analytics lifecycle is a key concept that enables organizations to approach data engineering, science, and analytics processes in a structured and consistent manner. By following a structured process, organizations can ensure they are solving the right problem, using the right data, and building data products that are accurate and reliable, ultimately leading to better decision making and better business results.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>analytics</category>
    </item>
  </channel>
</rss>
