<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Marvin Zhang</title>
    <description>The latest articles on DEV Community by Marvin Zhang (@tikazyq).</description>
    <link>https://dev.to/tikazyq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F566723%2F136515fd-dd85-4e6e-b36a-a81123260f87.jpeg</url>
      <title>DEV Community: Marvin Zhang</title>
      <link>https://dev.to/tikazyq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tikazyq"/>
    <language>en</language>
    <item>
      <title>Introducing LeanSpec: A Lightweight SDD Framework Built from First Principles</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Thu, 27 Nov 2025 14:34:15 +0000</pubDate>
      <link>https://dev.to/tikazyq/introducing-leanspec-a-lightweight-sdd-framework-built-from-first-principles-18a3</link>
      <guid>https://dev.to/tikazyq/introducing-leanspec-a-lightweight-sdd-framework-built-from-first-principles-18a3</guid>
      <description>&lt;p&gt;Earlier this year, I was amazed by agentic AI coding with Claude Sonnet 3.7. The term "vibe coding" hadn't been coined yet, but that's exactly what I was doing—letting AI generate code while I steered the conversation. It felt magical. Until it didn't.&lt;/p&gt;

&lt;p&gt;After a few weeks, I noticed patterns: code redundancy creeping in, intentions drifting from my original vision, and increasing rework as the AI forgot context between sessions. The honeymoon was over. I needed structure, but not the heavyweight processes that would kill the speed I'd gained.&lt;/p&gt;

&lt;p&gt;That search led me through several existing tools—Kiro, Spec Kit, OpenSpec—and eventually to building &lt;a href="https://github.com/codervisor/lean-spec" rel="noopener noreferrer"&gt;LeanSpec&lt;/a&gt;, a lightweight Spec-Driven Development framework that hits v0.2.7 today with 10 releases in under three weeks. This post shares why I built it, what makes it different, and how you can try it yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Vibe Coding's Hidden Costs
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The Vibe Coding Trap&lt;/strong&gt;&lt;br&gt;
AI coding assistants are incredibly productive—until they're not. Without structured context, AI generates plausible but inconsistent code, leading to technical debt that compounds session after session.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you've used AI coding tools extensively, you've likely encountered these patterns:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code redundancy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI doesn't remember previous implementations&lt;/td&gt;
&lt;td&gt;Duplicate logic scattered across files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Intention drift&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Context lost between sessions&lt;/td&gt;
&lt;td&gt;Features that don't quite match your vision&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Increased rework&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No persistent source of truth&lt;/td&gt;
&lt;td&gt;Circular conversations explaining the same thing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inconsistent architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No structural guidance&lt;/td&gt;
&lt;td&gt;Components that don't fit together cleanly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The industry's answer has been &lt;strong&gt;Spec-Driven Development (SDD)&lt;/strong&gt;—writing specifications before code to give AI (and humans) persistent context. But when I explored the existing tools, I found a gap.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Related Reading&lt;/strong&gt;&lt;br&gt;
New to SDD? Start with my foundational article &lt;a href="https://dev.to/blog/spec-driven-development"&gt;Spec-Driven Development: A Systematic Approach to Complex Features&lt;/a&gt; for methodology basics, or dive into the &lt;a href="https://dev.to/blog/sdd-tools-practices"&gt;2025 SDD Tools Landscape&lt;/a&gt; for a comprehensive comparison of industrial tools. Want to try the methodology without installing anything? The &lt;a href="https://www.lean-spec.dev/docs/tutorials/sdd-without-toolkit" rel="noopener noreferrer"&gt;Practice SDD Without the Toolkit&lt;/a&gt; tutorial has you covered.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why I Built LeanSpec
&lt;/h2&gt;

&lt;p&gt;My journey through the SDD landscape revealed three categories of tools, each with trade-offs that didn't fit my needs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vendor lock-in&lt;/strong&gt;: Kiro (Amazon's SDD IDE) offers tight integration but requires abandoning my existing workflow. I like my tools—switching IDEs wasn't an option.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cognitive overhead&lt;/strong&gt;: Spec Kit provides comprehensive structure, but its elaborate format creates significant cognitive load. Even with AI-assisted writing, parsing and maintaining those specs demands mental bandwidth that feels excessive for solo and small-team work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing project management&lt;/strong&gt;: OpenSpec came closest to my ideal—lightweight and flexible—but lacked the project management capabilities I needed to track dozens of specs across multiple projects.&lt;/p&gt;

&lt;p&gt;I wanted something different: &lt;strong&gt;a methodology, not just a tool&lt;/strong&gt;. Something like Agile—a set of principles anyone can adopt, with lightweight tooling that gets out of the way.&lt;/p&gt;

&lt;p&gt;So I built LeanSpec. And then I used LeanSpec to build LeanSpec.&lt;/p&gt;

&lt;h2&gt;
  
  
  First Principles: The Foundation
&lt;/h2&gt;

&lt;p&gt;LeanSpec isn't just tooling—it's built on five first principles that guide every design decision:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgt31tks8rv248er6oa8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgt31tks8rv248er6oa8t.png" width="784" height="556"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context Economy&lt;/strong&gt;: Specs must fit in working memory—both human and AI. Target under 300 lines. If you can't read it in 10 minutes, it's too long.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Signal-to-Noise Maximization&lt;/strong&gt;: Every line must inform decisions. No boilerplate, no filler, no ceremony for ceremony's sake.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intent Over Implementation&lt;/strong&gt;: Capture &lt;em&gt;why&lt;/em&gt;, not just &lt;em&gt;how&lt;/em&gt;. Implementation details change; intentions persist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bridge the Gap&lt;/strong&gt;: Specs serve both humans and AI. If either can't understand it, the spec has failed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Progressive Disclosure&lt;/strong&gt;: Start simple, add structure only when pain is felt. No upfront complexity.&lt;/p&gt;

&lt;p&gt;These principles aren't just documentation—LeanSpec's &lt;code&gt;validate&lt;/code&gt; command enforces them automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Web UI for Visual Management
&lt;/h3&gt;

&lt;p&gt;The feature I'm most excited about: &lt;code&gt;lean-spec ui&lt;/code&gt; launches a full web interface for managing your specs visually.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Launch the web UI&lt;/span&gt;
npx lean-spec ui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The UI provides Kanban-style board views, spec detail pages with Mermaid diagram rendering, and dependency visualization—all without leaving your browser. Perfect for planning sessions or reviewing project status.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9s1w76j0ich5iekg2lu9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9s1w76j0ich5iekg2lu9.png" alt="LeanSpec Kanban Board View" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qpfpt8aq76nlfjy7lts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qpfpt8aq76nlfjy7lts.png" alt="LeanSpec Spec Detail View" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  First Principles Validation
&lt;/h3&gt;

&lt;p&gt;LeanSpec doesn't just store specs—it validates them against first principles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check your specs against first principles&lt;/span&gt;
lean-spec validate

&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;# specs/045-user-auth/README.md&lt;/span&gt;
&lt;span class="c"&gt;#   ⚠️  warning  Spec exceeds 300 lines (342)  context-economy&lt;/span&gt;
&lt;span class="c"&gt;#   ⚠️  warning  Missing overview section      structure&lt;/span&gt;
&lt;span class="c"&gt;# &lt;/span&gt;
&lt;span class="c"&gt;# ✖ 2 warnings in 1 spec&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps specs lean and meaningful, preventing the specification bloat that plagues heavyweight SDD tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Search &amp;amp; Project Management
&lt;/h3&gt;

&lt;p&gt;Finding relevant specs shouldn't require remembering exact names:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Semantic search across all specs&lt;/span&gt;
lean-spec search &lt;span class="s2"&gt;"authentication flow"&lt;/span&gt;

&lt;span class="c"&gt;# Advanced queries&lt;/span&gt;
lean-spec search &lt;span class="s2"&gt;"status:in-progress tag:api"&lt;/span&gt;
lean-spec search &lt;span class="s2"&gt;"created:&amp;gt;2025-11-01"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Kanban board gives you instant project visibility:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lean-spec board

&lt;span class="c"&gt;# 📋 LeanSpec Board&lt;/span&gt;
&lt;span class="c"&gt;# ─────────────────────────────────────&lt;/span&gt;
&lt;span class="c"&gt;# 📅 Planned (12)     🚧 In Progress (3)     ✅ Complete (47)&lt;/span&gt;
&lt;span class="c"&gt;# ─────────────────────────────────────&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  MCP Server for AI Integration
&lt;/h3&gt;

&lt;p&gt;LeanSpec includes an MCP (Model Context Protocol) server, enabling AI assistants to directly interact with your specs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"leanspec"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"@leanspec/mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works with Claude Code, Cursor, GitHub Copilot, and other MCP-compatible tools. AI agents can search specs, read context, and update status—all programmatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example Projects for Quick Start
&lt;/h3&gt;

&lt;p&gt;New to SDD? Start with a working example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Scaffold a complete tutorial project&lt;/span&gt;
npx lean-spec init &lt;span class="nt"&gt;--example&lt;/span&gt; dark-theme
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three examples available: &lt;code&gt;dark-theme&lt;/code&gt;, &lt;code&gt;dashboard-widgets&lt;/code&gt;, and &lt;code&gt;api-refactor&lt;/code&gt;—each demonstrating different SDD patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Journey: Building LeanSpec with LeanSpec
&lt;/h2&gt;

&lt;p&gt;The most meta aspect of this project: after the initial release, &lt;strong&gt;LeanSpec has been developed entirely using LeanSpec&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Milestone&lt;/th&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;First line of code&lt;/td&gt;
&lt;td&gt;Oct 23, 2025&lt;/td&gt;
&lt;td&gt;Started with basic spec CRUD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.1.0 (First release)&lt;/td&gt;
&lt;td&gt;Nov 2, 2025&lt;/td&gt;
&lt;td&gt;10 days from scratch to release&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.2.0 (Production-ready)&lt;/td&gt;
&lt;td&gt;Nov 10, 2025&lt;/td&gt;
&lt;td&gt;First principles validation, comprehensive CLI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.2.7 (Current)&lt;/td&gt;
&lt;td&gt;Nov 26, 2025&lt;/td&gt;
&lt;td&gt;10 releases in 24 days&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Over 120 specs have been created within LeanSpec itself—covering features, architecture decisions, reflections, and even marketing strategy. The feedback loop is tight: identify friction → write spec → implement → validate with real use.&lt;/p&gt;

&lt;p&gt;I've also applied LeanSpec to other projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/crawlab-team/crawlab" rel="noopener noreferrer"&gt;Crawlab&lt;/a&gt; (12k+ stars) — web crawler management platform&lt;/li&gt;
&lt;li&gt;This blog (marvinzhang.dev)&lt;/li&gt;
&lt;li&gt;Upcoming projects under the &lt;a href="https://github.com/codervisor" rel="noopener noreferrer"&gt;codervisor&lt;/a&gt; org&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern holds across all of them: specs provide context that survives between sessions, AI stays aligned with my intentions, and I spend less time re-explaining.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes LeanSpec Different
&lt;/h2&gt;

&lt;p&gt;If you've read my &lt;a href="https://dev.to/blog/sdd-tools-practices"&gt;SDD Tools analysis&lt;/a&gt;, you know I evaluated six major tools in this space. Here's where LeanSpec fits:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Heavyweight Tools&lt;/th&gt;
&lt;th&gt;LeanSpec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Learning curve&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Days to weeks&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Spec overhead&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extensive upfront work&lt;/td&gt;
&lt;td&gt;Write as you go&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Often &amp;gt;2,000 per spec&lt;/td&gt;
&lt;td&gt;&amp;lt;300 lines target&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rigid structure&lt;/td&gt;
&lt;td&gt;Adapt to your workflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vendor lock-in&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Often required&lt;/td&gt;
&lt;td&gt;Works anywhere&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Philosophy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tool-first&lt;/td&gt;
&lt;td&gt;Methodology-first&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;LeanSpec is "lean" in multiple senses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Methodology&lt;/strong&gt;: Like Agile, it's principles you can adopt regardless of tooling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cognitive load&lt;/strong&gt;: Low overhead, quick to learn&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token economy&lt;/strong&gt;: Specs stay small, fitting in AI context windows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: Adapt to your workflow, not the other way around&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Try LeanSpec in under 5 minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install globally&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lean-spec

&lt;span class="c"&gt;# Initialize in your project&lt;/span&gt;
lean-spec init

&lt;span class="c"&gt;# Create your first spec&lt;/span&gt;
lean-spec create user-authentication

&lt;span class="c"&gt;# Launch the web UI&lt;/span&gt;
lean-spec ui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or try an example project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx lean-spec init &lt;span class="nt"&gt;--example&lt;/span&gt; dark-theme
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Already using Spec Kit or OpenSpec?&lt;/strong&gt; Check out the &lt;a href="https://www.lean-spec.dev/docs/guide/migration" rel="noopener noreferrer"&gt;migration guide&lt;/a&gt;—the transition is straightforward.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;LeanSpec is actively evolving. Current development focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VS Code extension for inline spec management (&lt;a href="https://web.lean-spec.dev/specs/17" rel="noopener noreferrer"&gt;Spec 17&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;AI chatbot UI for interactive spec assistance (&lt;a href="https://web.lean-spec.dev/specs/94" rel="noopener noreferrer"&gt;Spec 94&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Comprehensive internationalization support (&lt;a href="https://web.lean-spec.dev/specs/91" rel="noopener noreferrer"&gt;Spec 91&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;GitHub multi-project ingration (&lt;a href="https://web.lean-spec.dev/specs/98" rel="noopener noreferrer"&gt;Spec 98&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I built LeanSpec to solve my own problems—code quality degradation from vibe coding, context loss between AI sessions, the cognitive overhead of heavyweight SDD tools. If you face similar challenges, I hope it helps you too.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📦 &lt;a href="https://github.com/codervisor/lean-spec" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📚 &lt;a href="https://www.lean-spec.dev/" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📊 &lt;a href="https://www.npmjs.com/package/lean-spec" rel="noopener noreferrer"&gt;npm package&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Questions, feedback, or feature requests? Open an issue or start a &lt;a href="https://github.com/codervisor/lean-spec/discussions" rel="noopener noreferrer"&gt;discussion&lt;/a&gt;. I read everything.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>github</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Web Crawler in Action: How to use Webspot to implement automatic recognition and data extraction of list web pages</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Sun, 09 Apr 2023 05:17:16 +0000</pubDate>
      <link>https://dev.to/tikazyq/web-crawler-in-action-how-to-use-webspot-to-implement-automatic-recognition-and-data-extraction-of-list-web-pages-8kb</link>
      <guid>https://dev.to/tikazyq/web-crawler-in-action-how-to-use-webspot-to-implement-automatic-recognition-and-data-extraction-of-list-web-pages-8kb</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Using web crawling programs to extract list web pages is a one of those common web data extraction tasks. For engineers to write web crawlers, how to efficiently code and generate extraction rules is quite necessary, otherwise most of the time can be wasted on writing CSS selectors and XPath data extraction rules of web crawling programs. In light of this issue, this article will introduce an example of using open source tool &lt;a href="https://github.com/crawlab-team/webspot" rel="noopener noreferrer"&gt;Webspot&lt;/a&gt; to automatically recognize and extract data of list web pages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Webspot
&lt;/h2&gt;

&lt;p&gt;Webspot is an open source project aimed at automating web page data extraction. Currently, it supports recognition and crawling rules extraction of list pages and pagination. In addition, it provides a web UI interface for users to visually view the identified results, and allows developers to use APIs to obtain recognition results.&lt;/p&gt;

&lt;p&gt;Installation of Webspot is quite easy, you can refer to &lt;a href="https://github.com/crawlab-team/webspot" rel="noopener noreferrer"&gt;the official documentation&lt;/a&gt; for the installation tutorial with Docker and Docker Compose. Execute the commands below to install and start Webspot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# clone git repo&lt;/span&gt;
git clone https://github.com/crawlab-team/webspot

&lt;span class="c"&gt;# start docker containers&lt;/span&gt;
docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait for it to start up, which might take half a minute to initialize the application.&lt;/p&gt;

&lt;p&gt;After the initialization, we can visit the web page &lt;a href="http://localhost:9999" rel="noopener noreferrer"&gt;http://localhost:9999&lt;/a&gt;, and should be able to see the user interface below, which means it has started successfully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvgx8n408g8wq9imgqiw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvgx8n408g8wq9imgqiw.png" alt="Webspot 初始化界面" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, we can create a new request for page recognition. Click "New Request" and enter &lt;a href="https://quotes.toscrape.com" rel="noopener noreferrer"&gt;https://quotes.toscrape.com&lt;/a&gt;. Then click "Submit". Wait for a while and we should be able to see the page below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6blkjb1weyuvmjfl9b5i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6blkjb1weyuvmjfl9b5i.png" alt="Webspot 列表页识别" width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Use API to auto extract data
&lt;/h2&gt;

&lt;p&gt;We are now using Python to call the API of Webspot to auto extract data.&lt;/p&gt;

&lt;p&gt;The whole process is as below.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Call the Webspot API to obtain the extraction rules for list pages and pagination. The extraction rules are CSS Selectors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define the retrieval target based on the extraction rules of the list page, that is, each item and its corresponding field on the list page.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Determine the target for crawling the next page based on the extraction rules of pagination, and let the crawler automatically crawl the data of the next page.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Call API
&lt;/h4&gt;

&lt;p&gt;Calling the API is very simple, just pass the URL to be recognized into the body. The code is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt;

&lt;span class="c1"&gt;# API endpoint
&lt;/span&gt;&lt;span class="n"&gt;api_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://localhost:9999/api&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# url to extract
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://quotes.toscrape.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# call API to recognize list page and pagination elements
&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_endpoint&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/requests&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://quotes.toscrape.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;pprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the code above in Python Console can obtain the recognition result data similar to the following.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;request&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no_async&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pagination&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;detector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pagination&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                             &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Next&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                             &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                             &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;scores&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                             &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selectors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;next&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attribute&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pagination&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;node_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;li.next &amp;gt; a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;css&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}}}],&lt;/span&gt;
&lt;span class="bp"&gt;...&lt;/span&gt;
             &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;plain_list&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
                                                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fields&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attribute&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                         &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Field_text_1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                         &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;node_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                         &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div.quote &amp;gt; span.text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                         &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                                         &lt;span class="p"&gt;...],&lt;/span&gt;
                                        &lt;span class="p"&gt;...}],&lt;/span&gt;
                        &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;...}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The recognition results include CSS Selectors for list pages and pagination, as well as the corresponding fields for each item on the list page.&lt;/p&gt;

&lt;h4&gt;
  
  
  List page and fields extraction logics
&lt;/h4&gt;

&lt;p&gt;Next, we will write the logic for extracting list pages and fields.&lt;/p&gt;

&lt;p&gt;First, we can obtain list page selectors and fields through &lt;code&gt;results&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# list result
&lt;/span&gt;&lt;span class="n"&gt;list_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;plain_list&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# list items selector
&lt;/span&gt;&lt;span class="n"&gt;list_items_selector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;list_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selectors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;full_items&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;list_items_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# fields
&lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;list_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fields&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we can write the logic for parsing list page items.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# data
&lt;/span&gt;    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# items
&lt;/span&gt;    &lt;span class="n"&gt;items_elements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;list_items_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items_elements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# row data
&lt;/span&gt;        &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

        &lt;span class="c1"&gt;# iterate fields
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# field name
&lt;/span&gt;            &lt;span class="n"&gt;field_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="c1"&gt;# field element
&lt;/span&gt;            &lt;span class="n"&gt;field_element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

            &lt;span class="c1"&gt;# skip if field element not found
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;field_element&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;

            &lt;span class="c1"&gt;# add field value to row
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;field_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;field_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attribute&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# add row to data
&lt;/span&gt;        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the function &lt;code&gt;get_data&lt;/code&gt; in the above code, we pass a &lt;code&gt;BeautifulSoup&lt;/code&gt; instance as parameter, and use &lt;code&gt;list_items_selectors&lt;/code&gt; and &lt;code&gt;fields&lt;/code&gt; to parse and obtain the list data which is then returned the function caller.&lt;/p&gt;

&lt;h4&gt;
  
  
  Request list page and pagination logics
&lt;/h4&gt;

&lt;p&gt;Then, we need to write the logics of requesting list pages and pagination, that is, to request a given URL and parse its pagination, then call the function above &lt;code&gt;get_data&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We need to first obtain the pagination's CSS Selector.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pagination next selector
&lt;/span&gt;&lt;span class="n"&gt;next_selector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pagination&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selectors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;next&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;next_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we write crawler logic, which continuously crawls data from website list pages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crawl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# all data to crawl
&lt;/span&gt;    &lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;requesting &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# request url
&lt;/span&gt;        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# beautiful soup of html
&lt;/span&gt;        &lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# add parsed data
&lt;/span&gt;        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;

        &lt;span class="c1"&gt;# pagination next element
&lt;/span&gt;        &lt;span class="n"&gt;next_el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;next_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# end if pagination next element not found
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;next_el&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

                &lt;span class="c1"&gt;# url of next page
&lt;/span&gt;        &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urljoin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;next_el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;all_data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we have completed all the coding parts.&lt;/p&gt;

&lt;h4&gt;
  
  
  Putting them all together
&lt;/h4&gt;

&lt;p&gt;The following is the complete code for the entire crawl logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;urllib.parse&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urljoin&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# data
&lt;/span&gt;    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# items
&lt;/span&gt;    &lt;span class="n"&gt;items_elements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;list_items_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items_elements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# row data
&lt;/span&gt;        &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

        &lt;span class="c1"&gt;# iterate fields
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# field name
&lt;/span&gt;            &lt;span class="n"&gt;field_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="c1"&gt;# field element
&lt;/span&gt;            &lt;span class="n"&gt;field_element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

            &lt;span class="c1"&gt;# skip if field element not found
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;field_element&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;

            &lt;span class="c1"&gt;# add field value to row
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;field_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;field_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attribute&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# add row to data
&lt;/span&gt;        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crawl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# all data to crawl
&lt;/span&gt;    &lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;requesting &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# request url
&lt;/span&gt;        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# beautiful soup of html
&lt;/span&gt;        &lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# add parsed data
&lt;/span&gt;        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;

        &lt;span class="c1"&gt;# pagination next element
&lt;/span&gt;        &lt;span class="n"&gt;next_el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;next_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# end if pagination next element not found
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;next_el&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="c1"&gt;# url of next page
&lt;/span&gt;        &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urljoin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;next_el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;all_data&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# API endpoint
&lt;/span&gt;    &lt;span class="n"&gt;api_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://localhost:9999/api&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="c1"&gt;# url to extract
&lt;/span&gt;    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://quotes.toscrape.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="c1"&gt;# call API to recognize list page and pagination elements
&lt;/span&gt;    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_endpoint&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/requests&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://quotes.toscrape.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;pprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# list result
&lt;/span&gt;    &lt;span class="n"&gt;list_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;plain_list&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# list items selector
&lt;/span&gt;    &lt;span class="n"&gt;list_items_selector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;list_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selectors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;full_items&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;list_items_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# fields
&lt;/span&gt;    &lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;list_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fields&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# pagination next selector
&lt;/span&gt;    &lt;span class="n"&gt;next_selector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pagination&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selectors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;next&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;next_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# start crawling
&lt;/span&gt;    &lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;crawl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# print crawled results
&lt;/span&gt;    &lt;span class="nf"&gt;pprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_data&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the code and we can obtain the following result data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[{'Field_link_url_6': '/author/Albert-Einstein',
  'Field_link_url_8': '/tag/change/page/1/',
  'Field_text_1': '“The world as we have created it is a process of our '
                  'thinking. It cannot be changed without changing our '
                  'thinking.”',
  'Field_text_2': '“The world as we have created it is a process of our '
                  'thinking. It cannot be changed without changing our '
                  'thinking.”',
  'Field_text_3': '\n'
                  '            Tags:\n'
                  '            \n'
                  'change\n'
                  'deep-thoughts\n'
                  'thinking\n'
                  'world\n',
  'Field_text_4': 'Albert Einstein',
  'Field_text_5': '(about)',
  'Field_text_7': 'change'},
  ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we have achieved the crawling task of automatically extracting lists using Webspot. In this way, there is no need to explicitly define CSS Selectors or XPaths. Just call the Webspot API and we can obtain the list page data.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>crawler</category>
      <category>programming</category>
    </item>
    <item>
      <title>Talking Algorithm: Exploration of Intelligent Web Crawlers</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Sat, 25 Mar 2023 06:57:36 +0000</pubDate>
      <link>https://dev.to/tikazyq/talking-algorithm-exploration-of-intelligent-web-crawlers-2kk2</link>
      <guid>https://dev.to/tikazyq/talking-algorithm-exploration-of-intelligent-web-crawlers-2kk2</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"If I had asked people what they wanted, they would have said faster horses" -- Henry Ford&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today is the era of artificial intelligence. Whether it is ChatGPT or the various intelligent applications that follow, many people see the upcoming sci-fi world that was almost unimaginable a few years ago. However, in the field of reptiles, artificial intelligence does not seem to be involved too much. It is true that crawlers, as an "ancient" technology, have created many technical industries such as search engines, news aggregation, and data analysis in the past 20 years, but we have not seen obvious technological breakthroughs yet: crawler engineers still mainly rely on technologies such as XPath and reverse engineering to automatically obtain web data. However, with the development of artificial intelligence and machine learning, crawler technology can theoretically achieve "automatic driving". This article will introduce the current status and possible future development direction of the so-called &lt;strong&gt;intelligent crawler&lt;/strong&gt; (intelligent, automated data extraction crawler technology) from multiple perspectives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Web Crawling Technology
&lt;/h2&gt;

&lt;p&gt;A web crawler is an automated program used to obtain data from the Internet or other computer networks. They usually use automated scraping techniques to automatically visit the website and collect, parse and store information on the website. This information can be structured or unstructured data.&lt;/p&gt;

&lt;p&gt;Crawler technology in the traditional sense mainly includes the following modules or systems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Network request&lt;/strong&gt; : initiate an HTTP request to a website or web page to obtain data such as HTML;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web page parsing&lt;/strong&gt; : parse HTML to form a structured tree structure, and obtain target data through XPath or CSS Selector;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data storage&lt;/strong&gt; : store the parsed structured data, which can be in the form of a database or a file;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;URL management&lt;/strong&gt; : manage the URL list to be crawled and the URL list that has been crawled, such as URL resolution and request for paging or list pages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjm7t0lmu82eeydjvc9t0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjm7t0lmu82eeydjvc9t0.png" alt="web crawling system" width="800" height="323"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above is the basic crawler technology modules. For a large crawler system, it is also necessary to have the necessary modules for the production environment such as task scheduling, error management, and log management. The author's crawler management platform &lt;a href="https://www.crawlab.cn" rel="noopener noreferrer"&gt;Crawlab &lt;/a&gt;is a crawler management platform for enterprise-level production environments. In addition, for some anti-crawling measures, such as verification code or IP blocking, additional modules are usually required, such as verification code identification, IP proxy, etc.&lt;/p&gt;

&lt;p&gt;However, at present, the development of crawler programs is mainly focused on webpage parsing, which consumes a lot of manpower. Although HTML needs to be parsed into the webpage data, the layout, format, style, and content of different websites are different, so each website and webpage needs separate parsing logic, which greatly increases the cost of manual coding. Although some general-purpose crawlers such as search engine crawlers do not need to write too much parsing logic, such crawlers usually cannot focus on data extraction of specific topics. Therefore, in order to reduce the cost of manual writing, it is best to automatically extract web page data without writing or writing a small amount of parsing logic, which is the main goal of intelligent crawlers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Known Implementations
&lt;/h2&gt;

&lt;p&gt;It is not easy to implement intelligent web page extraction, but there are already some attempts to develop intelligent crawlers. Among them, &lt;a href="https://github.com/GeneralNewsExtractor/GeneralNewsExtractor" rel="noopener noreferrer"&gt;GNE (GeneralNewsExtractor)&lt;/a&gt; developed by Kingname is an open source implementation of webpage text extraction, based on &lt;a href="https://kns.cnki.net/KCMS/detail/detail.aspx?dbcode=CJFQ&amp;amp;dbname=CJFDLAST2019&amp;amp;filename=GWDZ201908029&amp;amp;v=MDY4MTRxVHJXTTFGckNVUkxPZmJ1Wm5GQ2poVXJyQklqclBkTEc0SDlqTXA0OUhiWVI4ZVgxTHV4WVM3RGgxVDM=" rel="noopener noreferrer"&gt;text and punctuation density extraction algorithms&lt;/a&gt;. &lt;a href="https://github.com/Gerapy/GerapyAutoExtractor" rel="noopener noreferrer"&gt;GerapyAutoExtractor &lt;/a&gt;developed by Cui Qingcai implemented webpage list page recognition based on &lt;a href="https://cuiqingcai.com/9531.html" rel="noopener noreferrer"&gt;list cluster and SVM algorithm&lt;/a&gt;. &lt;a href="https://www.octoparse.com/" rel="noopener noreferrer"&gt;Octoparse&lt;/a&gt;, a commercial client software, has developed an automatic list recognition module. &lt;a href="https://www.diffbot.com/" rel="noopener noreferrer"&gt;Diffbot&lt;/a&gt; is an API-based intelligent web page recognition platform, with a very high recognition accuracy rate, claiming to be 99%. Known smart crawler implementations are currently based mainly on the HTML structure and content of web pages, such as GNE and GerapyAutoExtractor. For commercial software such as Octopus and Diffbot, we cannot know the specific implementation method.&lt;/p&gt;

&lt;h2&gt;
  
  
  Explore List Page Recognition
&lt;/h2&gt;

&lt;p&gt;Now the accuracy of text recognition is very high, and there are many technical implementations and applications. Here we mainly focus on the identification of list pages, which is the web page parsing work of many crawlers.&lt;/p&gt;

&lt;p&gt;We can infer from experience how to automatically identify desired content. Humans are visual animals. When we see a web page with a list of articles, we will immediately recognize the list of articles without any surprises, as shown in the figure below. But how exactly do we recognize it? In fact, we naturally group the article list items of the same category into one category. So, we'll quickly realize that this is actually a list page. Of course, why are these list items similar? We can see that the child elements in these lists are also similar, so it's natural to tie them together. The individual sub-elements add up to a single list item, which our brains automatically group them together. This is the process of listing page recognition.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcodao.crawlab.cn%2Fimages%2F2023-03-25-062942.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcodao.crawlab.cn%2Fimages%2F2023-03-25-062942.png" width="800" height="555"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Based on such an analysis, it is actually very easy to think of the clustering algorithm in machine learning. All we need to do is to extract the characteristics of each node on the webpage, and then use the clustering algorithm to filter out the nodes of the same category. Of course, the feature selection here needs to be considered. Instead of simply looking at a single node of HTML, we need to associate it with other nodes to extract features, so that we can get some nodes of different categories. Then, we can filter out the desired list page according to the overall information of the node cluster.&lt;/p&gt;

&lt;p&gt;Of course, it is not an easy task to actually implement such an algorithm with code. It is necessary to model and vectorize each node of HTML, and build a tree-like graph based on them. This is very tedious thing. Fortunately, the author has used sklearn, networkx and other libraries to implement a basic list page recognition system &lt;a href="https://github.com/crawlab-team/webspot" rel="noopener noreferrer"&gt;Webspot&lt;/a&gt;, which can automatically recognize list elements on a list page, and can visually display the recognition results, as shown in the figure below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ps85p188va1lpphr2el.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ps85p188va1lpphr2el.png" alt="Webspot" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For most listings, Webspot's ability to identify is good. Although it is not as accurate as Diffbot, it can still be accurately identified for pages that are not very complicated.&lt;/p&gt;

&lt;p&gt;So, why invent a new wheel when there is already a list page identification solution like Diffbot? One of the most important reasons is that commercial and high-accuracy software such as Diffbot cannot directly provide reusable extraction rules such as XPath and CSS Selector. Extraction rules for automatic identification are all we need. Then, by integrating into open source crawlers such as Scrapy and Colly, the cost of data capture can be greatly reduced. This is also a feature that Webspot can currently bring to users. It is not only able to identify list page elements and corresponding fields, but also provide extraction rules, as shown in the figure below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9pymjdse5o7n3lgio22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9pymjdse5o7n3lgio22.png" alt="Webspot Fields" width="800" height="529"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With such an extraction rule, data can be automatically extracted from similar web pages only by automatic identification once.&lt;/p&gt;

&lt;p&gt;Currently, Webspot is still in the early stages of development, and there should be more new features and algorithm development and optimization in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Development
&lt;/h2&gt;

&lt;p&gt;Intelligent crawlers are equivalent to autopilot on web pages, allowing crawlers to obtain the desired data or information as required without too many manual operations. This is an ideal technology for many data demanders and crawler engineers. However, intelligent crawlers are not yet mature at present, and the existing implementations and technologies are relatively simple. In the future, technologies such as deep learning and reinforcement learning may be used to improve the recognition ability of intelligent crawlers. In addition, the combination of graph theory and artificial intelligence, along with visual technology, may allow intelligent crawlers to achieve breakthroughs in accuracy. The author will continue to explore on intelligent crawlers through the Webspot project to solve the cost problem of data extraction. Those who are interested in the development of intelligent crawlers please feel free to contact me on GitHub "tikazyq".&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Talking Architecture: What skills should architects have apart from drawing architecture diagrams?</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Mon, 14 Nov 2022 10:37:25 +0000</pubDate>
      <link>https://dev.to/tikazyq/talking-architecture-what-skills-should-architects-have-apart-from-drawing-architecture-diagrams-109j</link>
      <guid>https://dev.to/tikazyq/talking-architecture-what-skills-should-architects-have-apart-from-drawing-architecture-diagrams-109j</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Architecture is about the important stuff... whatever it is."&lt;/em&gt; -- Ralph Johnson&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Architect&lt;/strong&gt; is a position with powerfulness and respect. When you hear that someone is an architect of a company, will you feel a sense of awe? Architects are usually believed to be relevant to system design, technical strength, leadership and influence. It is precisely for this reason that many of the positions of architects in the enterprise are held by experienced and skilled senior software engineers. However, the &lt;strong&gt;definition&lt;/strong&gt; of an architect in the software industry is &lt;strong&gt;not quite clear&lt;/strong&gt;: cloud service providers such as Amazon and Alibaba Cloud have their own teams of architects, but most of them provide after-sales service to customers under the name "architect"; Some architects are nothing more than using his rich experience and excellent strength to solve technical problems, which is equivalent to what a senior software engineer can do. These are very different from the omniscient architect who designs the architecture diagram. &lt;/p&gt;

&lt;p&gt;The concepts related to architects in this article mainly come from a book &lt;em&gt;&lt;a href="https://www.oreilly.com/library/view/fundamentals-of-software/9781492043447/" rel="noopener noreferrer"&gt;Fundamentals of Software Architecture&lt;/a&gt;&lt;/em&gt; (authored by Mark Richards, Neal Ford) that I have been reading recently. This article will briefly introduce about what a &lt;em&gt;pragmatic&lt;/em&gt; architect &lt;strong&gt;should do&lt;/strong&gt; and &lt;strong&gt;required skills&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vobzlu3vug32g0jtixu.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vobzlu3vug32g0jtixu.jpeg" alt="Fundamentals of Software Architecture" width="250" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Chief Engineer
&lt;/h2&gt;

&lt;p&gt;First of all, an architect should be a &lt;strong&gt;chief engineer&lt;/strong&gt; of the entire software project, who is responsible for the overall design, implementation and quality of the software project. Therefore, for software architects, they need to have excellent programming skills, good understanding of the software project development process, and a certain breadth and depth in various technical areas. Not only that, because the architect is the chief person in charge of the technology side, he (or she) usually needs to think it through, from a perspective of the system as a whole, about how the various modules interact, whether the division of functional services is reasonable, where the bottleneck of the whole system will be, and so on. These are all in the technical area.&lt;/p&gt;

&lt;p&gt;When a software engineer has been engaged in many projects, he or she should have some deep understanding of software engineering and architecture, and with some system learning, he or she could ultimately become a qualified architect. Therefore, with technical consideration only, &lt;strong&gt;a software architect is equivalent to a senior software engineer&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is Architecture Diagram Necessary?
&lt;/h2&gt;

&lt;p&gt;You may have seen more or less some architecture diagrams: hierarchical architecture diagram, physical network topology diagram, flow diagram, interaction logic diagram, etc. But to answer the question that whether the architecture diagram is necessary, we need to understand the reason behind it. The main purpose of the architecture diagram is to help internal engineers or external technicians &lt;strong&gt;quickly understand&lt;/strong&gt; the &lt;strong&gt;system module information&lt;/strong&gt; contained in it. Therefore, those seemingly cool, professional and pleasing architecture diagrams, if lacking in meaningful content, would not be helpful for understanding. Instead, it is the architecture diagram which is &lt;strong&gt;simple&lt;/strong&gt;, &lt;strong&gt;clear&lt;/strong&gt; and &lt;strong&gt;easy to understand&lt;/strong&gt;, even if looking ugly, that will be helpful for software developers. Therefore, the architecture diagram is necessary, but on condition that it reveals the system module information with simpleness, conciseness, and clearness.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fviwhi0k67xbu8vyqoqzj.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fviwhi0k67xbu8vyqoqzj.jpeg" alt="WTF Architecture" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Not Only Tech
&lt;/h2&gt;

&lt;p&gt;As mentioned earlier, the architect, as the chief engineer of software engineering, will take technical responsibilities. But in reality, few architects can work on a project smoothly if they only focus on the technical bits. A successful architect, or an influential architect, needs at least good leadership or team management skills.&lt;/p&gt;

&lt;p&gt;Half of the eight skill requirements for architects in &lt;em&gt;Software Architecture Foundation&lt;/em&gt; are not directly related to technology. Let's take a look at the technical and non-technical requirements mentioned in this book.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Technical Requirements

&lt;ul&gt;
&lt;li&gt;Make architecture decisions&lt;/li&gt;
&lt;li&gt;Continually analyze the architecture&lt;/li&gt;
&lt;li&gt;Keep current with latest trends&lt;/li&gt;
&lt;li&gt;Diverse exposure and experience&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Non-Technical Requirements

&lt;ul&gt;
&lt;li&gt;Ensure compliance with decisions&lt;/li&gt;
&lt;li&gt;Have business domain knowledge&lt;/li&gt;
&lt;li&gt;Process interpersonal skills&lt;/li&gt;
&lt;li&gt;Understand and navigate politics&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;You may be surprised at the non-technical requirements for architects, such as why architects need to consider politics? As a young junior developer, I did not understand at first, but with the continuous accumulation of project experience, I found that many architectural decisions were not only from the technical rationality of the architecture itself, but also related to corporate politics. The politics we mentioned here is not bureaucracy in the traditional sense, but more referring to the cooperation or competition between various departments. This is particularly important for large companies. I will not elaborate in detail here, but may write an article later for this topic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnlqpis4bn22v11fms2c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnlqpis4bn22v11fms2c.jpg" alt="Not Only Architect" width="800" height="685"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Continuous Improvement
&lt;/h2&gt;

&lt;p&gt;Unfortunately, many architects promoted from software engineers continue to stick with their technical skills, ignoring non-technical things. Therefore, it is emphasized in this article that the responsibilities and requirements of architects should not only be restricted to technology and architecture themselves, but also more importantly focused on non-technical skills and experience accumulation. For example, how do we balance compliance, resources and development efficiency? What business value does your software project provide and what impact it will have? How to explain professional terms to non-technical personnel (especially the boss)? When external departments refuse to cooperate, how do we navigate to ensure the project success? Such issues require non-technical skills and experience significantly, and architects who would like to pursue their career goal should take them as necessity. When you step out of your comfort zone, your are probably more like to enjoy continuous learning and growing.&lt;/p&gt;

</description>
      <category>architecture</category>
    </item>
    <item>
      <title>Go Project Source Code Analysis: Schedule Job Library "cron"</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Sat, 12 Nov 2022 03:22:06 +0000</pubDate>
      <link>https://dev.to/tikazyq/go-project-source-code-analysis-schedule-job-library-cron-104f</link>
      <guid>https://dev.to/tikazyq/go-project-source-code-analysis-schedule-job-library-cron-104f</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;There are many excellent open-source projects on GitHub, where the code is transparent and available to everyone. As software developer, we can learn a lot from them including software engineering, unit testing, coding style standardization, etc. We can even find issues by looking into their code, and submit pull requests to contribute to tech communities. Today we are going to dig into the source code of a popular Golang open-source project on GitHub, &lt;a href="https://github.com/robfig/cron" rel="noopener noreferrer"&gt;robfig/cron&lt;/a&gt;, which is small and with clear annotations, and is very suitable for new developers to learn how to read and analyze source code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Environment Preparation
&lt;/h2&gt;

&lt;p&gt;First, we are going to fork the project into our personal repo. We can do so by clicking the &lt;code&gt;Fork&lt;/code&gt; button and enter the project name. After it's forked, we can either download to our local, or click &lt;code&gt;Create codespace on master&lt;/code&gt; to create a &lt;strong&gt;Codespace&lt;/strong&gt; on the GitHub repo's home page. Codespace is a Azure-based GitHub service that allows developers to develop remotely, and it is now available to individuals. Let's give it a try!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbes1t5okfuv3qcsbbr54.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbes1t5okfuv3qcsbbr54.jpeg" alt="Create Codespace" width="413" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After clicking, a new page will be opened in the browser, where there is a web interface of VS Code, displaying the directory, code and terminal of the project. See the figure below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vv9bumq8nbl4gsf4udm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vv9bumq8nbl4gsf4udm.png" alt="Codespace" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As our objective is to analyze source code, we will use it to display code for reading, instead of executing it.&lt;/p&gt;

&lt;p&gt;Now, we can start reading and analyzing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Entry File
&lt;/h2&gt;

&lt;p&gt;A good way of analyzing source code is find the &lt;strong&gt;entry file&lt;/strong&gt;, which is like a book's introduction, where the overall project structure will show up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcar22vs4b7pgwp35tjx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcar22vs4b7pgwp35tjx.png" alt="Code Structure" width="282" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can find in &lt;code&gt;README.md&lt;/code&gt; file that the usage of this library is something like &lt;code&gt;cron.New(cron.WithSeconds())&lt;/code&gt;, or the method &lt;code&gt;cron.New&lt;/code&gt;. Therefore, we reckon that this method is in the file &lt;code&gt;cron.go&lt;/code&gt;. Let's open it and take a look.&lt;/p&gt;

&lt;p&gt;A quick go-through can allow us to locate the &lt;code&gt;New&lt;/code&gt; method at Line 113, as below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fas3lg0i33ic2necf7sqo.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fas3lg0i33ic2necf7sqo.jpeg" alt="Method New" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we take a close look, it is basically a pointer to the instance of the &lt;code&gt;Cron&lt;/code&gt; struct. The parameters &lt;code&gt;opts ...Option&lt;/code&gt; is called Functional Option. The actual implementation is to build an instance pointer &lt;code&gt;c&lt;/code&gt; of the struct &lt;code&gt;Cron&lt;/code&gt;, applied by functional options, and return it. &lt;/p&gt;

&lt;p&gt;With above analysis, we can conclude that the main logic is inside the &lt;code&gt;Cron&lt;/code&gt; struct.&lt;/p&gt;

&lt;p&gt;Regardless to say, we can be sure that the entry file is &lt;code&gt;cron.go&lt;/code&gt;. What we can do next is analyze the core modules of this file to fully understand the source code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Struct/Class
&lt;/h2&gt;

&lt;p&gt;Now let's take a look at the structure of the core struct &lt;code&gt;Core&lt;/code&gt; to find some enlightenment. &lt;/p&gt;

&lt;p&gt;We can locate the struct &lt;code&gt;Cron&lt;/code&gt; at Line 13. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qz0sfkj093ktzqzq64n.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qz0sfkj093ktzqzq64n.jpeg" alt="Cron Struct" width="800" height="757"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are quite many fields in the struct &lt;code&gt;Cron&lt;/code&gt;, including the lower-case private attributes &lt;code&gt;entries&lt;/code&gt;, &lt;code&gt;chain&lt;/code&gt;, &lt;code&gt;parser&lt;/code&gt;, but we can hardly know their actual meanings and functionalities. According to the comments at Line 10-12, we can deduct that it is used for tracking &lt;code&gt;entries&lt;/code&gt; and executing functions defined by &lt;code&gt;schedule&lt;/code&gt;. However, we are still not quite sure about what it does exactly, so only through reading the source code further, we can understand more details.&lt;/p&gt;

&lt;p&gt;We can also find 3 interfaces just under beneath with their descriptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ScheduleParser&lt;/code&gt;: parser of the schedule, which parses specs and returns &lt;code&gt;Schedule&lt;/code&gt; instance.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Job&lt;/code&gt;: submitted scheduled jobs.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Schedule&lt;/code&gt;: used for describing running cycles of jobs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In fact, these 3 interfaces are very important, given their locations in the source code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Entry Method
&lt;/h2&gt;

&lt;p&gt;Before we explore further, let's recall the usage of the library apart from &lt;code&gt;cron.New&lt;/code&gt;: it takes effect after calling &lt;code&gt;c.Start()&lt;/code&gt;. Therefore, we need to dig into the &lt;code&gt;Start&lt;/code&gt; method in &lt;code&gt;Cron&lt;/code&gt;, which is actually the &lt;strong&gt;entry method&lt;/strong&gt; of the core struct. &lt;/p&gt;

&lt;p&gt;We can locate the &lt;code&gt;Start&lt;/code&gt; method at Line 215 in &lt;code&gt;cron.go&lt;/code&gt; as below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdz2zma8vjllw3v68k2da.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdz2zma8vjllw3v68k2da.jpeg" alt="Cron Start" width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An experienced Go developer could notice this is an atomic operation. &lt;code&gt;c.runningMu&lt;/code&gt; is an instance of &lt;code&gt;sync.Mutex&lt;/code&gt;, which can &lt;strong&gt;lock&lt;/strong&gt; at the beginning, and &lt;strong&gt;unlock&lt;/strong&gt; after the function execution finishes. This technique avoids data racing in the repeated calls of this method. &lt;code&gt;if c.running { return }&lt;/code&gt; denotes that the method won't go further and return immediately if it's already started. &lt;code&gt;c.running&lt;/code&gt; is set as &lt;code&gt;true&lt;/code&gt;. The last line is an essential statement, &lt;code&gt;go c.run()&lt;/code&gt;, meaning it starts a &lt;strong&gt;goroutine&lt;/strong&gt; to asynchronously execute &lt;code&gt;c.run&lt;/code&gt;. Now we have found the real core method, and the rest of the work is to analyze it through. &lt;/p&gt;

&lt;p&gt;Is it similar to we playing RPG games and unlocking all the milestones and entering the next episode?&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Wait! Is it over?&lt;/p&gt;

&lt;p&gt;The reason why we stopped here is that I want to keep the article short. Reading and analyzing source code a &lt;strong&gt;tedious and boring&lt;/strong&gt; process, which might prevent readers from going further. As a result, the objective of this article is to introduce the &lt;strong&gt;right and basic techniques&lt;/strong&gt; of analyzing source code to readers, allowing you guys to do the analysis by yourselves. This way can make you learn faster and with more interest. &lt;/p&gt;

&lt;p&gt;Now let's summarize the techniques for analyzing source code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find the entry file&lt;/li&gt;
&lt;li&gt;Locate the core struct/class&lt;/li&gt;
&lt;li&gt;Analyze the entry method&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>go</category>
      <category>opensource</category>
      <category>github</category>
    </item>
    <item>
      <title>Talking Data: Why data governance is so important in digital transformation?</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Sun, 30 Oct 2022 03:15:40 +0000</pubDate>
      <link>https://dev.to/tikazyq/talking-data-why-data-governance-is-so-important-in-digital-transformation-35mj</link>
      <guid>https://dev.to/tikazyq/talking-data-why-data-governance-is-so-important-in-digital-transformation-35mj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Organisms are living upon negentropy."&lt;/em&gt;--&lt;em&gt;What is Life?&lt;/em&gt; --by Erwin Schrödinger&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In today's Internet era, data is an important asset for enterprises. We generate data all the time: every time we open a mobile app, place an order online, or even drive through traffic lights, data will be generated. Data is everywhere. This is especially true in enterprises. With so much raw data and increasingly mature data analysis technology, entrepreneurs are excited, because this is the gold mine piled up for companies. However, things are not as simple as imagined. It is not easy to extract valuable treasures from these messy so-called "gold mines" that look like garbage dumps. In my previous article &lt;a href="https://dev.to/tikazyq/talking-data-what-do-we-need-for-engaging-data-analytics-3g9k"&gt;Talking Data: What do we need for engaging data analytics?&lt;/a&gt;, the concept of &lt;strong&gt;Data Governance&lt;/strong&gt; was mentioned. This article will introduce how data governance creates value in the chaos of enterprise data from the perspective of enterprise data management.&lt;/p&gt;

&lt;h2&gt;
  
  
  Isolated Data Island
&lt;/h2&gt;

&lt;p&gt;Large and medium-sized enterprises (generally more than 100 people with multiple departments) will encounter the problem of management chaos while their business grows fast. The sales department has its own sales statistics, typically in the form of large and scattered Excel files; The IT department manages the inventory systems by themselves; The HR department maintains an entire personnel statistics list. This situation will lead to many headaches: the boss often complains that he or she can only receive management reports every week; Managers looked at the up and down data in the report, doubting the data integrity; employees work overtime to sort out the data for the reports, but they are often questioned about data quality. Sounds familiar? These are very common issues in companies. The direct cause is the so-called &lt;strong&gt;Isolated Data Island&lt;/strong&gt; problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsf3stks5l5yo9e14wayf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsf3stks5l5yo9e14wayf.png" alt="Isolated Data Islands" width="800" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main reason for isolated data island issues is that data from various departments or teams is &lt;strong&gt;disconnected&lt;/strong&gt;. Many times, because of the rapid growth of business, some teams need to quickly set up a data system, but are incapable of developing it in time, so they can only use Excel or some fast and effective data entry tools to ensure the efficiency of business operations. With the growth of business, these &lt;em&gt;hacking&lt;/em&gt; workarounds continue to derive new internal processes, and bottlenecks will gradually emerge after reaching a certain scale, especially when integration with other departments or external systems is required.&lt;/p&gt;

&lt;p&gt;When bosses become aware of the isolated data island issue and decide to make a change, they will find it difficult to move forward most of the time. This is because they have many problems to solve, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Format and Standard&lt;/strong&gt;: The monthly sales data of the marketing department may be stored in various Excel files, while the product department is stored in the SQL Server database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Statistical Caliber&lt;/strong&gt;: The minimum statistical caliber of the accounting and financial department is by month, but the marketing department is by a single order.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Association&lt;/strong&gt;: The marketing department uses the order number, while the product department uses the database order ID.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale&lt;/strong&gt;: Thousands of Excel files with different formats are difficult to be efficiently processed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Governance
&lt;/h2&gt;

&lt;p&gt;The so-called data governance is actually to solve the problem of data islands through data integration tools. As shown in the figure below, the data governance team need to integrate, clean, transform, and store the business data of each department in the Data Warehouse, which will eventually be used by the entire organization. Through data governance and data warehouse, data islands of various business departments will be connected. After some sort of &lt;strong&gt;association&lt;/strong&gt; and &lt;strong&gt;integration&lt;/strong&gt; of the data, various data sources can generate many &lt;strong&gt;insights&lt;/strong&gt;. For example, the marketing team will be surprised to find that their main customers rarely use APP!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jzjk0m8g36g9j4cn2vo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jzjk0m8g36g9j4cn2vo.png" alt="Data Governance" width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, the above figure is a simplified process. In the real world, data governance in a large company or organization will be &lt;strong&gt;more complex&lt;/strong&gt;, which normally involve organizational, cultural, political, business nature and other factors. This is why many large and medium-sized enterprises, especially those accustomed to traditional Excel reports, will encounter &lt;strong&gt;obstacles&lt;/strong&gt; in digital transformation.&lt;/p&gt;

&lt;p&gt;For the data governance team, the key to success is not the technology, but the familiarity and understanding of &lt;strong&gt;business process&lt;/strong&gt; and &lt;strong&gt;enterprise culture&lt;/strong&gt;. There are ready-made tools and techniques for data cleaning and data transformation, but business processes and corporate culture are often quite subtle. This requires the data governance team to have good &lt;strong&gt;communication skills&lt;/strong&gt; and &lt;strong&gt;business knowledge&lt;/strong&gt;. The success of data governance requires not only practical tools and technologies, but also &lt;strong&gt;wisdom&lt;/strong&gt; in dealing with various business teams.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkr5fwa2y5gz3vabwqcv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkr5fwa2y5gz3vabwqcv.png" alt="Success Factors of Data Governance" width="774" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Keep improving
&lt;/h2&gt;

&lt;p&gt;The business development cannot live without data insights, but to achieve data driven decision making, communication and collaboration between teams and departments are often necessary. This is why many companies have started to adopt Agile. Here you can refer to the author's previous article &lt;a href="https://dev.to/tikazyq/talking-agile-are-you-sure-your-team-is-practicing-agile-properly-1l5"&gt;Talking Agile: Are you sure your team is practicing Agile properly?&lt;/a&gt;. What data governance needs to do more is to understand the business, think about how to use data integration to optimize the business, and create more value from data insights.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Golang in Action: How to implement a simple distributed system</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Wed, 26 Oct 2022 05:25:43 +0000</pubDate>
      <link>https://dev.to/tikazyq/golang-in-action-how-to-implement-a-simple-distributed-system-2n0n</link>
      <guid>https://dev.to/tikazyq/golang-in-action-how-to-implement-a-simple-distributed-system-2n0n</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Nowadays, many cloud-native and distributed systems such as Kubernetes are written in Go. This is because Go natively supports not only asynchronous programming but also static typing to ensure system stability. My open-source project Crawlab, a web crawler management platform, has applied distributed architecture. This article will introduce about how to design and implement a simple distributed system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ideas
&lt;/h2&gt;

&lt;p&gt;Before we start to code, we need to think about what we need to implement.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Master Node&lt;/strong&gt;: A central control system, similar to a troop commander to issue orders&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker Node&lt;/strong&gt;: Executors, similar to soldiers to execute tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Apart from the concepts above, we would need to implement some simple functionalities.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Report Status&lt;/strong&gt;: Worker nodes report current status to the master node.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assign Task&lt;/strong&gt;: Client makes API requests to the master node which assign tasks to worker nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute Script&lt;/strong&gt;: Worker nodes execute scripts from tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The overall architectural diagram is as below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1s2nae4tf7waja4ft92i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1s2nae4tf7waja4ft92i.png" alt="Main Process Diagram" width="800" height="213"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Action
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Node Communication
&lt;/h3&gt;

&lt;p&gt;The communication between nodes are very important in distributed systems. If each node runs individually, it will be not necessary to use a distributed system. Therefore, the communication module is an essential part in distributed systems.&lt;/p&gt;

&lt;h4&gt;
  
  
  gRPC Protocol
&lt;/h4&gt;

&lt;p&gt;First, let's think about how to make nodes to communicate with each other. The most common way is API, which yet has a drawback that it requires nodes to expose their IP addresses and ports to others, which is very insecure particularly in the public network. In light of that, we chose &lt;a href="https://grpc.io/" rel="noopener noreferrer"&gt;gRPC&lt;/a&gt;, a popular Remote Procedure Call (RPC) framework. We won't go deep into its technical details, but it is actually a mechanism to allow remote machines to execute commands by RPC callers. &lt;/p&gt;

&lt;p&gt;To use gRPC, let's first create a file named &lt;code&gt;go.mod&lt;/code&gt; and enter below content, then execute &lt;code&gt;go mod download&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;module go-distributed-system

go 1.17

require (
    github.com/golang/protobuf v1.5.0
    google.golang.org/grpc v1.27.0
    google.golang.org/protobuf v1.27.1
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we create a &lt;a href="https://developers.google.com/protocol-buffers" rel="noopener noreferrer"&gt;Protocol Buffers&lt;/a&gt;  file &lt;code&gt;node.proto&lt;/code&gt;, a gRPC protocol file, and enter below content.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight protobuf"&gt;&lt;code&gt;&lt;span class="na"&gt;syntax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"proto3"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;package&lt;/span&gt; &lt;span class="nn"&gt;core&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;option&lt;/span&gt; &lt;span class="na"&gt;go_package&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;".;core"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;message&lt;/span&gt; &lt;span class="nc"&gt;Request&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;message&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;service&lt;/span&gt; &lt;span class="n"&gt;NodeService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;rpc&lt;/span&gt; &lt;span class="n"&gt;ReportStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;){};&lt;/span&gt;       &lt;span class="c1"&gt;// Simple RPC&lt;/span&gt;
  &lt;span class="k"&gt;rpc&lt;/span&gt; &lt;span class="n"&gt;AssignTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;){};&lt;/span&gt;  &lt;span class="c1"&gt;// Server-Side RPC&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we have created two RPC service methods: one for reporting current status with a Simple RPC, another for assigning tasks with a Server-Side RPC. The difference between Simple RPC and Server-Side RPC is that Server-Side RPC can allow the server (master node in our case) to send data to the client (worker node) through a stream, but Simple RPC can only allow clients to call servers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ohl7hg4c308662pfgkh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ohl7hg4c308662pfgkh.png" alt="RPC Diagram" width="800" height="763"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After &lt;code&gt;.proto&lt;/code&gt; file is created, we need to compile it into &lt;code&gt;.go&lt;/code&gt; code file so that it can be used by the Go program. Let's execute the command as below. (Note: the compiler &lt;code&gt;protocol&lt;/code&gt; is not built-in and needs to be downloaded, please refer to &lt;a href="https://grpc.io/docs/protoc-installation/" rel="noopener noreferrer"&gt;https://grpc.io/docs/protoc-installation/&lt;/a&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;core
protoc &lt;span class="nt"&gt;--go_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./core &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--go-grpc_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./core &lt;span class="se"&gt;\&lt;/span&gt;
    node.proto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After it is executed, you can see two Go code files under the directory &lt;code&gt;core&lt;/code&gt;, &lt;code&gt;node.pb.go&lt;/code&gt; and &lt;code&gt;node_grpc.pb.go&lt;/code&gt; respectively, which serve as the gRPC library.&lt;/p&gt;

&lt;h4&gt;
  
  
  gRPC Server
&lt;/h4&gt;

&lt;p&gt;Now let's start writing server-side code.&lt;/p&gt;

&lt;p&gt;Firstly create a new file &lt;code&gt;core/node_service_server.go&lt;/code&gt;, and enter the content below. It implemented the two gRPC service methods created before. The channel &lt;code&gt;CmdChannel&lt;/code&gt; in particular would transfer commands to be executed on worker nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;core&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;NodeServiceGrpcServer&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;UnimplementedNodeServiceServer&lt;/span&gt;

    &lt;span class="c"&gt;// channel to receive command&lt;/span&gt;
    &lt;span class="n"&gt;CmdChannel&lt;/span&gt; &lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="n"&gt;NodeServiceGrpcServer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ReportStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="n"&gt;NodeServiceGrpcServer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;AssignTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;NodeService_AssignTaskServer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CmdChannel&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="c"&gt;// receive command and send to worker node (client)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;NodeServiceGrpcServer&lt;/span&gt;

&lt;span class="c"&gt;// GetNodeServiceGrpcServer singleton service&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetNodeServiceGrpcServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;NodeServiceGrpcServer&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;NodeServiceGrpcServer&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;CmdChannel&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  gRPC Client
&lt;/h4&gt;

&lt;p&gt;We don't have to care too much about the implementation of gRPC client. Instead, we only need to call the methods in the gRPC client, and the rest of the service requests and response will be handled automatically by the program.&lt;/p&gt;

&lt;h3&gt;
  
  
  Master Node
&lt;/h3&gt;

&lt;p&gt;After we implemented the node communication part, we can now write the master node, which is the core of the distributed system.&lt;/p&gt;

&lt;p&gt;Let's create a new file &lt;code&gt;node.go&lt;/code&gt; and enter the content below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;core&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/gin-gonic/gin"&lt;/span&gt;
    &lt;span class="s"&gt;"google.golang.org/grpc"&lt;/span&gt;
    &lt;span class="s"&gt;"net"&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// MasterNode is the node instance&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;MasterNode&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;api&lt;/span&gt;     &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Engine&lt;/span&gt;            &lt;span class="c"&gt;// api server&lt;/span&gt;
    &lt;span class="n"&gt;ln&lt;/span&gt;      &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listener&lt;/span&gt;           &lt;span class="c"&gt;// listener&lt;/span&gt;
    &lt;span class="n"&gt;svr&lt;/span&gt;     &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;           &lt;span class="c"&gt;// grpc server&lt;/span&gt;
    &lt;span class="n"&gt;nodeSvr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;NodeServiceGrpcServer&lt;/span&gt; &lt;span class="c"&gt;// node service&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MasterNode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Init&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c"&gt;// TODO: implement me&lt;/span&gt;
  &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"implement me"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MasterNode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c"&gt;// TODO: implement me&lt;/span&gt;
  &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"implement me"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MasterNode&lt;/span&gt;

&lt;span class="c"&gt;// GetMasterNode returns the node instance&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetMasterNode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MasterNode&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// node&lt;/span&gt;
        &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;MasterNode&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

        &lt;span class="c"&gt;// initialize node&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Init&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two placeholder methods &lt;code&gt;Init&lt;/code&gt; and &lt;code&gt;Start&lt;/code&gt; to be implemented.&lt;/p&gt;

&lt;p&gt;In the initialization method &lt;code&gt;Init&lt;/code&gt;, we will two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regster gRPC services&lt;/li&gt;
&lt;li&gt;Register API services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, we can add below code in &lt;code&gt;Init&lt;/code&gt; method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MasterNode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Init&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// grpc server listener with port as 50051&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ln&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;":50051"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// grpc server&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// node service&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodeSvr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GetNodeServiceGrpcServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// register node service to grpc server&lt;/span&gt;
    &lt;span class="n"&gt;RegisterNodeServiceServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodeSvr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// api&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/tasks"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// parse payload&lt;/span&gt;
        &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Cmd&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"cmd"`&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ShouldBindJSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AbortWithStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusBadRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c"&gt;// send command to node service&lt;/span&gt;
        &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodeSvr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CmdChannel&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Cmd&lt;/span&gt;

        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AbortWithStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we created a gRPC server, and registered &lt;code&gt;NodeServiceGrpcServer&lt;/code&gt; created before. We then use API framework &lt;code&gt;gin&lt;/code&gt; to create a simple API service, which can allow POST request to &lt;code&gt;/tasks&lt;/code&gt; to send commands to the channel &lt;code&gt;CmdChannel&lt;/code&gt; and pass to &lt;code&gt;NodeServiceGrpcServer&lt;/code&gt;. All pieces have been put together!&lt;/p&gt;

&lt;p&gt;The starting method &lt;code&gt;Start&lt;/code&gt; is quite simple, which is simplely to start the gRPC server and API server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MasterNode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// start grpc server&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ln&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// start api server&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":9092"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// wait for exit&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;下一步，我们就要实现实际做任务的工作节点了。&lt;/p&gt;

&lt;h3&gt;
  
  
  Worker Node
&lt;/h3&gt;

&lt;p&gt;现在，我们创建一个新文件 &lt;code&gt;core/worker_node.go&lt;/code&gt;，输入以下内容。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;core&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"google.golang.org/grpc"&lt;/span&gt;
    &lt;span class="s"&gt;"os/exec"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;WorkerNode&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ClientConn&lt;/span&gt;  &lt;span class="c"&gt;// grpc client connection&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;    &lt;span class="n"&gt;NodeServiceClient&lt;/span&gt; &lt;span class="c"&gt;// grpc client&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;WorkerNode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Init&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// connect to master node&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"localhost:50051"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithInsecure&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// grpc client&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NewNodeServiceClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;WorkerNode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// log&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"worker node started"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// report status&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReportStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;

    &lt;span class="c"&gt;// assign task&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AssignTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// receive command from master node&lt;/span&gt;
        &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Recv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c"&gt;// log command&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"received command: "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c"&gt;// execute command&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;exec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;workerNode&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;WorkerNode&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetWorkerNode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;WorkerNode&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;workerNode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// node&lt;/span&gt;
        &lt;span class="n"&gt;workerNode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;WorkerNode&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

        &lt;span class="c"&gt;// initialize node&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;workerNode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Init&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;workerNode&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above code, we created the gRPC client and connected it to the gRPC server in &lt;code&gt;Init&lt;/code&gt; method.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;Start&lt;/code&gt; method, we have done several things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Report status with a Simple RPC method.&lt;/li&gt;
&lt;li&gt;Assign tasks to acquire a stream with a Server-Side RPC method.&lt;/li&gt;
&lt;li&gt;Continuously receive data from the server (master node) via the acquired stream and execute commands.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now we have completed all core logics in the distributed systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Putting them all together
&lt;/h3&gt;

&lt;p&gt;Finally, we need to encapsulate these core functionalities.&lt;/p&gt;

&lt;p&gt;Create the main entry file &lt;code&gt;main.go&lt;/code&gt; and enter the content below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"go-distributed-system/core"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"master"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetMasterNode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"worker"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetWorkerNode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"invalid node type"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the simple distributed system is all done!&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Results
&lt;/h3&gt;

&lt;p&gt;We can then test the code.&lt;/p&gt;

&lt;p&gt;Open two command prompts. Enter &lt;code&gt;go run main.go master&lt;/code&gt; in one prompt to start the master node, and enter &lt;code&gt;go run main.go worker&lt;/code&gt; to start the worker node in another.&lt;/p&gt;

&lt;p&gt;If the master code starts successfully, you should be able to see the logs below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /tasks                    --&amp;gt; go-distributed-system/core.(*MasterNode).Init.func1 (3 handlers)
[GIN-debug] [WARNING] You trusted all proxies, this is NOT safe. We recommend you to set a value.
Please check https://pkg.go.dev/github.com/gin-gonic/gin#readme-don-t-trust-all-proxies for details.
[GIN-debug] Listening and serving HTTP on :9092
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For worker node, you can see logs like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;worker node started
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the master node and worker node have all started, we can open another command prompt to execute the command below to make an API call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"cmd": "touch /tmp/hello-distributed-system"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    http://localhost:9092/tasks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the worker node logs, you should be able to see &lt;code&gt;received command:  touch /tmp/hello-distributed-system&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Then let's check if the file has been created by executing &lt;code&gt;ls -l /tmp/hello-distributed-system&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-rw-r--r--  1 marvzhang  wheel     0B Oct 26 12:22 /tmp/hello-distributed-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The file was successfully created, which means the worker node has executed the task successfully. Hooray!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This article introduced a way to develop a simple distributed system written in Golang, with gRPC and built-in Go channel. &lt;/p&gt;

&lt;p&gt;Core libraries and techniques:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://grpc.io/" rel="noopener noreferrer"&gt;gRPC&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.google.com/protocol-buffers" rel="noopener noreferrer"&gt;Protocol Buffers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://golangdocs.com/channels-in-golang" rel="noopener noreferrer"&gt;channel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/gin-gonic/gin" rel="noopener noreferrer"&gt;gin&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;os/exec&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The code of the whole project is on GitHub: &lt;a href="https://github.com/tikazyq/codao-code/tree/main/2022-10/go-distributed-system" rel="noopener noreferrer"&gt;https://github.com/tikazyq/codao-code/tree/main/2022-10/go-distributed-system&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>CI/CD in Action: Manage auto builds of large open-source projects with GitHub Actions?</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Fri, 21 Oct 2022 05:55:40 +0000</pubDate>
      <link>https://dev.to/tikazyq/cicd-in-action-manage-auto-builds-of-large-open-source-projects-with-github-actions-3lcn</link>
      <guid>https://dev.to/tikazyq/cicd-in-action-manage-auto-builds-of-large-open-source-projects-with-github-actions-3lcn</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the previous article about &lt;em&gt;&lt;a href="https://dev.to/tikazyq/cicd-in-action-how-to-use-microsofts-github-actions-in-a-right-way-4g89"&gt;CI/CD in Action: How to use Microsoft's GitHub Actions in a right way?&lt;/a&gt;&lt;/em&gt;, we introduced how to use GitHub Actions workflows with a practical Python project. However, this is quite simple and not comprehensive enough for &lt;strong&gt;large projects&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;This article introduces practical CI/CD applications with GitHub Actions of my open-source project &lt;strong&gt;&lt;a href="https://github.com/crawlab-team/crawlab" rel="noopener noreferrer"&gt;Crawlab&lt;/a&gt;&lt;/strong&gt;. For those who are not familiar with Crawlab, you can refer to the &lt;a href="https://www.crawlab.cn" rel="noopener noreferrer"&gt;official site&lt;/a&gt; or &lt;a href="https://docs.crawlab.cn" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;. In short, Crawlab is a web crawler management platform for efficient data collection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overall CI/CD Architecture
&lt;/h2&gt;

&lt;p&gt;The new version of Crawlab v0.6 split general functionalities into separated modules, so that the whole project is consisted of a few dependent sub-projects. For example, the main project &lt;a href="https://github.com/crawlab-team/crawlab" rel="noopener noreferrer"&gt;crawlab&lt;/a&gt; depends on the front-end project &lt;a href="https://github.com/crawlab-team/crawlab-ui" rel="noopener noreferrer"&gt;crawlab-ui&lt;/a&gt; and back-end project &lt;a href="https://github.com/crawlab-team/crawlab-core" rel="noopener noreferrer"&gt;crawlab-core&lt;/a&gt;. Higher &lt;strong&gt;decoupling&lt;/strong&gt; and &lt;strong&gt;maintainability&lt;/strong&gt; are the benefits.&lt;/p&gt;

&lt;p&gt;Below is the diagram of the overall CI/CD architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcucf124w2zl879jrujb8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcucf124w2zl879jrujb8.png" alt="Crawlab CI/CD" width="800" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The building process of the whole Crawlab project is a little bit trivial. The ultimate deliverable or the Docker image &lt;a href="https://hub.docker.com/crawlabteam/crawlab" rel="noopener noreferrer"&gt;crawlabteam/crawlab&lt;/a&gt; depends on the main repository, which depends on the sub-projects of front-end, back-end, base images and plugins. They are come from their own repos, which again depend on upstream core-module repos. Here we have simplified the dependencies of front-end and back-end modules. &lt;/p&gt;

&lt;h2&gt;
  
  
  Front-End Building
&lt;/h2&gt;

&lt;p&gt;We start with the front-end part.&lt;/p&gt;

&lt;p&gt;The front-end repo crawlab-ui is distributed through NPM. Let's take a look at the CI/CD workflow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Publish to NPM registry&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;main&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;main&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;release&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;created&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;publish&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v2&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;12.22.7'&lt;/span&gt;
          &lt;span class="na"&gt;registry-url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://registry.npmjs.com/&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get version&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "TAG_VERSION=${GITHUB_REF#refs/*/}" &amp;gt;&amp;gt; $GITHUB_ENV&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install dependencies&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yarn install&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yarn run build&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;TAG_VERSION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{env.TAG_VERSION}}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.event_name == 'release' }}&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Publish npm&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm publish --registry ${REGISTRY}&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;NODE_AUTH_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{secrets.NPM_PUBLISH_TOKEN}}&lt;/span&gt;
          &lt;span class="na"&gt;TAG_VERSION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{env.TAG_VERSION}}&lt;/span&gt;
          &lt;span class="na"&gt;REGISTRY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://registry.npmjs.com/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are some important parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set up Node.js environment &lt;code&gt;uses: actions/setup-node@v2&lt;/code&gt; and its version &lt;code&gt;node-version: '12.22.7'&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Install dependencies &lt;code&gt;run: yarn install&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Build the package &lt;code&gt;yarn run build&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Publish the package to NPM registry &lt;code&gt;npm publish --registry ${REGISTRY}&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The token for publishing NPM package is  &lt;code&gt;${{secrets.NPM_PUBLISH_TOKEN}}&lt;/code&gt;, which is a GitHub &lt;strong&gt;secret&lt;/strong&gt; configured by the repo owner, and private to the public for security reasons.&lt;/p&gt;

&lt;p&gt;After the workflow is set up, a GitHub Actions workflow job will be &lt;strong&gt;automatically triggered&lt;/strong&gt; once any code commit is push to crawlab-ui.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6kbhfczy7yyvxoz57n8q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6kbhfczy7yyvxoz57n8q.png" alt="image-20221021113449174" width="800" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We barely need to take care of anything for NPM package publishing, because it is fully automated. Awesome!&lt;/p&gt;

&lt;h2&gt;
  
  
  Base Image Building
&lt;/h2&gt;

&lt;p&gt;Let's see another special workflow: base image building. The GitHub repo is &lt;a href="https://github.com/crawlab-team/docker-base-images" rel="noopener noreferrer"&gt;docker-base-images&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As the new published base image needs to be integrated into the final Docker image, we need to re-trigger a workflow job in crawlab once it is built. Let's see how this workflow is configured.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Docker crawlab-base&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;main&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;release&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;published&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;repository_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;crawlab-base&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;IMAGE_PATH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crawlab-base&lt;/span&gt;
  &lt;span class="na"&gt;IMAGE_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crawlabteam/crawlab-base&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Image&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v2&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get changed files&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;changed-files&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tj-actions/changed-files@v18.7&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Check matched&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;# check changed files&lt;/span&gt;
          &lt;span class="s"&gt;for file in ${{ steps.changed-files.outputs.all_changed_files }}; do&lt;/span&gt;
            &lt;span class="s"&gt;if [[ $file =~ ^\.github/workflows/.* ]]; then&lt;/span&gt;
              &lt;span class="s"&gt;echo "file ${file} is matched"&lt;/span&gt;
              &lt;span class="s"&gt;echo "is_matched=1" &amp;gt;&amp;gt; $GITHUB_ENV&lt;/span&gt;
              &lt;span class="s"&gt;exit 0&lt;/span&gt;
            &lt;span class="s"&gt;fi&lt;/span&gt;
            &lt;span class="s"&gt;if [[ $file =~ ^${IMAGE_PATH}/.* ]]; then&lt;/span&gt;
              &lt;span class="s"&gt;echo "file ${file} is matched"&lt;/span&gt;
              &lt;span class="s"&gt;echo "is_matched=1" &amp;gt;&amp;gt; $GITHUB_ENV&lt;/span&gt;
              &lt;span class="s"&gt;exit 0&lt;/span&gt;
            &lt;span class="s"&gt;fi&lt;/span&gt;
          &lt;span class="s"&gt;done&lt;/span&gt;

          &lt;span class="s"&gt;# force trigger&lt;/span&gt;
          &lt;span class="s"&gt;if [[ ${{ inputs.forceTrigger }} == true ]]; then&lt;/span&gt;
              &lt;span class="s"&gt;echo "is_matched=1" &amp;gt;&amp;gt; $GITHUB_ENV&lt;/span&gt;
              &lt;span class="s"&gt;exit 0&lt;/span&gt;
          &lt;span class="s"&gt;fi&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build image&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ env.is_matched == '1' }}&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;cd $IMAGE_PATH&lt;/span&gt;
          &lt;span class="s"&gt;docker build . --file Dockerfile --tag image&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Log into registry&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ env.is_matched == '1' }}&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo ${{ secrets.DOCKER_PASSWORD}} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Push image&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ env.is_matched == '1' }}&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;IMAGE_ID=$IMAGE_NAME&lt;/span&gt;

          &lt;span class="s"&gt;# Strip git ref prefix from version&lt;/span&gt;
          &lt;span class="s"&gt;VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')&lt;/span&gt;

          &lt;span class="s"&gt;# Strip "v" prefix from tag name&lt;/span&gt;
          &lt;span class="s"&gt;[[ "${{ github.ref }}" == "refs/tags/"* ]] &amp;amp;&amp;amp; VERSION=$(echo $VERSION | sed -e 's/^v//')&lt;/span&gt;

          &lt;span class="s"&gt;# Use Docker `latest` tag convention&lt;/span&gt;
          &lt;span class="s"&gt;[ "$VERSION" == "main" ] &amp;amp;&amp;amp; VERSION=latest&lt;/span&gt;

          &lt;span class="s"&gt;echo IMAGE_ID=$IMAGE_ID&lt;/span&gt;
          &lt;span class="s"&gt;echo VERSION=$VERSION&lt;/span&gt;

          &lt;span class="s"&gt;docker tag image $IMAGE_ID:$VERSION&lt;/span&gt;
          &lt;span class="s"&gt;docker push $IMAGE_ID:$VERSION&lt;/span&gt;

          &lt;span class="s"&gt;if [[ $VERSION == "latest" ]]; then&lt;/span&gt;
            &lt;span class="s"&gt;docker tag image $IMAGE_ID:main&lt;/span&gt;
            &lt;span class="s"&gt;docker push $IMAGE_ID:main&lt;/span&gt;
          &lt;span class="s"&gt;fi&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Trigger other workflows&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ env.is_matched == '1' }}&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;peter-evans/repository-dispatch@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.WORKFLOW_ACCESS_TOKEN }}&lt;/span&gt;
          &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crawlab-team/crawlab&lt;/span&gt;
          &lt;span class="na"&gt;event-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker-crawlab&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see in the workflow, the last step  &lt;code&gt;name: Trigger other workflows&lt;/code&gt; will &lt;strong&gt;trigger another GitHub Actions workflow job in another GitHub repo&lt;/strong&gt; crawlab-team/crawlab through &lt;code&gt;peter-evans/repository-dispatch@v2&lt;/code&gt;, a re-usable action. That means, if we make modifications in the base image code and push the commits, the base image will be built automatically before it triggers another workflow job in the repo crawlab to build the final image.&lt;/p&gt;

&lt;p&gt;This is so great! We can sit down and take a coffee, waiting for the job to finish, instead of doing any manual work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Today we introduced the use of GitHub Actions in the large open-source project Crawlab along with its automatic building process and overall CI/CD architecture. Overall, GitHub Actions supports the CI/CD integration of large projects quite well. &lt;/p&gt;

&lt;p&gt;Techniques used:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Automatic triggers to build&lt;/li&gt;
&lt;li&gt;Publish NPM packages&lt;/li&gt;
&lt;li&gt;Repo secrets&lt;/li&gt;
&lt;li&gt;Trigger workflows in other repos&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The code of the whole project is in the repos of Crawlab on GitHub and publicly available.&lt;/p&gt;

</description>
      <category>github</category>
      <category>devops</category>
    </item>
    <item>
      <title>Talking Algorithm: The hidden secret of nature in the divide-and-conquer algorithm</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Tue, 18 Oct 2022 06:57:01 +0000</pubDate>
      <link>https://dev.to/tikazyq/talking-algorithm-the-hidden-secret-of-nature-in-the-divide-and-conquer-algorithm-2d0o</link>
      <guid>https://dev.to/tikazyq/talking-algorithm-the-hidden-secret-of-nature-in-the-divide-and-conquer-algorithm-2d0o</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"The empire, long divided, must unite; long united, must divide. " -- &lt;em&gt;The Romance of the Three Kingdoms&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is hard to deny the importance of &lt;strong&gt;algorithms&lt;/strong&gt; in our modern IT industry. A well designed algorithm can run software programs with the minimal resources in the most efficient way. Therefore, algorithms are important, so that IT companies set high standards in the hiring process. Think about the algorithm tests in the technical interviews. Many people might feel that algorithms are quite distant from us, but I think its efficiency enhancement comes naturally, as the reason behind can be found in nature.&lt;/p&gt;

&lt;h2&gt;
  
  
  From a snowflake
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6c0tpocl96j54cf1r7ub.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6c0tpocl96j54cf1r7ub.jpg" alt="snowflake" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we all know, a snowflake is beautiful, because of not only its shining appearing, but also the shape. It looks like a polished hexagon, and each branch is a snowflake-like sub-hexagon. The term for this structure with recursive self similarity is &lt;strong&gt;Fractal&lt;/strong&gt;. Fractals are so common in nature, such as tree roots and tree leaf branches, pulmonary capillaries, natural coastal lines, or even broken glasses. &lt;/p&gt;

&lt;p&gt;So, why? What is the reason that fractals are so common in nature? Is it the design from Gods, or some &lt;strong&gt;fundamental mathematical laws&lt;/strong&gt; behind it? Academic researchers believe in the latter. According the thermodynamics, snowflakes are formed when water vapor encounters an abrupt drop in temperature; according to fluid mechanics, the tree-like structure of pulmonary capillaries would allow the oxygen to be absorbed by red blood cells. In conclusion, &lt;strong&gt;fractals exist for efficiencies&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Divide-and-conquer algorithm
&lt;/h2&gt;

&lt;p&gt;Now let's go back to the algorithm. For those who are familiar with algorithms should have known the techniques using the divide-and-conquer methodology. Its fast speed comes from the process where a large piece is recursively divided into the most granular parts and then merged back, or the self-similar recursion. It is just like the fractal theory. Another example of fractal principles is the balanced tree (B-tree) database index, which transform unstructured data layer-by-layer into a tree-like structure, and ultimately make database querying much faster. The process of optimizing B-tree indexes is identical to that of optimizing the querying efficiency. The math expression &lt;em&gt;LogN&lt;/em&gt; you often see in the complexity analysis, comes from the sum of arithmetic sequence in fractals. You can try deriving the equations through apply your mathematical knowledge from middle school.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is the world a fractal?
&lt;/h2&gt;

&lt;p&gt;The fractal theory is a new scientific research area, and many scientists associate it with &lt;strong&gt;the theory of complex systems&lt;/strong&gt;. Many complex systems are formed from some simple matters or laws. For example, ant colonies are like intelligent living individuals who can do loads of smart things, but only replying on very little information hormones. Life is made of cells, and cells come from DNAs, and DNAs are formed by micro particles including molecules, protons, and neutrons. &lt;strong&gt;Everything in the world is made up by these physical models&lt;/strong&gt;. Just like the English poet put it, "To see a World in a Grain of Sand, And a Heaven in a Wild Flower". &lt;/p&gt;

&lt;p&gt;Although the fractal theory explains most of the formation in nature, it is still imperfect. As we are amazed by the powerfulness of the fractal theory, we should realize that it is not ideal, and we should explore as we see. &lt;/p&gt;

</description>
      <category>algorithms</category>
    </item>
    <item>
      <title>CI/CD in Action: How to use Microsoft's GitHub Actions in a right way?</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Fri, 14 Oct 2022 03:32:17 +0000</pubDate>
      <link>https://dev.to/tikazyq/cicd-in-action-how-to-use-microsofts-github-actions-in-a-right-way-4g89</link>
      <guid>https://dev.to/tikazyq/cicd-in-action-how-to-use-microsofts-github-actions-in-a-right-way-4g89</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://docs.github.com/en/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt;&lt;/strong&gt; is the official &lt;strong&gt;CI/CD&lt;/strong&gt; workflow service provided by GitHub. It is aimed at making it easy for open-source project contributors to manage operational maintenance, and enable open-source communities to embrace cloud-native &lt;strong&gt;DevOps&lt;/strong&gt;. GitHub Actions is integrated into most of my open-source projects including &lt;a href="https://github.com/crawlab-team/crawlab" rel="noopener noreferrer"&gt;Crawlab&lt;/a&gt; and &lt;a href="https://github.com/crawlab-team/artipub" rel="noopener noreferrer"&gt;ArtiPub&lt;/a&gt;. As a contributor, I think GitHub Actions is not only easy to use, but also free (which is the most important). Therefore, I hope this article will allow open-source project contributors who are not familiar with GitHub Actions, to really get ideas on how to utilize it and make an impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting from documentation
&lt;/h2&gt;

&lt;p&gt;For those who are not familiar with GitHub Actions, it is strongly recommended that you read &lt;a href="https://docs.github.com/en/actions" rel="noopener noreferrer"&gt;the official documentation&lt;/a&gt; first, where you can find &lt;a href="https://youtu.be/cP0I9w2coGU" rel="noopener noreferrer"&gt;Introduction Video&lt;/a&gt;, &lt;a href="https://docs.github.com/en/actions/quickstart" rel="noopener noreferrer"&gt;Quick Start&lt;/a&gt;, &lt;a href="https://docs.github.com/en/actions/examples" rel="noopener noreferrer"&gt;Examples&lt;/a&gt;, concepts, how it works, etc. If you read through the docs, you can easily do GitHub DevOps with your own experience in CI/CD. References of all codes in this article can be found on the official documentation,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fst0fjn386v7uyz6xxqxf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fst0fjn386v7uyz6xxqxf.png" alt="GitHub Actions Docs" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Ideas
&lt;/h2&gt;

&lt;p&gt;Let's first figure out what we would like to implement, i.e. using GitHub Actions to run a web crawler to get daily ranking from &lt;a href="https://github.com/trending" rel="noopener noreferrer"&gt;GitHub Trending&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Steps of implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Upload crawler code to GitHub repo.&lt;/li&gt;
&lt;li&gt;Create and commit a GitHub Actions workflow.&lt;/li&gt;
&lt;li&gt;Trigger the GitHub Actions workflow to run the crawler.&lt;/li&gt;
&lt;li&gt;Check out the status of crawling tasks.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Crawler code
&lt;/h2&gt;

&lt;p&gt;Now we can start coding!&lt;/p&gt;

&lt;p&gt;Let's first upload the code to GitHub repo. As our topic today is focused on GitHub Actions, we are not going to dig deep into the code details.&lt;/p&gt;

&lt;p&gt;The code is as below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://github.com/trending&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;article.Box-row&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;repo_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;h1 a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;p&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;stars&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a.Link--muted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;repo_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stars&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; stars): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The first workflow
&lt;/h2&gt;

&lt;p&gt;We can click on the &lt;strong&gt;Actions&lt;/strong&gt; tab in the GitHub repo page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ot6dh2yvqtkn6dmlv4f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ot6dh2yvqtkn6dmlv4f.png" alt="GitHub Actions Repo Page" width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then you can see the welcome page as above, which indicates there is no workflows in your repo yet. You may also find there are some introductions as well as popular template workflows, where you can click the &lt;code&gt;Configure&lt;/code&gt; button to create a new workflow.&lt;/p&gt;

&lt;p&gt;We can search the keyword &lt;code&gt;Python&lt;/code&gt;, then we can find a workflow that runs Python programs. Let's use it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4koj1r5odvp531240zdb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4koj1r5odvp531240zdb.png" alt="Python Application Workflow" width="570" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After clicking the &lt;code&gt;Configure&lt;/code&gt; button, you will enter the page as below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hqrsj4ve79xxr8tmziy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hqrsj4ve79xxr8tmziy.png" alt="Create Workflow" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A GitHub Actions workflow is actually a &lt;strong&gt;&lt;code&gt;YAML&lt;/code&gt; configuration file&lt;/strong&gt;, similar to popular concepts like &lt;code&gt;PaaS&lt;/code&gt;, &lt;code&gt;IaaS&lt;/code&gt;, cloud-native applications, which automate configurations through codes.&lt;/p&gt;

&lt;p&gt;There is already some default workflow code, and we can slightly modify it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GitHub Trending Crawler&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Set up Python &lt;/span&gt;&lt;span class="m"&gt;3.10&lt;/span&gt;
      &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v3&lt;/span&gt;
      &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;python-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.10"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install dependencies&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;cd 2022-10/github-actions-python&lt;/span&gt;
        &lt;span class="s"&gt;python -m pip install --upgrade pip&lt;/span&gt;
        &lt;span class="s"&gt;if [ -f requirements.txt ]; then pip install -r requirements.txt; fi&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;cd 2022-10/github-actions-python&lt;/span&gt;
        &lt;span class="s"&gt;python main.py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Click the &lt;code&gt;Commit&lt;/code&gt; button on the web UI, and you can find that the workflow configuration file  &lt;code&gt;github-trending-crawler.yml&lt;/code&gt; is generated on the directory  &lt;code&gt;.github/workflows&lt;/code&gt; .&lt;/p&gt;

&lt;p&gt;Overall, the workflow above has done several things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Checkout the repo's code&lt;/li&gt;
&lt;li&gt;Set up Python environment&lt;/li&gt;
&lt;li&gt;Install dependencies&lt;/li&gt;
&lt;li&gt;Run the program&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  View task status
&lt;/h2&gt;

&lt;p&gt;As you submit the workflow commit, a CI/CD task will &lt;strong&gt;automatically be triggered&lt;/strong&gt; given the default triggering conditions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It means the workflow will be auto-triggered once we push commits or create pull requests.&lt;/p&gt;

&lt;p&gt;Now if go to the &lt;code&gt;Actions&lt;/code&gt; tab to check out the running status, you can find a list of workflow runs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5x25qehx680eymtrjh5l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5x25qehx680eymtrjh5l.png" alt="GitHub Actions Executions" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's click the latest one and click the &lt;code&gt;build&lt;/code&gt; job, you can see relevant logs. Now click to expand the details of the step &lt;code&gt;Run&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jp209d0miqak0wys7hn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jp209d0miqak0wys7hn.png" alt="GitHub Actions Logs" width="800" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the top repos on GitHub Trending are already printed out in the logs, as expected. The only thing left is to validate if it is in line with the actual ranking.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdip25t46lskx12402ui8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdip25t46lskx12402ui8.png" alt="GitHub Trending" width="800" height="664"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The content on the page is exactly the same as the output results are exactly. That's awesome, all done!&lt;/p&gt;

&lt;h2&gt;
  
  
  Scheduled workflow
&lt;/h2&gt;

&lt;p&gt;How can we run crawlers without &lt;strong&gt;schedules&lt;/strong&gt;? And here we are, GitHub Actions supports that just fine. Let's add it with pleasure.&lt;/p&gt;

&lt;p&gt;Open the edit panel of the created workflow, and add some code in the trigger part as below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cron expression &lt;code&gt;0 * * * *&lt;/code&gt; means triggering at 0 every hour. For those who are unfamiliar with cron expression, please refer to  &lt;a href="https://crontab.guru/" rel="noopener noreferrer"&gt;Cron Guru&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After the edit is committed, we can see the hourly executions of the &lt;strong&gt;crawler's scheduled tasks&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We introduced how to use GitHub Actions workflows to deploy a simple web crawling schedule task, with techniques as below.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;GitHub repo&lt;/li&gt;
&lt;li&gt;GitHub Actions workflows, including scheduled triggering&lt;/li&gt;
&lt;li&gt;Python crawler development, including dependencies &lt;a href="https://pypi.org/project/requests/" rel="noopener noreferrer"&gt;requests&lt;/a&gt; and &lt;a href="https://pypi.org/project/bs4/" rel="noopener noreferrer"&gt;bs4&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The code of the whole project is on GitHub: &lt;a href="https://github.com/tikazyq/codao-code/tree/main/2022-10/go-task-scheduler" rel="noopener noreferrer"&gt;https://github.com/tikazyq/codao-code/tree/main/2022-10/go-task-scheduler&lt;/a&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>devops</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>Talking Testing: the love and hate of Unit Tests</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Tue, 11 Oct 2022 10:13:05 +0000</pubDate>
      <link>https://dev.to/tikazyq/talking-testing-the-love-and-hate-of-unit-tests-5gl</link>
      <guid>https://dev.to/tikazyq/talking-testing-the-love-and-hate-of-unit-tests-5gl</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"No code is the best way to write secure and reliable applications."--Kelsey Hightower&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Many developers have probably more or less heard of &lt;strong&gt;Unit Tests&lt;/strong&gt;, and even written one and are familiar with it. However, in the volatile and fast changing environment, unit tests seem to be in an embarrassed situation. Developers know it is useful, but treat it with neglect. "The schedule is tight. What time do we have for unit tests?" Does it sound familiar?&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Unit Test?
&lt;/h2&gt;

&lt;p&gt;Unit Test is some lines of code written by developers to validate whether their own functional codes can run as expected. If the code is not passed, it means the functional codes are problematic.&lt;/p&gt;

&lt;p&gt;This &lt;em&gt;self-testing&lt;/em&gt; method looks self-deceiving, similar to taking an exam with official answers. In validation area, this term is called &lt;strong&gt;White Box Test&lt;/strong&gt;. The counterpart of White Box Test is &lt;strong&gt;Black Box Test&lt;/strong&gt; which uses other methods to validate things. Unit Test is White Box Test, and higher-level testing methods such as &lt;em&gt;Integration Test,&lt;/em&gt; &lt;em&gt;End to End Test&lt;/em&gt;, and &lt;em&gt;UI Test&lt;/em&gt; are all Black Box Tests. &lt;strong&gt;Unit Tests only test code itself&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvasej6kp63qhfaxtpsrs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvasej6kp63qhfaxtpsrs.png" alt="Testing Pyramid" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the benefits of unit testing?
&lt;/h2&gt;

&lt;p&gt;Unit testing is a very useful tool in &lt;strong&gt;Agile Development&lt;/strong&gt;. Some agile frameworks, such as eXtreme Programming (XP), requires that every feature must be covered by unit test cases. My previous article &lt;a href="https://dev.to/tikazyq/talking-agile-are-you-sure-your-team-is-practicing-agile-properly-1l5"&gt;Talking Agile: Are you sure your team is practicing Agile properly&lt;/a&gt; mentioned the importance of unit tests.&lt;/p&gt;

&lt;p&gt;Overall, unit tests have the main benefits as below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sustainable Quality Assurance&lt;/strong&gt;: Unit tests can make sure functional codes working as expected after being changed or refactored.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic Integration&lt;/strong&gt;: Unit tests are normally integrated in CI/CD pipelines, and will be triggered once the code is committed. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature Documentation&lt;/strong&gt;: Unit test cases can help new maintainers to get familiar with features and validation criteria.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, it looks like unit tests can bring a decent amount of benefits to software development. &lt;/p&gt;

&lt;h2&gt;
  
  
  Why give up unit testing practice?
&lt;/h2&gt;

&lt;p&gt;Unit testing can improve our product quality and test efficiency, so why so many developers are not fond of writing unit tests? According to JetBrains, only 57% of respondents write unit tests, and only 35% will integrate automatic tests in most of their projects.&lt;/p&gt;

&lt;p&gt;So why is that? There are several possible reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More work&lt;/strong&gt;: "I need to write 100 lines of code to test a feature with only 50. How can I get it done without overtime?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-confidence&lt;/strong&gt;: "As a senior developer, can you imagine there are bugs in my code?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not useful&lt;/strong&gt;: "Do I really need to write unit tests for a limited number of pages?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dedicated testers&lt;/strong&gt;: "Here you have QAs. Aren't they the guys to find bugs?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, they are not strong enough to be correct. Firstly, the amount of codes for bug fixing is way more larger than that. Apart from that, statistically speaking, there is a chance for a highly skilled developer to make mistakes if they write enough code. Furthermore, simple modules referenced by many other ones can become too important to fail. Finally, the responsibility of QA is ensure the overall quality instead of the locality of robustness, and have you seen any high-quality products full of buggy components?&lt;/p&gt;

&lt;h2&gt;
  
  
  To be farsighted
&lt;/h2&gt;

&lt;p&gt;In fact, the main reason for the issue is the &lt;strong&gt;mindset&lt;/strong&gt;. Most of the time developers tend to think from an individual and short-term perspective, instead of considering &lt;strong&gt;long-term benefits&lt;/strong&gt;. As a good developer, he or she should develop something that is of most value in a most efficient way. Therefore, the fact that unit tests cannot bring visible benefits in a short period, makes it neglected by the majority.&lt;/p&gt;

&lt;p&gt;Unit testing is like reading books, which cannot quickly make people knowledgeable, wealthy or famous, but can be significantly effective in the long run. If unit testing becomes the corporate culture, it is much easier to deliver high-quality products and services. Promoting it requires something more: either fundamental agile processes like XP or TDD, or high-level management such as CTO or architects.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>agile</category>
    </item>
    <item>
      <title>Talking Data: What do we need for engaging data analytics?</title>
      <dc:creator>Marvin Zhang</dc:creator>
      <pubDate>Thu, 06 Oct 2022 15:58:46 +0000</pubDate>
      <link>https://dev.to/tikazyq/talking-data-what-do-we-need-for-engaging-data-analytics-3g9k</link>
      <guid>https://dev.to/tikazyq/talking-data-what-do-we-need-for-engaging-data-analytics-3g9k</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"According to incomplete statistics, the proportion of white hair of data workers is higher than the average of the same age group."&lt;/em&gt; by a data worker&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Data&lt;/em&gt;, a familiar but mysterious word, has become a totem pursued by everyone. Managers love fancy data reports, data analysts are keen on building complicated statistical models, and salesmen take dashboards as compasses to see whether they can complete their KPIs. Since over ten years ago, the data industry has been developing fast, and there have been some novel yet formidable jargons, such as &lt;em&gt;Big Data&lt;/em&gt;, &lt;em&gt;Data Science&lt;/em&gt;, &lt;em&gt;Data Lake&lt;/em&gt;, &lt;em&gt;Data Mesh&lt;/em&gt;, &lt;em&gt;Data Governance&lt;/em&gt;. Yet the "traditional" terms are still abstruse:  &lt;em&gt;Data Warehouse&lt;/em&gt;, &lt;em&gt;Business Intelligence&lt;/em&gt;, &lt;em&gt;Data Mart&lt;/em&gt;, &lt;em&gt;Data Mining&lt;/em&gt;. What is more headachy is that many people are still unable to understand their relationship with recently popular concepts such as &lt;em&gt;Artificial Intelligence&lt;/em&gt;, &lt;em&gt;Machine Learning&lt;/em&gt;, and &lt;em&gt;Deep Learning&lt;/em&gt;. These hot buzzwords are the results of aggressive development in the data area.&lt;/p&gt;

&lt;h2&gt;
  
  
  Professional Doctor or Fortune Teller?
&lt;/h2&gt;

&lt;p&gt;Years ago, with the rapid development of the Internet industry, the bubble of the data industry was getting larger. Data, the by-product of the Internet applications, has large volumes and diversities. Data owners would like to get the most out of it and regard it as the gold mine. Therefore, data mining engineers became one of the most popular professionals. Later, a brand new yet more popular position &lt;em&gt;Data Scientist&lt;/em&gt; emerged as "the sexiest job in the 21st century".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg22whx0ou51ypnz5u1nt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg22whx0ou51ypnz5u1nt.jpg" alt="data-science" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The popularity of data scientists is its requirement for abilities and experience in various areas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Programming Skills&lt;/strong&gt;: at least able to use Python or R to do data cleansing, analysis and modeling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mathematics and Statistics&lt;/strong&gt;: familiar with probability theory, calculus, and discrete mathematics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business Knowledge&lt;/strong&gt;: deep understanding of market, process and macro trends in related areas.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communication Skills&lt;/strong&gt;: able to convey insights and analysis results in a human-friendly way.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, the sexiness of data scientist comes from its high barriers, because people who are excellent in all above are quite rare. However, even for the versatile talents, many data science projects would ultimately fail. The 2 major issues are &lt;strong&gt;scale&lt;/strong&gt; and &lt;strong&gt;quality&lt;/strong&gt;. According to CrowdFlower, for data science projects in 2016, 80% of the time was spent on data collection and data cleansing, whilst only 20% was spent on analysis and modeling. This is a huge waste.&lt;/p&gt;

&lt;p&gt;As most system architectures of enterprises are unable to support large-scale and high-quality data processing pipelines, so some work has to be manually done, just like the so called "human intelligence". There are their data models with low prediction accuracy, so  their data scientists are labelled as "quacks" or "fortune tellers". To become a true "professional doctor", you need not only the "professional medical knowledge" (&lt;strong&gt;core abilities&lt;/strong&gt;), but also the support from "professional medical equipment" (&lt;strong&gt;architecture and process&lt;/strong&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Where is the way？
&lt;/h2&gt;

&lt;p&gt;Many data workers are complaining about the fierce competition in the data area. Fortunately, the situation seems to be improving. Data analysts had to manually analyze distribution charts for deep insights, but now they can use smart machine learning models to &lt;strong&gt;automate&lt;/strong&gt; this process. Traditional data analysis and modeling skills have been gradually becoming easy. For instance, &lt;a href="https://powerbi.microsoft.com/en-us/" rel="noopener noreferrer"&gt;Power BI&lt;/a&gt; or &lt;a href="https://www.tableau.com/" rel="noopener noreferrer"&gt;Tableau&lt;/a&gt; allow users to use a drag-and-drop low-code fashion to generate visual charts and models, whilst the old way is to import Python libraries such as &lt;a href="https://pandas.pydata.org/" rel="noopener noreferrer"&gt;pandas&lt;/a&gt;, &lt;a href="https://matplotlib.org/" rel="noopener noreferrer"&gt;matplotlib&lt;/a&gt; and &lt;a href="https://scikit-learn.org/" rel="noopener noreferrer"&gt;sklearn&lt;/a&gt; to do the same in &lt;a href="https://jupyter.org/" rel="noopener noreferrer"&gt;Jupyter Notebook&lt;/a&gt;. Open-source projects &lt;a href="https://superset.apache.org/" rel="noopener noreferrer"&gt;Apache Superset&lt;/a&gt; and &lt;a href="https://www.metabase.com/" rel="noopener noreferrer"&gt;Metabase&lt;/a&gt; allow users to easily analyze data on the web pages. This is quite similar to the development of digital cameras, from the film cameras to digital cameras and to smartphone cameras used by everyone. With lower and lower technical barriers, the whole industry can be developing fast. "Everyone can be data analyst" will no longer be a fantasy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa13hldha72abj391k8d1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa13hldha72abj391k8d1.png" alt="powerbi" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, &lt;strong&gt;data quality&lt;/strong&gt; is still an issue. Although we can automatically fill missing data and correct wrong data with intelligent machine learning models, most of the time manual interactions are still needed. The powerful AI models based on deep learning are trained from a large amount of manually annotated data. As a result, many organizations are promoting &lt;strong&gt;data standardization&lt;/strong&gt;, an essential part of &lt;strong&gt;data governance.&lt;/strong&gt; &lt;em&gt;Garbage in ,garbage out&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  No Silver Bullet
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Data automation&lt;/strong&gt; and &lt;strong&gt;data standardization&lt;/strong&gt; is the mega trend in the future development. However, we should not regard it as the only solution to deal with data issue, given its wide range of application areas. Apart from fundamental professional data skills, the more important skills for current data workers would &lt;strong&gt;data sensitivity&lt;/strong&gt; and &lt;strong&gt;logical thinking&lt;/strong&gt;, which would not be taught in textbooks or courses. They have to come from project experience. Some seemingly high-profile terms, may not be as useful as those &lt;strong&gt;simple and practical methodologies&lt;/strong&gt;. &lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analytics</category>
      <category>python</category>
    </item>
  </channel>
</rss>
