<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Flo Comuzzi</title>
    <description>The latest articles on DEV Community by Flo Comuzzi (@flopi).</description>
    <link>https://dev.to/flopi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F105190%2F57233942-823a-428d-9e9c-7bfab50b266e.png</url>
      <title>DEV Community: Flo Comuzzi</title>
      <link>https://dev.to/flopi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/flopi"/>
    <language>en</language>
    <item>
      <title>PART 3 A Helm Chart for Ephemeral Environments</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Fri, 03 Oct 2025 17:46:29 +0000</pubDate>
      <link>https://dev.to/flopi/part-3-a-helm-chart-for-ephemeral-environments-4m4d</link>
      <guid>https://dev.to/flopi/part-3-a-helm-chart-for-ephemeral-environments-4m4d</guid>
      <description>&lt;h3&gt;
  
  
  Part 3 — Ephemeral Environments with Helm and Argo CD (Starry IDP)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: &lt;em&gt;We package a complete, isolated preview stack (two frontends, one backend, Redis, two DBs) into a single Helm chart&lt;/em&gt; and reconcile it with Argo CD. Each preview lives in its own namespace, gets secrets from External Secrets, exposes stable ingress with managed TLS, and tears down cleanly via prune/TTL—shrinking feedback loops and avoiding collisions in shared dev/stage.&lt;/p&gt;

&lt;p&gt;Note: I use preview and ephemeral environment terms interchangeably. The point is to emphasize that these are short-lived environment instances.&lt;/p&gt;

&lt;h3&gt;
  
  
  Who this is for
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Platform teams&lt;/strong&gt; adopting GitOps on GKE.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;App teams&lt;/strong&gt; wanting per-PR previews with minimal toil.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites and assumptions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GKE with GCE Ingress, ManagedCertificate available.&lt;/li&gt;
&lt;li&gt;Artifact Registry for images; Google Secret Manager for secrets.&lt;/li&gt;
&lt;li&gt;Argo CD installed and reachable by the cluster.&lt;/li&gt;
&lt;li&gt;Optional: External Secrets Operator (GSM integration), Workload Identity.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Why previews (recap)
&lt;/h3&gt;

&lt;p&gt;Shared environments create collisions, noisy logs, version skew, and review friction. Previews isolate each change into its own namespace with predictable URLs, allowing fast, deterministic testing without blocking teammates.&lt;/p&gt;




&lt;h3&gt;
  
  
  Architecture (at a glance)
&lt;/h3&gt;

&lt;p&gt;A preview environment has two frontends that connect to a backend. The backend connects to a Redis instance and two databases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlhd9566f6qi4q6apk76.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlhd9566f6qi4q6apk76.png" alt=" " width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  From PR to preview (sequence)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3rg2zlgqb10hyd7cw1ar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3rg2zlgqb10hyd7cw1ar.png" alt=" " width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Helm chart below is what Argo CD renders when creating an ephemeral/preview environment.&lt;/p&gt;




&lt;h3&gt;
  
  
  File hierarchy (Helm chart)
&lt;/h3&gt;

&lt;p&gt;You can create a Git repository with a similar file hierarchy. Each file is a template for a Kubernetes resource needed for an ephemeral environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ephemeral-environment-helm/
├─ Chart.yaml
├─ values.yaml
├─ values.preview.yaml            # optional per-env defaults (e.g., TTL/resource caps)
├─ values.schema.json             # optional, validates user-provided values
├─ charts/                        # optional, vendored dependencies
├─ templates/
│  ├─ _helpers.tpl               # names/labels templates
│  ├─ configmap.yaml             # non‑secret settings
│  ├─ externalsecret.yaml        # pulls secrets from GSM (or your vault)
│  ├─ serviceaccount.yaml        # workload identity
│  ├─ rbac-role.yaml             # minimal namespace Role
│  ├─ rbac-rolebinding.yaml
│  ├─ ingress.yaml               # GCE Ingress with hosts per app/env
│  ├─ managedcertificate.yaml    # TLS for Ingress hosts (GKE)
│  ├─ service-backend.yaml
│  ├─ deployment-backend.yaml
│  ├─ hpa-backend.yaml           # optional autoscaling
│  ├─ service-frontend1.yaml
│  ├─ deployment-frontend1.yaml
│  ├─ hpa-frontend1.yaml         # optional
│  ├─ service-frontend2.yaml
│  ├─ deployment-frontend2.yaml
│  ├─ hpa-frontend2.yaml         # optional
│  ├─ redis-statefulset.yaml
│  ├─ redis-service.yaml
│  ├─ db1-statefulset.yaml       # ephemeral DB1 (init/fixtures optional)
│  ├─ db1-service.yaml
│  ├─ db2-statefulset.yaml       # ephemeral DB2
│  ├─ db2-service.yaml
│  ├─ cronjob-cleanup.yaml       # TTL enforcement / garbage collection
│  ├─ networkpolicy.yaml         # optional, if using NetworkPolicies
│  └─ NOTES.txt                  # optional Helm install notes
└─ README.md                     # chart overview and values reference
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Minimal configuration snippets
&lt;/h3&gt;

&lt;p&gt;Small, copy‑pasteable examples to get a preview running.&lt;/p&gt;

&lt;h4&gt;
  
  
  values.yaml (minimal)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;global&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ttlMinutes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;starry.env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;preview-123&lt;/span&gt;

&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gce&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sample-be.preview-123.starry.mycompany.com&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sample-fe-1.preview-123.starry.mycompany.com&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sample-fe-2.preview-123.starry.mycompany.com&lt;/span&gt;
  &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;managedCertificate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-docker.pkg.dev/myproj/sample-be&lt;/span&gt;
    &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pr-123-abcd123&lt;/span&gt;
    &lt;span class="na"&gt;pullPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IfNotPresent&lt;/span&gt;
  &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;REDIS_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis://starry-redis-master:6379&lt;/span&gt;
    &lt;span class="na"&gt;DB1_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres://db1:5432/app&lt;/span&gt;
    &lt;span class="na"&gt;DB2_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres://db2:5432/app&lt;/span&gt;

&lt;span class="na"&gt;frontend1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-docker.pkg.dev/myproj/sample-fe-1&lt;/span&gt;
    &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pr-123-abcd123&lt;/span&gt;
  &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;VITE_API_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://sample-be.preview-123.starry.mycompany.com&lt;/span&gt;

&lt;span class="na"&gt;frontend2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-docker.pkg.dev/myproj/sample-fe-2&lt;/span&gt;
    &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pr-123-abcd123&lt;/span&gt;
  &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;VITE_API_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://sample-be.preview-123.starry.mycompany.com&lt;/span&gt;

&lt;span class="na"&gt;redis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;databases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;db1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;db2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;externalSecret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;secretStoreRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gsm-store&lt;/span&gt;
      &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterSecretStore&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  values.schema.json (guardrails excerpt)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://json-schema.org/draft-07/schema#"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"global"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"ttlMinutes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maximum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;240&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ttlMinutes"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"backend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"repository"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"tag"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"repository"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tag"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Argo CD Application (preview)
&lt;/h4&gt;

&lt;p&gt;Starry can make a request to the Kubernetes API to create an &lt;code&gt;Application&lt;/code&gt; resource that will manage a preview environment instance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Application&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;preview-123&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argocd&lt;/span&gt;
  &lt;span class="na"&gt;finalizers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;resources-finalizer.argocd.argoproj.io&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;starry&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repoURL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://github.com/myorg/environment-helm.git&lt;/span&gt;
    &lt;span class="na"&gt;targetRevision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;helm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;valueFiles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;values.yaml&lt;/span&gt;
  &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://kubernetes.default.svc&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;preview-123&lt;/span&gt;
  &lt;span class="na"&gt;syncPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;automated&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;prune&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;selfHeal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;syncOptions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;CreateNamespace=true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  ExternalSecret (GSM)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;external-secrets.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ExternalSecret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-secrets&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;preview-123&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;refreshInterval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1h&lt;/span&gt;
  &lt;span class="na"&gt;secretStoreRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gsm-store&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterSecretStore&lt;/span&gt;
  &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-secrets&lt;/span&gt;
  &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DB_PASSWORD&lt;/span&gt;
      &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;projects/123/secrets/db-password/versions/latest&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;REDIS_PASSWORD&lt;/span&gt;
      &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;projects/123/secrets/redis-password/versions/latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Security hardening
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Workload Identity on ServiceAccounts to access GSM; least‑privilege IAM on secrets.&lt;/li&gt;
&lt;li&gt;Namespace‑scoped Roles/RoleBindings; deny cluster‑wide privileges by default.&lt;/li&gt;
&lt;li&gt;No secrets in Git; inject via External Secrets only.&lt;/li&gt;
&lt;li&gt;Optional NetworkPolicies to confine pod traffic and egress.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost and quotas
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Defaults: &lt;code&gt;ttlMinutes: 30&lt;/code&gt;, HPA &lt;code&gt;minReplicas: 1&lt;/code&gt;, narrow requests/limits for density.&lt;/li&gt;
&lt;li&gt;Cap concurrent previews per team/namespace with ResourceQuota/LimitRange.&lt;/li&gt;
&lt;li&gt;Cleanup CronJob as a watchdog for stragglers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observability and troubleshooting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Standard labels (&lt;code&gt;app.kubernetes.io/*&lt;/code&gt;, &lt;code&gt;starry.env&lt;/code&gt;) and probes for all pods.&lt;/li&gt;
&lt;li&gt;Common issues and quick checks:

&lt;ul&gt;
&lt;li&gt;Cert Pending: DNS/host mismatch or quota; verify &lt;code&gt;ManagedCertificate&lt;/code&gt; status.&lt;/li&gt;
&lt;li&gt;404 after sync: NEG endpoints warming; check Service → Endpoints and pod readiness.&lt;/li&gt;
&lt;li&gt;Image pull back‑off: tag/registry typo; confirm Artifact Registry permissions.&lt;/li&gt;
&lt;li&gt;RBAC denied: verify namespace Role/RoleBinding and ServiceAccount name.&lt;/li&gt;
&lt;li&gt;Quota exceeded: review ResourceQuota and HPA limits.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;By shipping previews as a Helm chart reconciled by Argo CD, we get isolation by default, secure secret handling, fast spin‑up/tear‑down, and a fully auditable GitOps trail. CI stays simple—build and push images on PR branches—while the platform provides predictable URLs, ephemeral data stores, and least‑privilege access.&lt;/p&gt;

&lt;p&gt;In future articles, we’ll automate the PR lifecycle end‑to‑end: auto‑create on PR open, auto‑destroy on merge/close, add quotas and policies for cost control, layer optional E2E tests and data seeding for production‑like previews, and discuss Terraform for infrastructure setup.&lt;/p&gt;

</description>
      <category>idp</category>
      <category>cloud</category>
      <category>devex</category>
      <category>devops</category>
    </item>
    <item>
      <title>PART 2 Starry: An Internal Developer Platform (IDP) for Ephemeral Environments</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Mon, 29 Sep 2025 21:07:24 +0000</pubDate>
      <link>https://dev.to/flopi/part-2-starry-an-internal-developer-platform-idp-for-ephemeral-environments-192b</link>
      <guid>https://dev.to/flopi/part-2-starry-an-internal-developer-platform-idp-for-ephemeral-environments-192b</guid>
      <description>&lt;p&gt;In my &lt;a href="https://dev.to/flopi/starry-an-internal-developer-platform-idp-for-ephemeral-environments-part-1-26ac"&gt;last post&lt;/a&gt;, I introduced &lt;em&gt;Starry&lt;/em&gt; as a custom internal developer platform (IDP) that creates ephemeral environments for merge requests.  The platform uses well known, interoperable tools (Kubernetes, Helm, ArgoCD, Terraform) which makes the solution practical and adaptable to a range of needs. In this article, I will dive deeper into the architecture, a simplified CICD flow, and the relationship between &lt;em&gt;Starry&lt;/em&gt; and Argo CD. &lt;/p&gt;




&lt;p&gt;Let's simplify the system further by supposing that a user will manually create environments through the IDP. In a follow-up post I will introduce what is required to fully automate merge request to environment creation.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Push To Ephemeral Environment
&lt;/h2&gt;

&lt;p&gt;I am running CICD in GitHub for repos &lt;code&gt;sample-be&lt;/code&gt;, &lt;code&gt;sample-fe-1&lt;/code&gt;, and &lt;code&gt;sample-fe-2&lt;/code&gt;. When I have a merge request open and I push to the feature branch, an image is built and pushed to Artifact Registry.  This approach is super simplified. The CICD pipeline, triggered when there is a push to a branch that has an open pull request, in this case, only needs to build a container image, properly tag it, and push it to the image registry. The developer can create an environment with any tag pointing to &lt;code&gt;sample-be&lt;/code&gt;, &lt;code&gt;sample-fe-1&lt;/code&gt;, and &lt;code&gt;sample-fe-2&lt;/code&gt; images. &lt;/p&gt;

&lt;h3&gt;
  
  
  Ephemeral Environment Management Workflow
&lt;/h3&gt;

&lt;p&gt;A user can go to the &lt;em&gt;Starry&lt;/em&gt; Internal Developer Platform (IDP) and create an ephemeral environment by passing in an image tag for &lt;code&gt;sample-be&lt;/code&gt; backend. Optionally, a user may also choose to create an environment with &lt;code&gt;sample-fe-1&lt;/code&gt;, &lt;code&gt;sample-fe-2&lt;/code&gt; or both by passing in image tags for those. The backend and frontends become available at an ephemeral URL like &lt;code&gt;sample-env-backend.ephemeral.mycompany.com&lt;/code&gt; and &lt;code&gt;sample-env-fe-1.ephemeral.mycompany.com&lt;/code&gt;. The user can then manually run any tests by clicking around, checking endpoints, or the IDP can have some additional features like automated end-to-end tests that span the backend and frontend. Once the time-to-live (&lt;code&gt;ttl&lt;/code&gt;) of an environment is reached i.e. the environment is 30 minutes old, for example, the environment is destroyed or, of course, the user can trigger deletion through &lt;em&gt;Starry&lt;/em&gt; before that themselves.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1ahn37duomg97uuh212.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1ahn37duomg97uuh212.png" alt="CICD and Environment flow" width="800" height="894"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How is the &lt;em&gt;Starry&lt;/em&gt; application itself managed in Kubernetes? How does &lt;em&gt;Starry&lt;/em&gt; manage an environment? Understanding the Argo CD setup is actually key. Let's go into that next. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;starry-helm&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;To understand &lt;em&gt;Starry&lt;/em&gt;, it is important to understand how it lives in Kubernetes. To begin with, &lt;em&gt;Starry&lt;/em&gt; is a Python FAST API app that uses some frontend technologies and can be bundled up into an image and used just as any other app. We have a Dockerfile and the CICD pipeline builds an image and pushes the image registry when there is a push to the &lt;code&gt;main&lt;/code&gt; branch.&lt;/p&gt;

&lt;p&gt;Here is a diagram of the Kubernetes resources the &lt;em&gt;Starry&lt;/em&gt; Helm chart templates out:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F102lh95as5jcwryw7qqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F102lh95as5jcwryw7qqa.png" alt=" " width="800" height="298"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;Starry&lt;/em&gt; Helm chart provisions a simple ingress–service–deployment architecture suited for ephemeral use: a stateless web app runs as a &lt;code&gt;Deployment&lt;/code&gt; scaled by a &lt;code&gt;HorizontalPodAutoscaler&lt;/code&gt; (2–10) and configured via a &lt;code&gt;ConfigMap&lt;/code&gt;, using a &lt;code&gt;ServiceAccount&lt;/code&gt; annotated for (GCP) Workload Identity. It’s exposed internally by a &lt;code&gt;Service&lt;/code&gt; (ClusterIP with NEG) and externally by a GCE &lt;code&gt;Ingress&lt;/code&gt; on host &lt;code&gt;starry.mycompany.com&lt;/code&gt;, with TLS handled by a &lt;code&gt;ManagedCertificate&lt;/code&gt; so the Google L7 HTTP(S) Load Balancer terminates TLS and forwards HTTP to the &lt;code&gt;Service&lt;/code&gt;. Caching/coordination is provided by a single‑replica Redis &lt;code&gt;StatefulSet&lt;/code&gt; using &lt;code&gt;emptyDir&lt;/code&gt; storage (no persistence) and reached via the &lt;code&gt;starry-redis-master&lt;/code&gt; ClusterIP (and a headless service for stable DNS). A &lt;code&gt;CronJob&lt;/code&gt; runs cleanup tasks and talks to the app through the in‑cluster &lt;code&gt;Service&lt;/code&gt;. RBAC (&lt;code&gt;ClusterRole&lt;/code&gt;, &lt;code&gt;ClusterRoleBinding&lt;/code&gt;, &lt;code&gt;Role&lt;/code&gt;, &lt;code&gt;RoleBinding&lt;/code&gt;) is created to grant the app and Argo CD the required permissions. Networking flows &lt;code&gt;Ingress&lt;/code&gt; → &lt;code&gt;Service&lt;/code&gt; (NEG) → pods by label selector, pods reach Redis via service DNS, and outbound internet access uses the cluster’s standard egress/NAT; this favors fast spin‑up/tear‑down with minimal state and operational overhead.&lt;/p&gt;

&lt;p&gt;Note that the &lt;code&gt;CronJob&lt;/code&gt; cleanup tasks work by deleting environment resources for any environment that has exceeded the time-to-live setting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Argo CD and the &lt;code&gt;starry-helm&lt;/code&gt; Chart
&lt;/h3&gt;

&lt;p&gt;To let Argo CD manage &lt;em&gt;Starry&lt;/em&gt; and its environments safely, the Helm chart is designed to be fully declarative and follow GitOps best practices. It gives just enough permissions for Argo CD and the &lt;em&gt;Starry&lt;/em&gt; app to do their jobs—nothing more.&lt;/p&gt;

&lt;p&gt;The chart includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;RBAC setup: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;Role&lt;/code&gt; and &lt;code&gt;RoleBinding&lt;/code&gt; in the &lt;code&gt;argocd&lt;/code&gt; namespace lets the &lt;em&gt;Starry&lt;/em&gt; app create and update Argo CD Application resources—these represent the ephemeral environments.&lt;/li&gt;
&lt;li&gt;A read-only &lt;code&gt;ClusterRoleBinding&lt;/code&gt; lets the app watch core Kubernetes resources so it can monitor environment status.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Argo CD sync hints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sync annotations are added to guide Argo CD on how to apply changes safely:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Replace=true&lt;/code&gt; on the Redis &lt;code&gt;StatefulSet&lt;/code&gt; to handle immutable field changes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PruneLast=true&lt;/code&gt; on the &lt;code&gt;HorizontalPodAutoscaler&lt;/code&gt; so it’s cleaned up last.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ServerSideApply=true&lt;/code&gt; and related annotations on the cleanup &lt;code&gt;CronJob&lt;/code&gt; to preserve managed fields.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Stable naming and labeling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes labels and names follow predictable patterns (like &lt;code&gt;app.kubernetes.io/name&lt;/code&gt;) so Argo CD can track changes cleanly and avoid drift.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;argocd-apps&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;In the &lt;code&gt;argocd-apps&lt;/code&gt; repository, we declare the Argo CD setup. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd21yky33iy58btkwo485.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd21yky33iy58btkwo485.png" alt=" " width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How the repo is structured
&lt;/h3&gt;

&lt;p&gt;This repo uses Argo CD’s App of Apps pattern to keep environments simple and consistent. There’s one root &lt;code&gt;Application&lt;/code&gt; per environment (dev and prod). Each root points at a Kustomize overlay directory that lists the child Applications to deploy. This keeps environment differences limited to overlays, while the core app definitions live in a shared base.&lt;/p&gt;

&lt;p&gt;At the project level, an &lt;code&gt;AppProject&lt;/code&gt; named &lt;code&gt;starry&lt;/code&gt; defines what Git repos are allowed, where resources can be deployed, and who can operate them. Think of the &lt;code&gt;AppProject&lt;/code&gt; as the guardrails: it scopes access and keeps all Applications operating within an approved perimeter.&lt;/p&gt;

&lt;h3&gt;
  
  
  What gets deployed first (and why)
&lt;/h3&gt;

&lt;p&gt;Sync waves ensure the right order. Operators that everything else depends on come first: External Secrets and cert-manager are deployed early so secrets and certificates exist before workloads need them. External Secrets pulls a GitLab token from Google Secret Manager and materializes an Argo CD “repository” Secret; this gives Argo CD access to your Git repos without hardcoding credentials.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workload and automation
&lt;/h3&gt;

&lt;p&gt;The main workload, &lt;code&gt;starry&lt;/code&gt;, is deployed via its Helm chart. Image updates are automated by Argo CD Image Updater, which watches your container registry and, when a new allowed tag appears, writes the tag back to Git. That Git change drives Argo CD to reconcile the updated chart, keeping deployments declarative and auditable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling environment drift and platform quirks
&lt;/h3&gt;

&lt;p&gt;Some Kubernetes fields are mutated by the platform (e.g., GKE Autopilot) or are immutable in StatefulSets. &lt;code&gt;ignoreDifferences&lt;/code&gt; rules are applied where needed so Argo CD focuses on meaningful drift and doesn’t fight expected, safe mutations. Overall, the combination of App of Apps, overlays, operators-first ordering, External Secrets, and Image Updater results in a clean, environment-aware, and fully GitOps-driven deployment flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starry-Managed Environments via Argo CD
&lt;/h2&gt;

&lt;p&gt;When a new ephemeral environment is requested (for example, from a pull/merge request), the &lt;em&gt;starry&lt;/em&gt; service programmatically creates an Argo CD &lt;code&gt;Application&lt;/code&gt; custom resource by talking directly to the Kubernetes API. Running in‑cluster with a dedicated &lt;code&gt;ServiceAccount&lt;/code&gt;, it authenticates using the standard Kubernetes client flow and has RBAC to manage &lt;code&gt;applications.argoproj.io&lt;/code&gt; resources in the &lt;code&gt;argocd&lt;/code&gt; namespace. The service calculates a unique preview name (e.g., &lt;code&gt;starry-pr-123&lt;/code&gt;), target namespace, and value overrides (hostnames, image tag, replica counts), then submits a new &lt;code&gt;Application&lt;/code&gt; object pointing to the environment Helm chart and Git revision for that preview. The &lt;code&gt;spec.destination.namespace&lt;/code&gt; is set to the preview namespace, and the Helm/Kustomize parameters embed any per‑environment differences.&lt;/p&gt;

&lt;p&gt;Once the &lt;code&gt;Application&lt;/code&gt; object is created, the Argo CD controller does the heavy lifting. It watches the new &lt;code&gt;Application&lt;/code&gt;, pulls the referenced Git content, renders the manifests (via Helm/Kustomize), and applies them to the preview namespace. Sync waves ensure platform prerequisites (like secrets or certs, if included) are present before the workload rolls out. If needed, the starry service can pre‑create the target namespace or supporting resources, but generally it relies on Argo CD with &lt;code&gt;CreateNamespace=true&lt;/code&gt; to keep the flow simple and declarative.&lt;/p&gt;

&lt;p&gt;The lifecycle is symmetrical. To tear down a preview, the starry service deletes the corresponding &lt;code&gt;Application&lt;/code&gt; object through the Kubernetes API. Because Applications are created with Argo CD finalizers and pruning enabled, Argo CD prunes everything it deployed for that preview and the environment disappears cleanly. To keep operations safe and repeatable, the service uses idempotent “apply”-style calls (or server‑side apply), sets labels/annotations that encode the preview context, and ensures the &lt;code&gt;ServiceAccount&lt;/code&gt; only holds the minimal RBAC needed to create, update, and delete Application resources and the preview namespace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Starry&lt;/em&gt; keeps ephemeral environments simple, fast, and consistent by leaning on proven building blocks: Kubernetes for runtime, Helm for packaging, and Argo CD for declarative GitOps. CI’s job is minimal—build and push images when a branch with an open PR changes. Users then create environments in the IDP by choosing which images to run (backend and optional frontends), and &lt;em&gt;Starry&lt;/em&gt; provisions a short‑lived, isolated stack at predictable URLs. Because the chart is stateless by default and scoped to a unique namespace, environments spin up quickly and tear down cleanly when TTL is reached or the user deletes them. We will go more into the ephemeral environment Helm chart in another article.&lt;/p&gt;

&lt;p&gt;Under the hood, Argo CD provides the control loop while the chart and RBAC give just enough permission for &lt;em&gt;Starry&lt;/em&gt; to create/update Application CRs safely. Sync waves ensure platform dependencies (External Secrets, cert‑manager) land first, and optional image automation (Argo CD Image Updater) keeps updates auditable by writing tags back to Git. Drift rules suppress harmless platform mutations so reconciles stay focused on meaningful changes.&lt;/p&gt;

&lt;p&gt;This approach is practical and adaptable: each piece is interchangeable, the flow is fully declarative, and environments are reproducible across dev and prod. Today, creation is explicit and user‑driven for clarity; in a follow‑up, we’ll layer on automation to go from “merge request opened” to “environment ready” without manual steps. We will also go over what an ephemeral environment Helm chart could look like.&lt;/p&gt;

</description>
      <category>idp</category>
      <category>devex</category>
      <category>devops</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Starry: An Internal Developer Platform (IDP) for Ephemeral Environments Part 1</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Mon, 15 Sep 2025 20:22:07 +0000</pubDate>
      <link>https://dev.to/flopi/starry-an-internal-developer-platform-idp-for-ephemeral-environments-part-1-26ac</link>
      <guid>https://dev.to/flopi/starry-an-internal-developer-platform-idp-for-ephemeral-environments-part-1-26ac</guid>
      <description>&lt;p&gt;Suppose that your company has a large backend application that supplies information to several frontend applications. Right now, when a developer makes changes to one of the applications, they must merge their changes to a &lt;code&gt;develop&lt;/code&gt; branch to deploy the changes to the development environment shared by all developers. This setup is fraught with issues. A developer could deploy a change that breaks the development environment thereby getting in the way of other developers' testing until the change is fixed or removed from the branch. Developers must test their changes along with other changes at the same time so it could be hard to isolate where errors are coming from. How many issues can you think of when testing a set of apps in limited environments like only development, staging, and prod?&lt;/p&gt;

&lt;p&gt;By creating a short-lived ephemeral environment based on a developer's feature branch, changes can be isolated for better testing. That's where an internal developer platform (IDP) comes in. What is an IDP?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An Internal Developer Platform (IDP) is built by a platform team to build golden paths and enable developer self-service. An IDP consists of many different techs and tools, glued together in a way that lowers cognitive load on developers without abstracting away context and underlying technologies. Following best practices, platform teams treat their platform as a product and build it based on user research, maintaining and continuously improving it. &lt;a href="https://internaldeveloperplatform.org/what-is-an-internal-developer-platform/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;An IDP makes it easy for a developer to perform some task that combines actions that interact with various systems. In our case, I will walk you through an IDP design that enables easy creation of ephemeral environments. &lt;/p&gt;

&lt;p&gt;So, let's go back to the setup I presented in the beginning. Your company has a large backend application that supplies information to several frontend applications. Users interact with the frontends through their browsers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsyl6tkqar2o6bphl78zm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsyl6tkqar2o6bphl78zm.png" alt=" " width="800" height="238"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We could design a system, we'll call it &lt;em&gt;starry&lt;/em&gt;, that creates ephemeral environments such that for an environment called &lt;code&gt;test-00&lt;/code&gt; for example a developer can access the apps at &lt;code&gt;test-00.{app-name}.starry.mycompany.com&lt;/code&gt;. The &lt;code&gt;test-00&lt;/code&gt; environment could be testing code changes made to the &lt;code&gt;sample-be&lt;/code&gt; repo under the &lt;code&gt;feat/myfeature-00&lt;/code&gt; branch. To see how those changes affect the frontends, instances of the frontends are also spun up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzjll9u55ucq8bb1oz7w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzjll9u55ucq8bb1oz7w.png" alt=" " width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tooling and Artifacts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Kubernetes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt; makes spinning up applications and managing resources simpler and there is a rich ecosystem of tools and resources. Let's use Kubernetes to manage our system. Suppose we are running on GCP, our GKE cluster will interact with Artifact Registry for backend and frontend container images and Secret Manager for secrets used by &lt;em&gt;starry&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bmbuir88edxklz1ie31.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bmbuir88edxklz1ie31.png" alt=" " width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This has been simplified for the sake of this article. For example, VPCs and networking are not included here.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Helm
&lt;/h3&gt;

&lt;p&gt;To encapsulate application definitions, we will use &lt;strong&gt;Helm charts.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;1) For environments, we can create a custom Helm chart that manages backend and frontend deployments, ingress, databases, and cache. The Helm chart will also manage the services needed for the frontend apps to connect to the backend as well as secrets and service accounts. The chart creates a unique namespace for each environment. The environment Helm chart defines a single instance of an environment.&lt;/p&gt;

&lt;p&gt;Any environment, whether that be development, staging, production, or an ephemeral one, has two frontend apps, one backend app, two database instances and one cache instance. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fceqo8bads1moizhbp14d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fceqo8bads1moizhbp14d.png" alt=" " width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) We will also need a Helm chart for our &lt;em&gt;starry&lt;/em&gt; application which will provide a frontend for our users to create environments, view details, and delete environments after testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  ArgoCD
&lt;/h3&gt;

&lt;p&gt;The &lt;em&gt;starry&lt;/em&gt; application as well as the environment applications have Kubernetes resources that have to be managed. When you merge changes to &lt;em&gt;starry&lt;/em&gt;'s &lt;code&gt;main&lt;/code&gt; branch, you want the changes to be reflected in production. When you delete an ephemeral environment all associated resources like config maps need to be deleted, not just deployments, and we want to easily track the status of individual resources. Even for platform developers, we want an interface to our applications that is easy to understand complete with details about individual resource status. We may want to expose some of these details through a custom, well thought out UI to developers as well for their own troubleshooting. A system like &lt;strong&gt;ArgoCD&lt;/strong&gt; makes all of this easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraform
&lt;/h3&gt;

&lt;p&gt;Finally, all of this infrastructure needs to be bootstrapped and we can use &lt;em&gt;Infrastructure-as-Code (IaC)&lt;/em&gt; tool &lt;strong&gt;Terraform&lt;/strong&gt; for that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstxqxuqz6xpk6wb64oui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstxqxuqz6xpk6wb64oui.png" alt=" " width="800" height="835"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Artifacts
&lt;/h3&gt;

&lt;p&gt;We will end up with backend, frontend, Helm chart, ArgoCD, and Terraform &lt;strong&gt;repositories&lt;/strong&gt; in GitHub which will define our system:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhs2g7p0zbayn3jyfd6pu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhs2g7p0zbayn3jyfd6pu.png" alt=" " width="800" height="952"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There will be 4 container image repositories: 1) sample-be 2) sample-fe-1 3) sample-fe-2 4) starry&lt;/p&gt;

&lt;p&gt;There we have our tools and artifacts.&lt;/p&gt;




&lt;p&gt;After choosing tools based on our needs, we are ready to think through more specific patterns that we will use, more granular interactions between services, as well as the repository structures that will shape our implementation. &lt;em&gt;Stay tuned.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5l4ylqbbefmk4tb773x2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5l4ylqbbefmk4tb773x2.png" alt=" " width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devex</category>
      <category>idp</category>
      <category>devops</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Recent Platform Engineer Interview Questions</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Sun, 31 Aug 2025 02:50:03 +0000</pubDate>
      <link>https://dev.to/flopi/recent-platform-engineer-interview-questions-g3o</link>
      <guid>https://dev.to/flopi/recent-platform-engineer-interview-questions-g3o</guid>
      <description>&lt;p&gt;I have been interviewing for roles for several months now (ask me why if you want to!) and thought I would write down some of the questions that came up.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A question I got by a recruiter recently is, what is the difference between a Dockerfile and a container? Wonder what they were filtering for.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What do I know about networking? VPC, BGP, firewall, subnets, IPs, cross-network communication&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Another company gave me a take home assignment with instructions to showcase my proficiency in Terraform and Google Cloud Platform (GCP) by developing a Terraform module for provisioning a GCP environment, including a Virtual Private Cloud (VPC) and a Google Kubernetes Engine (GKE) cluster, along with all necessary prerequisites which were detailed in the document. I used the setup to test some CPanel later on which broke the build but everything worked for the follow up interview in which we went over my submission. Check &lt;a href="https://github.com/florenciacomuzzi/k8s-environment-terraform" rel="noopener noreferrer"&gt;it&lt;/a&gt; out.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On yet another interview, I was asked to write code so that engineers can queue up functions. How would I ensure that functions don't run over some timeout? Don't use &lt;code&gt;inspect&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write a Dockerfile for a known technology.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Review this Terraform module. What can you tell me about it? What is it doing? What about naming conventions? Data sources? Backend?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How does cherry-picking work in Git? Can you cherry pick a branch?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Show me how you would Google the answer to this question...&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Explain Kubernetes fundamental resources. What about Helm?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>career</category>
      <category>devex</category>
    </item>
    <item>
      <title>Intro to Data Ingestion and Data Lakes</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Fri, 09 Aug 2019 20:30:41 +0000</pubDate>
      <link>https://dev.to/flopi/intro-to-data-ingestion-and-data-lakes-3fdc</link>
      <guid>https://dev.to/flopi/intro-to-data-ingestion-and-data-lakes-3fdc</guid>
      <description>&lt;p&gt;I landed in the data engineering space by a bit of luck and a bit of blind faith. As graduation was approaching, I had landed a job through a new grad program. When asked about areas I'd be interested in working with, I mentioned "Big Data" because a friend had told me her mentor advised her to pursue it. It's the hot thing right now, she said. Now, years later, I'm glad I chose this route because data engineering is a superset of software engineering with a focus on performance that can be super fun. &lt;/p&gt;

&lt;p&gt;In this first post of this series, I'll go through what a data lake is and how it relates to data ingestion. I start with data ingestion because it gives a look into the work that is commonly done on data engineering teams and trends in the field.&lt;/p&gt;

&lt;p&gt;In the rest of the series, I look to give you a view into the concerns that mire my work life and go through how to plan, design, and build data pipelines. I hope you'll get a solid mix of business and engineering perspective. I haven't seen much writing about data engineering that is accessible to many folks so I also hope to provide some of that here and, of course, I am open to feedback!&lt;/p&gt;

&lt;p&gt;This series is motivated by and dedicated to my greatest mentors. I am grateful to them for walking alongside this long road with me.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is a data ingestion pipeline?
&lt;/h1&gt;

&lt;p&gt;With any data pipeline, the objective is getting data from A to B and sometimes even C, D, E, etc. &lt;em&gt;Ingestion&lt;/em&gt; is a term specific to &lt;strong&gt;data lakes&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a data lake?
&lt;/h2&gt;

&lt;p&gt;Well, a data &lt;em&gt;warehouse&lt;/em&gt; is usually a small set of curated datasets for specific purposes. Data in data warehouses typically lives for a short time, e.g. 30 days, and is used very often. A data warehouse may fit into a traditional database.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To have good query performance, the working set (the set of records your query touches) should typically be able to fit into memory.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In comparison, what we think of as "Big Data" (think any amount of data that doesn't all fit into memory at once), would need to be processed in some distributed way on several machines. We have indeed developed some frameworks to do this kind of distributed processing like &lt;code&gt;MapReduce&lt;/code&gt; and &lt;code&gt;Spark&lt;/code&gt;. Because "Big Data" by definition should not fit into the memory of a single machine, it should not reside in a database. It should live in another location... So goes the idea of a data lake. &lt;/p&gt;

&lt;p&gt;Data Lakes store massive amounts of data, typically historical data going back really far in time. Whereas we would call data in a data warehouse &lt;em&gt;hot&lt;/em&gt; because it is the most relevant and therefore the most used/queried, data in a data lake may not be so hot. This &lt;em&gt;colder&lt;/em&gt; data can perhaps be stored on storage volumes that have slower retrieval times. Slower hardware (usually) = cheaper hardware...&lt;/p&gt;

&lt;p&gt;For now, &lt;strong&gt;think of a data lake as a place where you store large amounts of data.&lt;/strong&gt; Yes, there is more to the concept of a data lake and you can read about it in The Enterprise Data Lake by Alex Gorelik.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ok, so what's ingestion refer to?
&lt;/h2&gt;

&lt;p&gt;Most data lakes are organized into several zones. For now, we will think of 3 zones: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;landing/dump zone&lt;/li&gt;
&lt;li&gt;raw zone&lt;/li&gt;
&lt;li&gt;transformed zone. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A common pattern is to dump data in the dump zone and have &lt;em&gt;something&lt;/em&gt; &lt;strong&gt;ingest&lt;/strong&gt; the data into the raw zone. Data in the raw zone should remain as close to its original form as possible. Datasets that you have changed should go in the transformed zone.&lt;/p&gt;

&lt;p&gt;So,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ingestion refers to the process of bringing in data from some location into the raw zone of the data lake &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;where it can be queried, using technology like Presto and Hive, for understanding so that other datasets can be built off of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ok, now show me how to do the thing...
&lt;/h2&gt;

&lt;p&gt;In the next post, I'll go through how to think through the design of a data ingestion pipeline! &lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>datalake</category>
      <category>dataingestion</category>
    </item>
    <item>
      <title>Live notetaking as I learn about distributed computing</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Sun, 21 Apr 2019 20:28:46 +0000</pubDate>
      <link>https://dev.to/flopi/live-notetaking-as-i-learn-about-distributed-computing-3j4b</link>
      <guid>https://dev.to/flopi/live-notetaking-as-i-learn-about-distributed-computing-3j4b</guid>
      <description>&lt;p&gt;In my previous &lt;a href="https://dev.to/floinnyc_/live-notetaking-as-i-learn-spark-odj"&gt;post&lt;/a&gt;, Live notetaking as I learn Spark, I learned some of the basics of Spark:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Spark is a distributed programming model in which the user specifies transformations. Multiple transformations build up a directed acyclic graph of instructions. An action begins the process of executing that graph of instructions, as a single job, by breaking it down into stages and tasks to execute across the cluster. The logical structures that we manipulate with transformations and actions are DataFrames and Datasets. To create a new DataFrame or Dataset, you call a transformation. To start computation or convert to native language types, you call an action." Spark: The Definitive Guide&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I liked the live notaking format because it pushed me to write down what I was learning and kept me accountable. I have so many drafted posts that I haven't published because I haven't finished my thoughts. Live notetaking takes the pressure off from constantly thinking about a possible pitch as I'm learning. I am using this live learning pattern again here.&lt;/p&gt;

&lt;p&gt;I am now up to chapter 4 of Spark: The Definitive Guide. In the time between my last live notaking post and this one I have done some reading on my own about the anatomy of database systems and &lt;code&gt;git&lt;/code&gt; as a distributed system. Seeing Spark described as a &lt;em&gt;distributed programming model&lt;/em&gt; got my attention since I haven't seen Spark described in this way before so I definitely want to be clear on what the term means. Julia Evans describes the importance of identifying what you don't understand in her &lt;a href="https://jvns.ca/blog/2018/09/01/learning-skills-you-can-practice/" rel="noopener noreferrer"&gt;post&lt;/a&gt; on how to teach yourself hard things.&lt;/p&gt;

&lt;p&gt;In this post, I will put together an outline of concepts related to distributed computing programming models. I have an additional goal as well: recognize patterns I use to learn a new concept. Is this the most effective way to learn? Can I optimize these patterns? What motivates me to use these patterns?&lt;/p&gt;




&lt;p&gt;First thing I did was enter "distributed programming model" into Google search engine. I chose the third result because it looks like it could be the syllabus to a class on this topic. I like syllabi. They usually contain readings and homework assignments I can complete for a topic. I also typically compare the textbooks that appear in different syllabi. If a textbook is used in several courses, I look into it to see if it is &lt;em&gt;the&lt;/em&gt; book on a topic. &lt;/p&gt;

&lt;p&gt;The &lt;a href="https://heather.miller.am/teaching/cs7680/" rel="noopener noreferrer"&gt;syllabus&lt;/a&gt; is for a class titled "SPECIAL TOPICS IN COMPUTER SYSTEMS:&lt;br&gt;
Programming Models for Distributed Computing" at Northeastern University. I'd definitely take this course if I could: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Topics we will cover include, promises, remote procedure calls, message-passing, conflict-free replicated datatypes, large-scale batch computation à la MapReduce/Hadoop and Spark, streaming computation, and where eventual consistency meets language design, amongst others."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;I got some feedback from a colleague to focus on the Spark data structures next&lt;/strong&gt; and this course covers conflict-free replicated datatypes. I don't know what that means yet but I want to know. The class works on authoring a literature review on the landscape of programming models for distributed computation during the semester. So cool.&lt;/p&gt;

&lt;h2&gt;
  
  
  RPC
&lt;/h2&gt;

&lt;p&gt;I have already read a bit on RPC in &lt;a href="https://www.amazon.com/Distributed-Systems-Maarten-van-Steen/dp/1543057381" rel="noopener noreferrer"&gt;Distributed Systems&lt;/a&gt; by Tanenbaum. Tanenbaum describes the Remote Procedure Call proposal made by Birrell and Nelson in Implementing Remote Procedure Calls (1984):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"When a process on machine A calls a procedure on machine B, the calling process on A is suspended, and execution of the called procedure takes place on B. Information can be transported from the caller to the callee in the parameters and can come back in the procedure result. No message passing at all is visible to the programmer." p. 173&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Implementing Remote Procedure Calls (1984)&lt;/li&gt;
&lt;li&gt;A Distributed Object Model for the Java System (1996)&lt;/li&gt;
&lt;li&gt;A Note on Distributed Computing (1994)&lt;/li&gt;
&lt;li&gt;A Critique of the Remote Procedure Call Paradigm (1988)&lt;/li&gt;
&lt;li&gt;Convenience Over Correctness (2008)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Futures, promises
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Multilisp: A language for concurrent symbolic computation (1985)&lt;/li&gt;
&lt;li&gt;Promises: linguistic support for efficient asynchronous procedure calls in distributed systems (1988)&lt;/li&gt;
&lt;li&gt;Oz dataflow concurrency. Selected sections from the textbook Concepts, Techniques, and Models of Computer Programming. 
Sections to read: 1.11: Dataflow, 2.2: The single-assignment store, 4.93-4.95: Dataflow variables as communication channels ...etc.&lt;/li&gt;
&lt;li&gt;The F# asynchronous programming model (2011)&lt;/li&gt;
&lt;li&gt;Your Server as a Function (2013)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Message passing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Concurrent Object-Oriented Programming (1990)&lt;/li&gt;
&lt;li&gt;Concurrency among strangers (2005)&lt;/li&gt;
&lt;li&gt;Scala actors: Unifying thread-based and event-based programming (2009)&lt;/li&gt;
&lt;li&gt;Erlang (2010)&lt;/li&gt;
&lt;li&gt;Orleans: cloud computing for everyone (2011)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Distributed Programming Languages
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Distributed Programming in Argus (1988)&lt;/li&gt;
&lt;li&gt;Distribution and Abstract Types in Emerald (1987)&lt;/li&gt;
&lt;li&gt;The Linda alternative to message-passing systems (1994)&lt;/li&gt;
&lt;li&gt;Orca: A Language For Parallel Programming of Distributed Systems (1992)&lt;/li&gt;
&lt;li&gt;Ambient-Oriented Programming in AmbientTalk (2006)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Consistency, CRDTs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services (2002)&lt;/li&gt;
&lt;li&gt;Conflict-free Replicated Data Types (2011)&lt;/li&gt;
&lt;li&gt;A comprehensive study of Convergent and Commutative Replicated Data Types (2011)&lt;/li&gt;
&lt;li&gt;CAP Twelve Years Later: How the "Rules" Have Changed (2012)&lt;/li&gt;
&lt;li&gt;Cloud Types for Eventual Consistency (2012)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Languages &amp;amp; Consistency
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Consistency Analysis in Bloom: a CALM and Collected Approach (2011)&lt;/li&gt;
&lt;li&gt;Logic and Lattices for Distributed Programming (2012)&lt;/li&gt;
&lt;li&gt;Consistency Without Borders (2013)&lt;/li&gt;
&lt;li&gt;Lasp: A language for distributed, coordination-free programming (2015)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Languages Extended for Distribution
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Towards Haskell in the Cloud (2011)&lt;/li&gt;
&lt;li&gt;Alice Through the Looking Glass (2004)&lt;/li&gt;
&lt;li&gt;Concurrency Oriented Programming in Termite Scheme (2006)&lt;/li&gt;
&lt;li&gt;Type-safe distributed programming with ML5 (2007)&lt;/li&gt;
&lt;li&gt;MBrace

&lt;ul&gt;
&lt;li&gt;MBrace: cloud computing with monads (2013)&lt;/li&gt;
&lt;li&gt;MBrace Programming Model (Tutorial)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Large-scale parallel processing (batch)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;MapReduce: simplified data processing on large clusters (2008)&lt;/li&gt;
&lt;li&gt;DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language (2008)&lt;/li&gt;
&lt;li&gt;Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing (2012)&lt;/li&gt;
&lt;li&gt;Spark SQL: Relational Data Processing in Spark (2015)&lt;/li&gt;
&lt;li&gt;FlumeJava: Easy, Efficient Data-Parallel Pipelines (2010)&lt;/li&gt;
&lt;li&gt;GraphX: A Resilient Distributed Graph System on Spark (2013)&lt;/li&gt;
&lt;li&gt;Dremel: Interactive Analysis of Web-Scale Datasets (2010)&lt;/li&gt;
&lt;li&gt;Pig latin: a not-so-foreign language for data processing (2008)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Large-scale parallel processing (streaming)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;TelegraphCQ: continuous dataflow processing (2003)&lt;/li&gt;
&lt;li&gt;Naiad: A Timely Dataflow System (2013)&lt;/li&gt;
&lt;li&gt;Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters (2012)&lt;/li&gt;
&lt;li&gt;The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing (2015)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From these &lt;a href="https://www.eecs.harvard.edu/~michaelm/postscripts/ReadPaper.pdf" rel="noopener noreferrer"&gt;tips&lt;/a&gt; on how to read a research paper and some common sense, I know I won't be able to read all these papers quickly nor am I interested in doing that right now. &lt;/p&gt;




&lt;p&gt;From &lt;a href="https://blog.ably.io/what-is-a-distributed-systems-engineer-f6c1d921acf8" rel="noopener noreferrer"&gt;https://blog.ably.io/what-is-a-distributed-systems-engineer-f6c1d921acf8&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;understanding of a hash ring: Cassandra, Riak, Dynamo, Couchbase Server&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;protocols for keeping track of changes in cluster topology, in response to network partitions, failures, and scaling events: &lt;br&gt;
 "Various protocols exist to ensure that this can happen, with varying levels of consistency and complexity. This needs to be dynamic and real time because nodes come and go in elastic systems, failures need to be detected quickly, and load and state needs to be rebalanced in real time." &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gossip protocol&lt;/li&gt;
&lt;li&gt;Paxos protocol&lt;/li&gt;
&lt;li&gt;Raft consensus algorithm&lt;/li&gt;
&lt;li&gt;Popular consensus backed systems like &lt;code&gt;etcd&lt;/code&gt; and Zookeeper, and gossip backed systems like Serf.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Eventually consistent data types and read/write consistencies&lt;br&gt;
locks are impractical to implement and impossible to scale. As a result, trade-offs need to be made between the consistency and availability of data. In many cases, for example, availability can be prioritised, and consistency guarantees weakened to eventual consistency, with data structures such as CRDTs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;familiar with CRDT or Operational Transform, the concepts of variable consistencies for queries or writes to data in a distributed data store

&lt;ul&gt;
&lt;li&gt;Operational Transform — implemented by Google originally in their Wave product and now in Google Docs. It has uses in collaboration apps, but OTs are complex and not widely implemented.&lt;/li&gt;
&lt;li&gt;Conflict-free Replicated Data Types or CRDT provides an eventually consistent result so long as the data types available are used. Used by Riak distributed database and presence in Phoenix.&lt;/li&gt;
&lt;li&gt;Consistency levels for both read and writes in distributed databases like Cassandra&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;At each layer, be confident in your understanding and ability to debug problems at a packet or frame level:&lt;br&gt;
WebSockets example&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DNS protocol and UDP for address lookup.&lt;/li&gt;
&lt;li&gt;File descriptors (on *nix) and buffers used for connections, NAT tables, conntrack tables etc.&lt;/li&gt;
&lt;li&gt;IP to route packets between hosts&lt;/li&gt;
&lt;li&gt;TCP to establish a connection&lt;/li&gt;
&lt;li&gt;TLS handshakes, termination and certificate authentication&lt;/li&gt;
&lt;li&gt;HTTP/1.1 or more recently 2.0 used extensively by gRPC.&lt;/li&gt;
&lt;li&gt;WebSocket upgrades over HTTP.

&lt;ul&gt;
&lt;li&gt;higher level protocols such as HTTP, WebSockets, gRPC and TCP sockets and the full stack of protocols they rely on all the way down to the OS itself&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;also be a solid systems engineer&lt;/strong&gt;: have the fundamentals such as programming languages, general design patterns, version control, infrastructure management, continuous integration and deployment systems already in place.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

</description>
      <category>livelearning</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Reflect as You Work: My Python Project Workflow</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Sat, 13 Apr 2019 02:38:56 +0000</pubDate>
      <link>https://dev.to/flopi/reflect-as-you-work-my-python-project-workflow-49he</link>
      <guid>https://dev.to/flopi/reflect-as-you-work-my-python-project-workflow-49he</guid>
      <description>&lt;p&gt;One of the apprenticeship patterns in &lt;a href="http://shop.oreilly.com/product/9780596518387.do" rel="noopener noreferrer"&gt;Apprenticeship Patterns&lt;/a&gt; is &lt;strong&gt;Reflect as You Work&lt;/strong&gt;. This pattern is about introspecting on how you work regularly. Doing this often allows developers to notice how their practices have changed and even how they haven't. This isn't just about observing yourself. As the book says, &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Unobtrusively watch the journeymen and master craftsmen on your team. Reflect on the practices, processes, and techniques they use to see if they can be connected to other parts of your experiences. Even as an apprentice, you can discover novel ideas simply by closely observing more experienced craftsmen as they go about their work." p. 36&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I have been thinking about my own practices and those of others around me. The workflow I follow when I create new Python projects particularly stands out because   I  learned it from sitting with another engineer. I noted what they did and asked questions. Then, I went back to my desk and tried it myself while taking more notes. I followed the resulting workflow so many times that the steps now flow from my fingertips with ease. &lt;/p&gt;

&lt;p&gt;I think there could be ways to optimize even this workflow but first I am going to note it down here for the potential future reader and for future me to look back on!&lt;/p&gt;

&lt;p&gt;P.S. Many of the extra details I included here I learned from my colleagues. A big thank you to them for sharing what they know with me 💓&lt;/p&gt;



&lt;h1&gt;
  
  
  Prerequisites
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pyenv&lt;/code&gt; is installed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  New Python Project Checklist
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Install a specific Python version.&lt;/li&gt;
&lt;li&gt;Create a project directory. Go to the directory.&lt;/li&gt;
&lt;li&gt;Set the Python version for the project.&lt;/li&gt;
&lt;li&gt;Create a virtual environment.&lt;/li&gt;
&lt;li&gt;Activate the virtual environment.&lt;/li&gt;
&lt;li&gt;Install dependencies.&lt;/li&gt;
&lt;li&gt;Save packages.&lt;/li&gt;
&lt;li&gt;Run the code.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note: this workflow should work on MacOS. &lt;/p&gt;



&lt;h2&gt;
  
  
  PREREQUISITE: Install &lt;code&gt;pyenv&lt;/code&gt;.
&lt;/h2&gt;

&lt;p&gt;Mac OS X comes with Python 2.7 out of the box. If you haven't fiddled with anything, you should be able to open up a Terminal window and type in &lt;code&gt;python --version&lt;/code&gt; and get some 2.7 variant. You probably don't want to use the version of Python that comes shipped with your OS (Operating System) though. There are many reasons for this like that the version may be out of date. I have even come across an important library that was missing. &lt;/p&gt;

&lt;p&gt;Not only do you want to avoid using the version of Python that is shipped with your machine, in your work you will need to have several different versions of Python installed at once. For example, perhaps one codebase is using an older version of Python due to some library dependency. Upgrading the version of Python you are using for that project could require some refactoring of that project that you haven't prioritized. At the same time, you may be using a newer Python version on other projects because you want to take advantage of shiny new features.&lt;/p&gt;

&lt;p&gt;Having several Python versions installed on your machine is a realistic scenario for a Python developer. Managing these versions effectively is important.&lt;/p&gt;

&lt;p&gt;There are instructions on how to install &lt;code&gt;pyenv&lt;/code&gt; &lt;a href="https://github.com/pyenv/pyenv" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you run a command like &lt;code&gt;python&lt;/code&gt; or &lt;code&gt;pip&lt;/code&gt;, your operating system searches through a list of directories to find an executable file with that name. This list of directories lives in an environment variable called &lt;code&gt;PATH&lt;/code&gt;, with each directory in the list separated by a colon...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;pyenv&lt;/code&gt; works by inserting a directory of &lt;em&gt;shims&lt;/em&gt; at the front of your &lt;code&gt;PATH&lt;/code&gt; so that when you call &lt;code&gt;python&lt;/code&gt; or &lt;code&gt;pip&lt;/code&gt; these shims are the first thing your OS finds. The commands you enter are, then, intercepted and sent to &lt;code&gt;pyenv&lt;/code&gt; which decides which version of Python to use for your command based on some rules. &lt;/p&gt;

&lt;p&gt;Follow the instructions to install &lt;code&gt;pyenv&lt;/code&gt;. Make sure you follow the rest of the post-installation steps under Basic GitHub Checkout even if you use Homebrew to install. When I was installing I found that I had a &lt;code&gt;.bashrc&lt;/code&gt; AND &lt;code&gt;.bash_profile&lt;/code&gt;. &lt;a href="http://www.joshstaiger.org/archives/2005/07/bash_profile_vs.html" rel="noopener noreferrer"&gt;Here&lt;/a&gt; is an article on the difference between them and when either file is used. If after following the instructions, you type in &lt;code&gt;pyenv&lt;/code&gt; and do not get something like the following, go back and make sure you set the other bash file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;flo at MacBook-Pro &lt;span class="k"&gt;in&lt;/span&gt; ~ &lt;span class="nv"&gt;$ &lt;/span&gt;pyenv
pyenv 1.2.8
Usage: pyenv &amp;lt;&lt;span class="nb"&gt;command&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&amp;lt;args&amp;gt;]

Some useful pyenv commands are:
   commands    List all available pyenv commands
   &lt;span class="nb"&gt;local       &lt;/span&gt;Set or show the &lt;span class="nb"&gt;local &lt;/span&gt;application-specific Python version
   global      Set or show the global Python version
   shell       Set or show the shell-specific Python version
   &lt;span class="nb"&gt;install     &lt;/span&gt;Install a Python version using python-build
   uninstall   Uninstall a specific Python version
   rehash      Rehash pyenv shims &lt;span class="o"&gt;(&lt;/span&gt;run this after installing executables&lt;span class="o"&gt;)&lt;/span&gt;
   version     Show the current Python version and its origin
   versions    List all Python versions available to pyenv
   which       Display the full path to an executable
   whence      List all Python versions that contain the given executable

See &lt;span class="sb"&gt;`&lt;/span&gt;pyenv &lt;span class="nb"&gt;help&lt;/span&gt; &amp;lt;&lt;span class="nb"&gt;command&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;' for information on a specific command.
For full documentation, see: https://github.com/pyenv/pyenv#readme
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 1: Install a specific Python version.
&lt;/h2&gt;

&lt;p&gt;Suppose I'm creating a script that will open the latest xkcd comic in a web browser. I'm going to run it with Python 3.7.0.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in ~ $ pyenv install 3.7.0
python-build: use openssl from homebrew
python-build: use readline from homebrew
Downloading Python-3.7.0.tar.xz...
-&amp;gt; https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tar.xz
Installing Python-3.7.0...
python-build: use readline from homebrew
Installed Python-3.7.0 to /Users/flo/.pyenv/versions/3.7.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Create a project directory. Go to the directory.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in ~ $ mkdir Documents/comic-creator
flo at MacBook-Pro in ~ $ cd Documents/comic-creator/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Set the Python version for the project.
&lt;/h2&gt;

&lt;p&gt;First, look at the files in the folder, even the hidden files (&lt;code&gt;-la&lt;/code&gt; will show hidden files).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in .../comic-creator $ ls -la
total 0
drwxr-xr-x   2 flo  staff    64 Apr 12 21:12 .
drwx------+ 33 flo  staff  1056 Apr 12 21:12 ..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, set the Python version for the project. Now you can see a hidden file (hidden files start with a dot). When you look inside &lt;code&gt;.python-version&lt;/code&gt;, you can see the version we set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in .../comic-creator $ pyenv local 3.7.0
flo at MacBook-Pro in .../comic-creator $ ls -la
total 8
drwxr-xr-x   3 flo  staff    96 Apr 12 21:16 .
drwx------+ 33 flo  staff  1056 Apr 12 21:12 ..
-rw-r--r--   1 flo  staff     6 Apr 12 21:16 .python-version
flo at MacBook-Pro in .../comic-creator $ cat .python-version 
3.7.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Create a virtual environment.
&lt;/h2&gt;

&lt;p&gt;Just as you may have several Python versions installed on your machine, you may also have different versions of Python packages installed. Imagine the dependency graph for one of your projects looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;requests==2.21.0
  - certifi [required: &amp;gt;=2017.4.17, installed: 2019.3.9]
  - chardet [required: &amp;gt;=3.0.2,&amp;lt;3.1.0, installed: 3.0.4]
  - idna [required: &amp;gt;=2.5,&amp;lt;2.9, installed: 2.8]
  - urllib3 [required: &amp;gt;=1.21.1,&amp;lt;1.25, installed: 1.24.1]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In another project, you may be using a different version of &lt;code&gt;requests&lt;/code&gt; which depends on a different version of &lt;code&gt;certifi&lt;/code&gt;. By using virtual environments, we can keep package installations isolated by project. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A virtual environment is a Python environment such that the Python interpreter, libraries and scripts installed into it are isolated from those installed in other virtual environments, and (by default) any libraries installed in a “system” Python, i.e., one which is installed as part of your operating system. &lt;a href="https://docs.python.org/3/library/venv.html" rel="noopener noreferrer"&gt;Python venv docs&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, first, you can verify (again) that we correctly set the Python version for the project. Now, create a virtual environment by calling &lt;code&gt;venv&lt;/code&gt; and call that new environment &lt;code&gt;venv&lt;/code&gt;. You can now see the environment is created.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in .../comic-creator $ python --version
Python 3.7.0
flo at MacBook-Pro in .../comic-creator $ python -m venv venv
flo at MacBook-Pro in .../comic-creator $ ls
venv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Activate the virtual environment.
&lt;/h2&gt;

&lt;p&gt;Look inside &lt;code&gt;venv&lt;/code&gt;. Then, look inside &lt;code&gt;venv/bin&lt;/code&gt;. &lt;code&gt;bin&lt;/code&gt; stands for &lt;em&gt;binary&lt;/em&gt;. In Linux/Unix-like systems, executable programs needed to run the system are found in &lt;a href="http://linfo.org/bin.html" rel="noopener noreferrer"&gt;&lt;code&gt;/bin&lt;/code&gt;&lt;/a&gt;. Similarly, Python executable programs are stored in &lt;code&gt;bin&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Activate the virtual environment with &lt;code&gt;source&lt;/code&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;source&lt;/code&gt; is a Unix command that evaluates the file following the command executed in the current context... Frequently the "current context" is a terminal window into which the user is typing commands during an interactive session. The &lt;code&gt;source&lt;/code&gt; command can be abbreviated as just a dot (.) in Bash and similar POSIX-ish shells. &lt;a href="https://en.wikipedia.org/wiki/Source_(command)" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means that if you open a new Terminal window, you will need to source the &lt;code&gt;activate&lt;/code&gt; file again to activate the virtual environment in that window! Also note that you can type in &lt;code&gt;. venv/bin/activate&lt;/code&gt; and it will do the exact same thing as &lt;code&gt;source venv/bin/activate&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in .../comic-creator $ ls venv/
bin        include    lib        pyvenv.cfg
flo at MacBook-Pro in .../comic-creator $ ls venv/bin/
activate         activate.csh     activate.fish    easy_install     easy_install-3.7 pip              pip3             pip3.7           python           python3
flo at MacBook-Pro in .../comic-creator $ source venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's look at &lt;code&gt;activate&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in .../comic-creator using virtualenv: venv $ cat venv/bin/activate
# This file must be used with "source bin/activate" *from bash*
# you cannot run it directly

deactivate () {
    # reset old environment variables
    if [ -n "${_OLD_VIRTUAL_PATH:-}" ] ; then
        PATH="${_OLD_VIRTUAL_PATH:-}"
        export PATH
        unset _OLD_VIRTUAL_PATH
    fi
    if [ -n "${_OLD_VIRTUAL_PYTHONHOME:-}" ] ; then
        PYTHONHOME="${_OLD_VIRTUAL_PYTHONHOME:-}"
        export PYTHONHOME
        unset _OLD_VIRTUAL_PYTHONHOME
    fi

    # This should detect bash and zsh, which have a hash command that must
    # be called to get it to forget past commands.  Without forgetting
    # past commands the $PATH changes we made may not be respected
    if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
        hash -r
    fi

    if [ -n "${_OLD_VIRTUAL_PS1:-}" ] ; then
        PS1="${_OLD_VIRTUAL_PS1:-}"
        export PS1
        unset _OLD_VIRTUAL_PS1
    fi

    unset VIRTUAL_ENV
    if [ ! "$1" = "nondestructive" ] ; then
    # Self destruct!
        unset -f deactivate
    fi
}

# unset irrelevant variables
deactivate nondestructive

VIRTUAL_ENV="/Users/flo/Documents/comic-creator/venv"
export VIRTUAL_ENV

_OLD_VIRTUAL_PATH="$PATH"
PATH="$VIRTUAL_ENV/bin:$PATH"
export PATH

# unset PYTHONHOME if set
# this will fail if PYTHONHOME is set to the empty string (which is bad anyway)
# could use `if (set -u; : $PYTHONHOME) ;` in bash
if [ -n "${PYTHONHOME:-}" ] ; then
    _OLD_VIRTUAL_PYTHONHOME="${PYTHONHOME:-}"
    unset PYTHONHOME
fi

if [ -z "${VIRTUAL_ENV_DISABLE_PROMPT:-}" ] ; then
    _OLD_VIRTUAL_PS1="${PS1:-}"
    if [ "x(venv) " != x ] ; then
        PS1="(venv) ${PS1:-}"
    else
    if [ "`basename \"$VIRTUAL_ENV\"`" = "__" ] ; then
        # special case for Aspen magic directories
        # see http://www.zetadev.com/software/aspen/
        PS1="[`basename \`dirname \"$VIRTUAL_ENV\"\``] $PS1"
    else
        PS1="(`basename \"$VIRTUAL_ENV\"`)$PS1"
    fi
    fi
    export PS1
fi

# This should detect bash and zsh, which have a hash command that must
# be called to get it to forget past commands.  Without forgetting
# past commands the $PATH changes we made may not be respected
if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
    hash -r
fi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: Install dependencies.
&lt;/h2&gt;

&lt;p&gt;If you haven't come across "dependencies", this word is often used to say that something is dependent on something else... Makes sense. In our case, our Python project will depend on installing various libraries that don't come already bundled with Python 3.7.0.&lt;/p&gt;

&lt;p&gt;This is what our code looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;webbrowser&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# url of latest xkcd comic
&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://xkcd.com/info.0.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Comic is located at {}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;img&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;webbrowser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;img&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; {}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a file &lt;code&gt;comic_popup.py&lt;/code&gt; in the project and add this code. If you try to run the code you will get an error. &lt;code&gt;requests&lt;/code&gt; module isn't installed. Let's install it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in .../comic-creator using virtualenv: venv $ touch comic_popup.py
flo at MacBook-Pro in .../comic-creator using virtualenv: venv $ ls
comic_popup.py venv
flo at MacBook-Pro in .../comic-creator using virtualenv: venv $ pip install requests
Collecting requests
  Using cached https://files.pythonhosted.org/packages/7d/e3/20f3d364d6c8e5d2353c72a67778eb189176f08e873c9900e10c0287b84b/requests-2.21.0-py2.py3-none-any.whl
Collecting chardet&amp;lt;3.1.0,&amp;gt;=3.0.2 (from requests)
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting urllib3&amp;lt;1.25,&amp;gt;=1.21.1 (from requests)
  Using cached https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl
Collecting idna&amp;lt;2.9,&amp;gt;=2.5 (from requests)
  Using cached https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl
Collecting certifi&amp;gt;=2017.4.17 (from requests)
  Using cached https://files.pythonhosted.org/packages/60/75/f692a584e85b7eaba0e03827b3d51f45f571c2e793dd731e598828d380aa/certifi-2019.3.9-py2.py3-none-any.whl
Installing collected packages: chardet, urllib3, idna, certifi, requests
Successfully installed certifi-2019.3.9 chardet-3.0.4 idna-2.8 requests-2.21.0 urllib3-1.24.1
You are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 7: Save packages.
&lt;/h2&gt;

&lt;p&gt;Notice what is printed when you enter &lt;code&gt;pip freeze&lt;/code&gt;. This command outputs installed packages in requirements format ({library-name}={version}). In the next line, &lt;em&gt;redirect&lt;/em&gt; that output to a file called &lt;code&gt;requirements.txt&lt;/code&gt; using &lt;a href="https://en.wikipedia.org/wiki/Redirection_(computing)" rel="noopener noreferrer"&gt;&lt;code&gt;&amp;gt;&lt;/code&gt;&lt;/a&gt;. A single &lt;code&gt;&amp;gt;&lt;/code&gt; will overwrite the contents of the file if the file already existed. Using &lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt; would append to the contents of an already existing file.&lt;/p&gt;

&lt;p&gt;You don't have to call the file &lt;code&gt;requirements.txt&lt;/code&gt; but that is what most Python developers use so follow the convention! More on requirements files &lt;a href="https://pip.pypa.io/en/stable/user_guide/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You may also notice that &lt;code&gt;requests&lt;/code&gt; isn't the only library outputted by &lt;code&gt;pip freeze&lt;/code&gt;. The other libraries are libraries that &lt;code&gt;requests&lt;/code&gt; depends on so when you install &lt;code&gt;requests&lt;/code&gt; you must install the others for &lt;code&gt;requests&lt;/code&gt; to work. These other libraries are referred to as &lt;em&gt;transitive dependencies&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in .../comic-creator using virtualenv: venv $ pip freeze
certifi==2019.3.9
chardet==3.0.4
idna==2.8
requests==2.21.0
urllib3==1.24.1
You are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
flo at MacBook-Pro in .../comic-creator using virtualenv: venv $ pip freeze &amp;gt; requirements.txt
You are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
flo at MacBook-Pro in .../comic-creator using virtualenv: venv $ cat requirements.txt 
certifi==2019.3.9
chardet==3.0.4
idna==2.8
requests==2.21.0
urllib3==1.24.1
flo at MacBook-Pro in .../comic-creator using virtualenv: venv $ pip --help

Usage:   
  pip &amp;lt;command&amp;gt; [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  config                      Manage local and global configuration.
  search                      Search PyPI for packages.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  --isolated                  Run pip in an isolated mode, ignoring environment variables and user configuration.
  -v, --verbose               Give more output. Option is additive, and can be used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output. Option is additive, and can be used up to 3 times (corresponding to WARNING, ERROR, and CRITICAL logging levels).
  --log &amp;lt;path&amp;gt;                Path to a verbose appending log.
  --proxy &amp;lt;proxy&amp;gt;             Specify a proxy in the form [user:passwd@]proxy.server:port.
  --retries &amp;lt;retries&amp;gt;         Maximum number of retries each connection should attempt (default 5 times).
  --timeout &amp;lt;sec&amp;gt;             Set the socket timeout (default 15 seconds).
  --exists-action &amp;lt;action&amp;gt;    Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup, (a)bort).
  --trusted-host &amp;lt;hostname&amp;gt;   Mark this host as trusted, even though it does not have valid or any HTTPS.
  --cert &amp;lt;path&amp;gt;               Path to alternate CA bundle.
  --client-cert &amp;lt;path&amp;gt;        Path to SSL client certificate, a single file containing the private key and the certificate in PEM format.
  --cache-dir &amp;lt;dir&amp;gt;           Store the cache data in &amp;lt;dir&amp;gt;.
  --no-cache-dir              Disable the cache.
  --disable-pip-version-check
                              Don't periodically check PyPI to determine whether a new version of pip is available for download. Implied with --no-index.
  --no-color                  Suppress colored output

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 8: Run the code.
&lt;/h2&gt;

&lt;p&gt;That's it. You should be able to run the code now. You would be able to run it as soon as you install the dependencies it needs but don't forget to save your requirements!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flo at MacBook-Pro in .../comic-creator using virtualenv: venv $ python comic_popup.py 
Comic is located at https://imgs.xkcd.com/comics/election_commentary.png
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, if you save your code to a repo, anyone can pull the code and run it. Add a &lt;code&gt;README.md&lt;/code&gt; and include which version of Python to use to run the code. The next developer will set up the right Python version and install the requirements by running &lt;code&gt;pip install -r requirements.txt&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Don't include the &lt;code&gt;.python-version&lt;/code&gt; file in the repo because the file is &lt;code&gt;pyenv&lt;/code&gt; specific and other developers may have their own way to manage Python versions. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;As a rule of thumb, I don't include files that are specific to me like configuration files for various IDEs (Integrated Development Environments) because they clutter up the repository. Ignore these files in your repository by adding and configuring a &lt;code&gt;.gitignore&lt;/code&gt; file.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;That's my development workflow when I start a Python project! I included some developer best practices where I felt it fit. I also explained as much context as I felt appropriate. I encourage you to try out different ways of doing the same thing to see the pros and cons of each.&lt;/p&gt;

&lt;p&gt;Feel free to ask any questions! I'd love to chat about best practices and what works for you as well. So many parts of our workflows are by convention or because that's the way we first learned it or we don't know any better. I'd love to hear from you!&lt;/p&gt;

</description>
      <category>python</category>
      <category>developerworkflow</category>
    </item>
    <item>
      <title>Live notetaking as I learn Spark</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Sat, 06 Apr 2019 16:00:20 +0000</pubDate>
      <link>https://dev.to/flopi/live-notetaking-as-i-learn-spark-odj</link>
      <guid>https://dev.to/flopi/live-notetaking-as-i-learn-spark-odj</guid>
      <description>&lt;h1&gt;
  
  
  What is this?
&lt;/h1&gt;

&lt;p&gt;I would love to get in the mind of other developers. I want to see how they think and I want to watch how they learn &lt;em&gt;live&lt;/em&gt;. So, &lt;strong&gt;this is an experiment.&lt;/strong&gt; &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I have a somewhat vague goal: learn the theoretical foundations of Spark so I can look at a program and optimize it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I will put my notes here as I go. My hope is that I will start to gleam some patterns from these notes. For example, so far I have noticed that I categorize the questions I have and define concepts I don't understand as I come across them sometimes putting off defining them because I want to understand a larger concept first. This is valuable information for me because I want to learn in the most efficient manner! There is so much I am curious about. I want to live life to the fullest and explore topics that give me joy.&lt;/p&gt;

&lt;p&gt;This experiment is inspired by Jessie Frazelle's blog post, &lt;a href="https://blog.jessfraz.com/post/digging-into-risc-v-and-how-i-learn-new-things/" rel="noopener noreferrer"&gt;Digging into RISC-V and how I learn new things&lt;/a&gt;. By inspired I mean that I felt that same excitement I felt when I first started learning to code when I read this post. It brought that feeling of wonder and reverence back. Romance is not dead :p &lt;/p&gt;

&lt;p&gt;If you have been feeling burned out or like your work does not matter, I urge you to think outside your immediate situation (whether a shitty job or overwhelming schoolwork) and learn about areas that &lt;strong&gt;you&lt;/strong&gt; are curious about. See how other people are learning. Try new things. Find inspiration.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wwahlhckgat8nlusioq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wwahlhckgat8nlusioq.jpg" alt="Fuzzy pink pen" width="236" height="270"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Notes
&lt;/h1&gt;
&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;From Spark: The Definitive Guide (recommended to me!)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you want to download and run Spark locally, the first step is to make sure that you have Java installed on your machine (available as &lt;code&gt;java&lt;/code&gt;), as well as a Python version if you would like to use Python. Next, visit the project’s official download page, select the package type of “Pre-built for Hadoop 2.7 and later,” and click “Direct Download.” This downloads a compressed TAR file, or tarball, that you will then need to extract.

&lt;ul&gt;
&lt;li&gt;I already installed the pre-built version for 2.9&lt;/li&gt;
&lt;li&gt;I moved the uncompressed folder to my Applications folder but couldn't find the folder thru the terminal. What have I done?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Small snafu
&lt;/h3&gt;

&lt;p&gt;There is the folder:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogyta79br58cplblz5if.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogyta79br58cplblz5if.png" alt="screenshot of Finder showing spark install in Applications folder" width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But, when I don't see any content in the folder thru the Terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; ~/Applications/
Chrome Apps.localized
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, I Googled and found that &lt;a href="https://apple.stackexchange.com/questions/44475/access-applications-directory-in-terminal" rel="noopener noreferrer"&gt;Applications on Mac is at the root&lt;/a&gt; and I had been looking at my user's Applications. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spark can run locally without any distributed storage system, such as Apache Hadoop, so I won't install Hadoop since the last time I did it was incredibly slow on my machine. (although I was using a different machine with shittier specs)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;"You can use Spark from Python, Java, Scala, R, or SQL. Spark itself is written in Scala, and runs on the Java Virtual Machine (JVM), so therefore to run Spark either on your laptop or a cluster, all you need is an installation of Java. If you want to use the Python API, you will also need a Python interpreter (version 2.7 or later)." &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I'll be using PySpark. I use &lt;code&gt;pyenv&lt;/code&gt; to manage installations of Python and create virtual environments. I see that I'll need a virtual environment because the book says, "In Spark 2.2, the developers also added the ability to install Spark for Python via &lt;code&gt;pip install pyspark&lt;/code&gt;. This functionality came out as this book was being written, so we weren’t able to include all of the relevant instructions." So I know that I'll need a virtual environment because I want to manage the version of &lt;code&gt;pyspark&lt;/code&gt; by project instead of installing it globally. I also now know that I may need to do more Googling since this book doesn't have all the instructions I may need. I don't know exactly how many instructions are missing -- hope this doesn't derail me for too long.  I'm anticipating it though and I wonder if that gets in the way of me pushing thru on other projects.&lt;/p&gt;

&lt;p&gt;This is how I'm installing &lt;code&gt;pyspark&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ mkdir spark-trial
$ cd spark-trial/
$ python --version
Python 2.7.10
$ pyenv local 3.6.5
$ python -m venv venv
$ source venv/bin/activate
$ python --version
Python 3.6.5
$ pip install pyspark
$ pip freeze &amp;gt; requirements.txt
$ cat requirements.txt 
py4j==0.10.7
pyspark==2.4.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, I want to make sure I can run the thing. The book says to run the following from the home directory of the Spark installation: &lt;code&gt;$ ./bin/pyspark&lt;/code&gt;.&lt;br&gt;
 Then, type “spark” and press Enter. You’ll see the SparkSession object printed. &lt;/p&gt;

&lt;p&gt;So I went back to the directory where I installed Spark and ran those commands. This was the output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Python 3.6.5 (default, Nov 20 2018, 15:26:21) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Applications/spark-2.4.1-bin-hadoop2.7/jars/spark-unsafe_2.11-2.4.1.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/04/06 12:44:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.1
      /_/

Using Python version 3.6.5 (default, Nov 20 2018 15:26:21)
SparkSession available as 'spark'.
&amp;gt;&amp;gt;&amp;gt; spark
&amp;lt;pyspark.sql.session.SparkSession object at 0x10e262ac8&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I don't know why it is running Python 3.6.5 even though the system default is running 2.7. I know &lt;code&gt;pyenv&lt;/code&gt; manages which version of Python is set for a project my creating a &lt;code&gt;.python-version&lt;/code&gt; file so I'm going to exit this shell and look for that file in this spark directory.&lt;/p&gt;

&lt;p&gt;When I run &lt;code&gt;python --version&lt;/code&gt; from the same directory that Spark is installed in I see the version is 3.6.5 -- the same version as my virtual environment. I set this bash theme that displays this when virtual environments are activated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;○ flo at MacBook-Pro in .../-2.4.1-bin-hadoop2.7 using virtualenv: venv $ python --version
Python 3.6.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I forgot that the virtual environment I created above is activated. I thought &lt;code&gt;pyenv&lt;/code&gt; worked by reading the &lt;code&gt;.python-version&lt;/code&gt; file to determine which Python version is set for a project but I did not put together how Python virtual environments work... what does the &lt;code&gt;activate&lt;/code&gt; script that comes with virtual environments do? After looking at the script, I see that it sets &lt;code&gt;VIRTUAL_ENV&lt;/code&gt; environment variable to the path of this virtual environment, adds the &lt;code&gt;VIRTUAL_ENV&lt;/code&gt; to the system &lt;code&gt;PATH&lt;/code&gt;, and unsets &lt;code&gt;PYTHONHOME&lt;/code&gt; if it is set so that the first version of Python that the system finds when it calls Python is the activated virtual environment's Python. Cool!&lt;/p&gt;

&lt;h2&gt;
  
  
  Spark Architecture
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Single machines do not have enough power and resources to perform computations on huge amounts of information (or the user probably does not have the time to wait for the computation to finish). A &lt;em&gt;cluster&lt;/em&gt;, or group, of computers, pools the resources of many machines together, giving us the ability to use all the cumulative resources as if they were a single computer. Now, a group of machines alone is not powerful, you need a framework to coordinate work across them. Spark does just that, managing and coordinating the execution of tasks on data across a cluster of computers." Spark the definite guide&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I have a textbook, Distributed Systems, by Andrew Tanenbaum (my favorite textbook author -- yes, I have a fav!). This morning, I read about different types of distributed computing, like cluster computing vs grid computing, so I am going to go back to the book to clarify if &lt;em&gt;cluster&lt;/em&gt; in the quote above has a connection to &lt;em&gt;cluster computing&lt;/em&gt; or if its an overloaded term.&lt;/p&gt;

&lt;p&gt;There are many classes of distributed systems; one class of distributed systems is for &lt;strong&gt;high-performance computing tasks&lt;/strong&gt;. There are two subgroups within this class: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cluster computing: 

&lt;ul&gt;
&lt;li&gt;underlying hardware consists of a &lt;strong&gt;collection of similar workstations&lt;/strong&gt; or PCs closely &lt;strong&gt;connected by&lt;/strong&gt; means of a high-speed &lt;strong&gt;local-area network&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;each node runs the same operating system &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;grid computing:

&lt;ul&gt;
&lt;li&gt;often constructed as a federation of computer systems, where each system may fall under a different administrative domain, and may be very different when it comes to hardware, software, and deployed network technology&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I have only come in contact with Spark within single companies which I assume each have a single network??? so for now I am going to assume that Spark falls under cluster computing.&lt;/p&gt;

&lt;p&gt;Distributed Systems textbook then goes into examples of cluster computers. For example, the &lt;em&gt;MOSIX system&lt;/em&gt;. MOSIX tries to provide a &lt;strong&gt;single-system image&lt;/strong&gt; of a cluster which means that a cluster computer tries to appear as a single computer to a process. I've come across my first distributed system gem: IT IS IMPOSSIBLE TO PROVIDE A SINGLE SYSTEM IMAGE UNDER ALL CIRCUMSTANCES.&lt;/p&gt;

&lt;p&gt;I am finally starting to connect the dots between Spark and distributed systems concepts. I am going back in the textbook to design goals of distributed systems to learn more about transparency and adding a section to my notes called Design Goals since I am now getting what these design goals are all about.&lt;/p&gt;

&lt;p&gt;Back to Spark, &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The cluster of machines that Spark will use to execute tasks is managed by a cluster manager like Spark’s standalone cluster manager, YARN, or Mesos. We then submit Spark Applications to these cluster managers, which will grant resources to our application so that we can complete our work."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Cluster manager
&lt;/h3&gt;

&lt;p&gt;The cluster manager controls physical machines and allocates resources to Spark Applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spark Applications
&lt;/h3&gt;

&lt;p&gt;Spark Applications consist of a &lt;em&gt;driver process&lt;/em&gt; and a set of &lt;em&gt;executor processes&lt;/em&gt;.  The driver and executors are simply processes, which means that they can live on the same machine or different machines. In local mode, the driver and executors run (as threads) on your individual computer instead of a cluster. &lt;/p&gt;

&lt;h4&gt;
  
  
  Driver
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;called the &lt;code&gt;SparkSession&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;1:1 &lt;code&gt;SparkSession&lt;/code&gt; to Spark Application&lt;/li&gt;
&lt;li&gt;runs &lt;code&gt;main()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;sits on a node in the cluster&lt;/li&gt;
&lt;li&gt;responsible for three things: 

&lt;ul&gt;
&lt;li&gt;maintaining information about the Spark Application during the lifetime of the application&lt;/li&gt;
&lt;li&gt;responding to a user’s program or input&lt;/li&gt;
&lt;li&gt;analyzing, distributing, and scheduling work across the executors &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  Executors
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;responsible for only two things: 

&lt;ul&gt;
&lt;li&gt;executing code assigned to it by the driver&lt;/li&gt;
&lt;li&gt;reporting the state of the computation on that executor back to the driver node&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I've done quite a bit of learning about concepts. I am itching to build before my attention wavers. There are still some parts in the text that I gleamed and seem relevant so I'm going to keep going and hopefully will get to build soon. Otherwise, I'll pivot myself.&lt;/p&gt;

&lt;h1&gt;
  
  
  Spark APIs
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;language APIs and structured vs unstructured APIs&lt;/li&gt;
&lt;li&gt;Each &lt;em&gt;language&lt;/em&gt; API maintains the same core concepts described (driver, executors, etc.?). 

&lt;ul&gt;
&lt;li&gt;There is a &lt;code&gt;SparkSession&lt;/code&gt; object available to the user, which is the entrance point to running Spark code.&lt;/li&gt;
&lt;li&gt;When using Spark from Python or R, you don’t write explicit JVM instructions; instead, &lt;strong&gt;you write Python and R code that Spark translates into code that it then can run on the executor JVMs.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Starting the Spark shell is how I can send commands to Spark so that Spark can then send to executors. So, starting a Spark shell creates an interactive Spark Application. The shell will start in standalone mode. I can also send standalone applications to Spark using &lt;code&gt;spark-submit&lt;/code&gt; process, whereby I submit a precompiled application to Spark. &lt;/p&gt;

&lt;h2&gt;
  
  
  Distributed collections of data
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;core data structures are &lt;em&gt;immutable&lt;/em&gt; so they cannot change after they are created&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  DataFrames
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;a table of data with rows and columns &lt;/li&gt;
&lt;li&gt;
&lt;em&gt;schema&lt;/em&gt; is list of columns and types&lt;/li&gt;
&lt;li&gt;parts of the DataFrame can reside on different machines

&lt;ul&gt;
&lt;li&gt;A &lt;em&gt;partition&lt;/em&gt; is a collection of &lt;em&gt;rows&lt;/em&gt; that sit on one physical machine in your cluster. A DataFrame’s partitions represent how the data is physically distributed across the cluster of machines during execution. &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Python/R DataFrames mostly exist on a single machine but can convert to Spark DataFrame&lt;/li&gt;

&lt;li&gt;
&lt;code&gt;myRange = spark.range(1000).toDF("number")&lt;/code&gt; in Python creates a DataFrame with one column containing 1,000 rows with values from 0 to 999. This range of numbers represents a distributed collection. When run on a cluster, each part of this range of numbers exists on a different executor. This is a Spark DataFrame.&lt;/li&gt;

&lt;li&gt;most efficient and easiest to use&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Transformations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;narrow: those for which each input partition will contribute to only one output partition

&lt;ul&gt;
&lt;li&gt;Spark will automatically perform an operation called &lt;em&gt;pipelining&lt;/em&gt;, meaning that if we specify multiple filters on DataFrames, they’ll all be performed in-memory. &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;wide: A wide dependency (or wide transformation) style transformation will have input partitions contributing to many output partitions aka &lt;em&gt;shuffle&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;when a shuffle is performed, Spark writes the results to disk&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Actions
&lt;/h3&gt;

&lt;h2&gt;
  
  
  Theoretical foundations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Design Goals
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Making distribution transparent/invisible: hide that processes and resources are physically distributed across multiple computers

&lt;ul&gt;
&lt;li&gt;Access: hide differences in data representation and how a process or resource is accessed&lt;/li&gt;
&lt;li&gt;location: hide where a process or resource is located&lt;/li&gt;
&lt;li&gt;relocation: hide that a resource or process may be moved to another location while in use&lt;/li&gt;
&lt;li&gt;migration: hide that a resource or process may move to another location&lt;/li&gt;
&lt;li&gt;replication: hide that a resource or process is replicated&lt;/li&gt;
&lt;li&gt;concurrency: hide that a resource or process may be shared by several independent users&lt;/li&gt;
&lt;li&gt;failure: hide the failure and recovery of a resource or process &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;What are the main problems in distributed systems?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiset&lt;/li&gt;
&lt;li&gt;working set: the amount of memory that a process requires in a given time interval, typically the units of information in question are considered to be memory pages. This is suggested to be an approximation of the set of pages that the process will access in the future and more specifically is suggested to be an indication of what pages ought to be kept in main memory to allow most progress to be made in the execution of that process.&lt;/li&gt;
&lt;li&gt;cluster computing paradigms&lt;/li&gt;
&lt;li&gt;distributed shared memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Architectural foundations&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RDD: resilient distributed dataset, a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way&lt;/li&gt;
&lt;li&gt;The Dataframe API was released as an abstraction on top of the RDD, followed by the Dataset API. In Spark 1.x, the RDD was the primary application programming interface (API), but as of Spark 2.x use of the Dataset API is encouraged.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cluster Computing Paradigm&lt;/p&gt;

&lt;p&gt;Limitations of MapReduce cluster computing paradigm&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spark and its RDDs were developed in 2012 in response to limitations in the &lt;/li&gt;
&lt;li&gt;MapReduce cluster computing paradigm forces a particular linear dataflow structure on distributed programs: 

&lt;ul&gt;
&lt;li&gt;MapReduce programs read input data from disk&lt;/li&gt;
&lt;li&gt;map a function across the data &lt;/li&gt;
&lt;li&gt;reduce the results of the map&lt;/li&gt;
&lt;li&gt;store reduction results on disk&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory.&lt;/li&gt;

&lt;li&gt;Spark facilitates the implementation of both iterative algorithms, which visit their data set multiple times in a loop, and interactive/exploratory data analysis, i.e., the repeated database-style querying of data. The latency of such applications may be reduced by several orders of magnitude compared to Apache Hadoop MapReduce implementation. &lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Apache Spark requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a cluster manager: standalone (native Spark cluster), Hadoop YARN, Apache Mesos&lt;/li&gt;
&lt;li&gt;a distributed storage system: Alluxio, HDFS, MapR-FS, Cassandra, OpenStack Swift, Amazon S3, Kudu, a custom solution, pseudo-distributed local mode where distributed storage is not required and the local file system can be used instead (Spark is run on a single machine with one executor per CPU core in this case)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;why use RDDs?&lt;br&gt;
RDDs are lower-level abstractions because they reveal physical execution characteristics like partitions to end users. Might use RDDs to parallelize raw data stored in memory on the driver machine. &lt;/p&gt;

&lt;p&gt;Scala RDDs are &lt;strong&gt;not&lt;/strong&gt; equivalent to Python RDDs.&lt;/p&gt;



&lt;br&gt;
Chapter 4: Structured API Overview

&lt;p&gt;Spark is a distributed programming model in which the user specifies transformations.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Spark is a distributed programming model in which the user specifies transformations. Multiple transformations build up a directed acyclic graph of instructions. An action begins the process of executing that graph of instructions, as a single job, by breaking it down into stages and tasks to execute across the cluster. The logical structures that we manipulate with transformations and actions are DataFrames and Datasets. To create a new DataFrame or Dataset, you call a transformation. To start computation or convert to native language types, you call an action."&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>spark</category>
      <category>distributedsystems</category>
      <category>learningplan</category>
    </item>
    <item>
      <title>A technical leadership lesson from interacting with folks outside of engineering</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Fri, 08 Mar 2019 02:30:46 +0000</pubDate>
      <link>https://dev.to/flopi/a-technical-leadership-lesson-from-interacting-with-folks-outside-of-engineering-416n</link>
      <guid>https://dev.to/flopi/a-technical-leadership-lesson-from-interacting-with-folks-outside-of-engineering-416n</guid>
      <description>&lt;p&gt;At work, I have the privilege of interacting with folks from other (non-engineering) teams often. I really enjoy this part of the job! I studied computer science in college and have worked as an engineer since so being in a role that gives me visibility into other roles is refreshing. I see every interaction with someone new as a chance to learn something and perhaps &lt;em&gt;build together&lt;/em&gt;? 😊  &lt;/p&gt;

&lt;p&gt;So when I came across this blog post I was immediately intrigued:&lt;/p&gt;


&lt;div class="ltag__link"&gt;
  &lt;a href="/samjarman" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F19747%2F97b2fe78-0de2-4c81-91aa-a41493c58b60.png" alt="samjarman"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/samjarman/daniele-bernardi-on-working-with-non-technical-people-and-getting-recognition-right-533l" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Daniele Bernardi on Working with Non-Technical People and Getting Recognition Right&lt;/h2&gt;
      &lt;h3&gt;Sam Jarman 👨🏼‍💻 ・ Mar 5 '19&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#career&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#nontechnicalpeople&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#softskills&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


&lt;p&gt;This quote resonated with me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I'm not afraid to say I struggled with this aspect at the beginning of my career. I thought technical leadership would simply mean displaying your knowledge by painstakingly listing all the details in a project, but I soon found out that aspect is actually perceived as overzealous. Technical leadership is really about using your best judgement to condense the most difficult aspects into a straightforward statement; it also means hiding details you know don't require broad consensus among the key stakeholders of your project."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I recently created a document whose primary audience are not engineers. I was proud of having started the conversation that led to this very document being written. I truly believed this document was taking my team one step forward on the road to success. So, I included several hundred pages of sample records which took many evenings during off-hours to put together. I wanted to be as precise as possible and I felt good about my effort right up until I received feedback from the other team. They had removed the sample records because it didn't add to the discussion. &lt;/p&gt;

&lt;p&gt;I realize I did not practice enough self-awareness when writing this document. Yes, I was concerned with how the readers would perceive me and my knowledge but now I see that I was more concerned with proving I knew what I was talking about. I missed the point.&lt;/p&gt;

&lt;p&gt;The blog post I reference above is not the first time I hear about the delivery of simplified messages as a key aspect of technical leadership. Other people I look up to and respect have shared this with me. This is, however, the first time that it really hit home for me as I reflected on this experience with another team outside engineering. So I will be mulling over this lesson, practicing empathy in future interactions, and actively seeking feedback.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I am curious on your thoughts! Please send me a message or comment below.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;P.S. I hate referring to folks as "non-technical" but I couldn't help how someone else titled their blog post. April Wensel explains some of the issues I have with the term on her &lt;a href="https://medium.com/compassionate-coding/if-you-can-use-a-fork-youre-technical-352e21d92c87" rel="noopener noreferrer"&gt;blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>technicalleadership</category>
    </item>
    <item>
      <title>Why you should join a new grad program</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Tue, 05 Mar 2019 02:22:25 +0000</pubDate>
      <link>https://dev.to/flopi/why-you-should-join-a-new-grad-program-4aec</link>
      <guid>https://dev.to/flopi/why-you-should-join-a-new-grad-program-4aec</guid>
      <description>&lt;p&gt;I immigrated to the United States when I was seven years old. Life was hard for my family. They were not working glamorous jobs. They were making ends meet the best they could. My mom never had a Bring Your Kid to Work Day at her low wage job. I did not get exposure to folks working in corporate settings either. So when I started working in the tech industry after college, I was very nervous -- I didn't know what to expect but I knew that I really &lt;em&gt;needed&lt;/em&gt; to succeed at this. This was my chance to make a better life for myself. &lt;strong&gt;I had been waiting for this moment my whole life.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenu1gzyiof9294qux1mx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenu1gzyiof9294qux1mx.jpg" alt="Fairly Odd Parents sequence switch corporate" width="480" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I consider myself very fortunate. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;My first job was through J.P. Morgan's &lt;a href="https://careers.jpmorgan.com/us/en/students/programs/software-engineer-fulltime" rel="noopener noreferrer"&gt;Software Engineer Program&lt;/a&gt;. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I remember getting the call with the job offer. Compensation was several times more than what my mom made in a year. I was thrilled. I was terrified. Many people I looked up to told me that this was a place I could learn best practices and grow. Also, the money... so I accepted. I had many restless nights after that. I was worried that any second the offer would be rescinded. I did not rest until the first day went by without a hitch. I am happy to say today that everything worked out.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/http%3A%2F%2Fs1.dmcdn.net%2FjeS0o%2Fx480-_W9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/http%3A%2F%2Fs1.dmcdn.net%2FjeS0o%2Fx480-_W9.jpg" alt="Welcome junior business leaguers" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The program offers many benefits:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You’ll have access to continuous training both on-the-job and via courses to build your technical and business skills. We’ll cover topics ranging from cybersecurity to presentation skills to further your career development. Our teams are dedicated to your support and advocacy throughout the two years of the program."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They're not kidding. I got a lot of support here. I had access to mentors, trainings, and built strong networks. My manager was cool and I learned more from them than I ever thought imaginable. &lt;strong&gt;I highly recommend joining a program for new grads if you want extra support during your transition from student to professional developer.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;New grad programs set the expectation that you will need extra support to succeed. When you tell more senior colleagues (the good ones!) you are in the program, they will make themselves available to you for help. Many will offer to mentor you. Take them up on it. &lt;/p&gt;

&lt;p&gt;Programs also tend to have policies to ensure that those in charge of your success are doing right by you. Managers often have to attend trainings to learn how to best support you. Seeing how seriously my manager at the time took the trainings increased my trust in them. They often shared what they learned at the trainings and kept me in the loop about what was coming up like the dreaded performance reviews (which they prepared me for!). &lt;/p&gt;

&lt;p&gt;On the flip side, Jesse Jiryu Davis writes about getting his interns taken away at one point: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"My managers were watching me founder and they issued ultimata: get involved, make goals, get the project on track. I never did. With only weeks left in the summer, Intern Protective Services reassigned my apprentices to Mike O'Brien, who maintained the MongoDB Connector for Hadoop at the time."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Jesse &lt;a href="https://emptysqua.re/blog/mentoring/#disaster" rel="noopener noreferrer"&gt;did eventually triumph as a mentor (yay!)&lt;/a&gt; but what I'd like to point to here is the structure set in place to ensure folks are supported. You want to know that your employer will rectify the situation if you find yourself on a team that is not working for you and you want to be sure it is easy to say something to someone who can help you before the situation gets too dire. &lt;strong&gt;Communicating difficult issues will be much easier in a new grad program where regular check-ins with program coordinators are the norm.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fvignette.wikia.nocookie.net%2Ffairlyoddparents%2Fimages%2Fc%2Fc4%2FPixiesInc048.png%2Frevision%2Flatest%3Fcb%3D20110321022504%26path-prefix%3Den" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fvignette.wikia.nocookie.net%2Ffairlyoddparents%2Fimages%2Fc%2Fc4%2FPixiesInc048.png%2Frevision%2Flatest%3Fcb%3D20110321022504%26path-prefix%3Den" alt="Fairly Odd Parents Pixie complaint desk" width="636" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I remember running into my program coordinator at the time and them saying, "Hi Flo, how are you?". They remember my name?! They must have interviewed hundreds of candidates! In a firm of over 400,000, do not underestimate how many people you will get to know. They will be rooting for you. That's pretty special and it will make a difference in your life. As you grow in your career, you will still get support but the expectation will be that you will seek out what you need. &lt;strong&gt;The support you get at this stage will model the support you will most likely seek in the future.&lt;/strong&gt; Make sure you have high standards for yourself!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ldb3iwf3sz6wuszfbfo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ldb3iwf3sz6wuszfbfo.png" alt="Close up of Pixie Boss saying you're in charge" width="800" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, in conclusion, if you are uneasy about the transition to professional life, consider a new grad program. You will learn what you need to succeed, have a structure in place to make sure you are supported, and you will have dedicated folks helping you every step of the way. I am really grateful my career started out like this!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I am happy to talk to you if you are considering a new grad program :)&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>newgrad</category>
    </item>
    <item>
      <title>How I'm developing my learning plan this year</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Sun, 03 Mar 2019 19:49:38 +0000</pubDate>
      <link>https://dev.to/flopi/how-im-developing-my-learning-plan-this-year-1pj5</link>
      <guid>https://dev.to/flopi/how-im-developing-my-learning-plan-this-year-1pj5</guid>
      <description>&lt;h1&gt;
  
  
  Motivation
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;My grandfather took my sister and I to the library every week as kids. I remember being in awe of the large books older folks would pick up. I remember telling myself that one day I too would be able to read such long books.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I have wanted to be part of a &lt;a href="https://www.recurse.com/about" rel="noopener noreferrer"&gt;Recurse Center&lt;/a&gt; batch ever since I found out the center exists. The thought of spending extended time learning about what I want brings me joy. Getting myself to a place where I feel comfortable embarking in self-guided learning &lt;em&gt;for hard things&lt;/em&gt; is also a huge motivation for me.&lt;/p&gt;

&lt;p&gt;Working at something you want to get better at and &lt;a href="https://www.healthyplace.com/blogs/buildingselfesteem/2015/11/feel-motivated-and-confident-with-this-dbt-skill" rel="noopener noreferrer"&gt;building mastery&lt;/a&gt; is a Dialectical Behavioral Therapy skill used to increased self-confidence. Through years of DBT, I have learned that when you want to achieve something, you need clear, actionable steps to get there otherwise you're setting yourself up for failure. I know I want to be able to learn any difficult topic so I must practice learning a difficult topic, reflecting on what worked and didn't work, and continue.&lt;/p&gt;

&lt;h1&gt;
  
  
  Expectations
&lt;/h1&gt;

&lt;p&gt;I looked to what the Recurse Center &lt;a href="https://www.recurse.com/what-we-look-for" rel="noopener noreferrer"&gt;looks for in applicants&lt;/a&gt; for a good model of possible habits to strive towards. I created the Daily Affirmations graphic below and set it as my screen background. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdy0jivmgxag1loqg87kx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdy0jivmgxag1loqg87kx.png" alt="My affirmations" width="800" height="1030"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To be clear, I don't think you need all of these to succeed. For example, I don't think you need to enjoy programming to get better at it however these aspirations align with my interests. I do enjoy programming! Doing activities that bring us happiness often increases well-being. What can I do to feed this interest? I find this reminder grounding whenever I am feeling frustrated by mundane work or feeling external pressure that doesn't align with my values.&lt;/p&gt;

&lt;p&gt;Also, note that one of my values is being &lt;em&gt;intellectually honest&lt;/em&gt;. I don't pretend to know something really well if I don't! To me, this isn't about moral superiority but rather the opportunities that open up when you are honest with yourself. When you fill in what you know about a topic you can see where the gaps are in your understanding and seek help. One of my fears when I started in this field was stagnation. I have learned over time that it is rare for things to take you by surprise when you are honest with yourself and practice self-awareness. Being honest with yourself also means being kind to yourself and that is so much easier to do when you know that you don't understand pointers because you are still fuzzy on references, for example, instead of rejecting C altogether because you've been struggling for a while.&lt;/p&gt;

&lt;h1&gt;
  
  
  Learning Goals
&lt;/h1&gt;

&lt;p&gt;At first, I knew I wanted to learn something thoroughly but I wasn't sure what exactly so I wrote down a list of interests in a Google doc. This is that list:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What are my interests?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implementation of different database types i.e NoSQL, SQL, graph&lt;/li&gt;
&lt;li&gt;Seeing how changes to implementation impact performance&lt;/li&gt;
&lt;li&gt;Database performance&lt;/li&gt;
&lt;li&gt;Systems&lt;/li&gt;
&lt;li&gt;Optimization of code at the lowest level i.e. assembly&lt;/li&gt;
&lt;li&gt;Drivers&lt;/li&gt;
&lt;li&gt;How networks work&lt;/li&gt;
&lt;li&gt;Physics of wi-fi&lt;/li&gt;
&lt;li&gt;How engines work e.g. storage engine, what does engine mean?&lt;/li&gt;
&lt;li&gt;How does JVM work? What is Java bytecode? What does that mean?&lt;/li&gt;
&lt;li&gt;Regex and state machines&lt;/li&gt;
&lt;li&gt;Designing distributed systems&lt;/li&gt;
&lt;li&gt;Assembler commands to machine commands, CPU understands binary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Designing Data Intensive Applications&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database algorithms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Database Reliability Engineering&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SLAs&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;There is a lot going on in this list. To know something well, you must first know it not so well. I currently use Python at work so I decided to learn this language thoroughly. I also noticed that the JavaScript community is welcoming and there is lots of accessible learning material out there. Learning JavaScript alongside Python should give me a chance to touch on some of the topics I am interested in like performance, low level details of languages, and how engines work.&lt;/p&gt;

&lt;h1&gt;
  
  
  Desired Outcomes
&lt;/h1&gt;

&lt;p&gt;I know I want to know Python and JavaScript thoroughly but because I haven't created a learning plan of this size and scope yet there are still many unknowns. &lt;/p&gt;

&lt;p&gt;I know I need to reenforce my learning so I will be blogging about what I learn along the way. I am also gathering all my notes in the same place so I can clearly see where the gaps in my knowledge are. I decided to go with &lt;a href="https://www.literatureandlatte.com/scrivener/overview" rel="noopener noreferrer"&gt;Scrivener&lt;/a&gt;, a word processor used for putting together literary works. I like it because it allows you to (re-)organize your thoughts into sections and subsections easily and integrates with &lt;em&gt;BibTex&lt;/em&gt; for citation management. &lt;/p&gt;

&lt;p&gt;This is what the project structure looks like right now:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5zh1ake8el678cxpk1g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5zh1ake8el678cxpk1g.png" alt="Project structure" width="648" height="966"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I add subtopics as I go. I still looking for a good language implementation book. I am thinking about getting "the dragon book". &lt;strong&gt;If you have any recommendations, please let me know!&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Progress so far
&lt;/h1&gt;

&lt;p&gt;I am making good progress! Learning about JavaScript in conjunction with Python has made it easier to recognize language implementation patterns and what the lingo for those patterns are. For example, I came across this excellent &lt;a href="https://blog.bitsrc.io/understanding-execution-context-and-execution-stack-in-javascript-1c9ea8642dd0" rel="noopener noreferrer"&gt;JavaScript execution context post&lt;/a&gt;. I realized that though I knew of the concept of an execution context, I had not thought about it formally. &lt;strong&gt;Knowing what keywords to search for is so important.&lt;/strong&gt; By looking up Python execution context info, I learned more about PYTHONPATH and why my code a while ago was acting the way that it was. Now I know what to search for when learning &lt;em&gt;any&lt;/em&gt; new programming language. &lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Making a plan for myself and starting with the basics like creating motivational content for myself has been helpful. I found something to aspire to (joining a Recurse Center batch) that already had a basic guide on the habits I need to get to my goal. I chose topics to focus on and created a structure that lets me see what I'm missing in order to fully understand a concept.&lt;/p&gt;

&lt;p&gt;I am actively writing down what I learn and reflecting on both the content and execution (no pun intended!). I have found that learning this way is super fun. I don't feel burdened with completing an entire textbook before proceeding to the next topic. I can switch from JavaScript to Python and vice versa when I get bored or when a concept is difficult to understand in one language. I constantly find new things to try out, like profiling Python code or deploying my own vanilla JS site to my new domain (!), that give me a quick feeling of satisfaction in between the difficult concepts like EBNF grammar files and lexical environments. &lt;/p&gt;

&lt;p&gt;Importantly, I notice that I am making connections between the material I learn &lt;em&gt;for fun&lt;/em&gt; and the material I learn &lt;em&gt;for work&lt;/em&gt; without the imposter syndrome anxiety. &lt;strong&gt;I see that I am growing as a person and developing interests that are completely my own and not fueled by a paycheck which has increased my feelings of self-efficacy.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;I'd love to hear about your learning plans and reflections! I have seen how some of you on this platform use blogging to keep yourselves accountable in your learning and it's super motivating! Keep up the good work, folks :)&lt;/p&gt;

</description>
      <category>python</category>
      <category>javascript</category>
      <category>learningplan</category>
    </item>
    <item>
      <title>I get nervous as a mentor, too!</title>
      <dc:creator>Flo Comuzzi</dc:creator>
      <pubDate>Fri, 01 Mar 2019 02:26:35 +0000</pubDate>
      <link>https://dev.to/flopi/i-get-nervous-as-a-mentor-too-2954</link>
      <guid>https://dev.to/flopi/i-get-nervous-as-a-mentor-too-2954</guid>
      <description>&lt;p&gt;Dear Reader, &lt;br&gt;
I want to be candid with you about my experiences as a mentor. I have mentored steadily in some capacity since I started working in industry. Mentoring caused complex emotions for me in the beginning. I love helping people and I took on the responsibility knowing that giving anything less than 100% would be difficult for me. I put so much pressure on myself. &lt;em&gt;I was nervous.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you have a mentor, you may also be nervous about your interactions with them. I remember fearing that I was wasting my mentors' time. I feared that their offers to answer any questions or meet up in the future were only courtesies. Now, as a mentor, I know that when I offer my time to someone, I mean it. I &lt;em&gt;love&lt;/em&gt; when someone reaches out after I offer help. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If I enthusiastically tell you to email me anytime and give you my card, please know that I have already decided that you are an intelligent person with important things to say and I am incredibly interested in hearing more from you.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I see incredible potential in you. I think you are special. I want to see you succeed. You have made a great impression on me. Please do not forget this!&lt;/p&gt;

&lt;p&gt;If you are a new mentor, you should remind yourself that your mentee is likely incredibly nervous because they think you're a cool person. Recently, I met a wonderful and smart young person who as soon as they met me expressed how awesome they think I am because I work at [cool place]. I was so struck by the idea that someone looks up to me for being at a job that they would like to be at that I froze for a few seconds. I instantly saw the worry in their face from my lack of response. I remembered how inspiring/intimidating it was for me to meet engineers working in industry. &lt;em&gt;As a mentor, cherish the moments your mentee expresses genuine emotion whether happiness, sadness, fear, or anger.&lt;/em&gt; It is difficult to be honest with people we respect for fear that we may come off as inappropriate or unprofessional. Remember to practice empathy! If your mentee expresses something that throws you off and you don't know what to say, think back to when you were in a similar position as your mentee and share some tidbit about what that was like for you. That's usually enough to comfort your mentee and let them know they haven't said something egregious. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Though I still get anxious from time to time, these feelings have mostly decreased in intensity as I have come to understand the dynamics of mentoring relationships more.  &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You'll get there too! Stay honest with yourself throughout the process, validate your feelings, and know that you'll get better at this thing. The lessons you'll learn and the relationships you'll build by sticking with it are truly worth it.&lt;/p&gt;

&lt;p&gt;Signed, &lt;br&gt;
A Very Eager Mentor&lt;/p&gt;

</description>
      <category>mentorship</category>
    </item>
  </channel>
</rss>
