<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jakub T</title>
    <description>The latest articles on DEV Community by Jakub T (@digitaldisorder).</description>
    <link>https://dev.to/digitaldisorder</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F95942%2F79455787-46ea-4236-b6cb-c5025e1cfd42.jpg</url>
      <title>DEV Community: Jakub T</title>
      <link>https://dev.to/digitaldisorder</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/digitaldisorder"/>
    <language>en</language>
    <item>
      <title>Tail AWS CloudWatch logs</title>
      <dc:creator>Jakub T</dc:creator>
      <pubDate>Fri, 07 Jan 2022 14:38:12 +0000</pubDate>
      <link>https://dev.to/digitaldisorder/tail-aws-cloudwatch-logs-1494</link>
      <guid>https://dev.to/digitaldisorder/tail-aws-cloudwatch-logs-1494</guid>
      <description>&lt;p&gt;Sometimes it is very useful to be able to stream remote logs from CloudWatch this is very easy with the &lt;a href="https://awscli.amazonaws.com/v2/documentation/api/latest/reference/logs/tail.html"&gt;AWS CLI tail&lt;/a&gt; command.&lt;/p&gt;

&lt;p&gt;Basic version of the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws logs tail log_group_name --tail since 1h --follow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can also filter by the log stream name or its prefix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws logs  tail log_group_name --log-stream-name-prefix web --since 1h --follow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>aws</category>
      <category>logs</category>
      <category>cli</category>
    </item>
    <item>
      <title>pyenv install fails on MacOS 11 BigSur</title>
      <dc:creator>Jakub T</dc:creator>
      <pubDate>Wed, 06 Jan 2021 09:23:57 +0000</pubDate>
      <link>https://dev.to/digitaldisorder/penv-install-fails-on-macos-11-bigsur-4a13</link>
      <guid>https://dev.to/digitaldisorder/penv-install-fails-on-macos-11-bigsur-4a13</guid>
      <description>&lt;p&gt;If you are getting the following error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pyenv install  3.8.6
python-build: use openssl@1.1 from homebrew
python-build: use readline from homebrew
Downloading Python-3.8.6.tar.xz...
-&amp;gt; https://www.python.org/ftp/python/3.8.6/Python-3.8.6.tar.xz

Installing Python-3.8.6...
python-build: use readline from homebrew
python-build: use zlib from xcode sdk

BUILD FAILED (OS X 11.1 using python-build 20180424)

Inspect or clean up the working tree at /var/folders/6x/cp1l4tr97y7g3jxm74n4c6v80000gn/T/python-build.20210106100539.35673
Results logged to /var/folders/6x/cp1l4tr97y7g3jxm74n4c6v80000gn/T/python-build.20210106100539.35673.log

Last 10 log lines:
    mod_name, mod_spec, code = _get_module_details(mod_name)
  File "/private/var/folders/6x/cp1l4tr97y7g3jxm74n4c6v80000gn/T/python-build.20210106100539.35673/Python-3.8.6/Lib/runpy.py", line 144, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/private/var/folders/6x/cp1l4tr97y7g3jxm74n4c6v80000gn/T/python-build.20210106100539.35673/Python-3.8.6/Lib/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "&amp;lt;frozen zipimport&amp;gt;", line 241, in load_module
  File "&amp;lt;frozen zipimport&amp;gt;", line 709, in _get_module_code
  File "&amp;lt;frozen zipimport&amp;gt;", line 570, in _get_data
zipimport.ZipImportError: can't decompress data; zlib not available
make: *** [install] Error 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It breaks because Python installation cannot find zlib. &lt;/p&gt;

&lt;p&gt;The solution is easy - provide CFLAGS from the xcode sdk:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ CFLAGS="-I$(xcrun --show-sdk-path)/usr/include" pyenv install 3.8.6
python-build: use openssl@1.1 from homebrew
python-build: use readline from homebrew
Downloading Python-3.8.6.tar.xz...
-&amp;gt; https://www.python.org/ftp/python/3.8.6/Python-3.8.6.tar.xz
Installing Python-3.8.6...
python-build: use readline from homebrew
python-build: use zlib from xcode sdk


Installed Python-3.8.6 to /Users/kuba/.pyenv/versions/3.8.6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>pyenv</category>
      <category>python</category>
      <category>macosx</category>
      <category>bigsur</category>
    </item>
    <item>
      <title>Getting started with Terraform on AWS</title>
      <dc:creator>Jakub T</dc:creator>
      <pubDate>Tue, 07 Jul 2020 06:26:15 +0000</pubDate>
      <link>https://dev.to/digitaldisorder/getting-started-with-terraform-on-aws-2eip</link>
      <guid>https://dev.to/digitaldisorder/getting-started-with-terraform-on-aws-2eip</guid>
      <description>&lt;p&gt;It is overwhelming at first to start managing the infrastructure using Terraform. It also seems like an overkill if you are not a big company with devops team but at the end you will thank yourself for that but it is really easier than you think. &lt;/p&gt;

&lt;h1&gt;
  
  
  Getting started
&lt;/h1&gt;

&lt;p&gt;First you need to install terraform:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brew install terraform
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Than create the directory where you will store your files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir terraform
cd terraform
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And start creating your infrastructure. &lt;/p&gt;

&lt;p&gt;Let's start by creating a simple init.tf file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;provider "aws" {
  region = "us-east-1"
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;In order for this to work you need properly configured &lt;a href="https://aws.amazon.com/cli/"&gt;aws-cli&lt;/a&gt; with the credentials saved in &lt;code&gt;~/.aws/credentials&lt;/code&gt; file. This will save you from putting your secrets into the terraform files and allow you to commit everything to the source control. &lt;/p&gt;

&lt;p&gt;Other useful options are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;profile - if you have multiple profiles in you .aws/credentials&lt;/li&gt;
&lt;li&gt;shared_credentials_file - if you want to use credentials file &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More about that: &lt;a href="https://www.terraform.io/docs/providers/aws/index.html#shared-credentials-file"&gt;https://www.terraform.io/docs/providers/aws/index.html#shared-credentials-file&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Initializing terraform
&lt;/h1&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ terraform init

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "aws" (hashicorp/aws) 2.56.0...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.aws: version = "~&amp;gt; 2.56"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h1&gt;
  
  
  Creating the first AWS resource
&lt;/h1&gt;

&lt;p&gt;As an example we will create an S3 bucket. &lt;br&gt;
In order to do that we will create the s3.tf file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_s3_bucket" "private-backups" {
  bucket = "private-backups"
  acl = "private"
  tags = {
    product = "infra"
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As you can see it is fairly straightforward and selfexplanatory.&lt;/p&gt;

&lt;p&gt;We can now see what will be executed if we apply the command by executing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tf plan
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Once validated we can just apply:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tf apply

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_s3_bucket.private-backups will be created
  + resource "aws_s3_bucket" "private-backups" {
      + acceleration_status         = (known after apply)
      + acl                         = "private"
      + arn                         = (known after apply)
      + bucket                      = "private-backups"
      + bucket_domain_name          = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + tags                        = {
          + "product" = "infra"
        }
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)

      + versioning {
          + enabled    = (known after apply)
          + mfa_delete = (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_s3_bucket.private-backups: Creating...
aws_s3_bucket.private-backups: Still creating... [10s elapsed]
aws_s3_bucket.private-backups: Still creating... [20s elapsed]
aws_s3_bucket.private-backups: Still creating... [30s elapsed]
aws_s3_bucket.private-backups: Creation complete after 30s [id=private-backups]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If we now execute the plan it will just say that nothing needs to be changed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tf plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

aws_s3_bucket.private-backups: Refreshing state... [id=private-backups]

------------------------------------------------------------------------

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The next step would be to submit the files to some source control so we keep track of the changes.&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>aws</category>
      <category>devops</category>
    </item>
    <item>
      <title>Throwaway postgres</title>
      <dc:creator>Jakub T</dc:creator>
      <pubDate>Sat, 09 May 2020 08:16:57 +0000</pubDate>
      <link>https://dev.to/digitaldisorder/throwaway-postgres-1gml</link>
      <guid>https://dev.to/digitaldisorder/throwaway-postgres-1gml</guid>
      <description>&lt;p&gt;I don't like to install too many services on my machine as after a while I end up with lots of services - Postgres, Redis, Mongo, Memcache, Elasticsearch that have lots of data from different POC projects. &lt;/p&gt;

&lt;p&gt;So whenever I need to test something quickly I just prefer to run it in a docker container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run --name postgres -d -p 5432:5432 -e POSTGRES_PASSWORD=kuba -e POSTGRES_DB=db -e POSTGRES_USER=kuba postgres:12
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Or with docker compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: '3'
services:
    postgres:
        image: postgres:12
        environment:
            - POSTGRES_USER=kuba
            - POSTGRES_PASSWORD=kuba
            - POSTGRES_DB=db
        ports:
            - "5432:5432"
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And we can quickly run the Postgres DB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ docker-compose up
Starting sql_postgres_1 ... done
Attaching to sql_postgres_1
postgres_1  |
postgres_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
postgres_1  |
postgres_1  | 2020-05-09 08:13:45.246 UTC [1] LOG:  starting PostgreSQL 12.2 (Debian 12.2-2.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
postgres_1  | 2020-05-09 08:13:45.247 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
postgres_1  | 2020-05-09 08:13:45.247 UTC [1] LOG:  listening on IPv6 address "::", port 5432
postgres_1  | 2020-05-09 08:13:45.249 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1  | 2020-05-09 08:13:45.262 UTC [25] LOG:  database system was shut down at 2020-05-09 08:12:41 UTC
postgres_1  | 2020-05-09 08:13:45.266 UTC [1] LOG:  database system is ready to accept connections
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And we have a throwaway db running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ psql postgres://kuba:kuba@localhost:5432/db
Null display is "NULL".
Pager is always used.
Timing is on.
psql (12.2)
Type "help" for help.

kuba@localhost:5432 db#
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



</description>
      <category>postgres</category>
      <category>dev</category>
      <category>docker</category>
      <category>dockercompose</category>
    </item>
    <item>
      <title>How to develop Apache Airflow DAGs in Docker Compose</title>
      <dc:creator>Jakub T</dc:creator>
      <pubDate>Tue, 05 May 2020 11:05:03 +0000</pubDate>
      <link>https://dev.to/digitaldisorder/how-to-develop-apache-airflow-dags-in-docker-compose-95m</link>
      <guid>https://dev.to/digitaldisorder/how-to-develop-apache-airflow-dags-in-docker-compose-95m</guid>
      <description>&lt;h1&gt;
  
  
  How to run a development environment on docker-compose
&lt;/h1&gt;

&lt;p&gt;Quick overview of how to run Apache airflow for development and tests on your local machine using docker-compose.&lt;/p&gt;

&lt;p&gt;We will be still using unofficial &lt;code&gt;puckel/docker-airflow&lt;/code&gt; image. There is already an official docker image but I didn't test it yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;docker &lt;/li&gt;
&lt;li&gt;docker-compose - &lt;a href="https://docs.docker.com/compose/install/"&gt;https://docs.docker.com/compose/install/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Project structure
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;docker-compose.yml - configuration file for the docker-compose&lt;/li&gt;
&lt;li&gt;dags - will contain all our dags&lt;/li&gt;
&lt;li&gt;lib - will contain all our custom code&lt;/li&gt;
&lt;li&gt;test - will contain our pytests&lt;/li&gt;
&lt;li&gt;.env - file with environment variables that we wish to include the containers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The environment variables are very handy because they allow you to customize almost everything in Airflow (&lt;a href="https://airflow.apache.org/docs/stable/best-practices.html?highlight=environment#configuration"&gt;https://airflow.apache.org/docs/stable/best-practices.html?highlight=environment#configuration&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  docker-compose.yml
&lt;/h2&gt;

&lt;p&gt;The basic structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: '2.1'
services:
    postgres:
        image: postgres:9.6
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=airflow
            - POSTGRES_DB=airflow
    webserver:
        image: puckel/docker-airflow:1.10.9
        restart: always
        mem_limit: 2048m
        depends_on:
            - postgres
        env_file:
            - .env
        environment:
            - LOAD_EX=n
            - EXECUTOR=Local
        volumes:
            - ./dags:/usr/local/airflow/dags
            - ./test:/usr/local/airflow/test
            - ./plugins:/usr/local/airflow/plugins
            # Uncomment to include custom plugins
            - ./requirements.txt:/requirements.txt
            - ~/.aws:/usr/local/airflow/.aws
        ports:
            - "8080:8080"
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As you can see we have several things there:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we allow to pass custom environment variables straight from the dotenv file (best practice is not include it in the files)&lt;/li&gt;
&lt;li&gt;we will use postgres instance running as another docker container&lt;/li&gt;
&lt;li&gt;we share our dags/test/plugins directories with the host so we can just edit our code on our local machine and run all the tests in container&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Dummy DAG
&lt;/h2&gt;

&lt;p&gt;Let's edit our first DAG: dags/dummy_dag.py&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime

with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
    op = DummyOperator(task_id='op')
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  Running the environment
&lt;/h2&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ docker-compose up

Starting airflow-on-docker-compose_postgres_1 ... done
Starting airflow-on-docker-compose_webserver_1 ... done
Attaching to airflow-on-docker-compose_postgres_1, airflow-on-docker-compose_webserver_1
[...]
webserver_1  | __init__.py:51}} INFO - Using executor [2020-05-05 10:19:08,741] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1  | LocalExecutor
webserver_1  | [2020-05-05 10:19:08,743] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Let's open the (&lt;a href="http://localhost:8080"&gt;http://localhost:8080&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ve8g2Gi9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/fgcht7jww93s9tazb4g2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ve8g2Gi9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/fgcht7jww93s9tazb4g2.png" alt="Airflow instance on docker-compose"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Running the tests in the environment
&lt;/h1&gt;

&lt;p&gt;In order to run the tests in the environment we can just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose run webserver bash
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This will give us access to the bash running in the container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;➜  airflow-on-docker-compose git:(master) ✗ docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
WARNING: You are using pip version 20.0.2; however, version 20.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
airflow@be3e69366e23:~$ ls
airflow.cfg  dags  plugins  test
airflow@be3e69366e23:~$ pytest test
bash: pytest: command not found
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Of course we didn't install pytest yet - this is very easy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ echo "pytest" &amp;gt;&amp;gt; requirements.txt
$ docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
Collecting pytest
  Downloading pytest-5.4.1-py3-none-any.whl (246 kB)
     |████████████████████████████████| 246 kB 222 kB/s
Collecting more-itertools&amp;gt;=4.0.0
  Downloading more_itertools-8.2.0-py3-none-any.whl (43 kB)
     |████████████████████████████████| 43 kB 3.1 MB/s
Collecting wcwidth
  Downloading wcwidth-0.1.9-py2.py3-none-any.whl (19 kB)
Requirement already satisfied: importlib-metadata&amp;gt;=0.12; python_version &amp;lt; "3.8" in /usr/local/lib/python3.7/site-packages (from pytest-&amp;gt;-r /requirements.txt (line 1)) (1.5.0)
Collecting packaging
  Downloading packaging-20.3-py2.py3-none-any.whl (37 kB)
Collecting pluggy&amp;lt;1.0,&amp;gt;=0.12
  Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
Collecting py&amp;gt;=1.5.0
  Downloading py-1.8.1-py2.py3-none-any.whl (83 kB)
     |████████████████████████████████| 83 kB 956 kB/s
Requirement already satisfied: attrs&amp;gt;=17.4.0 in /usr/local/lib/python3.7/site-packages (from pytest-&amp;gt;-r /requirements.txt (line 1)) (19.3.0)
Requirement already satisfied: zipp&amp;gt;=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata&amp;gt;=0.12; python_version &amp;lt; "3.8"-&amp;gt;pytest-&amp;gt;-r /requirements.txt (line 1)) (2.2.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/site-packages (from packaging-&amp;gt;pytest-&amp;gt;-r /requirements.txt (line 1)) (1.14.0)
Collecting pyparsing&amp;gt;=2.0.2
  Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
     |████████████████████████████████| 67 kB 624 kB/s
Installing collected packages: more-itertools, wcwidth, pyparsing, packaging, pluggy, py, pytest
  WARNING: The scripts py.test and pytest are installed in '/usr/local/airflow/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed more-itertools-8.2.0 packaging-20.3 pluggy-0.13.1 py-1.8.1 pyparsing-2.4.7 pytest-5.4.1 wcwidth-0.1.9
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We can implement our first basic test taken directly from (&lt;a href="https://github.com/apache/airflow/blob/master/docs/best-practices.rst"&gt;https://github.com/apache/airflow/blob/master/docs/best-practices.rst&lt;/a&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from airflow.models import DagBag

def test_dag_loading():
    dagbag = DagBag()
    dag = dagbag.get_dag(dag_id='dummy_dag')
    assert dagbag.import_errors == {}
    assert dag is not None
    assert len(dag.tasks) == 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And now we can freely run our tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;airflow@a6ca8c1b706d:~$ .local/bin/pytest
========================================================================== test session starts ==========================================================================
platform linux -- Python 3.7.6, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /usr/local/airflow
plugins: celery-4.4.0
collected 1 item

test/test_dag_loading.py .

===================================================================== 1 passed in 0.83s =====================================================================

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;All the code can be found here: &lt;a href="https://github.com/troszok/airflow-on-docker-compose"&gt;https://github.com/troszok/airflow-on-docker-compose&lt;/a&gt;&lt;/p&gt;

</description>
      <category>apacheairflow</category>
      <category>python</category>
      <category>docker</category>
      <category>dockercompose</category>
    </item>
    <item>
      <title>Accessing Tableau with Python</title>
      <dc:creator>Jakub T</dc:creator>
      <pubDate>Wed, 15 Jan 2020 09:03:19 +0000</pubDate>
      <link>https://dev.to/digitaldisorder/accessing-tableau-with-python-3gmk</link>
      <guid>https://dev.to/digitaldisorder/accessing-tableau-with-python-3gmk</guid>
      <description>&lt;p&gt;You can find the Tableau SDK Python docs: &lt;a href="https://tableau.github.io/server-client-python/docs/"&gt;https://tableau.github.io/server-client-python/docs/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In order to be able to refresh the Datasources you need to get their ids:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import tableauserverclient as TSC


USERNAME = ''
PASSWORD = ''
SITE_ID  = ''


tableau_auth = TSC.TableauAuth(USERNAME, PASSWORD, site_id=SITE_ID)
server = TSC.Server('https://dub01.online.tableau.com')
server.auth.sign_in(tableau_auth)

datasources, _ = server.datasources.get()

tags_to_refresh = {'production'}

# We want to refresh only the datasources with the production tag

datasource_ids = [datasource.id for datasource in datasources if (datasource.tags &amp;amp; tags_to_refresh)]

for datasource_id in datasource_ids:
    item = server.datasources.get_by_id(datasource_id)
    result = server.datasources.refresh(item)
    print(result)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



</description>
      <category>tableau</category>
      <category>python</category>
      <category>api</category>
    </item>
    <item>
      <title>How to run pyspark with additional Spark packages</title>
      <dc:creator>Jakub T</dc:creator>
      <pubDate>Wed, 01 Jan 2020 14:26:46 +0000</pubDate>
      <link>https://dev.to/digitaldisorder/how-to-run-pyspark-with-additional-spark-packages-5fc4</link>
      <guid>https://dev.to/digitaldisorder/how-to-run-pyspark-with-additional-spark-packages-5fc4</guid>
      <description>&lt;p&gt;When trying to run the tests of my PySpark jobs with delta.io I hit the following problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt; Caused by: java.lang.ClassNotFoundException: delta.DefaultSource
E                       at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
E                       at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
E                       at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
E                       at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
E                       at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
E                       at scala.util.Try$.apply(Try.scala:192)
E                       at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
E                       at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
E                       at scala.util.Try.orElse(Try.scala:84)
E                       at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
E                       ... 13 more

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is due to the fact the delta.io packages are not available by default in the Spark installation. &lt;/p&gt;

&lt;p&gt;When writing Spark applications in Scala you will probably add the dependencies in your build file or when launching the app you will pass it using the &lt;code&gt;--packages&lt;/code&gt; or &lt;code&gt;--jars&lt;/code&gt; command-line arguments.&lt;/p&gt;

&lt;p&gt;In order to force PySpark to install the delta packages, we can use the PYSPARK_SUBMIT_ARGS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export PYSPARK_SUBMIT_ARGS='--packages io.delta:delta-core_2.11:0.5.0 pyspark-shell'
pytest
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;then you can execute the tests as previously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pytest tests/delta_job.py

============================================================================================================================= test session starts ==============================================================================================================================
platform darwin -- Python 3.7.5, pytest-5.2.0, py-1.8.0, pluggy-0.13.0
rootdir: /Users/kuba/work/delta-jobs/
plugins: requests-mock-1.7.0, mock-1.13.0, flaky-3.6.1, cov-2.8.1
collected 4 items

tests/delta_job.py ....                                                                                                                                                                                                                              [100%]

============================================================================================================================== 4 passed in 26.32s ==============================================================================================================================
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;and everything is working as expected.&lt;/p&gt;

&lt;p&gt;Of course this way you can put any spark-submit command line argument that is available. More about it here: &lt;a href="https://spark.apache.org/docs/latest/submitting-applications.html"&gt;https://spark.apache.org/docs/latest/submitting-applications.html&lt;/a&gt; &lt;/p&gt;

</description>
      <category>spark</category>
      <category>python</category>
      <category>pyspark</category>
      <category>delta</category>
    </item>
  </channel>
</rss>
