This is the second part of the series about "Python concepts for people who are in need of using Apache Airflow but have little or no knowledge of Python".
While Airflow concepts will be explained en passant, the main focus of these articles are Python concepts and techniques.
In this article I will focus on string concatenation or to put together a text using multiple pieces like hard coded strings, variables, and/or templates.
Why string concatenation?
Initially I wanted to dedicate this article to Python data structures (like lists and dictionaries), but to show them in practice most of my Airflow related examples where about strings.
Feel free to skip this post, or quickly skim it, if you are familiar with Python strings.
Other reasons why I wanted to focus a bit more on this topic:
- over time Python introduced multiple ways to do string concatenation, looking at code written by other people can be sometimes confusing
- it is a topic for interview questions, especially for juniors engineers or analysts
- it is easy to do it wrong (wrong = in a way which is hard to maintain)
Some people find string concatenation confusing, it can be. I hope at the end of this article you can have the tools to understand string concatenation and formatting. If not, feel free to write your feedback in the comments.
A basic dag with strings
from datetime import datetime
from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
dag_name = "string_sample"
task1_name = "start"
task2_name = "end"
my_dag = DAG(dag_id=dag_name + "_dag",
start_date=datetime(2020, 4, 29),
schedule_interval="0 0 * * *"
)
task1 = DummyOperator(dag=my_dag,
task_id="task_{}".format(task1_name)
)
task2 = DummyOperator(dag=my_dag,
task_id=f"task_{task2_name}"
)
task1 >> task2
If we look into the Airflow web UI we will see a DAG called string_sample_dag
, which consists of two tasks: task_start
and task_end
. The operators used are, again, the DummyOperator ones. This DAG is just an excuse to show you how to concatenate and format strings in Python.
The many ways to concatenate strings in Python
The lazy way, the +
The simplest way to concatenate multiple strings is to use the plus sign, +
:
dag_name + "_dag"
It works well adding strings, here another working example:
>>> text = "world"
>>> print("hello " + text)
hello world
And here is when being lazy doesn't work that well:
>>> text = 123
>>> print("hello " + text)
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: must be str, not int
The +
operator is a simple man, it sees values it tries to concatenate them. Sometimes it works, sometimes it doesn't. In our case, it is not able to join a string with an integer. To make it work we need to transform 123
in a string "123"
:
>>> print("hello " + str(text))
hello 123
Better formatting with .format()
In the first task of the DAG there is a second, better way to concatenate strings. The string method .format()
allows us to format a string that we use as template:
"task_{}".format(task1_name)
In our previous "hello world" example the template for our greeting is "hello {}"
, where instead of {}
we want to have world
, 123
or maybe John
.
>>> text = "world"
>>> print("hello {}".format(text))
hello world
And it works with numbers too:
>>> text = 123
>>> print("hello {}".format(text))
hello 123
With multiple values
>>> greeting = "hello"
>>> text = "world"
>>> print("{} {}".format(text, greeting))
world hello
Uhm, this doesn't look right. The .format()
method assigns the values to the {}
placeholders in the order they appear. Of course we could switch the order of the passed values, but what if over time we change the template?
To avoid this use ordinal numbers or, even better, name the placeholders.
First with numbers:
print("{1} {0}".format(text, greeting))
hello world
Note The first value has index 0, the second 1.
Naming the placeholders:
>>> print("{hi} {who}".format(who=text, hi=greeting))
hello world
Using names placeholders our template string is more meaningful (which will be nice reviewing this code after a while), but we need to specify which value is hi
and which is who
.
For more formatting options and examples you can take a look at the Python documentation here. You probably will not need them for your DAGs, but it is good to know where to start (beside using Stack Overflow).
The f
string
Python 3.6 introduced a new way to format strings:
>>> text = "world"
>>> print(f"hello {text}")
hello world
The f
stands for formatted string. But also for fast, because this formatting method is faster to write and faster to produce the final result.
In a formatted string the values in between {}
are expressions that will be evaluated at runtime. Therefore it is possible to do things like:
>>> print(f"{5 - 4} hello to {text.lower()}")
1 hello to world
Again, more use case and advanced option can be discovered reading the Python documentation.
Few additional things on Python strings
These are probably details that you will not use writing DAGs, but it is worth to mention to complete the basic overview on the strings.
You probably noticed that formatting a string in Python utilizes the curly braces, then how to format a string that contains curly braces? Double curly, duh:
>>> print(f"In {{text}} is {text}")
In {text} is world
What about the double quote sign "
? First, in Python you can use '
and "
for a string, and you need to use the same sign to close the string. Another why is to escape the quote sign using a \
:
>>> print('This is a double quote: "')
This is a double quote: "
>>> print("This is a double quote too: \"")
This is a double quote too: "
Finally, how to deal with very long strings. Python allows you to break a long line of code into multiple using the backslash \
:
>>> print("hello " \
... "world")
hello world
As you can see Python concatenates the two strings ignoring the new line (to achieve that you can do "hello \n"
).
But you can also create a multi line string using """
or '''
:
>>> print("""hello
>>> world""")
hello
world
More additional things
Actually there is much more to say about Python strings, concatenation, and formatting, but this will require another post (or more).
Personally I believe these things are not very interesting for someone approaching Python for the first time. But I have been wrong in the past, vote unicorn to tell me that I am wrong and you want to know more about strings (and maybe add a comment on what you find difficult with strings in Python).
Which one to use?
Using Python 3.6 or above, f
-string is the way to go: it is fast to write and easy to maintain.
But I am the first to admit that for quick debugging I still resort to the +
(when possible).
The .format()
is worth to mention because there are many examples in the wild using it or you can encounter some old code which requires your attention.
That said, I suggest you to take a look at the links to the documentation to be aware of the formatting possibility offered by Python, sooner or later you will need them.
Shameless plug
In case you need support or assistance feel free to reach out to me in the comment or direct messages. On twitter you can find me with the handler @mucio.
If you need more structured help, the nice people at Untitled Data Company (which includes me) will be happy to help you with all your data needs.
Credits
Cover photo by Jess Bailey on Unsplash.
Top comments (0)