DEV Community

Cover image for INTRODUCTION TO PYTHON FOR DATA ENGINEERING
fatumakaliku
fatumakaliku

Posted on

INTRODUCTION TO PYTHON FOR DATA ENGINEERING

Python is a high-level (makes it easy to learn and doesn't require you to understand details of a computer in order to use it), general-purpose (can be used in various domains such as web development, automation, ML, AI), interpreted (written into a source code) programming language.
python is used in data science because it is rich in mathematical tools that are required to analyze data.
python programs have extension .py and is run on the command line by python file_name.py.

Python Hello World

Here lets create our first python program, hello world program. This will require you to first create a folder and name it lets say mycode where you'll be saving your code files. Then you'll need to launch the VS code and open the folder you created, mycode.
Then create a new python file, lets name it app.py file and enter the following code and save the file.

print ('Hello World!')
Enter fullscreen mode Exit fullscreen mode

The print() statement is an inbuilt function that returns the message on your screen, here it returns the Hello World! message on the screen.

Comments.

They are written with # in the beginning.
When writing code sometimes you want to document it, you want to note why a piece of code works and you can do so using comments.
Basically you use comments to explain formulas, algorithms and complex logics. When executing python programs, the python interpreter ignores the comments and only selectively interprets the code.
Python provides three kinds of comments including block comment, inline comment, and documentation string.

  1. Python block comment These comments explain the code that follows below it and its similarly idented as the code that it explains.
# Increase price of cat by 1000
price = price + 1000

Enter fullscreen mode Exit fullscreen mode
  1. Python inline comments These are comments placed in the same line as statements.
cat = cat + 1000 # increase the cat price by 1000

Enter fullscreen mode Exit fullscreen mode
  1. Documentation string. A documentation string is a string literal that you put as the first lines in a code block, for example, a function and documentation strings are called docstrings. Technically, docstrings are not the comments but they create anonymous variables that reference the strings. Also, they’re not ignored by the Python interpreter.
def sort():
""" sort the list using sort algorithm """

Enter fullscreen mode Exit fullscreen mode

Variables

Variables are labels that you can assign values to. Their sole purpose is to label and store data in memory. This data can then be used throughout your program.

favorite_animal = "cat"
print(favorite_animal)

Enter fullscreen mode Exit fullscreen mode

The variable favorite_animal can hold various values at different times. And its value can change throughout the program.

Arithmetic Operations

print(15 + 5)  # 20 (addition)
print(11 - 9)  # 2 (subtraction)
print(4 * 4)  # 16 (multiplication)
print(4 / 2)  # 2.0 (division)
print(2 ** 8)  # 256 (exponent)
print(7 % 2)  # 1 (remainder of the division)
print(11 // 2)  # 5 (floor division)

Enter fullscreen mode Exit fullscreen mode

Comparison and Logical Operators

Python comparison operators are used to compare two values;
==, !=, >, <, >=, <=.
Python Logical Operators
-Logical operators are used to combine conditional statements:
and, or, not
Python Arithmetic Operators

  • Arithmetic operators are used with numeric values to perform common mathematical operations: +, -, , /, %, *, //

Data Types.

1. Strings

Strings in python are surrounded by either single quotation marks('), or double quotation marks(")
You can display a string literal with the print() function.

2. Numbers.

There are three numeric types in Python:

integer
float
complex

x = 1    # int
y = 2.8  # float
z = 1j   # complex
Enter fullscreen mode Exit fullscreen mode

3. Booleans.

  • Booleans represent one of two values: True or False
  • When you run a condition in an if statement, Python returns True or False
#booleans
a = 1000
b = 200

if b > a:
  print("b is greater than a")
else:
  print("b is not greater than a")

Enter fullscreen mode Exit fullscreen mode

4. Lists

Lists are used to store multiple items in a single variable.
Lists are one of 4 built-in data types in Python used to store collections of data, the other 3 are Tuple, Set, and Dictionary, all with different qualities and usage and are created using square brackets[]

# can store any data type
Multiple_types = [False, 5.7, "Hello"]

# accessed and modified
favourite_animals = ["cats", "dogs", "rabbits"]
print(favourite_animals[1]) # dogs
favourite_animal[0] = "parrots"
print(favourite_animal[0]) # parrots

# subsets
print(favourite_animals[1:3]) # ['cats', 'rabbits']
print(favourite_animals[2:]) # ['rabbits']
print(favourite_animals[0:2]) # ['parrots', 'dogs']

# append
favourite_animals.append("bunnies")

# insert at index
favourite_animals.insert(1, "horses")

# remove
favourite_animals.remove("bunnies")

Enter fullscreen mode Exit fullscreen mode

5. Dictionaries

Dictionaries are key-value pairs. They are surrounded by {}. A dictionary is a collection which is ordered*, changeable and do not allow duplicates.

thisdict = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
#access,modify,delete
print(thisdict["brand"]) # Ford
print(thisdict["model"]) # Mustang
del thisdict["year"]
Enter fullscreen mode Exit fullscreen mode

6. Loops

# With the **while** **loop** we can execute a set of statements as long as a condition is true.
i = 1
while i < 6:
  print(i)
  i += 1
# A **for** **loop** is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).
fruits = ["apple", "banana", "cherry"]
for x in fruits:
  print(x)
# Looping Through a String
for x in "banana":
  print(x)
Enter fullscreen mode Exit fullscreen mode

File I/O

The simplest way to produce output is using the print statement where you can pass zero or more expressions separated by commas.

print "Python is really a great language,", "isn't it?"
# This produces the following result on your standard screen βˆ’

Python is really a great language, isn't it?

# Python provides two built-in functions to read a line of text from standard input, which by default comes from the keyboard
str = raw_input("Enter your input: ")
print "Received input is : ", str
# I typed "Hello Python!"
Hello Python

Enter fullscreen mode Exit fullscreen mode

Top comments (0)