DEV Community: Christopher Ambala

Python List Cheat Sheet

Christopher Ambala — Wed, 10 Apr 2024 12:02:20 +0000

A list is an ordered,mutable and heterogeneous collection of items

Creating a list

We use the syntax my_list

my_list = ['apple','banana','cherry']

Accessing List Items

The index number is used in reference, starting from index 0 and second item being 1.

first_item = my_list[0]
print(first_item)  #Outputs: apple

Modifying List Items

This is done by referring to their index.

my_list[1] = 'blueberry'
print(my_list) #Outputs:['apple','blueberry','cherry']

List Comprehension

It's a syntactic construct that enables list to be created from other lists.

new_list = [expression for item in iterable if condition]

expression is an operation applied to each item in the iterable that satisfies the condition.
item is a variable used to represent members of the iterable.
iterable is a sequence, collection, or an iterator object to be traversed.
condition is an optional filter that only includes item in the new_list if the condition is True.

squares = [x**2 for x in range(10) if x % 2 == 0]

This will result in squares being a list of the squares of all even numbers from 0 to 9: [0, 4, 16, 36, 64].

List comprehensions are a powerful feature of Python and can make your code more readable and efficient.

List Operations

append(): Adds an element at the end of the list.
extend(): Add the elements of a list (or any iterable), to the end of the current list.
insert(): Adds an element at the specified position.
remove(): Removes the item with the specified value.
pop(): Removes the element at the specified position.
index(): Returns the index of the first element with the specified value.
count(): Returns the number of times a value appears in the list.
sort(): Sorts the list.
reverse(): Reverses the order of the list.

# Create a list
fruits = ["apple", "banana", "cherry"]

# Add an element to the end of the list
fruits.append("orange")

# Add multiple elements to the end of the list
fruits.extend(["kiwi", "mango"])

# Add an element at a specific position
fruits.insert(1, "pineapple")

# Remove an element from the list
fruits.remove("banana")

# Remove the last element in the list
last_fruit = fruits.pop()

# Get the index of the first occurrence of an element
index_of_cherry = fruits.index("cherry")

# Count the number of times an element appears in the list
num_apples = fruits.count("apple")

# Sort the list
fruits.sort()

# Reverse the list
fruits.reverse()

List Concatenation

Can Be done in three ways;
Using the + Operator: This is the most straightforward method. The + operator can be used to add together two lists:

list1 = [1, 2, 3]
list2 = [4, 5, 6]
combined = list1 + list2  # combined is now [1, 2, 3, 4, 5, 6]

Using the extend() Method: The extend() method adds elements from another list (or any iterable) to the end of the current list:

list1 = [1, 2, 3]
list2 = [4, 5, 6]
list1.extend(list2)  # list1 is now [1, 2, 3, 4, 5, 6]

Using List Comprehension: This is a more advanced method that involves creating a new list based on existing lists. It’s a concise way to create lists based on existing lists (or other iterables)

list1 = [1, 2, 3]
list2 = [4, 5, 6]
combined = [item for sublist in [list1, list2] for item in sublist]  # combined is now [1, 2, 3, 4, 5, 6]

Dictionaries The Python Guide

Christopher Ambala — Mon, 01 Apr 2024 10:05:40 +0000

A dictionary in Python is an unordered collection of key-value pairs. Each item in the dictionary has a key, and each key maps to a unique value.

Dictionary

student = {
    "name": "John Doe",
    "age": 20,
    "grade": "Sophomore"
}

"name", "age", and "grade" are keys, and "John Doe", 20, and "Sophomore" are their respective values.

Key Characteristics

Unordered: Dictionaries are unordered, meaning the items do not have a defined order. The order of items can change over time, making the index of each item unreliable.
Mutable: Dictionaries are mutable. This means you can change, add, and remove items after the dictionary is created.
Indexed by Keys: Unlike lists, which are indexed by a range of numbers, dictionaries are indexed by keys. This key-value pair system makes dictionaries incredibly versatile for storing and organizing data.
Cannot Contain Duplicate Keys: Each key in a dictionary must be unique. If a duplicate key is assigned a value, the original key’s value will be overwritten.

Dictionary Creation

Dictionaries are created by enclosing key-value pairs in curly braces {}.
A colon : separates keys from its associate value.

person = {"name": "Alice", "age": 25}

Accessing Elements

print(person["name"])  # Output: Alice

Modifying Elements

Dictionaries are mutable. You can change the value of a specific item by referring to its key.

Dictionaries are mutable. You can change the value of a specific item by referring to its key.

Adding Elements

Adding an item to the dictionary is done by using a new index key and assigning a value to it.

person["city"] = "New York"
print(person)  # Output: {'name': 'Alice', 'age': 30, 'city': 'New York'}

Removing Elements

Use the del statement.

del person["age"]
print(person)  # Output: {'name': 'Alice', 'city': 'New York'}

Making a copy

Use the copy keyword

y = x.copy()
print(y)  # Output: {'one': 'uno', 'two': 2, 'three': 3}

Removing all items

Use the clear keyword

x.clear()
print(x)  # Output: {}

Getting the Number of Items

Use the len() keyword

x.clear()
print(x)  # Output: {}

Looping Over values

for item in y.values():
    print(item)

Using `if` statement to get values

if "one" in y:
    print(y['one'])  # Output: uno
if "two" not in y:
    print("Two not found")
if "three" in y:
    del y['three']

Python dictionaries are a flexible and efficient data type that allow you to organize and manipulate data effectively. They are a fundamental part of Python and understanding them is crucial to becoming proficient in the language. Whether you’re storing configuration settings, managing data in a web application, or even building complex data structures, Python dictionaries are an excellent tool to have in your programming toolkit.

Boolean,The Truth and False Of Python

Christopher Ambala — Mon, 25 Mar 2024 14:29:22 +0000

The Python Boolean type has only two possible values:

True
False No other value will have bool as its type. You can check the type of True and False with the built-in type():

>>> type(False)
<class 'bool'>
>>> type(True)
<class 'bool'>

Input: 1==1
Output: True 

Input: 2<1 
Output: False

Evaluate Variables and Expressions

Bools can be evaluated values and variables using the Python bool() function. This method is used to return or convert a value to a Boolean value i.e., True or False,

bool([x])

Python bool() Function

Booleans can be evaluated using the bool() function as:

# Python program to illustrate
# built-in method bool()

# Returns False as x is not equal to y
x = 5
y = 10
print(bool(x==y))

# Returns False as x is None
x = None
print(bool(x))

# Returns False as x is an empty sequence
x = ()
print(bool(x))

# Returns False as x is an empty mapping
x = {}
print(bool(x))

# Returns False as x is 0
x = 0.0
print(bool(x))

# Returns True as x is a non empty string
x = 'Greatest'
print(bool(x))

Output:

False
False
False
False
False
True

Convertion of Integers and Floats as Booleans

Numbers can be converted as bool values by using Python’s built-in bool() method. Any integer, floating-point number, or complex number having zero as a value is considered as False, while if they are having value as any positive or negative number then it is considered as True.

var1 = 0
print(bool(var1))

var2 = 1
print(bool(var2))

var3 = -9.7
print(bool(var3))

Output:

False
True
True

Boolean Operators

or
and
not
== (equivalent)
!= (not equivalent) ###Boolean OR Operator The Boolean or operator returns True if any one of the inputs is True else returns False. ###syntax:

# Python program to demonstrate
# or operator

a = 1
b = 2
c = 4

if a > b or b < c:
    print(True)
else:
    print(False)

if a or b or c:
    print("Atleast one number has boolean value as True")

Output:

True
Atleast one number has boolean value as True

Boolean And Operator

The Boolean operator returns False if any one of the inputs is False else returns True.
syntax:

# Python program to demonstrate
# and operator

a = 0
b = 2
c = 4

if a > b and b<c:
    print(True)
else:
    print(False)

if a and b and c:
    print("All the numbers has boolean value as True")
else:
    print("Atleast one number has boolean value as False")

output:

False
Atleast one number has boolean value as False

Boolean Not Operator

The Boolean Not operator only requires one argument and returns the negation of the argument i.e. returns the True for False and False for True.
syntax:

# Python program to demonstrate
# not operator

a = 0

if not a:
    print("Boolean value of a is False")

output:

Boolean value of a is False

Boolean == (equivalent) and != (not equivalent) Operator

Both operators are used to compare two results. == (equivalent operator returns True if two results are equal and != (not equivalent operator returns True if the two results are not same.
syntax:

# Python program to demonstrate
# equivalent an not equivalent
# operator

a = 0
b = 1

if a == 0:
    print(True)

if a == b:
    print(True)

if a != b:
    print(True)

output:

True
True

Python is Operator

Is is used to test whether two variables belong to the same object. The test will return True if the two objects are the same else it will return False even if the two objects are 100% equal.
syntax:

# Python program to demonstrate
# is keyword


x = 10
y = 10

if x is y:
    print(True)
else:
    print(False)

x = ["a", "b", "c", "d"]
y = ["a", "b", "c", "d"]

print(x is y)

output:

True
False

Python in Operator

in operator checks for the membership i.e. checks if the value is present in a list, tuple, range, string.
syntax:

# Python program to demonstrate
# in keyword

# Create a list
animals = ["dog", "lion", "cat"]

# Check if lion in list or not
if "lion" in animals:
    print(True)

output:

True

Python Strings In A Nutshell

Christopher Ambala — Mon, 18 Mar 2024 18:00:03 +0000

Strings are enclosed within single (' '), double (" "), or triple (''' ''' or """ """) quotes.

single_quoted = 'This is a single-quoted string.'
double_quoted = "This is a double-quoted string."
triple_quoted = '''This is a triple-quoted string.'''

Immutability

They cannot be changed in-place after they are created. For example, you can’t change a string by assigning to one of its
positions, but you can always build a new one and assign it to the same name.

Concatenation and Slicing

Python allows you to concatenate strings using the + operator, making it easy to combine text elements.

first_name = "John"
last_name = "Doe"
full_name = first_name + " " + last_name  # full_name is "John Doe"

text = "Hello, World!"
first_char = text[0]  # first_char is 'H'
substring = text[7:12]  # substring is 'World'

Every object in Python is classified as either immutable (unchangeable) or not. In terms of the core types, numbers, strings, and tuples are immutable; lists and dictionaries are
not.

String Operations

>>> match = re.match('/(.*)/(.*)/(.*)', '/usr/home/lumberjack')
>>> match.groups()
('usr', 'home', 'lumberjack')

>>> S[-1] # The last item from the end in S
'm'
>>> S[-2] # The second to last item from the end
'a'

A negative index is simply added to the string’s size,

>>> S[-1] # The last item in S
'm'
>>> S[len(S)-1] # Negative indexing, the hard way
'm'

Sequences also support a more general form of indexing,slicing, which is a way to extract an entire section.

>>> S # A 4-character string
'Spam'
>>> S[1:3] # Slice of S from offsets 1 through 2 (not 3)
'pa'

The general form, X[I:J], means “give me everything in X from offset I up to but not including offset J.” The result is returned in a new object. The second of the preceding operations, for instance, gives us all the characters in string S from offsets 1 through 2 (that is, 3 – 1) as a new string. The effect is
to slice or “parse out” the two characters in the middle.
In a slice, the left bound defaults to zero, and the right bound defaults to the length of
the sequence being sliced. This leads to some common usage variations:

Addition for(+)numbers, and concatenation for strings. This is a general property of Python that is called polymorphism.

Methods In String

>>> S.find('pa') # Find the offset of a substring
1
>>> S
'Spam'
>>> S.replace('pa', 'XYZ') # Replace occurrences of a substring with another
'SXYZm'
>>> S
'Spam'

>>> S[1:] # Everything past the first (1:len(S))
'pam'
>>> S # S itself hasn't changed
'Spam'
>>> S[0:3] # Everything but the last
'Spa'
>>> S[:3] # Same as S[0:3]
'Spa'
>>> S[:-1] # Everything but the last again, but simpler (0:-1)
'Spa'
>>> S[:] # All of S as a top-level copy (0:len(S))
'Spam'

Strings also support concatenation with the plus sign and repetition :

>>> S
Spam'
>>> S + 'xyz' # Concatenation
Strings | 81
'Spamxyz'
>>> S # S is unchanged
'Spam'
>>> S * 8 # Repetition
'SpamSpamSpamSpamSpamSpamSpamSpam'

>>> line = 'aaa,bbb,ccccc,dd'
>>> line.split(',') # Split on a delimiter into a list of substrings
['aaa', 'bbb', 'ccccc', 'dd']
>>> S = 'spam'
>>> S.upper() # Upper- and lowercase conversions
'SPAM'
>>> S.isalpha() # Content tests: isalpha, isdigit, etc.
True
>>> line = 'aaa,bbb,ccccc,dd\n'
>>> line = line.rstrip() # Remove whitespace characters on the right side
>>> line
'aaa,bbb,ccccc,dd'

Pattern Matching

This module has analogous calls for searching, splitting, and replacement, but because we can use patterns to specify substrings, we can be much more general:

>>> import re
>>> match = re.match('Hello[ \t]*(.*)world', 'Hello Python world')
>>> match.group(1)
'Python '

This example searches for a substring that begins with the word “Hello,” followed by zero or more tabs or spaces, followed by arbitrary characters to be saved as a matched group, terminated by the word “world.”The following pattern, picks out three groups separated by slashes:

>>> match = re.match('/(.*)/(.*)/(.*)', '/usr/home/lumberjack')
>>> match.groups()
('usr', 'home', 'lumberjack')

The 123 of Python Integers

Christopher Ambala — Mon, 11 Mar 2024 13:39:15 +0000

Today we look at core python data types which are;

Integer data type

Integers in Python are represented by the int data type (the abbreviation int comes from the word integer). To determine an integer of type int, a sequence of digits from 0 to 9 is used.

An explicitly specified numeric value in the program code is called an integer literal. When Python encounters an integer literal, it creates an int object that stores the specified value.

n = 17 # integer literal
m = 7 # integer literal

The integer data type int is used not only because it occurs in the real world, but also because it naturally occurs when creating most programs.

Converting a string to an integer

To convert a string to an integer, we use the int() command:

num = int(input()) # converting a read string to an integer

It is not necessary to use the input() command to convert a string to an integer.

The following code converts the string 12345 to an integer:

n = int('12345') # convert string to integer

If the string is not a number, an error will occur during the conversion.

Integer operators

Python provides four basic arithmetic operators for working with integers (+, −, , /), and also three additional ones (% for remainder, // for integer division and * for exponentiation).

The following program demonstrates all integer operators:

a = 13
b = 7

total = a + b
diff = a - b
prod = a * b
div1 = a / b
div2 = a // b
mod = a % b
exp = a ** b

print(a, '+', b, '=', total)
print(a, '-', b, '=', diff)
print(a, '*', b, '=', prod)
print(a, '/', b, '=', div1)
print(a, '//', b, '=', div2)
print(a, '%', b, '=', mod)
print(a, '**', b, '=', exp)

As a result of the operation of such a program , it will be output:

13 + 7 = 20
13–7 = 6
13 * 7 = 91
13 / 7 = 1.8571428571428572
13 // 7 = 1
13 % 7 = 6
13 ** 7 = 62748517

With the usual division (/), a number that is not an integer is obtained. Dividing by zero leads to an error.

Long arithmetic

A distinctive feature of the Python language is the unlimited integer data type. In fact, the size of the number depends only on the availability of free memory on the computer. This distinguishes Python from languages such as C++, C, C#, Java where variables of the whole data type have a limitation. For example, in C#, the range of integers is limited from −263−263 to 263–1263–1.

atom = 10 ** 80 # number of atoms in the universe
print('Number of atoms =', atom)

The result of the program will be a number with 81 digits:

Number of atoms = 100000000000000000000000000000000000000000

Separator character

For easy reading of numbers, you can use the underscore character:

num1 = 25_000_000
num2 = 25000000
print(num1)
print(num2)

The result of executing such code will be:

25000000
25000000

Floating point numbers

Along with integers in Python, it is possible to work with fractional (real) numbers. So, for example, the numbers 2/3, π — are real and the integer type int is not enough to represent them.

Fractional (real) numbers in computer science are called float point numbers.

Python uses the float data type to represent floating-point numbers.

e = 2.71828 #floating point literal
pi = 3.1415 #floating point literal

Unlike mathematics, where the separator is a comma, in computer science a dot is used.

Converting a string to a floating point number

To convert a string to a floating-point number, we use the float() command:

num = float(input()) #converting a read string to a floating-point number

It is not necessary to use the input() command to convert a string to a floating-point number.

The following code converts the string 1.2345 to a floating point number:

n = float('1.2345') #converting a string to a floating-point number

If the string is not a number, an error will occur during the conversion.

Arithmetic operators

Python provides four basic arithmetic operators for working with floating−point numbers (+, -, , /) and one additional (* for exponentiation).

The following program demonstrates arithmetic operators:

a = 13.5
b = 2.0

total = a + b
diff = a - b
prod = a * b
div = a / b
exp = a ** b

print(a, '+', b, '=', total)
print(a, '-', b, '=', diff)
print(a, '*', b, '=', prod)
print(a, '/', b, '=', div)
print(a, '**', b, '=', exp)

As a result of the operation of such a program , it will be output:

13.5 + 2.0 = 15.5
13.5–2.0 = 11.5
13.5 * 2.0 = 27.0
13.5 / 2.0 = 6.75
13.5 ** 2.0 = 182.25

Dividing by zero leads to an error.

Conversion between int and float

Implicit conversion. Any integer (int type) can be used where a floating-point number (float type) is expected, because Python automatically converts integers to floating-point numbers if necessary.

Explicit conversion. A floating-point number cannot be implicitly converted to an integer. For such a conversion, it is necessary to use an explicit conversion using the int() command.

num1 = 17.89
num2 = -13.56
num3 = int(num1)
num4 = int(num2)

print(num3)
print(num4)

The result of executing such code will be:

17
-13

Note that the conversion of floating-point numbers to an integer is performed with rounding towards zero, that is int(1.7) = 1, int(-1.7) = -1.For further reading refer to the Python Documentation

Python Tuples

Christopher Ambala — Mon, 04 Mar 2024 08:55:08 +0000

It's been a while since my last post😐,But I'm back with interesting python topics as I learn😃
Today we look at core python data types which are;

Tuples
Int
Lists
Dictionaries
Numbers
sets
File

Tuples

Tuples are ordered collections of heterogeneous data that are unchangeable.
They have the following characteristics;

Ordered: Tuples are part of sequence data types, which means they hold the order of the data insertion. It maintains the index value for each item.
Unchangeable: Tuples are unchangeable, which means that we cannot add or delete items to the tuple after creation.
Heterogeneous: Tuples are a sequence of data of different data types (like integer, float, list, string, etc;) and can be accessed through indexing and slicing.
Contains Duplicates: Tuples can contain duplicates, which means they can have items with the same value.

Creating tuples
They are created using () or the built in function tuple ()

# Using empty parentheses
mytuple = ()

# Using tuple() function
mytuple = tuple()

Creating a tuple with elements.

A tuple is created by placing all the items (elements) inside parentheses, separated by commas. The parentheses are optional, however, it is a good practice to use them.

A tuple can have any number of items and they may be of different types (integer, float, list, string, etc.).

# Different types of tuples
# Empty tuple
my_tuple = ()
print(my_tuple)

# Tuple having integers
my_tuple = (1, 2, 3)
print(my_tuple)

# tuple with mixed datatypes
my_tuple = (1, "Hello", 3.4)
print(my_tuple)

# nested tuple
my_tuple = ("mouse", [8, 4, 6], (1, 2, 3))
print(my_tuple)

Output

()
(1, 2, 3)
(1, 'Hello', 3.4)
('mouse', [8, 4, 6], (1, 2, 3))

Using the `tuple()` constructor.

tuple([iterable])

Tuple packing and unpacking:

Packing: Packing is the process of putting values into a tuple. You can create a tuple by separating values with commas, and Python will automatically pack them into a tuple.

# Packing
my_tuple = 1, 2, 'three', 4.0
print(my_tuple)  # Output: (1, 2, 'three', 4.0)

Unpacking: Unpacking is the process of extracting values from a tuple. You can assign the elements of a tuple to multiple variables in a single line.

# Unpacking
a, b, c, d = my_tuple
print(a)  # Output: 1
print(b)  # Output: 2
print(c)  # Output: 'three'
print(d)  # Output: 4.0

Accessing tuples by index.

To access an item through its index, you can use the following syntax:

tuple_object[index]

# Creating a tuple
my_tuple = (1, 2, 'three', 4.0)

# Accessing elements using indexing
first_element = my_tuple[0]
second_element = my_tuple[1]
third_element = my_tuple[2]
fourth_element = my_tuple[3]

# Printing the elements
print("First element:", first_element)   # Output: 1
print("Second element:", second_element)  # Output: 2
print("Third element:", third_element)    # Output: 'three'
print("Fourth element:", fourth_element)  # Output: 4.0

using a negative index;

# Accessing elements using negative indexing
last_element = my_tuple[-1]    # Equivalent to my_tuple[3]
second_last = my_tuple[-2]     # Equivalent to my_tuple[2]

# Printing the elements
print("Last element:", last_element)       # Output: 4.0
print("Second last element:", second_last)  # Output: 'three'

Retrieving Multiple elements in tuples.

tuple_object[start:stop:step]

start: The index where the slice begins.
stop: The index where the slice ends (exclusive).
step: The step or stride between elements.

my_tuple = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# Slicing from index 2 to index 7 (exclusive) with a step of 2
sliced_tuple = my_tuple[2:7:2]

print(sliced_tuple)

#Output

(3, 5, 7)

What happens if you index out of range.

If you use an index greater than or equal to the tuple’s length, then you get an IndexError exception:

my_tuple = (1, 2, 'three', 4.0)
print(my_tuple[5])


Traceback (most recent call last):
    ...
IndexError: tuple index out of range

Accessing elements within a nested tuple.

Accessing elements within nested tuples involves using multiple levels of indexing. Each level of nesting requires an additional set of square brackets to access the desired element.

# Creating a nested tuple
nested_tuple = (1, 2, (3, 4), ('five', 6))

# Accessing elements in the nested tuple
first_element = nested_tuple[0]
third_element_nested = nested_tuple[2]
first_element_nested = nested_tuple[2][0]
second_element_nested = nested_tuple[3][1]

# Printing the accessed elements
print("First element:", first_element)                 
 # Output: 1

print("Third element (nested tuple):", third_element_nested) 
 # Output: (3, 4)

print("First element of the nested tuple:", first_element_nested) 
 # Output: 3

print("Second element of the nested tuple:", second_element_nested) 
 # Output: 6

Tuple concatenation

You can concatenate two tuples in Python using the + operator. The result will be a new tuple that contains the elements of both original tuples.

pythonCopy code

# Two tuples to be concatenated
tuple1 = (1, 2, 3)
tuple2 = ('four', 'five', 'six')

# Concatenating the two tuples
concatenated_tuple = tuple1 + tuple2

# Printing the concatenated tuple
print("Concatenated Tuple:", concatenated_tuple)

#Output

Concatenated Tuple: (1, 2, 3, 'four', 'five', 'six')

Compare two tuples

In Python, you can compare two tuples using the comparison operators (==, !=, <, >, <=, >=). The comparison is performed element-wise, starting from the first element, and stops as soon as a decisive result is reached.

tuple1 = (1, 2, 3)
tuple2 = (1, 2, 4)

# Equality check
print("tuple1 == tuple2:", tuple1 == tuple2)  # Output: False


# Inequality check
print("tuple1 != tuple2:", tuple1 != tuple2)  # Output: True


# Less than check
print("tuple1 < tuple2:", tuple1 < tuple2)    # Output: True

# Greater than check
print("tuple1 > tuple2:", tuple1 > tuple2)    # Output: False


# Less than or equal to check
print("tuple1 <= tuple2:", tuple1 <= tuple2)  # Output: True


# Greater than or equal to check
print("tuple1 >= tuple2:", tuple1 >= tuple2)  # Output: False

Using tuple packing and unpacking to return multiple values from a function

Tuple packing and unpacking in Python can be used to return multiple values from a function. This is a convenient way to bundle multiple values together and then easily unpack them when needed.

def get_coordinates():
    x = 10
    y = 20
    z = 30
    # Tuple packing
    return x, y, z


# Function call
result = get_coordinates()

# Result is a tuple
print(result)  


# Output: (10, 20, 30)

Tuple Unpacking: When calling the function, you can unpack the returned tuple into individual variables:

def get_coordinates():
    x = 10
    y = 20
    z = 30
    return x, y, z

# Tuple unpacking
x_result, y_result, z_result = get_coordinates()

# Individual values
print("X:", x_result)  

# Output: 10

print("Y:", y_result) 

 # Output: 20

print("Z:", z_result) 

 # Output: 30

This is just a brief introduction into Tuples For further reading this Python Documentation will be in depth of what I have covered.

Data Engineering For Beginners.

Christopher Ambala — Tue, 31 Oct 2023 09:54:00 +0000

So you want to break into data engineering? Start today by learning more about data engineering and the fundamental concepts.

Data Engineering encompasses the set of all processes that collect and integrate raw data from various resources—into a unified and accessible data repository—that can be used for analytics and other applications.
What Does a Data Engineer Do?

Extracting and integrating data from a variety of sources—data collection.
Preparing the data for analysis: processing the data by applying suitable transformations to prepare the data for analysis and other downstream tasks. Includes cleaning, validating, and transforming data.
Designing, building, and maintaining data pipelines that encompass the flow of data from source to destination.
Design and maintain infrastructure for data collection, processing, and storage—infrastructure management.

Data Engineering Concepts
we have incoming data from all resources across the spectrum: from relational databases and web scraping to news feeds and user chats. The data coming from these sources can be classified into one of the three broad categories:

Structured data-Has a well-defined schema(schema. Data in relational databases, spreadsheets, and the like)
Semi-structured data-Has some structure but no rigid schema. Typically has metadata tags that provide additional information.(Include JSON and XML data, emails, zip files)
Unstructured data-Lacks a well-defined schema.(Images, videos and other multimedia files, website data)

Data Repositories: Data Warehouses, Data Lakes, and Data Marts

Before we take a deep dive,we'll learn about two data processing systems, namely, OLTP and OLAP systems:

OLTP
or Online Transactional Processing systems are used to store day-to-day operational data for applications such as inventory management. OLTP systems include relational databases that store data that can be used for analysis and deriving business insights.
OLAP
or Online Analytical Processing systems are used to store large volumes of historical data for carrying out complex analytics. In addition to databases, OLAP systems also include data warehouses and data lakes.

Data warehouses: A data warehouse refers to a single comprehensive store house of incoming data.

Data lakes: Data lakes allow to store all data types—including semi-structured and unstructured data—in their raw format without processing them. Data lakes are often the destination for ELT processes.

Data mart:You can think of data mart as a smaller subsection of a data warehouse—tailored for a specific business use case common.

Data lake houses: Recently, data lake houses are also becoming popular as they allow the flexibility of data lakes while offering the structure and organization of data warehouses.

Data Pipelines: ETL and ELT Processes

Data pipelines encompass the journey of data—from source to the destination systems—through ETL and ELT processes.

ETL—Extract, Transform, and Load—process includes the following steps:

Extract data from various sources
Transform the data—clean, validate, and standardize data
Load the data into a data repository or a destination application

ETL processes often have a data warehouse as the destination.

ELT—Extract, Load, and Transform—is a variation of the ETL process where instead of extract, transform, and load, the steps are in the order: extract, load, and transform.

Meaning the raw data collected from the source is loaded to the data repository—before any transformation is applied. This allows us to apply transformations specific to a particular application. ELT processes have data lakes as their destination.

Tools Data Engineers Should Know

Programming language: Intermediate to advanced proficiency in a programming language preferably one of Python, Scalar, and Java
Databases and SQL: Good understanding of database design and ability to work with databases both relational databases such as MySQL and PostgreSQL and non-relational databases such as MongoDB
Command-line fundamentals: Familiarity which Shell scripting and data processing and the command line
Knowledge of operating systems and networking
Data warehousing fundamentals
Fundamentals of distributed systems

Data engineering also requires strong software engineering skills including version control, logging, and application monitoring. You should also know how you use containerization tools like Docker and container orchestration tools like Kubernetes.

dbt (data build tool) for analytics engineering
Apache Sparkfor big data analysis and distributed data processing
Airflow for data pipeline orchestration
Fundamentals of cloud computing and working with at least one cloud provider such asAWS or Microsoft Azure.

Data Engineering For Beginners.

Christopher Ambala — Tue, 31 Oct 2023 09:53:59 +0000

So you want to break into data engineering? Start today by learning more about data engineering and the fundamental concepts.

Extracting and integrating data from a variety of sources—data collection.
Preparing the data for analysis: processing the data by applying suitable transformations to prepare the data for analysis and other downstream tasks. Includes cleaning, validating, and transforming data.
Designing, building, and maintaining data pipelines that encompass the flow of data from source to destination.
Design and maintain infrastructure for data collection, processing, and storage—infrastructure management.

Structured data-Has a well-defined schema(schema. Data in relational databases, spreadsheets, and the like)
Semi-structured data-Has some structure but no rigid schema. Typically has metadata tags that provide additional information.(Include JSON and XML data, emails, zip files)
Unstructured data-Lacks a well-defined schema.(Images, videos and other multimedia files, website data)

Data Repositories: Data Warehouses, Data Lakes, and Data Marts

Before we take a deep dive,we'll learn about two data processing systems, namely, OLTP and OLAP systems:

OLTP
or Online Transactional Processing systems are used to store day-to-day operational data for applications such as inventory management. OLTP systems include relational databases that store data that can be used for analysis and deriving business insights.
OLAP
or Online Analytical Processing systems are used to store large volumes of historical data for carrying out complex analytics. In addition to databases, OLAP systems also include data warehouses and data lakes.

Data warehouses: A data warehouse refers to a single comprehensive store house of incoming data.

Data mart:You can think of data mart as a smaller subsection of a data warehouse—tailored for a specific business use case common.

Data lake houses: Recently, data lake houses are also becoming popular as they allow the flexibility of data lakes while offering the structure and organization of data warehouses.

Data Pipelines: ETL and ELT Processes

Data pipelines encompass the journey of data—from source to the destination systems—through ETL and ELT processes.

ETL—Extract, Transform, and Load—process includes the following steps:

Extract data from various sources
Transform the data—clean, validate, and standardize data
Load the data into a data repository or a destination application

ETL processes often have a data warehouse as the destination.

ELT—Extract, Load, and Transform—is a variation of the ETL process where instead of extract, transform, and load, the steps are in the order: extract, load, and transform.

Tools Data Engineers Should Know

Programming language: Intermediate to advanced proficiency in a programming language preferably one of Python, Scalar, and Java
Databases and SQL: Good understanding of database design and ability to work with databases both relational databases such as MySQL and PostgreSQL and non-relational databases such as MongoDB
Command-line fundamentals: Familiarity which Shell scripting and data processing and the command line
Knowledge of operating systems and networking
Data warehousing fundamentals
Fundamentals of distributed systems

dbt (data build tool) for analytics engineering
Apache Sparkfor big data analysis and distributed data processing
Airflow for data pipeline orchestration
Fundamentals of cloud computing and working with at least one cloud provider such asAWS or Microsoft Azure.

TIME SERIES ANALYSIS.

Christopher Ambala — Wed, 25 Oct 2023 06:34:13 +0000

Time Series Analysis is a way of studying the characteristics of the response variable concerning time as the independent variable.
To estimate the target variable in predicting or forecasting, use the time variable as the reference point.
TSA represents a series of time-based orders, it would be Years, Months, Weeks, Days, Horus, Minutes, and Seconds. It is an observation from the sequence of discrete time of successive intervals.
Some real-world application of TSA includes weather forecasting models, stock market predictions, signal processing, and control systems.

What Is Time Series Analysis?
Time series analysis is a specific way of analyzing a sequence of data points collected over time. In TSA, analysts record data points at consistent intervals over a set period rather than just recording the data points intermittently or randomly.

Objectives of Time Series Analysis

To understand how time series works and what factors affect a certain variable(s) at different points in time.
Time series analysis will provide the consequences and insights of the given dataset’s features that change over time.
Supporting to derive the predicting the future values of the time series variable.
Assumptions: There is only one assumption in TSA, which is “stationary,” which means that the origin of time does not affect the properties of the process under the statistical factor.

How to Analyze Time Series?
To perform the time series analysis, we have to follow the following steps:

Collecting the data and cleaning it
Preparing Visualization with respect to time vs key feature
Observing the stationarity of the series
Developing charts to understand its nature.
Model building – AR, MA, ARMA and ARIMA
Extracting insights from prediction

Significance of Time Series
TSA is the backbone for prediction and forecasting analysis, specific to time-based problem statements.

Analyzing the historical dataset and its patterns
Understanding and matching the current situation with patterns derived from the previous stage.
Understanding the factor or factors influencing certain variable(s) in different periods.

With “Time Series,” we can prepare numerous time-based analyses and results.

Forecasting: Predicting any value for the future.
Segmentation: Grouping similar items together.
Classification: Classifying a set of items into given classes.
Descriptive analysis: Analysis of a given dataset to find out what is there in it.
Intervention analysis: Effect of changing a given variable on the outcome.

Components of Time Series Analysis

Trend: In which there is no fixed interval and any divergence within the given dataset is a continuous timeline. The trend would be Negative or Positive or Null Trend
Seasonality: In which regular or fixed interval shifts within the dataset in a continuous timeline. Would be bell curve or saw tooth
Cyclical: In which there is no fixed interval, uncertainty in movement and its pattern
Irregularity: Unexpected situations/events/scenarios and spikes in a short time span.

What Are the Limitations of Time Series Analysis?

Similar to other models, the missing values are not supported by TSA
The data points must be linear in their relationship.
Data transformations are mandatory, so they are a little expensive.
Models mostly work on Uni-variate data.

Data Types of Time Series
There are two major types – stationary and non-stationary.

Stationary: A dataset should follow the below thumb rules without having Trend, Seasonality, Cyclical, and Irregularity components of the time series.

The mean value of them should be completely constant in the data during the analysis.
The variance should be constant with respect to the time-frame
Covariance measures the relationship between two variables.

Non- Stationary: If either the mean-variance or covariance is changing with respect to time, the dataset is called non-stationary.

Time Series Data Models.

Autoregressive (AR) models AR model is a representation of a type of random process, which is why it is used for data describing time-varying processes, such as changes in weather, economics, etc.
Integrated (I) models Integrated models are series with random walk components. They are called integrated because these series are the sums of weakly-stationary components.
Moving-average (MA) models Moving-average models are used for modeling univariate time series. In MA models, the output variable depends linearly on the current and various past values of an imperfectly predictable (stochastic) term.

These three classes in various combinations produce the following three commonly used in time series data analytics models.

Autoregressive moving average (ARMA) models ARMA models combine AR and MA classes, where AR part involves regressing the variable on its own past values, while MA part is used to model the error term as a linear combination of error terms occurring contemporaneously and at various times in the past. ARMA models are frequently used for analytics and predicting future values in a series.
Autoregressive integrated moving average (ARIMA) models ARIMA models are a generalization of an ARMA model and are used in cases where data show evidence of non-stationarity qualities, where an initial differencing step, corresponding to the integrated part of the model, can be applied one or more times to eliminate the non-stationarity of the mean function. Both ARMA and ARIMA models are frequently used for analytics and predicting future values in a series.
Autoregressive fractionally integrated moving average (ARFIMA) models ARFIMA model, in its turn, generalizes ARIMA models (or, basically, all the three basic classes) by allowing non-integer values of the differencing parameter. ARFIMA models are frequently used for modeling so-called long memory time series where deviations from the long-run mean decay slower than an exponential decay. When it comes to nonlinear time series models, there are a number of models that represent changes in variability over time that are predicted by or related to recent past values of the observed series.
Autoregressive conditional heteroscedasticity (ARCH) models ARCH is one such model, which describes the variance of the current error term or innovation as a function of the actual sizes of error terms.in the previous time periods.

Exploratory Data Analysis: Data Visualization

Christopher Ambala — Sun, 08 Oct 2023 15:03:05 +0000

In this article, we’ll use data visualization to explore a dataset from Streeteasy which contains information about housing rentals in New York City.

Exploratory Data Analysis (EDA) is a process of describing the data by means of statistical and visualization techniques in order to bring important aspects of that data into focus for further analysis.

Univariate analysis
Univariate analysis focuses on a single variable at a time. Univariate data visualizations can help us answer questions like:

What is the typical price of a rental in New York City?
What proportion of NYC rentals have a gym?
Depending on the type of variable (quantitative or categorical) we want to visualize, we need to use slightly different visualizations.

Quantitative variables
Box plots (or violin plots) and histograms are common choices for visually summarizing a quantitative variable. These plots are useful because they simultaneously communicate information about minimum and maximum values, central location, and spread. Histograms can additionally illuminate patterns that can impact an analysis (eg., skew or multimodality).

For example, suppose we are interested in learning more about the price of apartments in NYC. A good starting place is to plot a box plot of the rent variable. We could plot a boxplot of rent as follows:

# Load libraries
import seaborn as sns
import matplotlib.pyplot as plt 

# Create the plot
sns.boxplot(x='rent', data=rentals)
plt.show()

We can see that most rental prices fall within a range of $2500-$5000; however, there are many outliers, particularly on the high end. For more detail, we can also plot a histogram of the rent variable.

# Create a histogram of the rent variable
sns.displot(rentals.rent, bins=10, kde=False)
plt.show()

The histogram highlights the long right-handed tail for rental prices. We can get a more detailed look at this distribution by increasing the number of bins:

# Create a histogram of the rent variable
sns.displot(rentals.rent, bins=50, kde=False)
plt.show()

Categorical variables
For categorical variables, we can use a bar plot (instead of a histogram) to quickly visualize the frequency (or proportion) of values in each category. For example, suppose we want to know how many apartments are available in each borough. We can visually represent that information as follows:

# Create a barplot of the counts in the borough variable
# The palette parameter will set the color scheme for the plot
sns.countplot(x='borough', data=rentals, palette='winter')
plt.show()

Bivariate analysis
In many cases, a data analyst is interested in the relationship between two variables in a dataset. For example:

Do apartments in different boroughs tend to cost different amounts?
What is the relationship between the area of an apartment and how much it costs?
Depending on the types of variables we are interested in, we need to rely on different kinds of visualizations.

One quantitative variable and one categorical variable
Two good options for investigating the relationship between a quantitative variable and a categorical variable are side-by-side box plots and overlapping histograms.

For example, suppose we want to understand whether apartments in different boroughs cost different amounts. We could address this question by plotting side by side box plots of rent by borough:

# Create a box plot of the borough variable relative to rent
sns.boxplot(x='borough', y='rent', data=rentals, palette='Accent')
plt.show()

This plot indicates that rental prices in Manhattan tend to be higher and have more variation than rental prices in other boroughs. We could also investigate the same question in more detail by looking at overlapping histograms of rental prices by borough:

plt.hist(rentals.rent[rentals.borough=='Manhattan'], label='Manhattan', density=True, alpha=.5)
plt.hist(rentals.rent[rentals.borough=='Queens'], label='Queens', density=True, alpha=.5)
plt.hist(rentals.rent[rentals.borough=='Brooklyn'], label='Brooklyn', density=True, alpha=.5)
plt.legend()
plt.show()

Two quantitative variables
A scatter plot is a great option for investigating the relationship between two quantitative variables. For example, if we want to explore the relationship between rent and size_sqft, we could create a scatter plot of these two variables:

# Create a scatterplot of the size_sqft variable relative to rent
sns.scatterplot(rentals.size_sqft, rentals.rent)
plt.show()

The plot indicates that there is a strong positive linear relationship between the cost to rent a property and its square footage. Larger properties tend to cost more money.

Two categorical variables
Side by side (or stacked) bar plots are useful for visualizing the relationship between two categorical variables. For example, suppose we want to know whether rentals that have an elevator are more likely to have a gym. We could plot a side by side bar plot as follows:

sns.countplot(x='has_elevator', hue='has_gym', data=rentals)
plt.show()

This plot tells us that buildings with elevators are approximately equally likely to have a gym or not have a gym; meanwhile, apartments without elevators are very unlikely to have a gym.

Multivariate analysis
Sometimes, a data analyst is interested in simultaneously exploring the relationship between three or more variables in a single visualization. Many of the visualization methods presented up to this point can include additional variables by using visual cues such as colors, shapes, and patterns. For example, we can investigate the relationship between rental price, square footage, and borough by using color to introduce our third variable:

sns.scatterplot(rentals.size_sqft, rentals.rent, hue = rentals.borough, palette='bright')
plt.show()

Another common data visualization for multivariate analysis is a heat map of a correlation matrix for all quantitative variables:

# Define the colormap which maps the data values to the color space defined with the diverging_palette method  
colors = sns.diverging_palette(150, 275, s=80, l=55, n=9, as_cmap=True)

# Create heatmap using the .corr method on df, set colormap to cmap
sns.heatmap(rentals.corr(), center=0, cmap=colors, robust=True)
plt.show()

Conclusion
In this article, I’ve summarized some of the important considerations for choosing a data visualization based on the question a data analyst wants to answer and the type of data that is available.

Between linear regression and random forest regression, which model would perform better and why?

Christopher Ambala — Sun, 08 Oct 2023 13:36:55 +0000

Let's look at the two in deapth to better understand which of the two works better.

Random forest regression is based on the ensemble machine learning technique of bagging. The two key concepts of random forests are:

Random sampling of training observations when building trees.
Random subsets of features for splitting nodes

Random forest regressions also discretize continuous variables since they are based on decision trees, which function through recursive binary partitioning at the nodes. This effectively means that we can split not only categorical variables, but also split continuous variables. Additionally, with enough data and sufficient splits, a step function with many small steps can approximate a smooth function for predicting an output.

Linear regression on the other hand is the standard regression technique in which relationships are modeled using a linear predictor function, the most common example of y = Ax + B. Linear regression models are often fitted using the least-squares approach.

There are also four main assumptions in linear regression:

A normal distribution of error terms
Independence in the predictors
The mean residuals must equal zero with constant variance
No correlation between the features So how do we differentiate between random forest regression and linear regression independent of the problem statement?

The difference between random forest regression versus standard regression techniques for many applications are:

Random forest regression can approximate complex nonlinear shapes without a prior specification. Linear regression performs better when the underlying function is linear and has many continuous predictors.
Random forest regression allows the use of arbitrarily many predictors (more predictors than data points is possible)
Random forest regression can also capture complex interactions between predictions without a prior specification
Both will give some semblance of a “feature importance. linear regression feature importance is much more interpretable than random forest given the linear regression coefficient values attached to each predictor.

Conclusion.
Random Forest tends to perform better than Linear Regression when: The data has a large number of features. The data has complex, non-linear relationships. The data contains missing values or outliers

Data Science Roadmap.

Christopher Ambala — Fri, 29 Sep 2023 13:41:23 +0000

Every organization has data stored in their service about their customers. They need to take advantage of this data to improve their service, better manage their marketing campaigns, but this is possible only by data scientists since they have the skills in math, programming, statistics to organize this extensive data and apply their knowledge to find hidden solutions in this data.

This article will show you the resources you need to learn to become a data scientist.

1. Learn Python Language.

This journey starts with learning this fabulous programming language called python which almost every person who works as a data scientist should understand very well. This language is used a lot when working with data, such as collecting data from resources such as web scraping or the database. You will also need to visualize them and create a machine learning model for prediction.

2. Data Processing & Visualization
You can define data visualization as the process of converting your dataset after cleaning it into charts that have meaning and can drive decisions for offering better services, better user experience, understanding more about your customers, and the list is endless. There are a lot of data processing and visualization libraries that work with python, and let’s first explore two of the best data processing libraries:

2.1. Numpy
This is a python library developed to work with arrays. Numpy can use it for mathematical calculation, which is very important for knowing if you are a data scientist. It's also one of the essential Python library every Machine Learning Engineer and Data Scientis should learn.

2.2. Pandas
This is used for working with tabular data such as CSV files, importing your data from different resources, and it is used a lot for data analysis and cleaning your data before using it.

2.3. Matplotlib
This is the most common and used python library for data visualization. It can create some fantastic graphs and charts with simple programming commands. It supports 3D visualizing, which makes it perfect for this purpose. Data Scientist and ML Engineer you should learn Matplotlib in 2023 along with NumPy and Pandas.

2.4. Tableau
Tableau is a data visualization tool that doesn’t need any programming skills to use, and it is used a lot in the business intelligence industry. Non-technical people can use it for making customized dashboards.

2.5. Power BI
Microsoft Power BI is a cloud-based data analytic and visualization service with a more incredible speed and efficiency offered by Microsoft. Many versions also work on the phone and desktop.

These are the best libraries and tools used among data scientists in their daily routine, but you explore more others if you want, such as Plotly and leaflet.

3. Learn Math
You don’t need to have excellent skills in math to be a data scientist. Still, it would be best to have a basic understanding of math, such as linear algebra, calculus, probabilities, and statistics.

These skills will be beneficial when working with data, such as transforming it into another shape or performing operations using a numpy library.

4. Machine Learning
Machine learning can be very useful if you want to become a data scientist since it will help you make predictions and it can make the machine take the right decisions without any human intervention.

4.1. Tensorflow
This is an open-source artificial intelligence library developed by Google and used a lot in deep learning models where you need to analyze a large amount of data.

4.2. Scikit-Learn

This is the most used library among machine learning engineers and data scientists, which can be very useful in a small amount of data and easy to use compared to Tensorflow.

Conclusion
This is an overview of the data science roadmap. You can learn more about programming languages used among data scientists such as R language, and deep dive into more about machine learning & deep learning.

DEV Community: Christopher Ambala

Python List Cheat Sheet

Creating a list

Accessing List Items

Modifying List Items

List Comprehension

List Operations

List Concatenation

Dictionaries The Python Guide

Dictionary

Key Characteristics

Dictionary Creation

Accessing Elements

Modifying Elements

Adding Elements

Removing Elements

Making a copy

Removing all items

Getting the Number of Items

Looping Over values

Using if statement to get values

Boolean,The Truth and False Of Python

Evaluate Variables and Expressions

Python bool() Function

Convertion of Integers and Floats as Booleans

Boolean Operators

Boolean And Operator

Boolean Not Operator

Boolean == (equivalent) and != (not equivalent) Operator

Python is Operator

Python in Operator

Python Strings In A Nutshell

Immutability

Concatenation and Slicing

String Operations

Methods In String

Pattern Matching

The 123 of Python Integers

Integer data type

Converting a string to an integer

Integer operators

Long arithmetic

Separator character

Floating point numbers

Converting a string to a floating point number

Arithmetic operators

Conversion between int and float

Python Tuples

Tuples

Creating a tuple with elements.

Using the tuple() constructor.

Tuple packing and unpacking:

Accessing tuples by index.

Retrieving Multiple elements in tuples.

What happens if you index out of range.

Accessing elements within a nested tuple.

Tuple concatenation

Compare two tuples

Using tuple packing and unpacking to return multiple values from a function

Data Engineering For Beginners.

Data Engineering For Beginners.

TIME SERIES ANALYSIS.

Exploratory Data Analysis: Data Visualization

Between linear regression and random forest regression, which model would perform better and why?

Data Science Roadmap.

Using `if` statement to get values

Using the `tuple()` constructor.