Continuing our exploration of Data Science from Scratch by Joel Grus (ch2). You'll note that the book emphasizes pure python with minimal libraries, so we may not see much of NumPy in the book. However, since NumPy is so pervasive for data science applications, and since `lists`

and NumPy `arrays`

have much overlap, I think it would be useful to use this opportunity to compare and contrast the section, knowing that for **most** of the book, we'll be using Python lists.

Lists are fundamental to Python so I'm going to spend some time exploring their features. For data science, `NumPy arrays`

are used frequently, so I thought it'd be good to implement all `list`

operations covered in this section in `Numpy arrays`

to *tease apart their similarities and differences*.

Below are the similarities.

This implies that whatever can be done in python `lists`

can also be done in numpy `arrays`

, including: getting the *nth* element in the list/array with square brackets, slicing the list/array, iterating through the list/array with *start, stop, step*, using the `in`

operator to find list/array membership, checking length and unpacking list/arrays.

```
# setup
import numpy as np
# create comparables
python_list = [1,2,3,4,5,6,7,8,9]
numpy_array = np.array([1,2,3,4,5,6,7,8,9])
# bracket operations
# get nth element with square bracket
python_list[0] # 1
numpy_array[0] # 1
python_list[8] # 9
numpy_array[8] # 9
python_list[-1] # 9
numpy_array[-1] # 9
# square bracket to slice
python_list[:3] # [1, 2, 3]
numpy_array[:3] # array([1, 2, 3])
python_list[1:5] # [2, 3, 4, 5]
numpy_array[1:5] # array([2, 3, 4, 5])
# start, stop, step
python_list[1:8:2] # [2, 4, 6, 8]
numpy_array[1:8:2] # array([2, 4, 6, 8])
# use in operator to check membership
1 in python_list # true
1 in numpy_array # true
0 in python_list # false
0 in numpy_array # false
# finding length
len(python_list) # 9
len(numpy_array) # 9
# unpacking
x,y = [1,2] # now x is 1, y is 2
w,z = np.array([1,2]) # now w is 1, z is 2
```

Now, here are the differences.

These tasks can be done in python `lists`

, but require a different approach for NumPy `array`

including: modification (extend in list, append for array). Finally, lists can store mixed data types, while NumPy array will convert to string.

```
# python lists can store mixed data types
heterogeneous_list = ['string', 0.1, True]
type(heterogeneous_list[0]) # str
type(heterogeneous_list[1]) # float
type(heterogeneous_list[2]) # bool
# numpy arrays cannot store mixed data types
# numpy arrays turn all data types into strings
homogeneous_numpy_array = np.array(['string', 0.1, True]) # saved with mixed data types
type(homogeneous_numpy_array[0]) # numpy.str_
type(homogeneous_numpy_array[1]) # numpy.str_
type(homogeneous_numpy_array[2]) # numpy.str_
# modifying list vs numpy array
# lists can use extend to modify list in place
python_list.extend([10,12,13]) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13]
numpy_array.extend([10,12,13]) # AttributeError: 'numpy.ndarray'
# numpy array must use append, instead of extend
numpy_array = np.append(numpy_array,[10,12,13])
# python lists can be added with other lists
new_python_list = python_list + [14,15] # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15]
numpy_array + [14,15] # ValueError
# numpy array cannot be added (use append instead)
# array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15])
new_numpy_array = np.append(numpy_array, [14,15])
# python lists have the append attribute
python_list.append(0) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 0]
# the append attribute for numpy array is used differently
numpy_array = np.append(numpy_array, [0])
```

Python `lists`

and NumPy `arrays`

have much in common, but there are meaningful differences as well.

#### Python Lists vs NumPy Arrays: What's the difference

Now that we know that there *are* meaningful differences, what can we attribute these differences to? This explainer from UCF highlights **performance** differences including:

- Size
- Performance
- Functionality

I'm tempted to go down this 🐇 🕳️ of further `lists`

vs `array`

comparisons, but we'll hold off for now.

For more content on data science, machine learning, R, Python, SQL and more, find me on Twitter.

## Discussion