Overview
In Python, indexing and slicing are ways to get elements from iterables (like lists, tuples, and even string).
ingredients = ['beef', 'onions', 'tomato', 'potato', 'carrot']
print(ingredients[1]) # onions
print(ingredients[1:4]) # ['onions', 'tomato', 'potato']
This is something I have to look at recently. I am personally not a fan of deep dives in the C-level source code since things can always change (for example, ROT_TWO is set to be replaced in Python 3.11). Nevertheless, I think reading C code as a Python developer is a fun experience I want to share.
Below codes are performed on Python 3.10.
AST
In CPython, actual Python syntax go through parsing and compiling.
Python code gets parsed and converted into an abstract syntax tree. The ast module can help us illustrate this step.
import ast
print(ast.dump(ast.parse('[][0]', mode='eval'), indent=4))
# Result:
# Expression(
# body=Subscript(
# value=List(elts=[], ctx=Load()),
# slice=Constant(value=0),
# ctx=Load()))
print(ast.dump(ast.parse('[][::-1]', mode='eval'), indent=4))
# Result:
# Expression(
# body=Subscript(
# value=List(elts=[], ctx=Load()),
# slice=Slice(
# step=UnaryOp(
# op=USub(),
# operand=Constant(value=1))),
# ctx=Load()))
What we see in the two code snippets above is that both indexing and slicing are essentially the same operation, Subscript.
Subscript takes a value, a slice, and a ctx. We can learn from here that the difference between indexing and slicing is that:
- indexing is when a
Constanttree is passed into theSubscript's slice; whereas - slicing is when a
Slicetree is passed otherwise.
Bytecode
The abstract syntax tree as shown above will eventually get compiled into bytecode. The dis module help us see it.
from dis import dis
dis('[][0]')
# Result:
# 1 0 BUILD_LIST 0
# 2 LOAD_CONST 0 (0)
# 4 BINARY_SUBSCR
# 6 RETURN_VALUE
dis('[][::-1]')
# Result:
# 1 0 BUILD_LIST 0
# 2 LOAD_CONST 0 (None)
# 4 LOAD_CONST 0 (None)
# 6 LOAD_CONST 1 (-1)
# 8 BUILD_SLICE 3
# 10 BINARY_SUBSCR
# 12 RETURN_VALUE
What we're seeing in these snippets is that Subscript is referred to as BINARY_SUBSCR at bytecode level. We can also see that if any of start, stop, or step is not supplied, None is put in place.
From the docs: the Slice Object
Part of the Python's builtins is the slice function which generates the
From the docs:
Slice objects are also generated when extended indexing syntax is used. For example:
a[start:stop:step]ora[start:stop, i].
We can infer that when slicing ingredients[1:4], a slice(1,4) is generated along the way.
Under the C
The bytecodes such as BUILD_SLICE and BINARY_SUBSCR can be found at Python's ceval.c
Below are snippets:
TARGET(BINARY_SUBSCR) {
PREDICTED(BINARY_SUBSCR);
PyObject *sub = POP();
PyObject *container = TOP();
PyObject *res = PyObject_GetItem(container, sub);
Py_DECREF(container);
Py_DECREF(sub);
SET_TOP(res);
if (res == NULL)
goto error;
JUMPBY(INLINE_CACHE_ENTRIES_BINARY_SUBSCR);
DISPATCH();
}
TARGET(BUILD_SLICE) {
PyObject *start, *stop, *step, *slice;
if (oparg == 3)
step = POP();
else
step = NULL;
stop = POP();
start = TOP();
slice = PySlice_New(start, stop, step);
Py_DECREF(start);
Py_DECREF(stop);
Py_XDECREF(step);
SET_TOP(slice);
if (slice == NULL)
goto error;
DISPATCH();
}
Some Stuff
As stated above, it's indexing if a Contstant (i.e., int) is passed; whereas it's slicing if a Slice is passed.
Can we pass a slice() in __getitem__?
# Turns out, yes
#
print(ingredients[slice(1,4)])
# Result:
# ['onions', 'tomato', 'potato']
#
print(ingredients.__getitem__(slice(1,4)))
# Same result
From this line from the C code:
PyObject *res = PyObject_GetItem(container, sub);
we can infer that PyObject_GetItem is the C equivalent of __getitem__ from the data model. We now know that indexing, slicing and getting a value from a dictionary are all just __getitem__. This should not be a surprise considering the syntax, but I am.
(As a side note: if I had just read the Data Model docs, this readings done to make this post wouldn't have been necessary.)
Some implementations for the slice object is in Objects/sliceobject.c. There's a lot going in here.
In one of the snippets above, it is PySlice_New which creates a new slice object (it's been around since it was first introduced).
Looking at this commit, it seems that PySlice_New works like a def __init__ which simply returns a PyObject struct with attributes start, stop, and step. Calculations happen elsewhere. For example, the pointer *start is calculated on five occasions.
if (r->start == Py_None) {
*start = *step < 0 ? length-1 : 0;
} else {
if (!PyLong_Check(r->start)) return -1;
*start = PyLong_AsSsize_t(r->start);
if (*start < 0) *start += length;
}
if (r->start == Py_None) {
*start = *step < 0 ? PY_SSIZE_T_MAX : 0;
}
if (*start < 0) {
*start += length;
if (*start < 0) {
*start = (step < 0) ? -1 : 0;
}
}
else if (*start >= length) {
*start = (step < 0) ? length - 1 : length;
}
I don't think I need to understand this yet. Moreso, there's a lot of pointers at play in order to really read it, you need to stop thinking in Python and start thinking in C.
Another cool thing to note is a factory function at the bottom called slice_new, which, looking at the pull request, seem to be the slice builtin.
Top comments (0)