jess unrein

Posted on Nov 8, 2018 • Edited on Nov 9, 2018

Converting a csv writer from Python 2 to Python 3

#python #csv #explainlikeimfive #debugging

Please use Python 3

Converting projects from Python 2.7 to Python 3.x is usually a pretty painless process. Usually, the checklist items go something like this:

Change print debugging statements to print() function from print keyword
Check dependencies for Python 3 compatibility
Consider using mypy gradual static typing. Consider further. Decide it's too much work for now but a great goal for v1 - whenever that will be.
Check to make sure you're not relying on integer rather than float division for any critical business logic

But dealing with reading and writing strings can suddenly get tricky. Even so, with Python 2's 2020 EOL fast approaching, using Python 3 is simply the responsible choice.

Example

I recently decided to port a project I wrote in Python 2 to Python 3.

The project is pretty simple:

Take a csv export of a Goodreads user's library
Write the contents out to a new csv using Libib's expected input fields

import csv
import re

def convert_csv():
    .
    .
    .
    with open('goodreads_export.csv', 'r') as f:
        reader = csv.DictReader(f)
    rows = [x for x in reader]

    books = [book for book in rows]
    print(books[1])
    # Simplifying the actual fields here so the example won't get too long :)
    header_keys = ['Author', 'ISBN-13', 'Title']
    print(len(header_keys))

    with open('libib_export.csv', 'wb') as f:
        writer = csv.writer(f)
        writer.writerow(header_keys)
        for book in books:
            row = []
            authors = [book.get('Author', '')]
            authors.append(book.get('Additional Authors', ''))
            row.append(','.join(authors))
            row.append(book.get('ISBN13', ''))
            row.append(book.get('Title', ''))

            writer.writerow(row)

I like this project because it's a very simple process that requires a file input and creates a file output, so it's a great example for testing out different deployment processes or system configurations. So I changed my print debugging statements, but I kept getting an error:

File "converter.py", line 15, in convert_csv
    writer.writerow(libib_keys)
TypeError: a bytes-like object is required, not 'str'

Which I thought made sense. Python 3 defaults to using UTF-8 encoded strings unless you specify using bytestrings. So I commented out the bulk of my process, converted my header keys to bytestrings, and tried again.

header_keys = [b'Author', b'ISBN-13', b'Title']

with open('libib_export.csv', 'wb') as f:
        writer = csv.writer(f)
        writer.writerow(header_keys)

But for some reason I still got the same error. After fruitlessly running it a few more times, hoping for different results, I decided to google the error. Which, of course, told me what I already knew about the difference between strings in Python2 and Python3. So I decided to take a look back into the python csv writer docs, to see what assumptions I was probably mucking up.

In the Python 2 docs, the example for constructing a csv writer looks like this:

import csv
with open('eggs.csv', 'wb') as csvfile:
    etc
    etc

but the Python 3 docs do it like this

import csv
with open('eggs.csv', 'w', newline='') as csvfile:

The Python 2 docs use the b mode when reading and writing files, but the Python 3 docs don't! I thought that was pretty weird, so I changed my output file definition to not use b mode, changed all of the bytestrings back to unicode strings, and the csv converter worked!

What happened?

b mode causes the open builtin function to open the file in binary mode, and is suitable for opening non-text files. The Python 2 docs on the open function state that some systems don't treat text and binary files differently, and that appending b to the modes is good for documentation purposes. Since all of the docs use rb and wb for csv manipulation in the Python 2 docs, it made sense to past-me to include the b mode in my csv writer.

However, the Python 3 open function expects and returns unencoded bytes when opening a file in binary mode. When I ended up digging into the actual definition of CSVWriter.writerow I found this:

    def writerow(self, row):
        if sys.version_info[0] < 3:
            r = []
            for item in row:
                if isinstance(item, text_type):
                    item = item.encode('utf-8')
                r.append(item)
            row = r
        self.writer.writerow(row)

writerow converts all my items back into utf-8 even if I've declared them as bytestrings before passing them in! So when the file expected byte objects it was getting the wrong type, no matter what I was giving the writer.

What's the point?

String interactions can get weird in Python 3 if you're used to Python 2's laissez faire attitude, and builtin functions don't always handle inputs the way you expect them to. This is an example of the exact thing you should be looking out for (and writing regression tests for!) when porting your project over to Python 3.

Top comments (2)

Raunak Ramakrishnan • Nov 8 '18

Did you try 2to3 on the Python 2 file? I'm interested in knowing what output it gave. I have heard quite a few problems are identified on a single run of the script.

jess unrein • Nov 8 '18

2to3 just identified the issues with print statements. Technically, opening a file in binary mode isn't a bug or unexpected behavior, based on the way I wrote the Python 2 file. But the way that binary mode works in 3, and the way that the Python 3 csv writer encodes strings under the hood created an unexpected interaction that broke forwards compatibility.

DEV Community