Let's take a look at how we can convert a string column where the data is in a dictionary format to pandas dataframe columns.
Read In Data
We're going to read in a csv and display our column with data. Pandas will display this column as an object, and we can't access the values in the dictionary as the row element is actually a string.
Using AST To Evaluate Strings
Let's use the AST library to transform our string into a python literal, defined here. This allows us to transform our values into dictionaries. We'll use the pandas .apply() function to apply this function to each element in the column.
Normalize Into New Columns
Using the pandas function json_normalize, we can convert our dictionary values into columns in a new dataframe, which we will merge into the orginal later. You don't have to set the json_normalize output to a new dataframe, I just like how it comes out.
We now have a new dataframe with the columns being the key and value pairs from our dictionary.
Combining Data
Once you have your new columns, you can either set them back to columns in the original dataframe or merge them into a larger dataframe using pd.merge().
Top comments (0)