DEV Community

loading...

Pandas Change Column Type – Definitive Guide

Vikram Aruchamy
Building things on Cloud and Writing how to do it :)
Originally published at stackvidhya.com on ・15 min read

This article is originally published on the blog Stack Vidhya as Pandas Change Column Type.

Pandas Dataframe is a powerful two dimensional data-structure which can be used to store and manipulate data for your Data analysis tasks.

Once you create a dataframe, you may need to change column type of a dataframe for reasons like converting a column to number format which can be easily used for modelling and classification.

In this tutorial, you’ll learn how to change column type of the pandas dataframe using

  • pandas astype()
  • pandasto_numeric()

If You’re in Hurry…

You can use the below code snippet to change column type of the pandas dataframe using the astype() method.

df = df.astype({"Column_name": str}, errors='raise') 

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Where,

  • df.astype() – Method to invoke the astype funtion in the dataframe.
  • {"Column_name": str} – List of columns to be cast into another format. Column_name is the column which need to be cast into another format. str is the target datatype to which the column values should be converted. You can use any of the builtin datatypes of Python or the datatypes available in Numpy .
  • errors='raise' – To specify how the exceptions to be handled while converting. raise will raise the error and ignore will ignore the errors and performs conversion only on the possible cell values.

This is how you can convert datatypes of columns in dataframe.

If You Want to Understand Details, Read on…

In this detailed tutorial, you’ll learn the how to change column type in pandas dataframe using different methods provided by the pandas itself.

Also the examples to perform different types of conversion.

Sample Dataframe

This is the sample dataframe used throughout the tutorial.

  • import pandas as pd to use functionalities provided by pandas
  • import numpy as np to use functionalities provided by numpy. You’ll specifically use the datatype int64 from np as int64 is not available in python by default.

Snippet

import pandas as pd
import numpy as np

# Creating a Dictionary
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU", "Speakers"],
        "Unit_Price":[500,200, 5000, 10000, 250.50],
        "No_Of_Units":[5,5, 10, 20, 8],
        "Available_Quantity":[5,10,11,15, "Not Available"],
        "Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','01/05/2021']
       }

# Creating a dataframe from the dictionary
df = pd.DataFrame(data)

# Printing the datatype of the columns
df.dtypes
Enter fullscreen mode Exit fullscreen mode

You can checkout the datatype of each column by using the code df.dtypes. Then you’ll see the type of each column printed.

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int64
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

The dataframe consists of types object, float64 and int64.

Note : the String types is displayed as object.

Printing the dataframe

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5 11/5/2021
1 Mouse 200.0 5 10 4/23/2021
2 Monitor 5000.0 10 11 08/21/2021
3 CPU 10000.0 20 15 09/18/2021
4 Speakers 250.5 8 Not Available 01/05/2021

You’ve the sample dataframe created with different datatypes.

Next, you’ll see how different types of column can be casted to another format.

Pandas Change Column Type To String

In this section, you’ll learn how to change column type to String.

You can use by using the astype() method and mentioning the str as target datatype.

In the sample dataframe, the column Unit_Price is float64. When the below line is executed, Unit_Price column will be converted to String format.

Snippet

df = df.astype({"Unit_Price": str})

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Where,

  • df.astype – Method to convert to another datatype
  • {"Unit_Price": str}Unit_Price is column name and str is the target datatype.

The df.dtypes will print the types of the column.

Datatypes of Columns

    product_name object
    Unit_Price object
    No_Of_Units int64
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

Before conversion, the column Unit_Price was float64.

Now you can see the Unit_Price is converted to String, and it is displayed as object type.

Refer this link to understand why String is displayed as object.

You’ve learnt how to cast a column type to String.

Next, you’ll see how to convert column type to int.

Pandas Change Column Type To Int

In this section, you’ll learn how to change column type to int.

You can convert a column to int using the to_numeric() method or astype() method.

Let’s look both the methods in detail.

Using to_numeric()

to_numeric() method will convert a column to int or float based on the values available in the column.

  • If column contains only numbers without decimal, to_numeric() will convert it to int64
  • If column contains numbers with decimal points, to_numeric() will convert it to float64.

Example: The Unit_Price column in the sample dataframe contains decimal numbers and the No_Of_Units column contains only numbers.

Hence the to_numeric() method will convert the Unit_Price column to float64 and the No_Of_Units column to int64.

# convert column "Unit_Price" of a DataFrame

df["Unit_Price"] = pd.to_numeric(df["Unit_Price"])

df["No_Of_Units"] = pd.to_numeric(df["No_Of_Units"])

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Datatypes after converting it using the to_numeric() method.

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int64
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

Printing the dataframe

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5 11/5/2021
1 Mouse 200.0 5 10 4/23/2021
2 Monitor 5000.0 10 11 08/21/2021
3 CPU 10000.0 20 15 09/18/2021
4 Speakers 250.5 8 Not Available 01/05/2021

Now, you’ll see how to handle exceptions while using to_numeric() method.

Error Handling in to_numeric

Exception handling or error handling is one of the good programming practice. Any operation in a program is prone to errors.

While converting a column to int , errors may occur because the column can contain non numeric values. In that case, the conversion cannot take place. So you need to specify h ow to handle the errors that occur during conversion.

You can use the additional optional parameter errors to specify how the errors should be handled.

errors='raise' will raise the error.

For example , the Available_Quantity column in the sample dataframe contains a String value Not Available in the one of the cells. It cannot be converted to number. In this case, the conversion will raise the error.

Snippet

df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='raise')
Enter fullscreen mode Exit fullscreen mode

Error will be raise as ValueError: Unable to parse string "Not Available" as follows.

Error Output

    ---------------------------------------------------------------------------

    ValueError Traceback (most recent call last)

    pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()

    ValueError: Unable to parse string "Not Available"

    During handling of the above exception, another exception occurred:

    pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()

    ValueError: Unable to parse string "Not Available" at position 4
Enter fullscreen mode Exit fullscreen mode

This is how you can raise the error and stop the conversion if there is any problem during conversion.

Next, you’ll see how to ignore the errors.

Ignoring the errors

You can ignore the errors that occur during the conversion by using the errors='ignore'.

For example , when you convert the Availability_Quantity column to int which has String value, errors will occur.

When errors='ignore' is used, conversion will be stopped silently without raising any errors. You’ll have the original dataframe intact.

Snippet

df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='ignore')

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5 11/5/2021
1 Mouse 200.0 5 10 4/23/2021
2 Monitor 5000.0 10 11 08/21/2021
3 CPU 10000.0 20 15 09/18/2021
4 Speakers 250.5 8 Not Available 01/05/2021

This is how you can ignore the errors while converting.

Coercing the Error

Coercing means, persuade (an unwilling person) to do something by using force. Similarly, in this context, you’ll force the to_numeric() method to convert the columns though it has some invalid values.

It’ll convert the possible cell values and ignore the invalid values.

Snippet

df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='coerce')

df.dtypes
Enter fullscreen mode Exit fullscreen mode

You could see the Available_Quantity column is converted to float64. The String values in the column is converted to NaN, which denotes

N ot A N umber.

You can see that in the below visualized dataframe.

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int64
    Available_Quantity float64
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

Printing the dataframe

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5.0 11/5/2021
1 Mouse 200.0 5 10.0 4/23/2021
2 Monitor 5000.0 10 11.0 08/21/2021
3 CPU 10000.0 20 15.0 09/18/2021
4 Speakers 250.5 8 NaN 01/05/2021

This is how you can use the to_numeric() to convert the column to any of the number types.

Next, you’ll learn about the astype() method.

Using astype()

astype() method is used to convert column to any type specified in the method parameter.

You can convert column to int by specifying int in the method parameter as shown below.

Snippet

df = df.astype({"No_Of_Units": int})

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Where,

  • df.astype() – Method to invoke the astype funtion in the dataframe.
  • {"No_Of_Units": int} – List of columns to be cast into another format. No_Of_Units is the column which need to be cast into int format. int is the target datatype to which the column values should be converted. Now the column will be converted to int32.

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int32
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

Note : astype() converts into int32 whereas to_numeric() converts it into int64 by default.

astype() is useful but you need to note few points. You need to use np.int64, if you want to convert it into 64 bit integer.

Now, let’s see how to handle error during astype() conversion.

Error Handling in astype()

As said before, errors are part of any programming. You need to specify how it needs to handled when it occurs.

You can do this by using the optional parameter errors.

errors='raise' will raise the error.

For example , the Available_Quantity column in the sample dataframe contains a String value Not Available in the one of the cells. It cannot be converted to number. In this case, the conversion will raise the error.

Snippet

df = df.astype({"Available_Quantity": float}, errors='raise')

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Error will be raised as below.

Error Output

    ---------------------------------------------------------------------------

    ValueError Traceback (most recent call last)

    <ipython-input-13-616dd5b910d4> in <module>
    ----> 1 df = df.astype({"Available_Quantity": float},errors='raise')
          2 
          3 df.dtypes

    ValueError: could not convert string to float: 'Not Available'
Enter fullscreen mode Exit fullscreen mode

You’ve raised the error during conversion.

Next, you’ll see how to ignore the errors.

Ignoring the errors

You can ignore the errors that occur during the conversion by using the errors='ignore'.

For example , when you convert the Availability_Quantity column to int which has String value, errors will occur.

When errors='ignore' is used, conversion will stopped silently without raising any errors. you’ll have the original dataframe intact.

df = df.astype({"Available_Quantity": float}, errors='ignore')

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int32
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

You could see that the Availability_Quantity column is still the type object which means it is not converted but no other errors raised as well.

Printing the dataframe

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5 11/5/2021
1 Mouse 200.0 5 10 4/23/2021
2 Monitor 5000.0 10 11 08/21/2021
3 CPU 10000.0 20 15 09/18/2021
4 Speakers 250.5 8 Not Available 01/05/2021

This is how you can ignore the errors during conversion.

Note:

astype() doesn’t coerce and performs the conversion on the applicable value. It either converts or ignores and return the original values. Hence, you’ll not be able to use errors=’coerce’ with the astype() method.

You’ve learnt how to cast column type to int.

Next, you’ll see how to convert object to int64.

Pandas Change Column Type From Object to Int64

In this section, you’ll learn how to change column type from object to int64.

You can do it by using the to_numeric() method as shown below. It automatically converts numbers to int64 by default.

Snippet

df["No_Of_Units"] = pd.to_numeric(df["No_Of_Units"])

df.dtypes
Enter fullscreen mode Exit fullscreen mode

You’re converting No_Of_Units column to int. See it is converted to int64.

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int64
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

Now, let’s see the default behavior of the astype() method and how it can be used to convert object to int64.

If you just specify int in astype, it converts the column to int32.

Snippet

df = df.astype({"No_Of_Units": int})

df.dtypes
Enter fullscreen mode Exit fullscreen mode

You’re converting No_Of_Units column to int. See it is converted to int32.

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int32
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

Now, you’ll convert object to int64 using astype().

You can use np.int64 in type to convert column to int64.

Snippet

df = df.astype({"No_Of_Units": np.int64})

df.dtypes
Enter fullscreen mode Exit fullscreen mode

You’re converting No_Of_Units column to np.int64. See it is converted to int64.

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int64
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

This is how you can convert to_numeric() and astype() to cast column type from object to int64.

Next, you’ll see how to convert column type from int to string.

Pandas Change Column Type From Int To String

In this section, you’ll learn how to change column type from Int to String.

You can use the astype() method to convert an int column to String.

In the sample dataframe, the column No_Of_Units is of number type. Now you’ll convert it to string.

Snippet

df = df.astype({"No_Of_Units": str}, errors='raise')

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Where,

  • df.astype() – Method to invoke the astype funtion in the dataframe.
  • {"No_Of_Units": str} – List of columns to be cast into another format. No_Of_Units is the column which need to be cast into another format. str is the target datatype to which the column values should be converted.

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units object
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

Now you can see the No_Of_Units is converted to String, and it is displayed as object type.

Note: Refer this link to understand why String is displayed as object.

This is how you can cast int column to String or Object.

Next, you’ll see how to convert column type to float.

Pandas Change Column Type To Float

In this section, you’ll learn how to change column type from to float.

You can use the astype() method to convert a column to float.

In the sample dataframe, the column Unit_Price has numbers with decimal values but column type is String format. Now you’ll convert it to float.

df = df.astype({"Unit_Price": float})

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Where,

  • df.astype() – Method to invoke the astype funtion in the dataframe.
  • {"Unit_Price": float} – List of columns to be cast into another format. No_Of_Units is the column which need to be cast into another format. float is the target datatype to which the column values should be converted.

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int64
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

You can see that the Unit_Price column is converted into float64.

Printing the dataframe

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5 11/5/2021
1 Mouse 200.0 5 10 4/23/2021
2 Monitor 5000.0 10 11 08/21/2021
3 CPU 10000.0 20 15 09/18/2021
4 Speakers 250.5 8 Not Available 01/05/2021

You’ve converted a column which has only numbers to float.

Now, let’s try to convert the column Available_Quantity to float. which has the non numeric charaters in one of the cells. The non numeric value is Not Available.

Note that, you’re using errors='coerce' which will force the conversion of the possible values.

df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='coerce')

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int64
    Available_Quantity float64
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

The column is converted to float64 without any problems. The non numeric characters are converted to NaN which means N ot A N umber.

Printing the dataframe

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5.0 11/5/2021
1 Mouse 200.0 5 10.0 4/23/2021
2 Monitor 5000.0 10 11.0 08/21/2021
3 CPU 10000.0 20 15.0 09/18/2021
4 Speakers 250.5 8 NaN 01/05/2021

This is how you can cast column type to float.

Next, you’ll learn how to cast column type to Datetime.

Pandas Change Column Type To Datetime64

In this section, you’ll learn how to change column type to Datetime64.

You can use the method to_datetime() to convert a string to datetime.

In the sample dataframe, the column Available_Since_Date has the datevalue as a String type.

You’ll convert the column type to datetime using the below snippet.

Snippet

df['Available_Since_Date']= pd.to_datetime(df['Available_Since_Date'])

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int64
    Available_Quantity object
    Available_Since_Date datetime64[ns]
    dtype: object
Enter fullscreen mode Exit fullscreen mode

You could see that the column Available_Since_Date column is converted into datetime64[ns].

to_datetime() also supports error handling where,

  • errors='raise' will raise an error if there is invalid date values available in any of the cells.
  • errors='ignore' will silently ignore errors if there is invalid date values available in any of the cells and returns the column intact.
  • errors='coerce' will convert the valid dates to datetime type and set other cells to NaT.

This is how you can convert column type to datetime.

Next, you’ll see how to convert multiple columns to int.

Pandas Convert Multiple Columns to Int

In this section, you’ll learn how to convert multiple columns to int using the astype() method.

Its similar to how you converted a single column to int using the astype(). You can just add the additional columns as shown below.

df[['column_1','column_2']] = df[['column_1','column_2']].astype(np.int64)

df.dtypes
Enter fullscreen mode Exit fullscreen mode

The column_1 and Column_2 will be converted to int using the astype().

Example, We’ve shown only one columns as the sample dataframe has only one numbers column.

df[['No_Of_Units']] = df[['No_Of_Units']].astype(np.int64)

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units np.int64
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

You can see that the column No_Of_Units converted into int64.

Printing the dataframe

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5 11/5/2021
1 Mouse 200.0 5 10 4/23/2021
2 Monitor 5000.0 10 11 08/21/2021
3 CPU 10000.0 20 15 09/18/2021
4 Speakers 250.5 8 Not Available 01/05/2021

Next, lets convert multiple columns using the to_numeric() method.

You’ve to use the apply method to apply the function to_numeric() to the specified columns as shown below.

df[['column_1','column_2']] = df[['column_1','column_2']].apply(pd.to_numeric)

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Example, We’ve shown only one columns as the sample dataframe has only one numbers column.

df[["No_Of_Units"]] = df[["No_Of_Units"]].apply(pd.to_numeric)

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Datatypes of Columns

    product_name object
    Unit_Price float64
    No_Of_Units int64
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

You can see that the column No_Of_Units converted into int64.

Printing the dataframe

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5 11/5/2021
1 Mouse 200.0 5 10 4/23/2021
2 Monitor 5000.0 10 11 08/21/2021
3 CPU 10000.0 20 15 09/18/2021
4 Speakers 250.5 8 Not Available 01/05/2021

This is how you can convert multiple column types to another format.

Next, you’ll see how to cast all columns to another type.

Pandas Convert All Columns

In this section, you’ll learn how to change column type of all columns in a dataframe. For example Converting All Object Columns To String.

You can use the astype() method also for converting all columns.

First create a list of all columns called columns_list by using list(df).

Then you can pass this list to the dataframe and invoke the astype() method, pass the target datatype to the astype() method.

For example, str to convert all columns to string.

Snippet

columns_list = list(df)

df[columns_list] = df[columns_list].astype(str)

df.dtypes
Enter fullscreen mode Exit fullscreen mode

Datatypes of Columns

    product_name object
    Unit_Price object
    No_Of_Units object
    Available_Quantity object
    Available_Since_Date object
    dtype: object
Enter fullscreen mode Exit fullscreen mode

You can see that all the columns of the dataframe is converted to String and it is displayed as object.

Refer this link to understand why String is displayed as object.

Printing the dataframe

df
Enter fullscreen mode Exit fullscreen mode

Dataframe Looks Like

product_name Unit_Price No_Of_Units Available_Quantity Available_Since_Date
0 Keyboard 500.0 5 5 11/5/2021
1 Mouse 200.0 5 10 4/23/2021
2 Monitor 5000.0 10 11 08/21/2021
3 CPU 10000.0 20 15 09/18/2021
4 Speakers 250.5 8 Not Available 01/05/2021

Conclusion

To Summarize, you’ve learnt how to change column type in pandas dataframe.

You’ve used the methods to_numeric() and astype() to change the column types and how to use these methods for performing various type conversion along with the exception handling.

If you have any questions, comment below.

You May also Like

Discussion (0)