This article is originally published on the blog Stack Vidhya as Pandas Change Column Type.
Pandas Dataframe is a powerful two dimensional data-structure which can be used to store and manipulate data for your Data analysis tasks.
Once you create a dataframe, you may need to change column type of a dataframe for reasons like converting a column to number format which can be easily used for modelling and classification.
In this tutorial, you’ll learn how to change column type of the pandas dataframe using
- pandas
astype()
- pandas
to_numeric()
If You’re in Hurry…
You can use the below code snippet to change column type of the pandas dataframe using the astype() method.
df = df.astype({"Column_name": str}, errors='raise')
df.dtypes
Where,
-
df.astype()
– Method to invoke the astype funtion in the dataframe. -
{"Column_name": str}
– List of columns to be cast into another format. Column_name is the column which need to be cast into another format.str
is the target datatype to which the column values should be converted. You can use any of the builtin datatypes of Python or the datatypes available in Numpy . -
errors='raise'
– To specify how the exceptions to be handled while converting.raise
will raise the error andignore
will ignore the errors and performs conversion only on the possible cell values.
This is how you can convert datatypes of columns in dataframe.
If You Want to Understand Details, Read on…
In this detailed tutorial, you’ll learn the how to change column type in pandas dataframe using different methods provided by the pandas itself.
Also the examples to perform different types of conversion.
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
-
import pandas as pd
to use functionalities provided bypandas
-
import numpy as np
to use functionalities provided by numpy. You’ll specifically use the datatypeint64
fromnp
asint64
is not available in python by default.
Snippet
import pandas as pd
import numpy as np
# Creating a Dictionary
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU", "Speakers"],
"Unit_Price":[500,200, 5000, 10000, 250.50],
"No_Of_Units":[5,5, 10, 20, 8],
"Available_Quantity":[5,10,11,15, "Not Available"],
"Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','01/05/2021']
}
# Creating a dataframe from the dictionary
df = pd.DataFrame(data)
# Printing the datatype of the columns
df.dtypes
You can checkout the datatype of each column by using the code df.dtypes
. Then you’ll see the type of each column printed.
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int64
Available_Quantity object
Available_Since_Date object
dtype: object
The dataframe consists of types object
, float64
and int64
.
Note : the String types is displayed as object.
Printing the dataframe
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
You’ve the sample dataframe created with different datatypes.
Next, you’ll see how different types of column can be casted to another format.
Pandas Change Column Type To String
In this section, you’ll learn how to change column type to String
.
You can use by using the astype()
method and mentioning the str
as target datatype.
In the sample dataframe, the column Unit_Price is float64. When the below line is executed, Unit_Price column will be converted to String format.
Snippet
df = df.astype({"Unit_Price": str})
df.dtypes
Where,
-
df.astype
– Method to convert to another datatype -
{"Unit_Price": str}
– Unit_Price is column name andstr
is the target datatype.
The df.dtypes
will print the types of the column.
Datatypes of Columns
product_name object
Unit_Price object
No_Of_Units int64
Available_Quantity object
Available_Since_Date object
dtype: object
Before conversion, the column Unit_Price was float64
.
Now you can see the Unit_Price is converted to String
, and it is displayed as object
type.
Refer this link to understand why String
is displayed as object.
You’ve learnt how to cast a column type to String.
Next, you’ll see how to convert column type to int.
Pandas Change Column Type To Int
In this section, you’ll learn how to change column type to int
.
You can convert a column to int
using the to_numeric()
method or astype()
method.
Let’s look both the methods in detail.
Using to_numeric()
to_numeric() method will convert a column to int or float based on the values available in the column.
- If column contains only numbers without decimal,
to_numeric()
will convert it toint64
- If column contains numbers with decimal points,
to_numeric()
will convert it tofloat64
.
Example: The Unit_Price column in the sample dataframe contains decimal numbers and the No_Of_Units column contains only numbers.
Hence the to_numeric()
method will convert the Unit_Price column to float64
and the No_Of_Units column to int64
.
# convert column "Unit_Price" of a DataFrame
df["Unit_Price"] = pd.to_numeric(df["Unit_Price"])
df["No_Of_Units"] = pd.to_numeric(df["No_Of_Units"])
df.dtypes
Datatypes after converting it using the to_numeric()
method.
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int64
Available_Quantity object
Available_Since_Date object
dtype: object
Printing the dataframe
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
Now, you’ll see how to handle exceptions while using to_numeric()
method.
Error Handling in to_numeric
Exception handling or error handling is one of the good programming practice. Any operation in a program is prone to errors.
While converting a column to int
, errors may occur because the column can contain non numeric values. In that case, the conversion cannot take place. So you need to specify h ow to handle the errors that occur during conversion.
You can use the additional optional parameter errors
to specify how the errors should be handled.
errors='raise'
will raise the error.
For example , the Available_Quantity column in the sample dataframe contains a String value Not Available in the one of the cells. It cannot be converted to number. In this case, the conversion will raise the error.
Snippet
df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='raise')
Error will be raise as ValueError: Unable to parse string "Not Available"
as follows.
Error Output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "Not Available"
During handling of the above exception, another exception occurred:
pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "Not Available" at position 4
This is how you can raise the error and stop the conversion if there is any problem during conversion.
Next, you’ll see how to ignore
the errors.
Ignoring the errors
You can ignore
the errors that occur during the conversion by using the errors='ignore'
.
For example , when you convert the Availability_Quantity column to int which has String value, errors will occur.
When errors='ignore'
is used, conversion will be stopped silently without raising any errors. You’ll have the original dataframe intact.
Snippet
df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='ignore')
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
This is how you can ignore the errors while converting.
Coercing the Error
Coercing means, persuade (an unwilling person) to do something by using force. Similarly, in this context, you’ll force the to_numeric()
method to convert the columns though it has some invalid values.
It’ll convert the possible cell values and ignore the invalid values.
Snippet
df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='coerce')
df.dtypes
You could see the Available_Quantity column is converted to float64. The String values in the column is converted to NaN, which denotes
N ot A N umber.
You can see that in the below visualized dataframe.
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int64
Available_Quantity float64
Available_Since_Date object
dtype: object
Printing the dataframe
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5.0 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10.0 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11.0 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15.0 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | NaN | 01/05/2021 |
This is how you can use the to_numeric() to convert the column to any of the number types.
Next, you’ll learn about the astype()
method.
Using astype()
astype() method is used to convert column to any type specified in the method parameter.
You can convert column to int
by specifying int
in the method parameter as shown below.
Snippet
df = df.astype({"No_Of_Units": int})
df.dtypes
Where,
-
df.astype()
– Method to invoke the astype funtion in the dataframe. -
{"No_Of_Units": int}
– List of columns to be cast into another format. No_Of_Units is the column which need to be cast into int format.int
is the target datatype to which the column values should be converted. Now the column will be converted to int32.
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int32
Available_Quantity object
Available_Since_Date object
dtype: object
Note : astype()
converts into int32
whereas to_numeric()
converts it into int64
by default.
astype()
is useful but you need to note few points. You need to use np.int64
, if you want to convert it into 64 bit integer.
Now, let’s see how to handle error during astype()
conversion.
Error Handling in astype()
As said before, errors are part of any programming. You need to specify how it needs to handled when it occurs.
You can do this by using the optional parameter errors
.
errors='raise'
will raise the error.
For example , the Available_Quantity column in the sample dataframe contains a String value Not Available in the one of the cells. It cannot be converted to number. In this case, the conversion will raise the error.
Snippet
df = df.astype({"Available_Quantity": float}, errors='raise')
df.dtypes
Error will be raised as below.
Error Output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-616dd5b910d4> in <module>
----> 1 df = df.astype({"Available_Quantity": float},errors='raise')
2
3 df.dtypes
ValueError: could not convert string to float: 'Not Available'
You’ve raised the error during conversion.
Next, you’ll see how to ignore
the errors.
Ignoring the errors
You can ignore
the errors that occur during the conversion by using the errors='ignore'
.
For example , when you convert the Availability_Quantity column to int which has String value, errors will occur.
When errors='ignore'
is used, conversion will stopped silently without raising any errors. you’ll have the original dataframe intact.
df = df.astype({"Available_Quantity": float}, errors='ignore')
df.dtypes
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int32
Available_Quantity object
Available_Since_Date object
dtype: object
You could see that the Availability_Quantity column is still the type object which means it is not converted but no other errors raised as well.
Printing the dataframe
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
This is how you can ignore the errors during conversion.
Note:
astype()
doesn’t coerce
and performs the conversion on the applicable value. It either converts or ignores and return the original values. Hence, you’ll not be able to use errors=’coerce’ with the astype()
method.
You’ve learnt how to cast column type to int.
Next, you’ll see how to convert object to int64
.
Pandas Change Column Type From Object to Int64
In this section, you’ll learn how to change column type from object to int64
.
You can do it by using the to_numeric()
method as shown below. It automatically converts numbers to int64
by default.
Snippet
df["No_Of_Units"] = pd.to_numeric(df["No_Of_Units"])
df.dtypes
You’re converting No_Of_Units column to int. See it is converted to int64
.
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int64
Available_Quantity object
Available_Since_Date object
dtype: object
Now, let’s see the default behavior of the astype()
method and how it can be used to convert object to int64
.
If you just specify int
in astype, it converts the column to int32.
Snippet
df = df.astype({"No_Of_Units": int})
df.dtypes
You’re converting No_Of_Units column to int. See it is converted to int32
.
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int32
Available_Quantity object
Available_Since_Date object
dtype: object
Now, you’ll convert object to int64
using astype()
.
You can use np.int64
in type to convert column to int64.
Snippet
df = df.astype({"No_Of_Units": np.int64})
df.dtypes
You’re converting No_Of_Units column to np.int64
. See it is converted to int64
.
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int64
Available_Quantity object
Available_Since_Date object
dtype: object
This is how you can convert to_numeric()
and astype()
to cast column type from object to int64
.
Next, you’ll see how to convert column type from int
to string
.
Pandas Change Column Type From Int To String
In this section, you’ll learn how to change column type from Int to String.
You can use the astype()
method to convert an int column to String.
In the sample dataframe, the column No_Of_Units is of number type. Now you’ll convert it to string.
Snippet
df = df.astype({"No_Of_Units": str}, errors='raise')
df.dtypes
Where,
-
df.astype()
– Method to invoke the astype funtion in the dataframe. -
{"No_Of_Units": str}
– List of columns to be cast into another format. No_Of_Units is the column which need to be cast into another format.str
is the target datatype to which the column values should be converted.
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units object
Available_Quantity object
Available_Since_Date object
dtype: object
Now you can see the No_Of_Units is converted to String
, and it is displayed as object
type.
Note: Refer this link to understand why String
is displayed as object.
This is how you can cast int column to String or Object.
Next, you’ll see how to convert column type to float.
Pandas Change Column Type To Float
In this section, you’ll learn how to change column type from to float.
You can use the astype()
method to convert a column to float.
In the sample dataframe, the column Unit_Price has numbers with decimal values but column type is String
format. Now you’ll convert it to float
.
df = df.astype({"Unit_Price": float})
df.dtypes
Where,
-
df.astype()
– Method to invoke the astype funtion in the dataframe. -
{"Unit_Price": float}
– List of columns to be cast into another format. No_Of_Units is the column which need to be cast into another format.float
is the target datatype to which the column values should be converted.
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int64
Available_Quantity object
Available_Since_Date object
dtype: object
You can see that the Unit_Price column is converted into float64
.
Printing the dataframe
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
You’ve converted a column which has only numbers to float
.
Now, let’s try to convert the column Available_Quantity to float. which has the non numeric charaters in one of the cells. The non numeric value is Not Available.
Note that, you’re using errors='coerce'
which will force the conversion of the possible values.
df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='coerce')
df.dtypes
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int64
Available_Quantity float64
Available_Since_Date object
dtype: object
The column is converted to float64
without any problems. The non numeric characters are converted to NaN
which means N ot A N umber.
Printing the dataframe
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5.0 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10.0 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11.0 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15.0 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | NaN | 01/05/2021 |
This is how you can cast column type to float
.
Next, you’ll learn how to cast column type to Datetime.
Pandas Change Column Type To Datetime64
In this section, you’ll learn how to change column type to Datetime64
.
You can use the method to_datetime() to convert a string to datetime.
In the sample dataframe, the column Available_Since_Date has the datevalue as a String type.
You’ll convert the column type to datetime
using the below snippet.
Snippet
df['Available_Since_Date']= pd.to_datetime(df['Available_Since_Date'])
df.dtypes
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int64
Available_Quantity object
Available_Since_Date datetime64[ns]
dtype: object
You could see that the column Available_Since_Date column is converted into datetime64[ns]
.
to_datetime()
also supports error handling where,
-
errors='raise'
will raise an error if there is invalid date values available in any of the cells. -
errors='ignore'
will silently ignore errors if there is invalid date values available in any of the cells and returns the column intact. -
errors='coerce'
will convert the valid dates to datetime type and set other cells toNaT
.
This is how you can convert column type to datetime.
Next, you’ll see how to convert multiple columns to int.
Pandas Convert Multiple Columns to Int
In this section, you’ll learn how to convert multiple columns to int using the astype()
method.
Its similar to how you converted a single column to int using the astype()
. You can just add the additional columns as shown below.
df[['column_1','column_2']] = df[['column_1','column_2']].astype(np.int64)
df.dtypes
The column_1 and Column_2 will be converted to int using the astype()
.
Example, We’ve shown only one columns as the sample dataframe has only one numbers column.
df[['No_Of_Units']] = df[['No_Of_Units']].astype(np.int64)
df.dtypes
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units np.int64
Available_Quantity object
Available_Since_Date object
dtype: object
You can see that the column No_Of_Units converted into int64
.
Printing the dataframe
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
Next, lets convert multiple columns using the to_numeric()
method.
You’ve to use the apply method to apply the function to_numeric() to the specified columns as shown below.
df[['column_1','column_2']] = df[['column_1','column_2']].apply(pd.to_numeric)
df.dtypes
Example, We’ve shown only one columns as the sample dataframe has only one numbers column.
df[["No_Of_Units"]] = df[["No_Of_Units"]].apply(pd.to_numeric)
df.dtypes
Datatypes of Columns
product_name object
Unit_Price float64
No_Of_Units int64
Available_Quantity object
Available_Since_Date object
dtype: object
You can see that the column No_Of_Units converted into int64
.
Printing the dataframe
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
This is how you can convert multiple column types to another format.
Next, you’ll see how to cast all columns to another type.
Pandas Convert All Columns
In this section, you’ll learn how to change column type of all columns in a dataframe. For example Converting All Object Columns To String.
You can use the astype()
method also for converting all columns.
First create a list of all columns called columns_list
by using list(df)
.
Then you can pass this list to the dataframe and invoke the astype()
method, pass the target datatype to the astype()
method.
For example, str
to convert all columns to string.
Snippet
columns_list = list(df)
df[columns_list] = df[columns_list].astype(str)
df.dtypes
Datatypes of Columns
product_name object
Unit_Price object
No_Of_Units object
Available_Quantity object
Available_Since_Date object
dtype: object
You can see that all the columns of the dataframe is converted to String and it is displayed as object.
Refer this link to understand why String
is displayed as object.
Printing the dataframe
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
Conclusion
To Summarize, you’ve learnt how to change column type in pandas dataframe.
You’ve used the methods to_numeric()
and astype()
to change the column types and how to use these methods for performing various type conversion along with the exception handling.
If you have any questions, comment below.
You May also Like
- How to Rename Column in pandas
- How to Iterate over Rows in Pandas Dataframe
- How to Drop Column in pandas dataframe
- How to Add Column to Dataframe in Pandas
Top comments (0)