PySpark Convert String To Date Format – Check 2 Great Examples!

PySpark Convert String To Date Format - Check 2 Great Examples!
Share this post and Earn Free Points!

In this post I will show you how to using PySpark Convert String To Date Format. Since Spark 2.2+ is very easy. You can just use the built-in function like to_date or to_timestamp, because the both support the format argument.

PySpark Convert String To Date Format

Very often when we work with Spark we need to convert data from one type to another. The case with conversion from String to Date or from String to Timestamp is (I think) the hard one, due to fact that Date or Timestamp can be presented by various formats like: YYYYMM, YYYY-MM-DD, yyyy-MM-dd HH:mm:ss etc…

To convert a string to a date format in PySpark, you can use the to_date function in the pyspark.sql.functions module. This function takes a string column as input and returns a column of date type.

Here is an example of how you can use to_date to convert a string column to a date column in a PySpark DataFrame:

from pyspark.sql.functions import to_date

df = spark.createDataFrame([("2022-12-28",),("2022-12-29",)], ["date_string"])

df = df.withColumn("date", to_date(df.date_string))

df.show()

This will output the following DataFrame:

+------------+----------+
| date_string|      date|
+------------+----------+
|2022-12-28  |2022-12-28|
|2022-12-29  |2022-12-29|
+------------+----------+

You can also specify the format of the input string using the format argument of to_date. For example, if the input string is in the format “dd-MM-yyyy”, you can use the following code to parse it:

df = df.withColumn("date", to_date(df.date_string, "dd-MM-yyyy"))

Note that the to_date function will return null if the input string is not in a valid date format. You can use the when and otherwise functions from pyspark.sql.functions to handle such cases. For example:

from pyspark.sql.functions import when

df = df.withColumn("date", when(to_date(df.date_string, "dd-MM-yyyy").isNotNull(), to_date(df.date_string, "dd-MM-yyyy")).otherwise(None))

This will set the date column to null for any rows where the date_string column is not in the “dd-MM-yyyy” format.

How To Use to_timestamp Function?

PySpark Convert String to Date or to Timestamp – please find the function to_timestamp which you can use to convert String to Timestamp in PySpark.

pyspark.sql.functions.to_timestamp(col, format=None)

Converts Column of pyspark.sql.types.StringType or pyspark.sql.types.TimestampType into pyspark.sql.types.DateType using the optionally specified format. Default format is ‘yyyy-MM-dd HH:mm:ss’. Specify formats according to SimpleDateFormats.

https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.functions.to_timestamp

Example Of: to_timestamp Method

df = spark.createDataFrame([('2022-02-14 16:15:00',)], ['t'])
df.select(to_timestamp(df.t).alias('dt')).collect()

[Row(dt=datetime.datetime(2022, 2, 14, 16, 15))]

How To Use to_utc_timestamp Function?

Given a timestamp, which corresponds to a certain time of day in the given timezone, returns another timestamp that corresponds to the same time of day in UTC.

https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.functions.to_utc_timestamp

Example Of: to_utc_timestamp Method

df = spark.createDataFrame([('2022-02-14 16:15:00',)], ['t'])
df.select(to_utc_timestamp(df.t, "PST").alias('t')).collect()

[Row(dt=datetime.datetime(2022, 2, 14, 16, 15))]
Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

How useful was this post?

Click on a star to rate it!

Average rating 4.5 / 5. Vote count: 2

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?