In this post I will show you how to using PySpark" Convert String To Date Format. Since Spark 2.2+ is very easy. You can just use the built-in function like to_date or to_timestamp, because the both support the format argument.
Table of Contents
PySpark Convert String To Date Format
Very often when we work with Spark" we need to convert data from one type to another. The case with conversion from String to Date or from String to Timestamp is (I think) the hard one, due to fact that Date or Timestamp can be presented by various formats like: YYYYMM, YYYY-MM-DD, yyyy-MM-dd HH:mm:ss etc…
To convert a string to a date format in PySpark", you can use the to_date
function in the pyspark.sql.functions
module. This function takes a string column as input and returns a column" of date type.
Here is an example of how you can use to_date
to convert a string column to a date column in a PySpark" DataFrame":
from pyspark.sql.functions import to_date df = spark.createDataFrame([("2022-12-28",),("2022-12-29",)], ["date_string"]) df = df.withColumn("date", to_date(df.date_string)) df.show()
This will output the following DataFrame":
+------------+----------+ | date_string| date| +------------+----------+ |2022-12-28 |2022-12-28| |2022-12-29 |2022-12-29| +------------+----------+
You can also specify the format of the input string using the format
argument of to_date
. For example, if the input string is in the format “dd-MM-yyyy”, you can use the following code to parse it:
df = df.withColumn("date", to_date(df.date_string, "dd-MM-yyyy"))
Note that the to_date
function will return null
if the input string is not in a valid date format. You can use the when
and otherwise
functions from pyspark.sql.functions
to handle such cases. For example:
from pyspark.sql.functions import when df = df.withColumn("date", when(to_date(df.date_string, "dd-MM-yyyy").isNotNull(), to_date(df.date_string, "dd-MM-yyyy")).otherwise(None))
This will set the date
column to null
for any rows where the date_string
column is not in the “dd-MM-yyyy” format.
How To Use to_timestamp Function?
PySpark" Convert String to Date or to Timestamp – please find the function to_timestamp which you can use to convert String to Timestamp in PySpark".
pyspark.sql.functions.to_timestamp(col, format=None)
Converts Column of pyspark.sql.types.StringType or pyspark.sql.types.TimestampType into pyspark.sql.types.DateType using the optionally specified format. Default" format is ‘yyyy-MM-dd HH:mm:ss’. Specify formats according to SimpleDateFormats.
https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.functions.to_timestamp
Example Of: to_timestamp Method
df = spark.createDataFrame([('2022-02-14 16:15:00',)], ['t']) df.select(to_timestamp(df.t).alias('dt')).collect() [Row(dt=datetime.datetime(2022, 2, 14, 16, 15))]
How To Use to_utc_timestamp Function?
Given a timestamp, which corresponds to a certain time of day in the given timezone, returns another timestamp that corresponds to the same time of day in UTC.
https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.functions.to_utc_timestamp
Example Of: to_utc_timestamp Method
df = spark.createDataFrame([('2022-02-14 16:15:00',)], ['t']) df.select(to_utc_timestamp(df.t, "PST").alias('t')).collect() [Row(dt=datetime.datetime(2022, 2, 14, 16, 15))]
Could You Please Share This Post?
I appreciate It And Thank YOU! :)
Have A Nice Day!