python - polars date quarter parsing using strptime returns null - Stack Overflow

Using the documentation here (which also points to here) I would expect the following use of the Polars

Using the documentation here (which also points to here) I would expect the following use of the Polars strptime function to produce a pl.Date value:

import polars as pl

date_format = "%Y-Q%q-%d"
df = pl.DataFrame({
    "quarter_str": ["2024-Q1-01", "2023-Q3-01", "2025-Q2-01"]
})

## another approach that does not work
#date_format = "%Y-Q%q"
#df = pl.DataFrame({
#    "quarter_str": ["2024-Q1", "2023-Q3", "2025-Q2"]
#})

result = df.with_columns(
    pl.col("quarter_str").str.strptime(pl.Date, format=date_format, strict=False).alias("parsed_date")
)

print(result)

I'm not sure if this is my issue, a Polars issue, or an issue with the Rust library. But it seems like the parsing of the quarter is not performing as expected. Note that neither of the above approaches work (see commented out value). At first I thought it would not assume the first day of the quarter, but passing a day %d value didn't help either.

Is there a python polars approach to convert a string only containing a year/quarter into a pl.Date value?

The expected output for the string '2023-Q3', for example, would be the date July 1st, 2023. Since that date is the first day of the 3rd quarter. The expected output for the string '2023-Q3-27' would be July 27th, 2023 (the 27th day of the quarter). The expected output for the string '2023-Q3-45' would be the 45th day of the 3rd quarter of 2023 - sometime in mid-August, for example.

Using the documentation here (which also points to here) I would expect the following use of the Polars strptime function to produce a pl.Date value:

import polars as pl

date_format = "%Y-Q%q-%d"
df = pl.DataFrame({
    "quarter_str": ["2024-Q1-01", "2023-Q3-01", "2025-Q2-01"]
})

## another approach that does not work
#date_format = "%Y-Q%q"
#df = pl.DataFrame({
#    "quarter_str": ["2024-Q1", "2023-Q3", "2025-Q2"]
#})

result = df.with_columns(
    pl.col("quarter_str").str.strptime(pl.Date, format=date_format, strict=False).alias("parsed_date")
)

print(result)

I'm not sure if this is my issue, a Polars issue, or an issue with the Rust library. But it seems like the parsing of the quarter is not performing as expected. Note that neither of the above approaches work (see commented out value). At first I thought it would not assume the first day of the quarter, but passing a day %d value didn't help either.

Is there a python polars approach to convert a string only containing a year/quarter into a pl.Date value?

The expected output for the string '2023-Q3', for example, would be the date July 1st, 2023. Since that date is the first day of the 3rd quarter. The expected output for the string '2023-Q3-27' would be July 27th, 2023 (the 27th day of the quarter). The expected output for the string '2023-Q3-45' would be the 45th day of the 3rd quarter of 2023 - sometime in mid-August, for example.

Share Improve this question edited Mar 25 at 16:19 Rodalm 5,5838 silver badges21 bronze badges asked Mar 25 at 15:10 sicsmprsicsmpr 535 bronze badges 3
  • 1 What is the expected output? How do you determine the month of the date, knowing only the day of the month, the quarter and the year? That's probably the reason why the parsing fails. Or did you expect the %d to represent the day of the quarter instead of the day of the month? – Rodalm Commented Mar 25 at 15:26
  • The expected output for the string '2023-Q3', for example, would be the date July 1st, 2023. Since that date is the first day of the 3rd quarter. The expected output for the string '2023-Q3-27' would be July 27th, 2023 (the 27th day of the quarter). The expected output for the string '2023-Q3-45' would be the 45th day of the 3rd quarter of 2023 - sometime in mid-August, for example. But I realize making assumptions can be dangerous. Just passing year and month, for example, does not assume the 1st day of the month - so why should passing year and quarter assume the 1st day of the quarter. – sicsmpr Commented Mar 25 at 15:42
  • I was hoping for an 'easy button' but there does not seem to be one. – sicsmpr Commented Mar 25 at 15:43
Add a comment  | 

1 Answer 1

Reset to default 2

As far as I know, there is no direct way to parse those formats using pl.Expr.str.strptime. An alternative approach using column expressions is:

  1. Extract the year, quarter and optionally the day since the start of the quarter from the quarter_str

  2. Determine the starting month of the quarter

  3. Construct the quarter start date

  4. If quarter_str contains days at the end of the string, add them to the quarter start date

import polars as pl

df = pl.DataFrame(
    {
        "quarter_str": ["2023-Q3-01", "2023-Q3-27", "2025-Q4", "2025-Q3-45"],
    }
)

quarter = pl.col("quarter_str").str.extract("Q(\d)").cast(pl.Int8)
year = pl.col("quarter_str").str.extract("(\d{4})").cast(pl.Int16)
month = (quarter - 1) * 3 + 1
quarter_start = pl.date(year, month, 1)

has_day_of_quarter = pl.col("quarter_str").str.contains("\d{4}-Q\d-\d+")
days_since_quarter_start = (
    pl.when(has_day_of_quarter)
    .then(pl.col("quarter_str").str.extract("-(\d+)$").cast(pl.Int16) - 1)
    .otherwise(0)
)

result = df.with_columns(
    (quarter_start + pl.duration(days=days_since_quarter_start)).alias("parsed_date")
)

Output :

>>> result

shape: (4, 2)
┌─────────────┬─────────────┐
│ quarter_str ┆ parsed_date │
│ ---         ┆ ---         │
│ str         ┆ date        │
╞═════════════╪═════════════╡
│ 2023-Q3-01  ┆ 2023-07-01  │
│ 2023-Q3-27  ┆ 2023-07-27  │
│ 2025-Q4     ┆ 2025-10-01  │
│ 2025-Q3-45  ┆ 2025-08-14  │
└─────────────┴─────────────┘

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744186819a4562243.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信