python - Simplest way to convert aggregated data to visualize in polars

Suppose I have aggregated the mean and the median of some value over 3 months, like:

df = (data.group_by('month_code').agg(pl.col('value').mean().alias('avg'),
                                      pd.col('value').median().alias('med')
                                      )
          .sort('month_code')
          .collect()
      )

Resulting in something like:

df = pd.DataFrame({'month': ['M202412','M202501','M202502'],
                   'avg': [0.037824, 0.03616, 0.038919],
                   'med': [0.01381, 0.013028, 0.014843]
                   })

And I'd like to visualize it, so should convert to the format:

df_ = pd.DataFrame({'month': ['M202412','M202501','M202502']*2,
                   'type': ['avg','avg','avg','med','med','med'],
                   'value': [0.037824, 0.03616, 0.038919, 0.01381, 0.013028, 0.014843],
                   })

Which is then easy to visualize:

df_.plot.line(x='month',y='value',color='type').properties(width=400, height=350, title='avg and med')

What is the simplest way to convert df to df_ above?

Suppose I have aggregated the mean and the median of some value over 3 months, like:

df = (data.group_by('month_code').agg(pl.col('value').mean().alias('avg'),
                                      pd.col('value').median().alias('med')
                                      )
          .sort('month_code')
          .collect()
      )

Resulting in something like:

df = pd.DataFrame({'month': ['M202412','M202501','M202502'],
                   'avg': [0.037824, 0.03616, 0.038919],
                   'med': [0.01381, 0.013028, 0.014843]
                   })

And I'd like to visualize it, so should convert to the format:

df_ = pd.DataFrame({'month': ['M202412','M202501','M202502']*2,
                   'type': ['avg','avg','avg','med','med','med'],
                   'value': [0.037824, 0.03616, 0.038919, 0.01381, 0.013028, 0.014843],
                   })

Which is then easy to visualize:

df_.plot.line(x='month',y='value',color='type').properties(width=400, height=350, title='avg and med')

What is the simplest way to convert df to df_ above?

Share Improve this question edited Mar 25 at 9:40 Sandipan Dey 23.3k4 gold badges57 silver badges71 bronze badges asked Mar 25 at 9:28 lmocsi 1,0981 gold badge11 silver badges26 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 2

In Polars df.melt is deprecated for df.unpivot.

Unpivot does also not require passing the columns to unpivot if the "index" column is specified.

So, in a Polars, a current solution is as below

df = pl.DataFrame({
    'month': ['M202412','M202501','M202502'],
    'avg': [0.037824, 0.03616, 0.038919],
    'med': [0.01381, 0.013028, 0.014843],
})
df_ = df.unpivot(index="month", variable_name="type")

print(df_)

# shape: (6, 3)
# ┌─────────┬──────┬──────────┐
# │ month   ┆ type ┆ value    │
# │ ---     ┆ ---  ┆ ---      │
# │ str     ┆ str  ┆ f64      │
# ╞═════════╪══════╪══════════╡
# │ M202412 ┆ avg  ┆ 0.037824 │
# │ M202501 ┆ avg  ┆ 0.03616  │
# │ M202502 ┆ avg  ┆ 0.038919 │
# │ M202412 ┆ med  ┆ 0.01381  │
# │ M202501 ┆ med  ┆ 0.013028 │
# │ M202502 ┆ med  ┆ 0.014843 │
# └─────────┴──────┴──────────┘

You can try uisng df.melt to convert from wide to long formats. It keeps the month column as an identifier and unpivots the avg and med columns into rows with corresponding type and value columns.

df_ = df.melt(
    id_vars=["month"],
    value_vars=["avg", "med"],
    variable_name="type",
    value_name="value"
)

Full reproducable:

import polars as pl
import plotly.express as px

df = pl.DataFrame({
    'month': ['M202412', 'M202501', 'M202502'],
    'avg': [0.037824, 0.03616, 0.038919],
    'med': [0.01381, 0.013028, 0.014843]
})

df_ = df.melt(
    id_vars=["month"],
    value_vars=["avg", "med"],
    variable_name="type",
    value_name="value"
)

fig = px.line(
    df_.to_pandas(),  
    x='month',
    y='value',
    color='type',
    title='Average and Median Values'
)

fig.update_layout(
    width=400,
    height=350
)

fig.show()

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744206756a4563141.html

python - Simplest way to convert aggregated data to visualize in polars - Stack Overflow

2 Answers 2

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Simplest way to convert aggregated data to visualize in polars - Stack Overflow

2 Answers 2

相关推荐