python - Simplest way to convert aggregated data to visualize in polars - Stack Overflow

Suppose I have aggregated the mean and the median of some value over 3 months, like:df = (data.group_b

Suppose I have aggregated the mean and the median of some value over 3 months, like:

df = (data.group_by('month_code').agg(pl.col('value').mean().alias('avg'),
                                      pd.col('value').median().alias('med')
                                      )
          .sort('month_code')
          .collect()
      )

Resulting in something like:

df = pd.DataFrame({'month': ['M202412','M202501','M202502'],
                   'avg': [0.037824, 0.03616, 0.038919],
                   'med': [0.01381, 0.013028, 0.014843]
                   })

And I'd like to visualize it, so should convert to the format:

df_ = pd.DataFrame({'month': ['M202412','M202501','M202502']*2,
                   'type': ['avg','avg','avg','med','med','med'],
                   'value': [0.037824, 0.03616, 0.038919, 0.01381, 0.013028, 0.014843],
                   })

Which is then easy to visualize:

df_.plot.line(x='month',y='value',color='type').properties(width=400, height=350, title='avg and med')

What is the simplest way to convert df to df_ above?

Suppose I have aggregated the mean and the median of some value over 3 months, like:

df = (data.group_by('month_code').agg(pl.col('value').mean().alias('avg'),
                                      pd.col('value').median().alias('med')
                                      )
          .sort('month_code')
          .collect()
      )

Resulting in something like:

df = pd.DataFrame({'month': ['M202412','M202501','M202502'],
                   'avg': [0.037824, 0.03616, 0.038919],
                   'med': [0.01381, 0.013028, 0.014843]
                   })

And I'd like to visualize it, so should convert to the format:

df_ = pd.DataFrame({'month': ['M202412','M202501','M202502']*2,
                   'type': ['avg','avg','avg','med','med','med'],
                   'value': [0.037824, 0.03616, 0.038919, 0.01381, 0.013028, 0.014843],
                   })

Which is then easy to visualize:

df_.plot.line(x='month',y='value',color='type').properties(width=400, height=350, title='avg and med')

What is the simplest way to convert df to df_ above?

Share Improve this question edited Mar 25 at 9:40 Sandipan Dey 23.3k4 gold badges57 silver badges71 bronze badges asked Mar 25 at 9:28 lmocsilmocsi 1,0981 gold badge11 silver badges26 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 2

In Polars df.melt is deprecated for df.unpivot.

Unpivot does also not require passing the columns to unpivot if the "index" column is specified.

So, in a Polars, a current solution is as below

df = pl.DataFrame({
    'month': ['M202412','M202501','M202502'],
    'avg': [0.037824, 0.03616, 0.038919],
    'med': [0.01381, 0.013028, 0.014843],
})
df_ = df.unpivot(index="month", variable_name="type")

print(df_)

# shape: (6, 3)
# ┌─────────┬──────┬──────────┐
# │ month   ┆ type ┆ value    │
# │ ---     ┆ ---  ┆ ---      │
# │ str     ┆ str  ┆ f64      │
# ╞═════════╪══════╪══════════╡
# │ M202412 ┆ avg  ┆ 0.037824 │
# │ M202501 ┆ avg  ┆ 0.03616  │
# │ M202502 ┆ avg  ┆ 0.038919 │
# │ M202412 ┆ med  ┆ 0.01381  │
# │ M202501 ┆ med  ┆ 0.013028 │
# │ M202502 ┆ med  ┆ 0.014843 │
# └─────────┴──────┴──────────┘

You can try uisng df.melt to convert from wide to long formats. It keeps the month column as an identifier and unpivots the avg and med columns into rows with corresponding type and value columns.

df_ = df.melt(
    id_vars=["month"],
    value_vars=["avg", "med"],
    variable_name="type",
    value_name="value"
)

Full reproducable:

import polars as pl
import plotly.express as px

df = pl.DataFrame({
    'month': ['M202412', 'M202501', 'M202502'],
    'avg': [0.037824, 0.03616, 0.038919],
    'med': [0.01381, 0.013028, 0.014843]
})

df_ = df.melt(
    id_vars=["month"],
    value_vars=["avg", "med"],
    variable_name="type",
    value_name="value"
)

fig = px.line(
    df_.to_pandas(),  
    x='month',
    y='value',
    color='type',
    title='Average and Median Values'
)

fig.update_layout(
    width=400,
    height=350
)

fig.show()

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744206756a4563141.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信