Suppose I have aggregated the mean and the median of some value over 3 months, like:
df = (data.group_by('month_code').agg(pl.col('value').mean().alias('avg'),
pd.col('value').median().alias('med')
)
.sort('month_code')
.collect()
)
Resulting in something like:
df = pd.DataFrame({'month': ['M202412','M202501','M202502'],
'avg': [0.037824, 0.03616, 0.038919],
'med': [0.01381, 0.013028, 0.014843]
})
And I'd like to visualize it, so should convert to the format:
df_ = pd.DataFrame({'month': ['M202412','M202501','M202502']*2,
'type': ['avg','avg','avg','med','med','med'],
'value': [0.037824, 0.03616, 0.038919, 0.01381, 0.013028, 0.014843],
})
Which is then easy to visualize:
df_.plot.line(x='month',y='value',color='type').properties(width=400, height=350, title='avg and med')
What is the simplest way to convert df to df_ above?
Suppose I have aggregated the mean and the median of some value over 3 months, like:
df = (data.group_by('month_code').agg(pl.col('value').mean().alias('avg'),
pd.col('value').median().alias('med')
)
.sort('month_code')
.collect()
)
Resulting in something like:
df = pd.DataFrame({'month': ['M202412','M202501','M202502'],
'avg': [0.037824, 0.03616, 0.038919],
'med': [0.01381, 0.013028, 0.014843]
})
And I'd like to visualize it, so should convert to the format:
df_ = pd.DataFrame({'month': ['M202412','M202501','M202502']*2,
'type': ['avg','avg','avg','med','med','med'],
'value': [0.037824, 0.03616, 0.038919, 0.01381, 0.013028, 0.014843],
})
Which is then easy to visualize:
df_.plot.line(x='month',y='value',color='type').properties(width=400, height=350, title='avg and med')
What is the simplest way to convert df to df_ above?
Share Improve this question edited Mar 25 at 9:40 Sandipan Dey 23.3k4 gold badges57 silver badges71 bronze badges asked Mar 25 at 9:28 lmocsilmocsi 1,0981 gold badge11 silver badges26 bronze badges2 Answers
Reset to default 2In Polars df.melt
is deprecated for df.unpivot
.
Unpivot does also not require passing the columns to unpivot if the "index" column is specified.
So, in a Polars, a current solution is as below
df = pl.DataFrame({
'month': ['M202412','M202501','M202502'],
'avg': [0.037824, 0.03616, 0.038919],
'med': [0.01381, 0.013028, 0.014843],
})
df_ = df.unpivot(index="month", variable_name="type")
print(df_)
# shape: (6, 3)
# ┌─────────┬──────┬──────────┐
# │ month ┆ type ┆ value │
# │ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ f64 │
# ╞═════════╪══════╪══════════╡
# │ M202412 ┆ avg ┆ 0.037824 │
# │ M202501 ┆ avg ┆ 0.03616 │
# │ M202502 ┆ avg ┆ 0.038919 │
# │ M202412 ┆ med ┆ 0.01381 │
# │ M202501 ┆ med ┆ 0.013028 │
# │ M202502 ┆ med ┆ 0.014843 │
# └─────────┴──────┴──────────┘
You can try uisng df.melt to convert from wide to long formats. It keeps the month
column as an identifier and unpivots
the avg
and med
columns into rows with corresponding type and value columns.
df_ = df.melt(
id_vars=["month"],
value_vars=["avg", "med"],
variable_name="type",
value_name="value"
)
Full reproducable:
import polars as pl
import plotly.express as px
df = pl.DataFrame({
'month': ['M202412', 'M202501', 'M202502'],
'avg': [0.037824, 0.03616, 0.038919],
'med': [0.01381, 0.013028, 0.014843]
})
df_ = df.melt(
id_vars=["month"],
value_vars=["avg", "med"],
variable_name="type",
value_name="value"
)
fig = px.line(
df_.to_pandas(),
x='month',
y='value',
color='type',
title='Average and Median Values'
)
fig.update_layout(
width=400,
height=350
)
fig.show()
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744206756a4563141.html
评论列表(0条)