pandas - Add column with missing values by position to timeseries - Stack Overflow

I have a dataframe that I like to add a column of values from array of tuples. The tuple contains the c

I have a dataframe that I like to add a column of values from array of tuples. The tuple contains the coordinates (position, value). An example:

import pandas as pd
import numpy as np

alpha = [chr(i) for i in range(ord('A'), ord('K')+1)]

dt = pd.date_range(start='2025-1-1', freq='1h', periods = len(alpha))

df = pd.DataFrame ( alpha , index = dt )
df.index.name = 'timestamp'
df.columns = ['item']

c = np.array( [(1, 100), (2, 202), (6, 772)] )

which gives:

timestamp item
2025-01-01 00:00:00 A
2025-01-01 01:00:00 B
2025-01-01 02:00:00 C
2025-01-01 03:00:00 D
2025-01-01 04:00:00 E
2025-01-01 05:00:00 F
2025-01-01 06:00:00 G
2025-01-01 07:00:00 H
2025-01-01 08:00:00 I
2025-01-01 09:00:00 J
2025-01-01 10:00:00 K

I have a dataframe that I like to add a column of values from array of tuples. The tuple contains the coordinates (position, value). An example:

import pandas as pd
import numpy as np

alpha = [chr(i) for i in range(ord('A'), ord('K')+1)]

dt = pd.date_range(start='2025-1-1', freq='1h', periods = len(alpha))

df = pd.DataFrame ( alpha , index = dt )
df.index.name = 'timestamp'
df.columns = ['item']

c = np.array( [(1, 100), (2, 202), (6, 772)] )

which gives:

timestamp item
2025-01-01 00:00:00 A
2025-01-01 01:00:00 B
2025-01-01 02:00:00 C
2025-01-01 03:00:00 D
2025-01-01 04:00:00 E
2025-01-01 05:00:00 F
2025-01-01 06:00:00 G
2025-01-01 07:00:00 H
2025-01-01 08:00:00 I
2025-01-01 09:00:00 J
2025-01-01 10:00:00 K

I am trying to join column c, in such a way that ROW[1] contains [B and 100].

I have accomplished what I want with the following:

df.reset_index(inplace = True) 
df.index.name = 'pos'

for x,y in c:
    df.loc[ int(x) , 'price'] = y

df.set_index("timestamp", inplace=True)

This gave me the desired results:

timestamp item price
2025-01-01 00:00:00 A nan
2025-01-01 01:00:00 B 100
2025-01-01 02:00:00 C 202
2025-01-01 03:00:00 D nan
2025-01-01 04:00:00 E nan
2025-01-01 05:00:00 F nan
2025-01-01 06:00:00 G 772
2025-01-01 07:00:00 H nan
2025-01-01 08:00:00 I nan
2025-01-01 09:00:00 J nan
2025-01-01 10:00:00 K nan

However, the idea of dropping and recreating the index for this feels a bit awkward, especially if I have multiple indexes.

My question, is there a better way that dropping and recreating an index to add a column with missing values, using position ?

Share Improve this question asked Mar 13 at 1:38 MansourMansour 6881 gold badge6 silver badges14 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 3

Index.take returns the index of your dataframe based on the position and we can use the first column of your array to get the index.

df.loc[df.index.take(c[:, 0]), 'price'] = c[:, 1]

You can also use a combination of loc and iloc.


df.loc[df.iloc[c[:, 0]].index, 'price'] = c[:, 1]

End result:

                    item  price
2025-01-01 00:00:00    A    NaN
2025-01-01 01:00:00    B  100.0
2025-01-01 02:00:00    C  202.0
2025-01-01 03:00:00    D    NaN
2025-01-01 04:00:00    E    NaN
2025-01-01 05:00:00    F    NaN
2025-01-01 06:00:00    G  772.0
2025-01-01 07:00:00    H    NaN
2025-01-01 08:00:00    I    NaN
2025-01-01 09:00:00    J    NaN
2025-01-01 10:00:00    K    NaN

You can do it this way by creating a key from hour in dataframe index and join on a dataframe created from the list of tuples.

df.reset_index().assign(key=df.index.hour).merge(pd.DataFrame(c, columns=['key', 'price']), how='left')

Output:

             timestamp item  key  price
0  2025-01-01 00:00:00    A    0    NaN
1  2025-01-01 01:00:00    B    1  100.0
2  2025-01-01 02:00:00    C    2  202.0
3  2025-01-01 03:00:00    D    3    NaN
4  2025-01-01 04:00:00    E    4    NaN
5  2025-01-01 05:00:00    F    5    NaN
6  2025-01-01 06:00:00    G    6  772.0
7  2025-01-01 07:00:00    H    7    NaN
8  2025-01-01 08:00:00    I    8    NaN
9  2025-01-01 09:00:00    J    9    NaN
10 2025-01-01 10:00:00    K   10    NaN

Or even shorted:

df.assign(price=df.index.hour.map(pd.Series(c[:,1], index=c[:,0])))

Output:

                    item  price
timestamp                      
2025-01-01 00:00:00    A    NaN
2025-01-01 01:00:00    B  100.0
2025-01-01 02:00:00    C  202.0
2025-01-01 03:00:00    D    NaN
2025-01-01 04:00:00    E    NaN
2025-01-01 05:00:00    F    NaN
2025-01-01 06:00:00    G  772.0
2025-01-01 07:00:00    H    NaN
2025-01-01 08:00:00    I    NaN
2025-01-01 09:00:00    J    NaN
2025-01-01 10:00:00    K    NaN

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744723462a4590048.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信