pandas - Add column with missing values by position to timeseries

I have a dataframe that I like to add a column of values from array of tuples. The tuple contains the coordinates (position, value). An example:

import pandas as pd
import numpy as np

alpha = [chr(i) for i in range(ord('A'), ord('K')+1)]

dt = pd.date_range(start='2025-1-1', freq='1h', periods = len(alpha))

df = pd.DataFrame ( alpha , index = dt )
df.index.name = 'timestamp'
df.columns = ['item']

c = np.array( [(1, 100), (2, 202), (6, 772)] )

which gives:

timestamp	item
2025-01-01 00:00:00	A
2025-01-01 01:00:00	B
2025-01-01 02:00:00	C
2025-01-01 03:00:00	D
2025-01-01 04:00:00	E
2025-01-01 05:00:00	F
2025-01-01 06:00:00	G
2025-01-01 07:00:00	H
2025-01-01 08:00:00	I
2025-01-01 09:00:00	J
2025-01-01 10:00:00	K

I have a dataframe that I like to add a column of values from array of tuples. The tuple contains the coordinates (position, value). An example:

import pandas as pd
import numpy as np

alpha = [chr(i) for i in range(ord('A'), ord('K')+1)]

dt = pd.date_range(start='2025-1-1', freq='1h', periods = len(alpha))

df = pd.DataFrame ( alpha , index = dt )
df.index.name = 'timestamp'
df.columns = ['item']

c = np.array( [(1, 100), (2, 202), (6, 772)] )

which gives:

timestamp	item
2025-01-01 00:00:00	A
2025-01-01 01:00:00	B
2025-01-01 02:00:00	C
2025-01-01 03:00:00	D
2025-01-01 04:00:00	E
2025-01-01 05:00:00	F
2025-01-01 06:00:00	G
2025-01-01 07:00:00	H
2025-01-01 08:00:00	I
2025-01-01 09:00:00	J
2025-01-01 10:00:00	K

I am trying to join column c, in such a way that ROW[1] contains [B and 100].

I have accomplished what I want with the following:

df.reset_index(inplace = True) 
df.index.name = 'pos'

for x,y in c:
    df.loc[ int(x) , 'price'] = y

df.set_index("timestamp", inplace=True)

This gave me the desired results:

timestamp	item	price
2025-01-01 00:00:00	A	nan
2025-01-01 01:00:00	B	100
2025-01-01 02:00:00	C	202
2025-01-01 03:00:00	D	nan
2025-01-01 04:00:00	E	nan
2025-01-01 05:00:00	F	nan
2025-01-01 06:00:00	G	772
2025-01-01 07:00:00	H	nan
2025-01-01 08:00:00	I	nan
2025-01-01 09:00:00	J	nan
2025-01-01 10:00:00	K	nan

However, the idea of dropping and recreating the index for this feels a bit awkward, especially if I have multiple indexes.

My question, is there a better way that dropping and recreating an index to add a column with missing values, using position ?

Share Improve this question asked Mar 13 at 1:38 Mansour 6881 gold badge6 silver badges14 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 3

Index.take returns the index of your dataframe based on the position and we can use the first column of your array to get the index.

df.loc[df.index.take(c[:, 0]), 'price'] = c[:, 1]

You can also use a combination of loc and iloc.


df.loc[df.iloc[c[:, 0]].index, 'price'] = c[:, 1]

End result:

                    item  price
2025-01-01 00:00:00    A    NaN
2025-01-01 01:00:00    B  100.0
2025-01-01 02:00:00    C  202.0
2025-01-01 03:00:00    D    NaN
2025-01-01 04:00:00    E    NaN
2025-01-01 05:00:00    F    NaN
2025-01-01 06:00:00    G  772.0
2025-01-01 07:00:00    H    NaN
2025-01-01 08:00:00    I    NaN
2025-01-01 09:00:00    J    NaN
2025-01-01 10:00:00    K    NaN

You can do it this way by creating a key from hour in dataframe index and join on a dataframe created from the list of tuples.

df.reset_index().assign(key=df.index.hour).merge(pd.DataFrame(c, columns=['key', 'price']), how='left')

Output:

             timestamp item  key  price
0  2025-01-01 00:00:00    A    0    NaN
1  2025-01-01 01:00:00    B    1  100.0
2  2025-01-01 02:00:00    C    2  202.0
3  2025-01-01 03:00:00    D    3    NaN
4  2025-01-01 04:00:00    E    4    NaN
5  2025-01-01 05:00:00    F    5    NaN
6  2025-01-01 06:00:00    G    6  772.0
7  2025-01-01 07:00:00    H    7    NaN
8  2025-01-01 08:00:00    I    8    NaN
9  2025-01-01 09:00:00    J    9    NaN
10 2025-01-01 10:00:00    K   10    NaN

Or even shorted:

df.assign(price=df.index.hour.map(pd.Series(c[:,1], index=c[:,0])))

Output:

                    item  price
timestamp                      
2025-01-01 00:00:00    A    NaN
2025-01-01 01:00:00    B  100.0
2025-01-01 02:00:00    C  202.0
2025-01-01 03:00:00    D    NaN
2025-01-01 04:00:00    E    NaN
2025-01-01 05:00:00    F    NaN
2025-01-01 06:00:00    G  772.0
2025-01-01 07:00:00    H    NaN
2025-01-01 08:00:00    I    NaN
2025-01-01 09:00:00    J    NaN
2025-01-01 10:00:00    K    NaN

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744723462a4590048.html

pandas - Add column with missing values by position to timeseries - Stack Overflow

2 Answers 2

发表回复

评论列表（0条）

联系我们

400-800-8888

pandas - Add column with missing values by position to timeseries - Stack Overflow

2 Answers 2

相关推荐