Let us assume we have this dataframe:
df = pd.DataFrame.from_dict({1:{"a": 10, "b":20, "c":30}, 2:{"a":100, "b":200, "c":300}}, orient="index")
Further, let us assume I want to apply a function to each row that adds 1 to the values in columns a and b
def add(x):
return x["a"] +1, x["b"] +1
Now, if I use the apply function to mod and overwrite the columns twice, some values are flipped:
>>> df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(lambda x: add(x), axis=1)
>>> df
a b c
1 11 101 30
2 21 201 300
>>>
>>> df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(lambda x: add(x), axis=1)
>>> df
a b c
1 12 22 30
2 102 202 300
>>>
Could somebody explain to me why b1 and a2 get flipped?
Let us assume we have this dataframe:
df = pd.DataFrame.from_dict({1:{"a": 10, "b":20, "c":30}, 2:{"a":100, "b":200, "c":300}}, orient="index")
Further, let us assume I want to apply a function to each row that adds 1 to the values in columns a and b
def add(x):
return x["a"] +1, x["b"] +1
Now, if I use the apply function to mod and overwrite the columns twice, some values are flipped:
>>> df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(lambda x: add(x), axis=1)
>>> df
a b c
1 11 101 30
2 21 201 300
>>>
>>> df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(lambda x: add(x), axis=1)
>>> df
a b c
1 12 22 30
2 102 202 300
>>>
Could somebody explain to me why b1 and a2 get flipped?
Share asked Mar 10 at 22:42 LangtecLangtec 938 bronze badges2 Answers
Reset to default 4The issue lies in your add
function. You defined the function to return the tuple x['a'] + 1, x['b'] + 1
, causing to "flip" the values between column a and b.
The function you pass to apply
should in your case not know anything about columns.
Simply define the function as:
def add(x):
return x + 1
df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(add)
You can even remove the axis
assignment when passing the function as a parameter to apply.
As you call apply only on the dataframe columns 'a'
and 'b'
you don't need to specify in your add function that those are the columns you want to add + 1 to.
This is your original DataFrame:
a b c
1 10 20 30
2 100 200 300
Now, look at the output of df[['a', 'b']].apply(add, axis=1)
:
df[['a', 'b']].apply(add, axis=1)
1 (11, 101)
2 (21, 201)
dtype: object
This creates a Series of tuples, which means you have two items (11, 101)
and (21, 201)
, and those are objects (tuples). The first item will be assigned to a
, the second to b
.
Let see what happens if you were assigning two strings instead:
df.loc[:, ['a', 'b']] = ['x', 'y']
a b c
1 x y 30
2 x y 300
The first item (x
) gets assigned to a
, the second (y
) to b
.
Your unexpected behavior is due to a combination of two things:
- you are ignoring the index with
.loc[:, ...]
- the right hand side is a Series (of objects)
If you remove either condition, this wouldn't work:
# let's assign on the columns directly
df[['a', 'b']] = df[['a', 'b']].apply(add, axis=1)
# KeyError: 0
# let's convert the output to list
df[['a', 'b']] = df[['a', 'b']].apply(add, axis=1).tolist()
# a b c
# 1 11 21 30
# 2 101 201 300
In addition, your error only occurred because you had the same number of rows and columns in the selection. This would have raised an error with 3 columns:
df.loc[:, ['a', 'b', 'c']] = df[['a', 'b', 'c']].apply(add, axis=1)
# ValueError: Must have equal len keys and value when setting with an iterable
Take home message
If you need to use a function with apply
and axis=1
and you want to output several "columns", either convert the output to lists if you have the same columns as output:
df[['a', 'b']] = df[['a', 'b']].apply(add, axis=1).tolist()
Or output a DataFrame by making the function return a Series:
def add(x):
return pd.Series({'a': x['a']+1, 'b': x['b']+1})
df[['a', 'b']] = df[['a', 'b']].apply(add, axis=1)
In any case, never use df.loc[:, ...]
unless you know why you're doing this (i.e. you're purposely breaking the index alignment).
Vectorial operations
Of course, the above assumes you have complex, non-vectorized functions to use. If your goal is to perform a simple addition:
# adding 1 to both a and b
df[['a', 'b']] += 1
# adding 1 to a and 2 to b
df[['a', 'b']] += [1, 2]
# adding 1 to a and 2 to b, using add
df[['a', 'b']] = df[['a', 'b']].add([1, 2])
# adding 1 to a and 2 to b, using a dictionary
df[['a', 'b']] = df[['a', 'b']].add({'b': 2, 'a': 1})
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744820584a4595590.html
评论列表(0条)