python - confusion on re-assigning pandas columns after modification with apply

Let us assume we have this dataframe:

df = pd.DataFrame.from_dict({1:{"a": 10, "b":20, "c":30}, 2:{"a":100, "b":200, "c":300}}, orient="index")

Further, let us assume I want to apply a function to each row that adds 1 to the values in columns a and b

def add(x):
    return x["a"] +1, x["b"] +1

Now, if I use the apply function to mod and overwrite the columns twice, some values are flipped:

>>> df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(lambda x: add(x), axis=1)
>>> df
    a    b    c
1  11  101   30
2  21  201  300
>>> 
>>> df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(lambda x: add(x), axis=1)
>>> df
     a    b    c
1   12   22   30
2  102  202  300
>>>

Could somebody explain to me why b1 and a2 get flipped?

Let us assume we have this dataframe:

df = pd.DataFrame.from_dict({1:{"a": 10, "b":20, "c":30}, 2:{"a":100, "b":200, "c":300}}, orient="index")

Further, let us assume I want to apply a function to each row that adds 1 to the values in columns a and b

def add(x):
    return x["a"] +1, x["b"] +1

Now, if I use the apply function to mod and overwrite the columns twice, some values are flipped:

>>> df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(lambda x: add(x), axis=1)
>>> df
    a    b    c
1  11  101   30
2  21  201  300
>>> 
>>> df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(lambda x: add(x), axis=1)
>>> df
     a    b    c
1   12   22   30
2  102  202  300
>>>

Could somebody explain to me why b1 and a2 get flipped?

Share asked Mar 10 at 22:42 Langtec 938 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 4

The issue lies in your add function. You defined the function to return the tuple x['a'] + 1, x['b'] + 1, causing to "flip" the values between column a and b.

The function you pass to apply should in your case not know anything about columns.
Simply define the function as:

def add(x):
    return x + 1

df.loc[:, ["a", "b"]] = df[["a", "b"]].apply(add)

You can even remove the axis assignment when passing the function as a parameter to apply.
As you call apply only on the dataframe columns 'a' and 'b' you don't need to specify in your add function that those are the columns you want to add + 1 to.

This is your original DataFrame:

     a    b    c
1   10   20   30
2  100  200  300

Now, look at the output of df[['a', 'b']].apply(add, axis=1):

df[['a', 'b']].apply(add, axis=1)

1    (11, 101)
2    (21, 201)
dtype: object

This creates a Series of tuples, which means you have two items (11, 101) and (21, 201), and those are objects (tuples). The first item will be assigned to a, the second to b.

Let see what happens if you were assigning two strings instead:

df.loc[:, ['a', 'b']] = ['x', 'y']

   a  b    c
1  x  y   30
2  x  y  300

The first item (x) gets assigned to a, the second (y) to b.

Your unexpected behavior is due to a combination of two things:

you are ignoring the index with .loc[:, ...]
the right hand side is a Series (of objects)

If you remove either condition, this wouldn't work:

# let's assign on the columns directly
df[['a', 'b']] = df[['a', 'b']].apply(add, axis=1)

# KeyError: 0


# let's convert the output to list
df[['a', 'b']] = df[['a', 'b']].apply(add, axis=1).tolist()

#      a    b    c
# 1   11   21   30
# 2  101  201  300

In addition, your error only occurred because you had the same number of rows and columns in the selection. This would have raised an error with 3 columns:

df.loc[:, ['a', 'b', 'c']] = df[['a', 'b', 'c']].apply(add, axis=1)

# ValueError: Must have equal len keys and value when setting with an iterable

Take home message

If you need to use a function with apply and axis=1 and you want to output several "columns", either convert the output to lists if you have the same columns as output:

df[['a', 'b']] = df[['a', 'b']].apply(add, axis=1).tolist()

Or output a DataFrame by making the function return a Series:

def add(x):
    return pd.Series({'a': x['a']+1, 'b': x['b']+1})

df[['a', 'b']] = df[['a', 'b']].apply(add, axis=1)

In any case, never use df.loc[:, ...] unless you know why you're doing this (i.e. you're purposely breaking the index alignment).

Vectorial operations

Of course, the above assumes you have complex, non-vectorized functions to use. If your goal is to perform a simple addition:

# adding 1 to both a and b
df[['a', 'b']] += 1

# adding 1 to a and 2 to b
df[['a', 'b']] += [1, 2]

# adding 1 to a and 2 to b, using add
df[['a', 'b']] = df[['a', 'b']].add([1, 2])

# adding 1 to a and 2 to b, using a dictionary
df[['a', 'b']] = df[['a', 'b']].add({'b': 2, 'a': 1})

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744820584a4595590.html

python - confusion on re-assigning pandas columns after modification with apply - Stack Overflow

2 Answers 2

Take home message

Vectorial operations

发表回复

评论列表（0条）

联系我们

400-800-8888

python - confusion on re-assigning pandas columns after modification with apply - Stack Overflow

2 Answers 2

Take home message

Vectorial operations

相关推荐