python - Dropping different values from multiple columns - Stack Overflow

I have a Pandas dataframe with 28 columns in total. Each one has a unique number after a name. I want t

I have a Pandas dataframe with 28 columns in total. Each one has a unique number after a name. I want to drop all the numbers from the columns but keep the name. How can I do that best?

Here is an example of the columns:

Miscellaneous group | 00002928  Alcoholic Beverages | 0000292   Animal fats group | 000029

I tried .rename() already but to do this for 28 columns isn't efficient and is time consuming. It also creates a very long coding cell in Google Colab Notebook.

I have a Pandas dataframe with 28 columns in total. Each one has a unique number after a name. I want to drop all the numbers from the columns but keep the name. How can I do that best?

Here is an example of the columns:

Miscellaneous group | 00002928  Alcoholic Beverages | 0000292   Animal fats group | 000029

I tried .rename() already but to do this for 28 columns isn't efficient and is time consuming. It also creates a very long coding cell in Google Colab Notebook.

Share Improve this question edited Feb 23 at 11:16 ouroboros1 14.9k7 gold badges48 silver badges58 bronze badges asked Feb 23 at 10:58 Joelle KappertJoelle Kappert 31 silver badge1 bronze badge
Add a comment  | 

2 Answers 2

Reset to default 1

Using df.columns.str.split:

columns = ["Miscellaneous group | 00002928",  
           "Alcoholic Beverages | 0000292",
           "Animal fats group | 000029"]

df = pd.DataFrame(columns=columns)

df.columns = df.columns.str.split(r'\s+\|', regex=True).str[0]

Or df.columns.str.replace:

df.columns = df.columns.str.replace(r'\s+\|.*$', '', regex=True)

Also possible via map and re.sub:

import re

df.columns = map(lambda x: re.sub(r'\s+\|.*$', '', x), df.columns)

With df.rename you could apply logic like:

df = df.rename(columns=lambda x: x.split(' |')[0])

Or indeed via re.split:

df = df.rename(columns=lambda x: re.split(r'\s+\|', x)[0])

For the regex pattern, see regex101.

Assuming you're starting off with, e.g.

df.columns = ["Miscellaneous group | 00002928",  "Alcoholic Beverages | 0000292",   "Animal fats group | 000029"]

The simplest solution looks like it would be to use a list comprehension to iterate over the column names and split on the | in your string and keep the first part of the resulting list, so:

df.columns = [col.split(" | ")[0] for col in columns]

This returns:

['Miscellaneous group', 'Alcoholic Beverages', 'Animal fats group']

Alternatively, you could do this with a regex:

import re

df.columns = [re.sub(r'\s*\|.*', '', col) for col in columns]

This looks for a string that begins with whitespace, followed by |, followed by anything and replaces it all with an empty string.

Final alternative:

columns = [re.sub(r'\s*\d+$', '', s) for s in columns]

This looks for whitespace followed by digits at the end of each string, so this would remove the trailing digits regardless of what preceded them (in case the | isn't always present), so it would produce:

['Miscellaneous group |', 'Alcoholic Beverages |', 'Animal fats group |']

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745157221a4614197.html

相关推荐

  • python - Dropping different values from multiple columns - Stack Overflow

    I have a Pandas dataframe with 28 columns in total. Each one has a unique number after a name. I want t

    10小时前
    40

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信