i have a data frame with two columns (name1, name2) i would like to use a dictionary of column names and then do a for loop that compares if the values are the same and specifically show the values that are not the same
when i try the following i get an error "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"
# create df
test2 = {'NAME1': ['Tom', 'nick', 'krish', 'jack'],
'NAME': ['Tom', 'nick', 'Carl', 'Bob']}
dfx = pd.DataFrame(test2)
#create dictionary
thisdict = {
"NAME1": "NAME"
}
#loop and display differences
for a, b in thisdict.items():
if dfx[a] != dfx[b]:
x = dfx[[a, b]]
print(x)
i have a data frame with two columns (name1, name2) i would like to use a dictionary of column names and then do a for loop that compares if the values are the same and specifically show the values that are not the same
when i try the following i get an error "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"
# create df
test2 = {'NAME1': ['Tom', 'nick', 'krish', 'jack'],
'NAME': ['Tom', 'nick', 'Carl', 'Bob']}
dfx = pd.DataFrame(test2)
#create dictionary
thisdict = {
"NAME1": "NAME"
}
#loop and display differences
for a, b in thisdict.items():
if dfx[a] != dfx[b]:
x = dfx[[a, b]]
print(x)
Share
Improve this question
edited Mar 4 at 14:50
jsbueno
111k11 gold badges157 silver badges235 bronze badges
asked Mar 4 at 13:57
chemicaluserchemicaluser
417 bronze badges
4
- Can you share the output you're trying to get for this sample data? – Mureinik Commented Mar 4 at 14:07
- Please refer to stackoverflow/questions/36921951/… – Salt Commented Mar 4 at 14:08
- @Mureinik the output is a list or a df that shows the values 'krish', 'jack' and beside it 'Carl', "Bob' ... its the values that do not match in the two columns – chemicaluser Commented Mar 4 at 14:10
- The error occurs because you are trying to compare two Series directly using !=, which results in a Series of boolean values. Instead, you should use the .ne() method to compare the Series element-wise and then filter the DataFrame based on the result. Let me know if able to understand, if not I will attach the code and output in answer – Anant Arun Commented Mar 4 at 14:13
2 Answers
Reset to default 3You need to compare the values row by row and filter the rows where the values in the two columns are not equal, try like below:
# Loop and display differences
for a, b in thisdict.items():
# Compare the columns row by row
mismatches = dfx[dfx[a] != dfx[b]]
if not mismatches.empty:
print(f"Mismatches between '{a}' and '{b}':")
print(mismatches[[a, b]])
This kind of filtering for all rows is what Pandas and other dataframe frameworks give you for free.
By making the comparison on the columns you get a boolean series, which in turn can work as indices for the original dataframe and automatically select the columns that interest your:
for a, b in thisdict.items():
diff = dfx[dfx[a] != dfx[b]]
print(diff[[a, b]])
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745039865a4607745.html
评论列表(0条)