Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
221 views
in Technique[技术] by (71.8m points)

python - Determining if a Pandas dataframe row has multiple specific values

I have a Pandas data frame represented by the one below:

     A    B    C    D
 |   1    1    1    3    |
 |   1    1    1    2    |
 |   2    3    4    5    |

I need to iterate through this data frame, looking for rows where the values in columns A, B, & C match and if that's true check the values in column D for those rows and delete the row with the smaller value. So, in above example would look like this afterwards.

         A    B    C    D
    |    1    1    1    3    |
    |    2    3    4    5    |

I've written the following code, but something isn't right and it's causing an error. It also looks more complicated than it may need to be, so I am wondering if there is a better, more concise way to write this.

 for col, row in df.iterrows():
...     df1 = df.copy()
...     df1.drop(col, inplace = True)
...     for col1, row1 in df1.iterrows():
...             if df[0].iloc[col] == df1[0].iloc[col1] & df[1].iloc[col] == df1[1].iloc[col1] & 
                df[2].iloc[col] == df1[2].iloc[col1] & df1[3].iloc[col1] > df[3].iloc[col]:
...                     df.drop(col, inplace = True)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here is one solution:

df[~((df[['A', 'B', 'C']].duplicated(keep=False)) & (df.groupby(['A', 'B', 'C'])['D'].transform(min)==df['D']))]

Explanation:

df[['A', 'B', 'C']].duplicated(keep=False)

returns a mask for rows with duplicated values of ['A', 'B', 'C'] columns

df.groupby(['A', 'B', 'C'])['D'].transform(min)==df['D']

returns a mask for rows that have the minimum value for ['D'] column, for each group of ['A', 'B', 'C']

The combination of these masks, selects all these rows (duplicated ['A', 'B', 'C'] and minimum 'D' for the group. With ~ we select all other rows except from these ones.

Result for the provided input:

   A  B  C  D
0  1  1  1  3
2  2  3  4  5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...