TAGS :Viewed: 6 - Published at: a few seconds ago

[ pandas query rows by list ]

I have a pandas data frame and want to return the rows from the data frame corresponding to the customer ids that appear in a list of target ids.

For example, if my data frame looks like this:

id    Name    ...    ...
-------------------------
1     Bob     ...    ...
2     Dave    ...    ...
2     Dave    ...    ...
3     Phil    ...    ...
4     Rick    ...    ...
4     Rick    ...    ...

Basically I want to return the rows for customers who appear more than once in this data frame. So I want to return all the ids that occur more than once.

id    Name    ...    ...
-------------------------
2     Dave    ...    ...
2     Dave    ...    ...
4     Rick    ...    ...
4     Rick    ...    ...

I can get a list of the ids by doing the following

grouped_ids = df.groupby('id').size()
id_list = grouped_ids[grouped_ids>1].index.tolist()

And now I'd like to go back to the data frame and return all the rows corresponding to those ids in the list.

Is this possible?

Thanks for the help.

Answer 1


I guess you are looking for isin():

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'customer_id':range(5), 'A':('a', 'b', 'c', 'd', 'e')})

In [3]: df
Out[3]: 
   A  customer_id
0  a            0
1  b            1
2  c            2
3  d            3
4  e            4

In [4]: df[df.customer_id.isin((1,3))]
Out[4]: 
   A  customer_id
1  b            1
3  d            3

[edit] To match a given target list, just use it as argument for the isin() method:

In [5]: mylist = (1,3)

In [6]: df[df.customer_id.isin(mylist)]
Out[6]: 
       A  customer_id
1  abcde            1
3  abcde            3