TAGS :Viewed: 9 - Published at: a few seconds ago

[ Pandas: Get an if statement/.loc to return the index for that row ]

I've got a dataframe with 2 columns and I'm adding a 3rd.

I want the 3rd column to be dependant on the value of the 2nd either returning a set answer or the corresponding index for that row.

An example the database is below:

print (df)
            Amount      Percentage
Country      
Belgium      20           .0952
France       50           .2380
Germany      60           .2857
UK           80           .3809

Now I want my new third column to say 'Other' if the percentage is below 25% and to say the name of the country if the percentage is above 25%. So this is what I've written:

df.['Country']='Other')
df.loc[df['percentage']>0.25, 'Country']=df.index

Unfortunately my output doesn't give the equivalent index; it just gives the index in order:

 print (df)
            Amount      Percentage      Country
Country      
Belgium      20           .0952         Other
France       50           .2380         Other
Germany      60           .2857         Belgium
UK           80           .3809         France

Obviously I want to see Germany across from Germany and UK across from UK. How can I get it to give me the index which is in the same row as the number which trips the threshold in my code?

Answer 1


You can try numpy.where:

df['Country'] = np.where(df['Percentage']>0.25, df.index, 'Other')
print df
         Amount  Percentage  Country
Country                             
Belgium      20      0.0952    Other
France       50      0.2380    Other
Germany      60      0.2857  Germany
UK           80      0.3809       UK

Or create Series from index by to_series:

df['Country']='Other'
df.loc[df['Percentage']>0.25, 'Country']=df.index.to_series()
print df
         Amount  Percentage  Country
Country                             
Belgium      20      0.0952    Other
France       50      0.2380    Other
Germany      60      0.2857  Germany
UK           80      0.3809       UK

Answer 2


To use the method you were trying to implement:

df['Country'] = 'Other'
df.loc[df['Percentage'] > 0.25, 'Country'] = df.loc[df['Percentage'] > 0.25].index

>>> df
         Amount  Percentage  Country
Country                             
Belgium      20      0.0952    Other
France       50      0.2380    Other
Germany      60      0.2857  Germany
UK           80      0.3809       UK

Because the filter is the same on both sides, it is often best to use a mask on large datasets so that you only do the comparison once:

mask = df['Percentage'] > 0.25
df.loc[mask, 'Country'] = df.loc[mask].index

# Delete the mask once finished with it to save memory if needed.
del mask