[ Apply regex replace to python pandas data frame ]
I am trying to remove the last octet from ip addresses stored in a pandas data frame column.
Currently, I am trying to run the following code:
def rem_last_oct(ip):
return re.sub(r'\d+$', '', ip)
# also tried running with plain string manipulation:
# return ''.join(str(ip).rpartition('.')[:1])
df['cut_ipaddress'] = df['ipaddress'].apply(rem_last_oct)
For some reason, the function runs properly on strings, but when running it on the data frame with apply it returns empty strings, and not the first three octets.
What is the correct way to do this?
Answer 1
You can use the replace
method of the str
attribute (to access string manipulation functions, see docs):
In [11]: s = pd.Series(["22.231.113.64", "194.66.82.11"])
In [12]: s
Out[12]:
0 22.231.113.64
1 194.66.82.11
dtype: object
In [14]: s.str.replace(r'\d+$', '')
Out[14]:
0 22.231.113.
1 194.66.82.
dtype: object
By the way, your approach above does work for me. It is to say, it works on a Series:
In [20]: s.apply(rem_last_oct)
Out[20]:
0 22.231.113.
1 194.66.82.
but how you access the column with df['ipaddress']
above is normally a series, so this should also work. What error message do you get?