[ how to regex a value of a specific key in python ]
I have a long string with key values in this format:
"info":"infotext","day":"today","12":"here","info":"infotext2","info":"infotext3"
I want to get the value (=infotexts) of all "info" keys. How can this be done?
Answer 1
Use the json, Luke
s = '"info":"infotext","day":"today","12":"here","info":"infotext2","info":"infotext3"'
import json
def pairs_hook(pairs):
return [val for key, val in pairs if key == 'info']
p = json.loads('{' + s + '}', object_pairs_hook=pairs_hook)
print p # [u'infotext', u'infotext2', u'infotext3']
From the docs:
object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict.
Just for the sake of completeness, here's a regular expression that does the same:
rg = r'''(?x)
"info"
\s* : \s*
"
(
(?:\\.|[^"])*
)
"
'''
re.findall(rg, s) # ['infotext', 'infotext2', 'infotext3']
This also handles spaces around :
and escaped quotes inside strings, like e.g.
"info" : "some \"interesting\" information"
Answer 2
As long as your infotext
does not contain (escaped) quotes, you could try something like this:
>>> m = re.findall(r'"info":"([^"]+)', str)
>>> m
['infotext', 'infotext2', 'infotext3']
We simply match "info":"
and then as many non-"
characters as possible (which are captured and thus returned).
Answer 3
use this regex (?<="info":")(.+?)(?=")
Answer 4
In [140]: import re
In [141]: strs='''"info":"infotext","day":"today","12":"here","info":"infotext2","info":"infotext3"'''
In [146]: [x.split(":")[-1].strip('"') for x in re.findall(r'"info":"\w+"',strs)]
Out[146]: ['infotext', 'infotext2', 'infotext3']