TAGS :Viewed: 7 - Published at: a few seconds ago

[ how to regex a value of a specific key in python ]

I have a long string with key values in this format:

"info":"infotext","day":"today","12":"here","info":"infotext2","info":"infotext3"

I want to get the value (=infotexts) of all "info" keys. How can this be done?

Answer 1


Use the json, Luke

s = '"info":"infotext","day":"today","12":"here","info":"infotext2","info":"infotext3"'

import json

def pairs_hook(pairs):
    return [val for key, val in pairs if key == 'info']

p = json.loads('{' + s + '}', object_pairs_hook=pairs_hook)
print p # [u'infotext', u'infotext2', u'infotext3']

From the docs:

object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict.

Just for the sake of completeness, here's a regular expression that does the same:

rg = r'''(?x)

    "info"
    \s* : \s*
    "
        (
            (?:\\.|[^"])*
        )
    "
'''
re.findall(rg, s) # ['infotext', 'infotext2', 'infotext3']

This also handles spaces around : and escaped quotes inside strings, like e.g.

 "info"  :   "some \"interesting\" information"

Answer 2


As long as your infotext does not contain (escaped) quotes, you could try something like this:

>>> m = re.findall(r'"info":"([^"]+)', str)
>>> m
['infotext', 'infotext2', 'infotext3']

We simply match "info":" and then as many non-" characters as possible (which are captured and thus returned).

Answer 3


use this regex (?<="info":")(.+?)(?=")

Answer 4


In [140]: import re

In [141]: strs='''"info":"infotext","day":"today","12":"here","info":"infotext2","info":"infotext3"'''

In [146]: [x.split(":")[-1].strip('"') for x in  re.findall(r'"info":"\w+"',strs)]
Out[146]: ['infotext', 'infotext2', 'infotext3']