TAGS :Viewed: 7 - Published at: a few seconds ago

[ Continuous capital run strings in sentence in one line preferably using reduce or map function in python ]

Here I am trying to get all the continuous capital letter strings in a sentence. I have tried the following which gives an output as 'LJ'

I haven't been able to figure out why it doesn't add STRR and HLLJ but it adds only LJ instead to the list. Does it assume [""] as a string.

reduce(lambda x ,y : x[-1] + (y) if y.isupper() or y.isspace() else x,"STRR hello HLLJ",[""])

My input is : STRR hello HLLJ and I wish to get an output as ["STRR","HLLJ"]

Test case : ABCD AAA lkjl JJJJJJ. Here it should give ["ABCD AAA","JJJJJJ"]

Any help is appreciated.

Using Reduce I came up with this Finally but its not efficient:

reduce(lambda x, y : x[0:len(x)-1] + [x[-1]+y] if y.isupper() or y.isspace() else x + [""] if not x[-1].strip() is "" else x,"STRR Hello HLLJ", [""])

Answer 1


Finding patterns in strings is what the re module is for:

In [1]: import re
In [2]: re.findall("[A-Z]+(?: [A-Z]+)*", "ABCD AAA lkjl JJJJJJ")
Out[2]: ['ABCD AAA', 'JJJJJJ']

or, if you don't want to include capital letters that are part of another word, you can exclude them using word boundary anchors:

In [3]: re.findall(r"\b[A-Z]+(?: [A-Z]+)*\b", "ABCD AAA Lkjl JJJJJJ")
Out[3]: ['ABCD AAA', 'JJJJJJ']

Caveat: This only looks for ASCII letters.

Answer 2


With a regular expression and re.findall

>>> asd="HELLO worLD"
>>> import re
>>> re.findall("[A-Z\s]+",asd)
['HELLO', 'LD']

Explanation:

  • [A-Z\s]+ matches one or more subsequent captial letters or whitespace
  • findall returns a list of all matches.

Answer 3


Just for completeness, another solution using itertools.groupby:

>>> s = "STRR hello HLLJ"
>>> [''.join(g) for k, g in itertools.groupby(s, key=str.isupper) if k]
['STRR', 'HLLJ']

Answer 4


This is my version using reduce which works as it should for input "STRR Hello HLLJ" it gives output ['STRR H', ' HLLJ']. Its nice to get different ways to solve.

reduce(lambda x, y : x[0:len(x)-1] + [x[-1]+y] if y.isupper() or y.isspace() else x + [""] if not x[-1].strip() is "" else x,"STRR Hello HLLJ", [""])

I know this is not efficient.