[ Continuous capital run strings in sentence in one line preferably using reduce or map function in python ]
Here I am trying to get all the continuous capital letter strings in a sentence. I have tried the following which gives an output as 'LJ'
I haven't been able to figure out why it doesn't add STRR
and HLLJ
but it adds only LJ
instead to the list. Does it assume [""]
as a string.
reduce(lambda x ,y : x[-1] + (y) if y.isupper() or y.isspace() else x,"STRR hello HLLJ",[""])
My input is :
STRR hello HLLJ
and I wish to get an output as ["STRR","HLLJ"]
Test case :
ABCD AAA lkjl JJJJJJ. Here it should give ["ABCD AAA","JJJJJJ"]
Any help is appreciated.
Using Reduce I came up with this Finally but its not efficient:
reduce(lambda x, y : x[0:len(x)-1] + [x[-1]+y] if y.isupper() or y.isspace() else x + [""] if not x[-1].strip() is "" else x,"STRR Hello HLLJ", [""])
Answer 1
Finding patterns in strings is what the re
module is for:
In [1]: import re
In [2]: re.findall("[A-Z]+(?: [A-Z]+)*", "ABCD AAA lkjl JJJJJJ")
Out[2]: ['ABCD AAA', 'JJJJJJ']
or, if you don't want to include capital letters that are part of another word, you can exclude them using word boundary anchors:
In [3]: re.findall(r"\b[A-Z]+(?: [A-Z]+)*\b", "ABCD AAA Lkjl JJJJJJ")
Out[3]: ['ABCD AAA', 'JJJJJJ']
Caveat: This only looks for ASCII letters.
Answer 2
With a regular expression and re.findall
>>> asd="HELLO worLD"
>>> import re
>>> re.findall("[A-Z\s]+",asd)
['HELLO', 'LD']
Explanation:
[A-Z\s]+
matches one or more subsequent captial letters or whitespacefindall
returns a list of all matches.
Answer 3
Just for completeness, another solution using itertools.groupby
:
>>> s = "STRR hello HLLJ"
>>> [''.join(g) for k, g in itertools.groupby(s, key=str.isupper) if k]
['STRR', 'HLLJ']
Answer 4
This is my version using reduce which works as it should for input "STRR Hello HLLJ"
it gives output ['STRR H', ' HLLJ']
. Its nice to get different ways to solve.
reduce(lambda x, y : x[0:len(x)-1] + [x[-1]+y] if y.isupper() or y.isspace() else x + [""] if not x[-1].strip() is "" else x,"STRR Hello HLLJ", [""])
I know this is not efficient.