[ Curly Braces with {} grep and regular expressions: Why does it exceed the maximum value? ]
I've been self-studying shell scripting for a while now, and I came across this section of a Linux Fundamentals manual concerning grep and curly braces {}. My problem is that when I'm demanding a string pattern to search for using grep from a minimum to a maximum number of occurrences using {} or curly braces, my result exceeds the maximum that I specified.
Here is what happened:
Express11:~/unix_training/reg_ex # cat reg_file2
ll
lol
lool
loool
loooose
Express11:~/unix_training/reg_ex # grep -E 'o{2,3}' reg_file2
lool
loool
loooose
Express11:~/unix_training/reg_ex #
When according to the manual, should not be the case as I am specifying here that I am only looking for strings containing two consecutive o's to three consecutive o's.
EDIT: Actually, the reason why I did not understand how the curly braces worked was because of this simplistic explanation by the manual. And I quote:
19.4.10. between n and m times And here we demand exactly from minimum 2 to maximum 3 times.
paul@debian7:~$ cat list2 ll lol lool loool paul@debian7:~$ grep -E 'o{2,3}' list2 lool loool paul@debian7:~$ grep 'o\{2,3\}' list2 lool loool paul@debian7:~$ cat list2 | sed 's/o\{2,3\}/A/' ll lol lAl lAl paul@debian7:~$
Thanks to all those who replied.
Answer 1
# grep -E 'o{2,3}' reg_file2
lool
loool
loooose
Command works perfectly, that it matches the first three o's in the last line. That's why you get also last line in the final output.
I think the command you're actually looking for is,
$ grep -P '(?<!o)o{2,3}(?!o)' file
lool
loool
Explanation:
(?<!o)
negative lookbehind which asserts that the match won't be preceded by the lettero
.o{2,3}
Matches 2 or 3 o's.(?!o)
Negative lookahead which asserts that the match won't be followed by the lettero
.
OR
$ grep -E '(^|[^o])o{2,3}($|[^o])' file
lool
loool
Explanation:
(^|[^o])
Matches the start of a line^
or any character but not ofo
o{2,3}
Matches 2 or 3 o's($|[^o])
Matches the end of the line$
or any character but not ofo
Answer 2
You are not clear with how regex works.
The pattern o{2,3}
in grep will go through each line looking for oo
and ooo
, As long as there is a match, Grep will get you that line. Since you didn't add other rules in your pattern, What you get from grep -E 'o{2,3}' reg_file2
is correct.
I guess in your case you only want a two or three consecutive letter 'o's, Thus you will need to use regex like what Raj answesed. Matching oo
or ooo
which is neither following nor followed by the letter 'o'.