TAGS :Viewed: 6 - Published at: a few seconds ago

[ Curly Braces with {} grep and regular expressions: Why does it exceed the maximum value? ]

I've been self-studying shell scripting for a while now, and I came across this section of a Linux Fundamentals manual concerning grep and curly braces {}. My problem is that when I'm demanding a string pattern to search for using grep from a minimum to a maximum number of occurrences using {} or curly braces, my result exceeds the maximum that I specified.

Here is what happened:

Express11:~/unix_training/reg_ex # cat reg_file2
ll
lol
lool
loool
loooose
Express11:~/unix_training/reg_ex # grep -E 'o{2,3}' reg_file2
lool
loool
loooose
Express11:~/unix_training/reg_ex #

When according to the manual, should not be the case as I am specifying here that I am only looking for strings containing two consecutive o's to three consecutive o's.

EDIT: Actually, the reason why I did not understand how the curly braces worked was because of this simplistic explanation by the manual. And I quote:

19.4.10. between n and m times And here we demand exactly from minimum 2 to maximum 3 times.

paul@debian7:~$ cat list2
ll
lol
lool
loool
paul@debian7:~$ grep -E 'o{2,3}' list2
lool
loool
paul@debian7:~$ grep 'o\{2,3\}' list2
lool
loool
paul@debian7:~$ cat list2 | sed 's/o\{2,3\}/A/'
ll
lol
lAl
lAl
paul@debian7:~$

Thanks to all those who replied.

Answer 1


# grep -E 'o{2,3}' reg_file2
lool
loool
loooose

Command works perfectly, that it matches the first three o's in the last line. That's why you get also last line in the final output.

I think the command you're actually looking for is,

$ grep -P '(?<!o)o{2,3}(?!o)' file
lool
loool

Explanation:

  • (?<!o) negative lookbehind which asserts that the match won't be preceded by the letter o.

  • o{2,3} Matches 2 or 3 o's.

  • (?!o) Negative lookahead which asserts that the match won't be followed by the letter o.

OR

$ grep -E '(^|[^o])o{2,3}($|[^o])' file
lool
loool

Explanation:

  • (^|[^o]) Matches the start of a line ^ or any character but not of o

  • o{2,3} Matches 2 or 3 o's

  • ($|[^o]) Matches the end of the line $ or any character but not of o

Answer 2


You are not clear with how regex works.

The pattern o{2,3} in grep will go through each line looking for oo and ooo, As long as there is a match, Grep will get you that line. Since you didn't add other rules in your pattern, What you get from grep -E 'o{2,3}' reg_file2 is correct.

I guess in your case you only want a two or three consecutive letter 'o's, Thus you will need to use regex like what Raj answesed. Matching oo or ooo which is neither following nor followed by the letter 'o'.