[ Grouping regex based on the previous grouping result ]
I have some parameters that I have to sort into different lists. The prefix determines which list should it belong to.
I use prefixes like: c
, a
, n
, o
and an additional hyphen (-
) to determine whether to put it in include l it or exclude list.
I use the regex grouped as:
/^(-?)([o|a|c|n])(\w+)/
But here the third group (\w+
) is not generic, and it should actually be dependent on the second group's result. I.e, if the prefix is:
- 'c' or 'a' ->
/\w{3}/
- 'o' ->
/\w{2}/
- else ->
/\w+/
Can I do this with a single regex? Currently I am using an if
condition to do so.
Example input:
Valid:
"-cABS", "-aXYZ", "-oWE", "-oqr", "-ncanbeanyting", "nstillanything", "a123", "-conT" (will go to c_exclude_list)
Invalid:
"cmorethan3chars", "c1", "-a1234", "prefizisnotvalid", "somethingelse", "oABC"
Output: for each arg push to the correct list, ignore the invalid.
c_include_list, c_exclude_list, a_include_list, a_exclude_list etc.
Answer 1
You can use this pattern:
/(-?)\b([aocn])((?:(?<=[ac])\w{3}|(?<=o)\w{2}|(?<=n)\w+))\b/
The idea consists to use lookbehinds to check the previous character without including it in the capture group.
Answer 2
Since version 2.0, Ruby has switched from Oniguruma to Onigmo (a fork of Oniguruma), which adds support for conditional regex, among other features.
So you can use the following regex to customize the pattern based on the prefix:
^-(?:([ca])|(o)|(n))?(?(1)\w{3}|(?(2)\w{2}|(?(3)\w+)))$
Answer 3
Is a single, mind-bending regex the best way to deal with this problem?
Here's a simpler approach that does not employ a regex at all. I suspect that it would be at least as efficient as a single regex, considering that with the latter you must still assign matching strings to their respective arrays. I think it also reads better and would be easier to maintain. The code below should be easy to modify if I have misunderstood some fine points of the question.
Code
def devide_em_up(str)
h = { a_exclude: [], a_include: [], c_exclude: [], c_include: [],
o_exclude: [], o_include: [], other_exclude: [], other_include: [] }
str.split.each do |s|
exclude = (s[0] == ?-)
s = s[1..-1] if exclude
first = s[0]
s = s[1..-1] if 'cao'.include?(first)
len = s.size
case first
when 'a'
(exclude ? h[:a_exclude] : h[:a_include]) << s if len == 3
when 'c'
(exclude ? h[:c_exclude] : h[:c_include]) << s if len == 3
when 'o'
(exclude ? h[:o_exclude] : h[:o_include]) << s if len == 2
else
(exclude ? h[:other_exclude] : h[:other_include]) << s
end
end
h
end
Example
Let's try it:
str = "-cABS cABT -cDEF -aXYZ -oWE -oQR oQT -ncanbeany nstillany a123 " +
"-conT cmorethan3chars c1 -a1234 prefizisnotvalid somethingelse oABC"
devide_em_up(str)
#=> {:a_exclude=>["XYZ"], :a_include=>["123"],
# :c_exclude=>["ABS", "DEF"], :c_include=>["ABT"],
# :o_exclude=>["WE", "QR"], :o_include=>["QT"],
# :other_exclude=>["ncanbeany"], :other_include=>["nstillany"]}