grep: Basic vs Extended
3.6 Basic vs Extended Regular Expressions
=========================================
In basic regular expressions the characters ‘?’, ‘+’, ‘{’, ‘|’, ‘(’, and
‘)’ lose their special meaning; instead use the backslashed versions
‘\?’, ‘\+’, ‘\{’, ‘\|’, ‘\(’, and ‘\)’. Also, a backslash is needed
before an interval expression’s closing ‘}’, and an unmatched ‘\)’ is
invalid.
Portable scripts should avoid the following constructs, as POSIX says
they produce undefined results:
• Extended regular expressions that use back-references.
• Basic regular expressions that use ‘\?’, ‘\+’, or ‘\|’.
• Empty parenthesized regular expressions like ‘()’.
• Empty alternatives (as in, e.g, ‘a|’).
• Repetition operators that immediately follow empty expressions,
unescaped ‘$’, or other repetition operators.
• A backslash escaping an ordinary character (e.g., ‘\S’), unless it
is a back-reference.
• An unescaped ‘[’ that is not part of a bracket expression.
• In extended regular expressions, an unescaped ‘{’ that is not part
of an interval expression.
Traditional ‘egrep’ did not support interval expressions and some
‘egrep’ implementations use ‘\{’ and ‘\}’ instead, so portable scripts
should avoid interval expressions in ‘grep -E’ patterns and should use
‘[{]’ to match a literal ‘{’.
GNU ‘grep -E’ attempts to support traditional usage by assuming that
‘{’ is not special if it would be the start of an invalid interval
expression. For example, the command ‘grep -E '{1'’ searches for the
two-character string ‘{1’ instead of reporting a syntax error in the
regular expression. POSIX allows this behavior as an extension, but
portable scripts should avoid it.