Overview
The following ERE (Extended Regular Expression) operators were defined to achieve consistency between programs like grep
, sed
, and awk
. In POSIX, regexps are greedy.
-
.
matches any single character.- There exist application-specific exclusions. For instance, newlines and the
NUL
character are often ignored.
- There exist application-specific exclusions. For instance, newlines and the
-
[...]
, the bracket expression, matches any enclosed character.- An optional
-
can be included to denote a range. -
is treated literally if its the first or last specified character.]
is treated literally if its the first specified character.^
complements the set if its the first specified character.
- An optional
-
^
is the leading anchor. It matches the starting position of a string. -
$
is the trailing anchor. It matches the ending position of a string. -
*
matches the preceding element zero or more times. -
+
matches the preceding element one or more times. -
?
matches the preceding element zero or one times. -
{n}
, an interval expression, matches the preceding elementn
times.{n,}
matches the preceding element at leastn
times.{n,m}
matches the preceding element betweenn
andm
times.- Interval expressions cannot contain repetition counts
> 255
. Results are otherwise undefined.
-
|
is the alternation operator. It allows specifying match alternatives.
Character Classes
Notation for describing a class of characters specific to a given locale/character set.
Class | Similar To | Meaning |
---|---|---|
[:alnum:] | [A-Za-z0-9] | Alphanumeric characters |
[:alpha:] | [A-Za-z] | Alphabetic characters |
[:blank:] | [ \t] | ' ' and TAB characters |
[:cntrl:] | Control characters | |
[:digit:] | [0-9] | Numeric characters |
[:graph:] | [^ [:cntrl:]] | Printable and visible characters |
[:lower:] | [a-z] | Lowercase alphabetic characters |
[:print:] | [ [:graph:]] | Printable characters |
[:punct:] | All graphic characters except letters and digits | |
[:space:] | [ \t\n\r\f\v] | Whitespace characters |
[:upper:] | [A-Z] | Uppercase alphabetic characters |
[:xdigit:] | [0-9A-Fa-f] | Hexadecimal digits |
Bibliography
- “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
- Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf