Overview

The following ERE (Extended Regular Expression) operators were defined to achieve consistency between programs like grep, sed, and awk. In POSIX, regexps are greedy.

  • . matches any single character.

    • There exist application-specific exclusions. For instance, newlines and the NUL character are often ignored.
  • [...], the bracket expression, matches any enclosed character.

    • An optional - can be included to denote a range.
    • - is treated literally if its the first or last specified character.
    • ] is treated literally if its the first specified character.
    • ^ complements the set if its the first specified character.
  • ^ is the leading anchor. It matches the starting position of a string.

  • $ is the trailing anchor. It matches the ending position of a string.

  • * matches the preceding element zero or more times.

  • + matches the preceding element one or more times.

  • ? matches the preceding element zero or one times.

  • {n}, an interval expression, matches the preceding element n times.

    • {n,} matches the preceding element at least n times.
    • {n,m} matches the preceding element between n and m times.
    • Interval expressions cannot contain repetition counts > 255. Results are otherwise undefined.
  • | is the alternation operator. It allows specifying match alternatives.

Character Classes

Notation for describing a class of characters specific to a given locale/character set.

ClassSimilar ToMeaning
[:alnum:][A-Za-z0-9]Alphanumeric characters
[:alpha:][A-Za-z]Alphabetic characters
[:blank:][ \t]' ' and TAB characters
[:cntrl:]Control characters
[:digit:][0-9]Numeric characters
[:graph:][^ [:cntrl:]]Printable and visible characters
[:lower:][a-z]Lowercase alphabetic characters
[:print:][ [:graph:]]Printable characters
[:punct:]All graphic characters except letters and digits
[:space:][ \t\n\r\f\v]Whitespace characters
[:upper:][A-Z]Uppercase alphabetic characters
[:xdigit:][0-9A-Fa-f]Hexadecimal digits

Bibliography