Notes - Jeremy

<Regex> (C++11)

This header is part of the text processing library.

std::basic_regex

Defined in header <regex>

template<
  class CharT,
  class Traits = std::regex_traits<CharT>
> class basic_regex;

The class template basic_regex provides a general framework for holding regular expressions. Several typedefs for common character types are provided:

Type	Definition
std::regex	std::basic_regex
std::wregex	std::basic_regex<wchar_t>

Member types

Member type	Definition
value_type	CharT
traits_type	Traits
string_type	Traits::string_type
locale_type	Traits::locale_type
flag_type	std::regex_constants::syntax_option_type

Member functions

Function	Definition
(constructor)	constructs the regex object
(destructor)	destructs the regex object
operator=	assigns the contents
assign	assigns the contents
mark_count	returns the number of marked sub-expressions within the regular expression
flags	returns the syntax flags
getloc	get locale information
imbue	set locale information
swap	swaps the contents

Constants

Grammar option	Effect(s)
ECMAScript	Use the Modified ECMAScript regular expression grammar.
basic	Use the basic POSIX regular expression grammar (grammar documentation).
extended	Use the extended POSIX regular expression grammar (grammar documentation).
awk	Use the regular expression grammar used by the awk utility in POSIX (grammar documentation).
grep	Use the regular expression grammar used by the grep utility in POSIX. This is effectively the same as the basic option with the addition of newline ‘\n’ as an alternation separator.
egrep	Use the regular expression grammar used by the grep utility, with the -E option, in POSIX. This is effectively the same as the extended option with the addition of newline ‘\n’ as an alternation separator in addition to ’ \| '.

Grammar variation

Grammar variation	Effect(s)
icase	Character matching should be performed without regard to case.
nosubs	When performing matches, all marked sub-expressions (expr) are treated as non-marking sub-expressions (?:expr). No matches are stored in the supplied std::regex_match structure and mark_count() is zero.
optimize	Instructs the regular expression engine to make matching faster, with the potential cost of making construction slower. For example, this might mean converting a non-deterministic FSA to a deterministic FSA.
collate	Character ranges of the form “[a-b]” will be locale sensitive.
multiline (c++17)	Specifies that ^ shall match the beginning of a line and $ shall match the end of a line, if the ECMAScript engine is selected.

At most one grammar option can be chosen out of ECMAScript, basic, extended, awk, grep, egrep. If no grammar is chosen, ECMAScript is assumed to be selected. The other options serve as variations, such that

std::regex("meow", std::regex:icase) is equivalent to std::regex("meow", std::regex::ECMAScript|std::regex::icase)

The member constants in basic_regex are duplicates of the syntax_option_type constants defined in the namespace std::regex_constants.

Non-member functions

Function	Definition
std::swap(std::basic_regex) (c++11)	specializes the std::swap algorithm

Dedution guides(since c++17)

// Since C++17
template< class ForwardIt >
basic_regex( ForwardIt, ForwardIt,
             std::regex_constants::syntax_option_type = std::regex_constants::ECMAScript )
-> basic_regex<typename std::iterator_traits<ForwardIt>::value_type>;

Example:

#include <regex>
#include <vector>

int main() {
  std::vector<char> v = {'a', 'b', 'c'};
  std::basic_regex re(v.begin(), v.end()); //uses explicit deduction guide
}

std::sub_match

// since c++11
template< class BidirIt >
class sub_match;

The class template std::sub_match is used by the regular expression engine to denote sequences of characters matched by marked sub-expressions. A match is a [begin, end) pair within the target range matched by the regular expression, but with additional observer functions to enhance code clarity.

Only the default constructor is publicly accessible. Instances of std::sub_match are normally constructed and populated as a part of a std::match_results container during the processing of one of the regex algorithms.

The member functions return defined default values unless the matched member is true.

std::sub_match inherits from std::pair<BidirIt, BidirIt>, although it cannot be treated as a std::pair object because member functions such as assignment will not work as expected.

Type requirements
- BidirIt must meet the requirements of LegacyBidirectionalIterator.
Specializations

Several specializations for common character sequence types are provided:

Type	Definition
std::csub_match	std::sub_match<const char*>
std::wcsub_match	std::sub_match<const wchar_t*>
std::ssub_match	std::sub_matchstd::string::const_iterator
std::wssub_match	std::sub_matchstd::wstring::const_iterator

Data members

Member	Description
_boolmatched	whether this match was successful

Inherited from std::pair BidirIt first: start of the match sequence BidirIt seond: one-past-the-end of the match sequence

Member functions

Function	Definition
(constructor)	constructs the match object
length	returns the length of the match (if any)
str operator string_type	converts to the underlying string type
compare	compares matched subsequence (if any)
swap	swaps the contents

Non-member functions

Function	Definition
operator==
operator!= (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator< (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator<= (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator> (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator>= (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator<=> (C++20)	compares a sub_match with another sub_match, a string, or a character
operator<<	outputs the matched character subsequence

Example

#include <cassert>
#include <iostream>
#include <regex>
#include <string>

int main() {
  std::string sentence{"Friday the thirteenth."};
  const std::regex re{"([A-z]+) ([a-z]+) ([a-z]+)"};
  std::smatch words;
  std::regex_search(sentence, words, re);
  std::cout << std::boolalpha;
  for (const auto &m : words) {
    assert(m.matched);
    std::cout << "m: [" << m << "], m.length(): " << m.length() << ", "
                 "*m.first: '" << *m.first << "', "
                 "*m.second: '" << *m.second << "'\n";
  }
}

/**
 * Output:
 * m: [Friday the thirteenth], m.length(): 21, *m.first: 'F', *m.second: '.'
   m: [Friday], m.length(): 6, *m.first: 'F', *m.second: ' '
   m: [the], m.length(): 3, *m.first: 't', *m.second: ' '
   m: [thirteenth], m.length(): 10, *m.first: 't', *m.second: '.'
 */

See also

regex_token_iterator (c++11): iterates through the specified sub-expressions within all regex matches in a given string or through unmatched substrings

std::match_results

// Since C++11
template<
    class BidirIt,
    class Alloc = std::allocator<std::sub_match<BidirIt>>
> class match_results;

// Since C++17
namespace pmr {
  template <class BidirIt>
  using match_results = std::match_results<BidirIt,
                             std::pmr::polymorphoc_allocator<
                                std::sub_match<BidirIt>>>;
}

The class template std::match_results holds a collection of character sequences that represent the result of a regular expression match.

This is a specialized allocator-aware (感知分配) container. It can only be default created, obtained from std::regex_iterator, or modified by std::regex_search or std::regex_match. Because std::match_results holds std::sub_matches, each of which is a pair of iterators into the original character sequence that was matched, it’s undefined behavior to examine (check) std::match_results if the original character sequence was destroyed or iterators to it were invalidated for other reasons.

The first std::sub_match (index 0) contained in a std::match_result always represents the full match within a target sequence made by a regex, and subsequent std::sub_matches represent sub-expression matches corresponding in sequence to the left parenthesis delimiting the sub-expression in the regex.

std::match_results meets the requirements of a AllocatorAwareContainer and of a SequenceContainer, except that only copy assignment, move assignment, and operations defined for a constant containers are supported, and that the semantics of comparison functions are different from those required for a container.

Type requirements
- BidirIt must meet the requirements of LegacyBidirectionalIterator.
- Alloc must meet the requirements of Allocator
Specializations

Type	Definition
std::cmatch	std::match_results<const char*>
std::wcmatch	std::match_results<const wchar_t*>
std::smatch	std::match_resultsstd::string::const_iterator
std::wsmatch	std::match_resultsstd::wstring::const_iterator
std::pmr::cmatch (C++17)	std::pmr::match_results<const char*>
std::pmr::wcmatch (C++17)	std::pmr::match_results<const wchar_t*>
std::pmr::smatch (C++17)	std::pmr::match_resultsstd::string::const_iterator
std::pmr::wsmatch (C++17)	std::pmr::match_resultsstd::wstring::const_iterator

Member types

Member type	Definition
allocator_type	Allocator
value_type	std::sub_match
const_reference	const value_type&
reference	value_type&
const_iterator	implementation-defined (depends on the underlying container)
iterator	const_iterator
difference_type	std::iterator_traits::difference_type
size_type	std::allocator_traits::size_type
char_type	std::iterator_traits::value_type
string_type	std::basic_string<char_type>

Member functions

Function	Definition
(constructor)	cibstructs the object
(destructor)	destructs the object
operator=	assigns the contents
get_allocator	returns the associated allocator
ready	checks if the results are available
empty	checks whether the match was successful
size	returns the number of matches in a fully-established result state
max_size	returns the aximum possible number of sub-matches
length	returns the length of the particular sub-matches
position	returns the position of the first character of the particular sub-match
str	returns teh sequence of characters for the particular sub-match
operator[]	returns specified sub-match
prefix	returns sub-sequence between the beginning of the target sequence and the beginning of the full match
suffix	returns sub-sequence between the end of the full match and the end of the target sequence
begin/cbegin	returns iterator to the beginning of the list of sub-matches
end/cend	returns iterator to the end of the list of sub-matches
format	formats match results for output
swap	swaps the contents

Non-member functions

Function	Definition
operator==/operator!= (removed in C++20)	lexicographically compares the values in the two match result
std::swap(std::match_results)(C++11)	specializes the std::swap algorithm

Important

std::match_results<Iterator> is a template.

std::smatch = std::match_resultsstd::string::const_iterator

std::regex_iterator

template<
    class BidirIt,
    class CharT = typename std::iterator_traits<BidirIt>::value_type, // (since c++11)
    class Traits = std::regex_traits<CharT>
> class regex_iterator

std::regex_iterator is a read-only iterator that accesses the individual matches of a regular expression within the underlying character sequence. It meets the requirements of a LegacyForwardIterator, except that for dereferenceable values a and b with a == b, *a and *b will not be bound to the same object.

On construction, and on every increment, it calls std::regex_search and remembers the result (that is, saves a copy of the std::match_results<BidirIt> value). The first object may be read when the iterator is constructed or when the first dereferencing is done. Otherwise, dereferencing only returns a copy of the most recently obtained regex match.

The default-constructed std::regex_iterator is the end-of-sequence iterator. When a valid std::regex_iterator is incremented after reaching the last match (std::regex_search returns false), it becomes equal to the end-of-sequence iterator. Dereferencing or incrementing it further invokes undefined behavior.

A typical implementation of std::regex_iterator holds the begin and the end iterators for the underlying sequence (two instances of BidirIt), a pointer to the regular expression (const regex_type*), the match flags (std::regex_constants::match_flag_type), and the current match (std::match_results<BidirIt>).

Type requirements
- BidirIt must meet the requirements of LegacyBidirectionalIterator.
Specializations

Type	Definition
std::cregex_iterator	std::regex_iterator<const char*>
std::wcregex_iterator	std::regex_iterator<const wchar_t*>
std::sregex_iterator	std::regex_iteratorstd::string::const_iterator
std::wsregex_iterator	std::regex_iteratorstd::wstring::const_iterator

Member Types

Type	Definition
value_type	std::match_results
difference_type	std::ptrdiff_t
pointer	const value_type*
reference	const value_type&
iterator_category	std::forward_iterator_tag
iterator_concept	(C++20) std::input_iterator_tag
regex_type	std::basic_regex<CharT, Traits>

Data members

Member	Description
BidiIt begin (private)	the begin iterator
BidiIt end (private)	the end iterator
const regex_type* pregex (private)	a pointer to a regular expression
regex_constants::match_flag_type flags (private)	a flag
match_results match (private)	the current match

Member functions

Function	Definition
(constructor)	cibstructs the object
(destructor)	destructs the object
operator=	assigns contents
operator==/operator!= (removed in C++20)	compares two regex_iterators
operator*/operator->	accesses the current match
operator++/operator++(int)	advances the iterator o the next match

Example

#include <iostream>
#include <iterator>
#include <regex>
#include <string>

int main() {
  const std::string s = "Quick brown fox.";

  std::regex words_regex("[^\\s]+");
  auto words_begin = std::sregex_iterator(s.begin(), s.end(), words_regex);
  auto words_end = std::sregex_iterator();

  std::cout << "Found " << std::distance(words_begin, words_end) << " words:\n";

  for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
    std::smatch match = *i;
    std::string match_str = match.str();
    std::cout << match_str << "\n";
  }
}

/**
 * Found 3 words:
   Quick
   brown
   fox.
 */

std::regex_token_iterator

template<
    class BidirIt,
    class CharT = typename std::iterator_traits<BidirIt>::valuetype, // (since c++11)
    class Traits = std::regex_traits<CharT>
> class regex_token_iterator

std::regex_token_iterator is a read-only LegacyForwardIterator that accesses the individual sub-matches of every match of a regular expression within the underlying character sequence. It can also be used to access the parts of the sequence that were not matched by the given regular expression (e.g. as a tokenizer).

On construction, it constructs an std::regex_iterator and on every increment it steps through the requested sub-matches from the current match_results, incrementing the underlying std::regex_iterator when incrementing away from the last submatch.

The default-constructed std::regex_token_iterator is the end-of-sequence iterator. When a valid std::regex_token_iterator is incremented after reaching the last submatch of the last match, it becomes equal to the end-of-sequence iterator. Dereferencing or incrementing it further invokes undefined behavior.

Just before becoming the end-of-sequence iterator, a std::regex_token_iterator may become a suffix iterator, if the index -1 (non-matched fragment) appears in the list of the requested submatch indices. Such iterator, if dereferenced, returns a match_results corresponding to the sequence of characters between the last match and the end of sequence.

A typical implementation of std::regex_token_iterator holds the underlying std::regex_iterator, a container (e.g. std::vector) of the requested submatch indices, the internal counter equal to the index of the submatch, a pointer to std::sub_match, pointing at the current submatch of the current match, and a std::match_results object containing the last non-matched character sequence (used in tokenizer mode).

Type requirements
- BidirIt must meet the requirements of LegacyBidirectionalIterator.
Specializations

Type	Definition
std::cregex_token_iterator	std::regex_token_iterator<const char*>
std::wcregex_token_iterator	std::regex_token_iterator<const wchar_t*>
std::sregex_token_iterator	std::regex_token_iteratorstd::string::const_iterator
std::wsregex_token_iterator	std::regex_token_iteratorstd::wstring::const_iterator

Member functions

Function	Definition
(constructor)	constructs a new regex_token_iterator
(destructor)	destructs a regex_token_iterator, including the cached value
operator=	assigns contents
operator==/operator!= (removed in C++20)	compares two regex_token_iterators
operator*/operator->	access current submatch
operator++/operator++(int)	advances the iterator to the next submatch

Notes

It is the programmer’s responsibility to ensure that the std::basic_regex object passed to the iterator’s constructor outlives the iterator. Because the iterator stores a std::regex_iterator which stores a pointer to the regex, incrementing the iterator after the regex was destroyed results in undefined behavior.

Example

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <regex>


int main() {

  // Tokenization (non-match fragments)
  // Note that regex is matched only two times; when the third value is obtained
  // the iterator is a suffix iterator.
  const std::string text = "Quick brown fox.";
  const std::regex ws_re("\\s+"); // whitespace
  std::copy(std::sregex_token_iterator(text.begin(), text.end(), ws_re, -1),
            std::sregex_token_iterator(),
            std::ostream_iterator<std::string>(std::cout, "\n")
           );
  std::cout << "\n";

  // Iterating the first submatches
  const std::string html = R"(<p><a href="http://google.com">google</a> )"
                           R"(< a HREF ="http://cppreference.com">cppreference</a>\n</p>)";

  const std::regex url_re(R"!!(<\s*A\s+[^>]*href\s*=\s*"([^"]*)")!!", std::regex::icase);

  std::copy(std::sregex_token_iterator(html.begin(), html.end(), url_re, 1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout, "\n"));


  return 0;
}

/**
 * Output
 * Quick
   brown
   fox.

   http://google.com
   http://cppreference.com
 */

std::regex_error

Member functions

Function	Definition
(constructor)	constructs a regex_error_object
operator=	replaces the regex_error object
code	gets the std::regex_constants::error_type for a regex_error

Example

#include <iostream>
#include <regex>

int main() {
  try {
    std::regex re("[a-b][a");
  } catch (const std::regex_error &e) {
    std::cout << "regex_error caught: " << e.what() << "\n";
    if (e.code() == std::regex_constants::error_brack) {
      std::cout << "The code was error_brack\n";
    }
  }
}

/**
 * Output
 * regex_error caught: Unexpected character within '[...]' in regular expression
   The code was error_brack
 */

std::regex_traits

// since C++11
template< class charT >
class regex_traits;

The type trait template regex_traits supplies std::basic_regex with the set of types and functions necessary to operate on the type CharT.

Since many of regex operations are locale-sensitive (when std::regex_constants::collate flag is set), the regex_traits class typically holds an instance of a std::locale as a private member.

standard specializations
- std::regex_traits
- std::regex_traits<wchar_t>
Member types

Type	Definition
char_type	CharT
string_type	std::basic_string
locale_type	The locale used for localized behavior in the regular expression. Must be CopyConstructible
char_class_type	Represents a character classification and is capable of holding an implementation specific set returned by lookup_classname. Must be a BitmaskType.

Member functions

Function	Definition
(constructor)	constructs the regex_traits object
length_[static]	calculates the length of a null-terminated character string
translate	determines the equivalence key for a characte
translate_nocase	determines the case-insensitive equivalence key for a character
transform	determines the sort key for the given string, used to provide collation order
transform_primary	determines the primary sort key for the character sequence, used to determine equivalence class
lookup_collatename	gets a collation element by name
lookup_classname	gets a character class by name
isctype	indicates membership in a localized character class
value	translates the character representing a numeric digit into an integral value
imbue	sets the locale
getloc	gets the locale

Reference

1.cppreference.com

Jeremy

c++

c++Lib

FreeRTOS

golang

Javascript

Webpack

Vite

Webassembly

MCU

Protocol

ML

Compilation

DataStructure

Algorithm

MC_MP_Programming

ResearchMethod

PrivacySecurity

EfficientAlgorithm

AdvancedAlgorithmicTech

AlgorithmGameTheory

LowCodeProject

ComputerCompose

Network

LinearMath

OperationSystem

Mathmatic