Notes - Jeremy

<Regex>

This header is part of the text processing library.

std::basic_regex

Defined in header <regex>

template<
  class CharT,
  class Traits = std::regex_traits<CharT>
> class basic_regex;

The class template basic_regex provides a general framework for holding regular expressions. Several typedefs for common character types are provided:

Type	Definition
std::regex	std::basic_regex
std::wregex	std::basic_regex<wchar_t>

Member types

Member type	Definition
value_type	CharT
traits_type	Traits
string_type	Traits::string_type
locale_type	Traits::locale_type
flag_type	std::regex_constants::syntax_option_type

Member functions

Function	Definition
(constructor)	constructs the regex object
(destructor)	destructs the regex object
operator=	assigns the contents
assign	assigns the contents
mark_count	returns the number of marked sub-expressions within the regular expression
flags	returns the syntax flags
getloc	get locale information
imbue	set locale information
swap	swaps the contents

Constants

Grammar option	Effect(s)
ECMAScript	Use the Modified ECMAScript regular expression grammar.
basic	Use the basic POSIX regular expression grammar (grammar documentation).
extended	Use the extended POSIX regular expression grammar (grammar documentation).
awk	Use the regular expression grammar used by the awk utility in POSIX (grammar documentation).
grep	Use the regular expression grammar used by the grep utility in POSIX. This is effectively the same as the basic option with the addition of newline ‘\n’ as an alternation separator.
egrep	Use the regular expression grammar used by the grep utility, with the -E option, in POSIX. This is effectively the same as the extended option with the addition of newline ‘\n’ as an alternation separator in addition to ’ \| '.
Grammar variation	Effect(s)
-----------------	---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
icase	Character matching should be performed without regard to case.
nosubs	When performing matches, all marked sub-expressions (expr) are treated as non-marking sub-expressions (?:expr). No matches are stored in the supplied std::regex_match structure and mark_count() is zero.
optimize	Instructs the regular expression engine to make matching faster, with the potential cost of making construction slower. For example, this might mean converting a non-deterministic FSA to a deterministic FSA.
collate	Character ranges of the form “[a-b]” will be locale sensitive.
multiline (c++17)	Specifies that ^ shall match the beginning of a line and $ shall match the end of a line, if the ECMAScript engine is selected.

At most one grammar option can be chosen out of ECMAScript, basic, extended, awk, grep, egrep. If no grammar is chosen, ECMAScript is assumed to be selected. The other options serve as variations, such that

std::regex("meow", std::regex:icase) is equivalent to std::regex("meow", std::regex::ECMAScript|std::regex::icase)

The member constants in basic_regex are duplicates of the syntax_option_type constants defined in the namespace std::regex_constants.

Non-member functions

Function	Definition
std::swap(std::basic_regex) (c++11)	specializes the std::swap algorithm

Dedution guides(since c++17)

// Since C++17
template< class ForwardIt >
basic_regex( ForwardIt, ForwardIt,
             std::regex_constants::syntax_option_type = std::regex_constants::ECMAScript )
-> basic_regex<typename std::iterator_traits<ForwardIt>::value_type>;

Example:

#include <regex>
#include <vector>

int main() {
  std::vector<char> v = {'a', 'b', 'c'};
  std::basic_regex re(v.begin(), v.end()); //uses explicit deduction guide
}

std::sub_match

// since c++11
template< class BidirIt >
class sub_match;

The class template std::sub_match is used by the regular expression engine to denote sequences of characters matched by marked sub-expressions. A match is a [begin, end) pair within the target range matched by the regular expression, but with additional observer functions to enhance code clarity.

Only the default constructor is publicly accessible. Instances of std::sub_match are normally constructed and populated as a part of a std::match_results container during the processing of one of the regex algorithms.

The member functions return defined default values unless the matched member is true.

std::sub_match inherits from std::pair<BidirIt, BidirIt>, although it cannot be treated as a std::pair object because member functions such as assignment will not work as expected.

Type requirements
- BidirIt must meet the requirements of LegacyBidirectionalIterator.
Specializations

Several specializations for common character sequence types are provided:

Type	Definition
std::csub_match	std::sub_match<const char*>
std::wcsub_match	std::sub_match<const wchar_t*>
std::ssub_match	std::sub_matchstd::string::const_iterator
std::wssub_match	std::sub_matchstd::wstring::const_iterator

Data members

Member	Description
_boolmatched	whether this match was successful

Inherited from std::pair BidirIt first: start of the match sequence BidirIt seond: one-past-the-end of the match sequence

Member functions

Function	Definition
(constructor)	constructs the match object
length	returns the length of the match (if any)
str operator string_type	converts to the underlying string type
compare	compares matched subsequence (if any)
swap	swaps the contents

Non-member functions

Function	Definition
operator==
operator!= (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator< (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator<= (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator> (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator>= (removed in C++20)	compares a sub_match with another sub_match, a string, or a character
operator<=> (C++20)	compares a sub_match with another sub_match, a string, or a character
operator<<	outputs the matched character subsequence

Example

#include <cassert>
#include <iostream>
#include <regex>
#include <string>

int main() {
  std::string sentence{"Friday the thirteenth."};
  const std::regex re{"([A-z]+) ([a-z]+) ([a-z]+)"};
  std::smatch words;
  std::regex_search(sentence, words, re);
  std::cout << std::boolalpha;
  for (const auto &m : words) {
    assert(m.matched);
    std::cout << "m: [" << m << "], m.length(): " << m.length() << ", "
                 "*m.first: '" << *m.first << "', "
                 "*m.second: '" << *m.second << "'\n";
  }
}

/**
 * Output:
 * m: [Friday the thirteenth], m.length(): 21, *m.first: 'F', *m.second: '.'
   m: [Friday], m.length(): 6, *m.first: 'F', *m.second: ' '
   m: [the], m.length(): 3, *m.first: 't', *m.second: ' '
   m: [thirteenth], m.length(): 10, *m.first: 't', *m.second: '.'
 */

See also

regex_token_iterator (c++11): iterates through the specified sub-expressions within all regex matches in a given string or through unmatched substrings

Reference

1.cppreference.com

Jeremy

Catalog

c++

c++Lib

FreeRTOS

golang

Javascript

Webpack

Vite

Webassembly

MCU

Protocol

ML

Compilation

DataStructure

Algorithm

MC_MP_Programming

ResearchMethod

PrivacySecurity

EfficientAlgorithm

AdvancedAlgorithmicTech

AlgorithmGameTheory

LowCodeProject

ComputerCompose

Network

LinearMath

OperationSystem

Mathmatic

<Regex>

std::basic_regex

std::sub_match

Reference