Movatterモバイル変換


[0]ホーム

URL:


man7.org > Linux >man-pages

Linux/UNIX system programming training


regcomp(3p) — Linux manual page

PROLOG |NAME |SYNOPSIS |DESCRIPTION |RETURN VALUE |ERRORS |EXAMPLES |APPLICATION USAGE |RATIONALE |FUTURE DIRECTIONS |SEE ALSO |COPYRIGHT

REGCOMP(3P)             POSIX Programmer's ManualREGCOMP(3P)

PROLOG        top

       This manual page is part of the POSIX Programmer's Manual.  The       Linux implementation of this interface may differ (consult the       corresponding Linux manual page for details of Linux behavior), or       the interface may not be implemented on Linux.

NAME        top

       regcomp, regerror, regexec, regfree — regular expression matching

SYNOPSIS        top

       #include <regex.h>       int regcomp(regex_t *restrictpreg, const char *restrictpattern,           intcflags);       size_t regerror(interrcode, const regex_t *restrictpreg,           char *restricterrbuf, size_terrbuf_size);       int regexec(const regex_t *restrictpreg, const char *restrictstring,           size_tnmatch, regmatch_tpmatch[restrict], inteflags);       void regfree(regex_t *preg);

DESCRIPTION        top

       These functions interpretbasic andextended regular expressions       as described in the Base Definitions volume of POSIX.1‐2017,Chapter 9,Regular Expressions.       Theregex_tstructure is defined in<regex.h> and contains at       least the following member:         ┌───────────────┬──────────────┬───────────────────────────┐         │Member TypeMember NameDescription│         ├───────────────┼──────────────┼───────────────────────────┤         │size_tre_nsub       │ Number of parenthesized   │         │               │              │ subexpressions.           │         └───────────────┴──────────────┴───────────────────────────┘       Theregmatch_tstructure is defined in<regex.h> and contains at       least the following members:         ┌───────────────┬──────────────┬───────────────────────────┐         │Member TypeMember NameDescription│         ├───────────────┼──────────────┼───────────────────────────┤         │regoff_trm_so         │ Byte offset from start of │         │               │              │string to start of        │         │               │              │ substring.                │         │regoff_trm_eo         │ Byte offset from start of │         │               │              │string of the first       │         │               │              │ character after the end   │         │               │              │ of substring.             │         └───────────────┴──────────────┴───────────────────────────┘       Theregcomp() function shall compile the regular expression       contained in the string pointed to by thepattern argument and       place the results in the structure pointed to bypreg.  Thecflags       argument is the bitwise-inclusive OR of zero or more of the       following flags, which are defined in the<regex.h> header:       REG_EXTENDED  Use Extended Regular Expressions.       REG_ICASE     Ignore case in match (see the Base Definitions                     volume of POSIX.1‐2017,Chapter 9,RegularExpressions).       REG_NOSUB     Report only success/fail inregexec().       REG_NEWLINE   Change the handling of <newline> characters, as                     described in the text.       The default regular expression type forpattern is a Basic Regular       Expression. The application can specify Extended Regular       Expressions using the REG_EXTENDEDcflags flag.       If the REG_NOSUB flag was not set incflags, thenregcomp() shall       setre_nsub to the number of parenthesized subexpressions       (delimited by"\(\)"in basic regular expressions or"()"in       extended regular expressions) found inpattern.       Theregexec() function compares the null-terminated string       specified bystring with the compiled regular expressionpreg       initialized by a previous call toregcomp().  If it finds a match,regexec() shall return 0; otherwise, it shall return non-zero       indicating either no match or an error. Theeflags argument is the       bitwise-inclusive OR of zero or more of the following flags, which       are defined in the<regex.h> header:       REG_NOTBOL    The first character of the string pointed to bystring is not the beginning of the line. Therefore,                     the <circumflex> character ('^'), when taken as a                     special character, shall not match the beginning ofstring.       REG_NOTEOL    The last character of the string pointed to bystring is not the end of the line. Therefore, the                     <dollar-sign> ('$'), when taken as a special                     character, shall not match the end ofstring.       Ifnmatch is 0 or REG_NOSUB was set in thecflags argument toregcomp(), thenregexec() shall ignore thepmatch argument.       Otherwise, the application shall ensure that thepmatch argument       points to an array with at leastnmatch elements, andregexec()       shall fill in the elements of that array with offsets of the       substrings ofstring that correspond to the parenthesized       subexpressions ofpattern:pmatch[i].rm_so shall be the byte       offset of the beginning andpmatch[i].rm_eo shall be one greater       than the byte offset of the end of substringi.  (Subexpressioni       begins at theith matched open parenthesis, counting from 1.)       Offsets inpmatch[0] identify the substring that corresponds to       the entire regular expression. Unused elements ofpmatch up topmatch[nmatch-1] shall be filled with -1. If there are more thannmatch subexpressions inpattern (pattern itself counts as a       subexpression), thenregexec() shall still do the match, but shall       record only the firstnmatch substrings.       When matching a basic or extended regular expression, any given       parenthesized subexpression ofpattern might participate in the       match of several different substrings ofstring, or it might not       match any substring even though the pattern as a whole did match.       The following rules shall be used to determine which substrings to       report inpmatch when matching regular expressions:        1. If subexpressioni in a regular expression is not contained           within another subexpression, and it participated in the match           several times, then the byte offsets inpmatch[i] shall           delimit the last such match.        2. If subexpressioni is not contained within another           subexpression, and it did not participate in an otherwise           successful match, the byte offsets inpmatch[i] shall be -1. A           subexpression does not participate in the match when:'*'or"\{\}"appears immediately after the subexpression in a           basic regular expression, or'*','?', or"{}"appears           immediately after the subexpression in an extended regular           expression, and the subexpression did not match (matched 0           times)           or:'|'is used in an extended regular expression to select                  this subexpression or another, and the other                  subexpression matched.        3. If subexpressioni is contained within another subexpressionj, andi is not contained within any other subexpression that           is contained withinj, and a match of subexpressionj is           reported inpmatch[j], then the match or non-match of           subexpressioni reported inpmatch[i] shall be as described in           1. and 2. above, but within the substring reported inpmatch[j] rather than the whole string. The offsets inpmatch[i] are still relative to the start ofstring.        4. If subexpressioni is contained in subexpressionj, and the           byte offsets inpmatch[j] are -1, then the pointers inpmatch[i] shall also be -1.        5. If subexpressioni matched a zero-length string, then both           byte offsets inpmatch[i] shall be the byte offset of the           character or null terminator immediately following the zero-           length string.       If, whenregexec() is called, the locale is different from when       the regular expression was compiled, the result is undefined.       If REG_NEWLINE is not set incflags, then a <newline> inpattern       orstring shall be treated as an ordinary character. If       REG_NEWLINE is set, then <newline> shall be treated as an ordinary       character except as follows:        1. A <newline> instring shall not be matched by a <period>           outside a bracket expression or by any form of a non-matching           list (see the Base Definitions volume of POSIX.1‐2017,Chapter9,Regular Expressions).        2. A <circumflex> ('^') inpattern, when used to specify           expression anchoring (see the Base Definitions volume of           POSIX.1‐2017,Section 9.3.8,BRE Expression Anchoring), shall           match the zero-length string immediately after a <newline> instring, regardless of the setting of REG_NOTBOL.        3. A <dollar-sign> ('$') inpattern, when used to specify           expression anchoring, shall match the zero-length string           immediately before a <newline> instring, regardless of the           setting of REG_NOTEOL.       Theregfree() function frees any memory allocated byregcomp()       associated withpreg.       The following constants are defined as the minimum set of error       return values, although other errors listed as implementation       extensions in<regex.h> are possible:       REG_BADBR     Content of"\{\}"invalid: not a number, number too                     large, more than two numbers, first larger than                     second.       REG_BADPAT    Invalid regular expression.       REG_BADRPT'?','*', or'+'not preceded by valid regular                     expression.       REG_EBRACE"\{\}"imbalance.       REG_EBRACK"[]"imbalance.       REG_ECOLLATE  Invalid collating element referenced.       REG_ECTYPE    Invalid character class type referenced.       REG_EESCAPE   Trailing <backslash> character in pattern.       REG_EPAREN"\(\)"or"()"imbalance.       REG_ERANGE    Invalid endpoint in range expression.       REG_ESPACE    Out of memory.       REG_ESUBREG   Number in"\digit"invalid or in error.       REG_NOMATCHregexec() failed to match.       If more than one error occurs in processing a function call, any       one of the possible constants may be returned, as the order of       detection is unspecified.       Theregerror() function provides a mapping from error codes       returned byregcomp() andregexec() to unspecified printable       strings. It generates a string corresponding to the value of theerrcode argument, which the application shall ensure is the last       non-zero value returned byregcomp() orregexec() with the given       value ofpreg.  Iferrcode is not such a value, the content of the       generated string is unspecified.       Ifpreg is a null pointer, buterrcode is a value returned by a       previous call toregexec() orregcomp(), theregerror() still       generates an error string corresponding to the value oferrcode,       but it might not be as detailed under some implementations.       If theerrbuf_size argument is not 0,regerror() shall place the       generated string into the buffer of sizeerrbuf_size bytes pointed       to byerrbuf.  If the string (including the terminating null)       cannot fit in the buffer,regerror() shall truncate the string and       null-terminate the result.       Iferrbuf_size is 0,regerror() shall ignore theerrbuf argument,       and return the size of the buffer needed to hold the generated       string.       If thepreg argument toregexec() orregfree() is not a compiled       regular expression returned byregcomp(), the result is undefined.       Apreg is no longer treated as a compiled regular expression after       it is given toregfree().

RETURN VALUE        top

       Upon successful completion, theregcomp() function shall return 0.       Otherwise, it shall return an integer value indicating an error as       described in<regex.h>, and the content ofpreg is undefined. If a       code is returned, the interpretation shall be as given in<regex.h>.       Ifregcomp() detects an invalid RE, it may return REG_BADPAT, or       it may return one of the error codes that more precisely describes       the error.       Upon successful completion, theregexec() function shall return 0.       Otherwise, it shall return REG_NOMATCH to indicate no match.       Upon successful completion, theregerror() function shall return       the number of bytes needed to hold the entire generated string,       including the null termination. If the return value is greater       thanerrbuf_size, the string returned in the buffer pointed to byerrbuf has been truncated.       Theregfree() function shall not return a value.

ERRORS        top

       No errors are defined.The following sections are informative.

EXAMPLES        top

           #include <regex.h>           /*            * Match string against the extended regular expression in            * pattern, treating errors as no match.            *            * Return 1 for match, 0 for no match.            */           int           match(const char *string, char *pattern)           {               int    status;               regex_t    re;               if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {                   return(0);      /* Report error. */               }               status = regexec(&re, string, (size_t) 0, NULL, 0);               regfree(&re);               if (status != 0) {                   return(0);      /* Report error. */               }               return(1);           }       The following demonstrates how the REG_NOTBOL flag could be used       withregexec() to find all substrings in a line that match a       pattern supplied by a user.  (For simplicity of the example, very       little error checking is done.)           (void) regcomp (&re, pattern, 0);           /* This call to regexec() finds the first match on the line. */           error = regexec (&re, &buffer[0], 1, &pm, 0);           while (error == 0) {  /* While matches found. */               /* Substring found between pm.rm_so and pm.rm_eo. */               /* This call to regexec() finds the next match. */               error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);           }

APPLICATION USAGE        top

       An application could use:           regerror(code,preg,(char *)NULL,(size_t)0)       to find out how big a buffer is needed for the generated string,malloc() a buffer to hold the string, and then callregerror()       again to get the string. Alternatively, it could allocate a fixed,       static buffer that is big enough to hold most strings, and then       usemalloc() to allocate a larger buffer if it finds that this is       too small.       To match a pattern as described in the Shell and Utilities volume       of POSIX.1‐2017,Section 2.13,Pattern Matching Notation, use thefnmatch() function.

RATIONALE        top

       Theregexec() function must fill in allnmatch elements ofpmatch,       wherenmatch andpmatch are supplied by the application, even if       some elements ofpmatch do not correspond to subexpressions inpattern.  The application developer should note that there is       probably no reason for using a value ofnmatch that is larger thanpreg->re_nsub+1.       The REG_NEWLINE flag supports a use of RE matching that is needed       in some applications like text editors. In such applications, the       user supplies an RE asking the application to find a line that       matches the given expression. An anchor in such an RE anchors at       the beginning or end of any line. Such an application can pass a       sequence of <newline>-separated lines toregexec() as a single       long string and specify REG_NEWLINE toregcomp() to get the       desired behavior. The application must ensure that there are no       explicit <newline> characters inpattern if it wants to ensure       that any match occurs entirely within a single line.       The REG_NEWLINE flag affects the behavior ofregexec(), but it is       in thecflags parameter toregcomp() to allow flexibility of       implementation. Some implementations will want to generate the       same compiled RE inregcomp() regardless of the setting of       REG_NEWLINE and haveregexec() handle anchors differently based on       the setting of the flag. Other implementations will generate       different compiled REs based on the REG_NEWLINE.       The REG_ICASE flag supports the operations taken by thegrep-i       option and the historical implementations ofex andvi.  Including       this flag will make it easier for application code to be written       that does the same thing as these utilities.       The substrings reported inpmatch[] are defined using offsets from       the start of the string rather than pointers. This allows type-       safe access to both constant and non-constant strings.       The typeregoff_tis used for the elements ofpmatch[] to ensure       that the application can represent large arrays in memory       (important for an application conforming to the Shell and       Utilities volume of POSIX.1‐2017).       The 1992 edition of this standard requiredregoff_tto be at least       as wide asoff_t, to facilitate future extensions in which the       string to be searched is taken from a file. However, these future       extensions have not appeared.  The requirement rules out popular       implementations with 32-bitregoff_tand 64-bitoff_t, so it has       been removed.       The standard developers rejected the inclusion of aregsub()       function that would be used to do substitutions for a matched RE.       While such a routine would be useful to some applications, its       utility would be much more limited than the matching function       described here. Both RE parsing and substitution are possible to       implement without support other than that required by the ISO C       standard, but matching is much more complex than substituting. The       only difficult part of substitution, given the information       supplied byregexec(), is finding the next character in a string       when there can be multi-byte characters. That is a much larger       issue, and one that needs a more general solution.       Theerrno variable has not been used for error returns to avoid       filling theerrno name space for this feature.       The interface is defined so that the matched substringsrm_sp andrm_ep are in a separateregmatch_tstructure instead of inregex_t.  This allows a single compiled RE to be used       simultaneously in several contexts; inmain() and a signal       handler, perhaps, or in multiple threads of lightweight processes.       (Thepreg argument toregexec() is declared with typeconst, so       the implementation is not permitted to use the structure to store       intermediate results.) It also allows an application to request an       arbitrary number of substrings from an RE. The number of       subexpressions in the RE is reported inre_nsub inpreg.  With       this change toregexec(), consideration was given to dropping the       REG_NOSUB flag since the user can now specify this with a zeronmatch argument toregexec().  However, keeping REG_NOSUB allows       an implementation to use a different (perhaps more efficient)       algorithm if it knows inregcomp() that no subexpressions need be       reported. The implementation is only required to fill inpmatch ifnmatch is not zero and if REG_NOSUB is not specified. Note that       thesize_ttype, as defined in the ISO C standard, is unsigned, so       the description ofregexec() does not need to address negative       values ofnmatch.       REG_NOTBOL was added to allow an application to do repeated       searches for the same pattern in a line. If the pattern contains a       <circumflex> character that should match the beginning of a line,       then the pattern should only match when matched against the       beginning of the line.  Without the REG_NOTBOL flag, the       application could rewrite the expression for subsequent matches,       but in the general case this would require parsing the expression.       The need for REG_NOTEOL is not as clear; it was added for       symmetry.       The addition of theregerror() function addresses the historical       need for conforming application programs to have access to error       information more than ``Function failed to compile/match your RE       for unknown reasons''.       This interface provides for two different methods of dealing with       error conditions. The specific error codes (REG_EBRACE, for       example), defined in<regex.h>, allow an application to recover       from an error if it is so able. Many applications, especially       those that use patterns supplied by a user, will not try to deal       with specific error cases, but will just useregerror() to obtain       a human-readable error message to present to the user.       Theregerror() function uses a scheme similar toconfstr() to deal       with the problem of allocating memory to hold the generated       string. The scheme used bystrerror() in the ISO C standard was       considered unacceptable since it creates difficulties for multi-       threaded applications.       Thepreg argument is provided toregerror() to allow an       implementation to generate a more descriptive message than would       be possible witherrcode alone. An implementation might, for       example, save the character offset of the offending character of       the pattern in a field ofpreg, and then include that in the       generated message string. The implementation may also ignorepreg.       A REG_FILENAME flag was considered, but omitted. This flag causedregexec() to match patterns as described in the Shell and       Utilities volume of POSIX.1‐2017,Section 2.13,Pattern MatchingNotation instead of REs. This service is now provided by thefnmatch() function.       Notice that there is a difference in philosophy between the       ISO POSIX‐2:1993 standard and POSIX.1‐2008 in how to handle a       ``bad'' regular expression. The ISO POSIX‐2:1993 standard says       that many bad constructs ``produce undefined results'', or that       ``the interpretation is undefined''. POSIX.1‐2008, however, says       that the interpretation of such REs is unspecified. The term       ``undefined'' means that the action by the application is an       error, of similar severity to passing a bad pointer to a function.       Theregcomp() andregexec() functions are required to accept any       null-terminated string as thepattern argument. If the meaning of       the string is ``undefined'', the behavior of the function is       ``unspecified''. POSIX.1‐2008 does not specify how the functions       will interpret the pattern; they might return error codes, or they       might do pattern matching in some completely unexpected way, but       they should not do something like abort the process.

FUTURE DIRECTIONS        top

       None.

SEE ALSO        top

fnmatch(3p),glob(3p)       The Base Definitions volume of POSIX.1‐2017,Chapter 9,RegularExpressions,regex.h(0p),sys_types.h(0p)       The Shell and Utilities volume of POSIX.1‐2017,Section 2.13,Pattern Matching Notation

COPYRIGHT        top

       Portions of this text are reprinted and reproduced in electronic       form from IEEE Std 1003.1-2017, Standard for Information       Technology -- Portable Operating System Interface (POSIX), The       Open Group Base Specifications Issue 7, 2018 Edition, Copyright       (C) 2018 by the Institute of Electrical and Electronics Engineers,       Inc and The Open Group.  In the event of any discrepancy between       this version and the original IEEE and The Open Group Standard,       the original IEEE and The Open Group Standard is the referee       document. The original Standard can be obtained online athttp://www.opengroup.org/unix/online.html .       Any typographical or formatting errors that appear in this page       are most likely to have been introduced during the conversion of       the source files to man page format. To report such errors, seehttps://www.kernel.org/doc/man-pages/reporting_bugs.html .IEEE/The Open Group                2017REGCOMP(3P)

Pages that refer to this page:regex.h(0p)



HTML rendering created 2025-09-06 byMichael Kerrisk, author ofThe Linux Programming Interface.

For details of in-depthLinux/UNIX system programming training courses that I teach, lookhere.

Hosting byjambit GmbH.

Cover of TLPI


[8]ページ先頭

©2009-2025 Movatter.jp