Movatterモバイル変換


[0]ホーム

URL:


    grapheme_str_split »
    « Grapheme Functions

    grapheme_extract

    (PHP 5 >= 5.3.0, PHP 7, PHP 8, PECL intl >= 1.0.0)

    grapheme_extractFunction to extract a sequence of default grapheme clusters from a text buffer, which must be encoded in UTF-8

    Description

    Procedural style

    grapheme_extract(
        string$haystack,
        int$size,
        int$type =GRAPHEME_EXTR_COUNT,
        int$offset = 0,
        int&$next =null
    ):string|false

    Function to extract a sequence of default grapheme clusters from a text buffer, which must be encoded in UTF-8.

    Parameters

    haystack

    String to search.

    size

    Maximum number items - based on thetype - to return.

    type

    Defines the type of units referred to by thesize parameter:

    • GRAPHEME_EXTR_COUNT (default) -size is the number of default grapheme clusters to extract.
    • GRAPHEME_EXTR_MAXBYTES -size is the maximum number of bytes returned.
    • GRAPHEME_EXTR_MAXCHARS -size is the maximum number of UTF-8 characters returned.

    offset

    Starting position inhaystack in bytes - if given, it must be zero or a positive value that is less than or equal to the length ofhaystack in bytes, or a negative value that counts from the end ofhaystack. Ifoffset does not point to the first byte of a UTF-8 character, the start position is moved to the next character boundary.

    next

    Reference to a value that will be set to the next starting position. When the call returns, this may point to the first byte position past the end of the string.

    Return Values

    A string starting at offsetoffset and ending on a default grapheme cluster boundary that conforms to thesize andtype specified, orfalse on failure.

    Changelog

    VersionDescription
    7.1.0 Support for negativeoffsets has been added.

    Examples

    Example #1grapheme_extract() example

    <?php

    $char_a_ring_nfd
    ="a\xCC\x8A";// 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) normalization form "D"
    $char_o_diaeresis_nfd="o\xCC\x88";// 'LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6) normalization form "D"

    printurlencode(grapheme_extract($char_a_ring_nfd.$char_o_diaeresis_nfd,1,GRAPHEME_EXTR_COUNT,2));

    ?>

    The above example will output:

    o%CC%88

    See Also

    Found A Problem?

    Learn How To Improve This PageSubmit a Pull RequestReport a Bug
    add a note

    User Contributed Notes3 notes

    AJH
    14 years ago
    Here's how to use grapheme_extract() to loop across a UTF-8 string character by character.

    <?php

    $str
    ="سabcक’…";
    // if the previous line didn't come through, the string contained:
    //U+0633,U+0061,U+0062,U+0063,U+0915,U+2019,U+2026

    $n=0;

    for (
    $start=0,$next=0,$maxbytes=strlen($str),$c='';
    $start<$maxbytes;
    $c=grapheme_extract($str,1,GRAPHEME_EXTR_MAXCHARS, ($start=$next),$next)
    )
    {
    if (empty(
    $c))
    continue;
    echo
    "This utf8 character is ".strlen($c) ." bytes long and its first byte is ".ord($c[0]) ."\n";
    $n++;
    }
    echo
    "$n UTF-8 characters in a string of$maxbytes bytes!\n";
    // Should print: 7 UTF8 characters in a string of 14 bytes!
    ?>
    Philo
    1 year ago
    The other comments on this page were helpful for me.
    However, consider using something better than empty($value) when checking the value returned by grapheme_extract since it could as well return something like "0" (which of course evaluates to false).
    yevgen dot grytsay at gmail dot com
    4 years ago
    Looping through grapheme clusters:

    <?php

    // Example taken from Rust documentation:https://doc.rust-lang.org/book/ch08-02-strings.html#bytes-and-scalar-values-and-grapheme-clusters-oh-my
    $str="नमस्ते";
    // Alternatively:
    //$str = pack('C*', ...[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135]);
    $next=0;
    $maxbytes=strlen($str);

    var_dump($str);

    while (
    $next<$maxbytes) {
    $char=grapheme_extract($str,1,GRAPHEME_EXTR_COUNT,$next,$next);
    if (empty(
    $char)) {
    continue;
    }
    echo
    "{$char} - This utf8 character is ".strlen($char) .' bytes long',PHP_EOL;
    }

    //string(18) "नमस्ते"
    //न - This utf8 character is 3 bytes long
    //म - This utf8 character is 3 bytes long
    //स् - This utf8 character is 6 bytes long
    //ते - This utf8 character is 6 bytes long
    ?>
    add a note
    To Top
    and to navigate •Enter to select •Esc to close
    PressEnter without selection to search using Google

    [8]ページ先頭

    ©2009-2025 Movatter.jp