NotificationsYou must be signed in to change notification settings
Fork9.7k
Star30.8k

[String] a new component for object-oriented strings management with an abstract unit system#33553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

fabpot merged 3 commits intosymfony:masterfromnicolas-grekas:string-component

Sep 26, 2019

Merged

[String] a new component for object-oriented strings management with an abstract unit system#33553

fabpot merged 3 commits intosymfony:masterfromnicolas-grekas:string-component

Sep 26, 2019

Conversation

Copy link

Member

nicolas-grekas commentedSep 11, 2019•
edited
Loading

Q	A
Branch?	master
Bug fix?	no
New feature?	yes
Deprecations?	no
Tickets	-
License	MIT
Doc PR	symfony/symfony-docs#12376

[EDIT: classes have been renamed in#33816]

This is a reboot of#22184 (thanks@hhamon for working on it) and a generalization of my previous work on the topic (patchwork/utf8). Unlike existing libraries (includingpatchwork/utf8), this component provides a unified API for the 3 unit systems of strings: bytes, code points and grapheme clusters.

The unified API is defined by theAbstractString class. It has 2 direct child classes:BinaryString andAbstractUnicodeString, itself extended byUtf8String andGraphemeString.

All objects are immutable and provide clear edge-case semantics, using exceptions and/or (nullable) types!

Two helper functions are provided to create such strings:

newGraphemeString('foo') ==u('foo');// when dealing with Unicode, prefer grapheme unitsnewBinaryString('foo') ==b('foo');

GraphemeString is the most linguistic-friendly variant of them, which means it's the one ppl should use most of the timewhen dealing with written text.

Future ideas:

improve tests
add more docblocks (only where they'd add value!)
consider adding more methods in the string API (is*()?,*Encode()?, etc.)
first class Emoji support
merge the Inflector component into this one
usewidth() to improvetruncate() andwordwrap()
~~move methodslug() to a dedicated locale-aware service class~~ see[String] Introduce a locale-aware Slugger in the String component #33768
propose your ideas (send PRs after merge)

Out of (current) scope:

whatintl provides (collations, transliterations, confusables, segmentation, etc)

Here is the unified API I'm proposing in this PR, borrowed from looking at many existing libraries, but also Java, Python, JavaScript and Go.

function __construct(string$string ='');staticfunction unwrap(array$values): arraystaticfunction wrap(array$values):arrayfunction after($needle, bool$includeNeedle = false, int$offset =0): self;function afterLast($needle,bool$includeNeedle =false,int$offset =0):self;function append(string ...$suffix):self;function before($needle,bool$includeNeedle =false,int$offset =0):self;function beforeLast($needle,bool$includeNeedle =false,int$offset =0):self;function camel():self;function chunk(int$length =1):array;function collapseWhitespace():selffunction endsWith($suffix): bool;function ensureEnd(string$suffix):self;function ensureStart(string$prefix):self;function equalsTo($string):bool;function folded():self;function ignoreCase():self;function indexOf($needle,int$offset =0): ?int;function indexOfLast($needle,int$offset =0): ?int;function isEmpty():bool;function join(array$strings):self;function jsonSerialize():string;function length():int;function lower():self;function match(string$pattern,int$flags =0,int$offset =0):array;function padBoth(int$length,string$padStr =''):self;function padEnd(int$length,string$padStr =''):self;function padStart(int$length,string$padStr =''):self;function prepend(string ...$prefix):self;function repeat(int$multiplier):self;function replace(string$from,string$to):self;function replaceMatches(string$fromPattern,$to):self;function slice(int$start =0,int$length =null):self;function snake():self;function splice(string$replacement,int$start =0,int$length =null):self;function split(string$delimiter,int$limit =null,int$flags =null):array;function startsWith($prefix):bool;function title(bool$allWords =false):self;function toBinary(string$toEncoding =null):BinaryString;function toGrapheme():GraphemeString;function toUtf8():Utf8String;function trim(string$chars ="\t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"):self;function trimEnd(string$chars ="\t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"):self;function trimStart(string$chars ="\t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"):self;function truncate(int$length,string$ellipsis =''):self;function upper():self;function width(bool$ignoreAnsiDecoration =true):int;function wordwrap(int$width =75,string$break ="\n",bool$cut =false):self;function __clone();function __toString():string;

AbstractUnicodeString adds these:

staticfunctionfromCodePoints(int ...$codes): self;function ascii(array$rules = []):self;function codePoint(int$index =0): ?int;function folded(bool$compat =true):parent;function normalize(int$form =self::NFC):self;function slug(string$separator ='-'):self;

andBinaryString:

staticfunction fromRandom(int$length =16): self;function byteCode(int$index =0): ?int;function isUtf8():bool;function toUtf8(string$fromEncoding =null):Utf8String;function toGrapheme(string$fromEncoding =null):GraphemeString;

Case insensitive operations are done with theignoreCase() method.
e.g.b('abc')->ignoreCase()->indexOf('B') will return1.

For reference, CLDR transliterations (used in theascii() method) are defined here:
https://github.com/unicode-org/cldr/tree/master/common/transforms

julienbourdeau, ro0NL, you-ser, keichinger, kunicmarko20, mickaelandrieu, DavidFeller, romainnorberg, fbourigault, deleugpn, and 109 more reacted with thumbs up emoji

brzuchal, funivan, babeuloula, gsdevme, amodliszewski, kaznovac, ayanozturk, NickvdMeij, mario-deluna, jmoz, and 6 more reacted with thumbs down emoji

linaori, fbourigault, OskarStark, reyostallenberg, babeuloula, javiereguiluz, bobey, ayanozturk, decima, atailouloute, and 7 more reacted with hooray emoji

julienbourdeau, Devristo, mnavarrocarter, romainnorberg, fbourigault, jeremyFreeAgent, ogizanagi, OskarStark, kubk, gocanto, and 16 more reacted with heart emoji

julienbourdeau, romainnorberg, fbourigault, fancyweb, knmitov, jjanvier, jeremyFreeAgent, ogizanagi, OskarStark, javiereguiluz, and 6 more reacted with rocket emoji

romainnorberg reacted with eyes emoji

nicolas-grekas added this to thenext milestone

Sep 11, 2019

nicolas-grekas mentioned this pull request

Sep 11, 2019

[Utf8] New component with Bytes, CodePoints and Graphemes implementations of string objects#22184

Closed

nicolas-grekas force-pushed thestring-component branch 2 times, most recently from6a1137d toc613662Compare

September 11, 2019 10:25

Copy link

Member

stof commentedSep 11, 2019

no way to unignore case ? and no way to know whether the current object is ignoring case ? This makes the API unusable for code wanting to deal with the string in a case sensitive way while accepting an external string object.

Also should we merge this new component in 4.4, which would mean that its first release is already non-experimental ? We are not allowed to have experimental components in LTS versions, per our LTS policy.

Copy link

MemberAuthor

nicolas-grekas commentedSep 11, 2019•
edited
Loading

no way to unignore case ? and no way to know whether the current object is ignoring case ?

->ignoreCase() applies only to the very next call in the fluent API chain. This should answer both your questions. SeeAbstractString::__clone()

Also should we merge this new component in 4.4

That's something we need to decide indeed. On my side, I think we can make it non-experimental.

Copy link

Member

stof commentedSep 11, 2019

and what happens for all methods accepting a string as argument, when passing non-UTF-8 strings to the method on aUtf8String orGraphemeString ?

Regarding the naming, shouldUtf8String be renamed to highlight it is about code points ? AFAIK,GraphemeString also expects the string to be in UTF-8.

Note that these comments are based purely on your PR description. I haven't looked at the code yet.

nicolas-grekas force-pushed thestring-component branch fromc613662 to8945735Compare

September 11, 2019 10:34

Copy link

MemberAuthor

nicolas-grekas commentedSep 11, 2019•
edited
Loading

what happens for all methods accepting a string as argument, when passing non-UTF-8 strings to the method on a Utf8String or GraphemeString ?

anInvalidArgumentException is thrown

should Utf8String be renamed to highlight it is about code points ?

I think UTF-8 is more common vocabulary. The previous PR usedCodePoint indeed, but this is cryptic to many, and doesn't convey the technical encoding scheme (it could use UTF-16BE/LE, etc., nothing would tell). That's why I thinkUtf8String is better.

stof reviewed

Sep 11, 2019

View reviewed changes

src/Symfony/Component/String/CHANGELOG.md OutdatedShow resolvedHide resolved

Copy link

Member

stof commentedSep 11, 2019

@nicolas-grekas but the whole component is about UTF-8 strings. AFAICT, even BinaryString is not meant to operate on other encodings, as it does not validate that the string is valid UTF-8 before converting it to other implementations.
Btw, this means that naming the componentString might also be too generic.

ro0NL reviewed

Sep 11, 2019

View reviewed changes

src/Symfony/Component/String/AbstractString.php OutdatedShow resolvedHide resolved

Copy link

MemberAuthor

nicolas-grekas commentedSep 11, 2019•
edited
Loading

BinaryString is not meant to operate on other encodings, as it does not validate that the string is valid UTF-8 before converting it to other implementations

It does, dunno why you think otherwise. If you try to convert a random binary string to UTF-8/Grapheme, you'll get anInvalidArgumentException too.

BinaryString is what it tells: it can handle any binary strings and doesn't care about the encoding, like the native PHP string functions, just using an OOP API.

Thus the name of the component.

Copy link

Member

fabpot commentedSep 11, 2019

To me, this should go as experimental in 5.0.

Copy link

Contributor

drupol commentedSep 11, 2019

Definitely supporting this :-) Nice !

ro0NL reviewed

Sep 11, 2019

View reviewed changes

src/Symfony/Component/String/AbstractString.php OutdatedShow resolvedHide resolved

Copy link

Member

javiereguiluz commentedSep 11, 2019•
edited by nicolas-grekas
Loading

Sorry to sound naive, but I can't find in this pull request or the previous one, some brief explanation about when/where should developers use this.

Why/when should we use these classes/methods instead of the normal str_ PHP functions or the mb_str UTF8 functions? Thanks!

Note: I'm not questioning this ... I just want to know where this fits in Symfony developers and Symfony itself. Thanks a lot!

edit: see#33553 (comment)

nicolas-grekas force-pushed thestring-component branch 2 times, most recently from5ccd5a7 tof9b903bCompare

September 11, 2019 10:54

Copy link

Member

javiereguiluz commentedSep 11, 2019

For your consideration, we could turn these 4 methods:

function ensureLeft(string$prefix):selffunction ensureRight(string$suffix): selffunction padLeft(int$length, string$padStr = ' '): selffunction padRight(int$length, string$padStr = ' '): self

Into these 2 methods if we change the order of the arguments:

function padLeft(string$padStr ='',int$length =null):selffunction padRight(string$padStr = ' ', int $length = null): self

Example:

// BEFORE$s1 =u('lorem')->ensureLeft('abc');// $s1 = 'abclorem'$s2 =u('lorem')->ensureRight('abc');// $s2 = 'loremabc'$s3 =u('lorem')->padLeft(8,'abc');// $s3 = 'abcabcablorem'$s4 =u('lorem')->padRight(8,'abc');// $s4 = 'loremabcabcab'// AFTER$s1 =u('lorem')->padLeft('abc');// $s1 = 'abclorem'$s2 =u('lorem')->padRight('abc');// $s2 = 'loremabc'$s3 =u('lorem')->padLeft('abc',8);// $s3 = 'abcabcablorem'$s4 =u('lorem')->padRight('abc',8);// $s4 = 'loremabcabcab'

Copy link

MemberAuthor

nicolas-grekas commentedSep 11, 2019

when/where should developers use this.

All the time would be fine. e.g.$matches = $string->match('/some-regexp/) is a much more friendly API thanpreg_match('/some-regexp/', $string, $matches) (even more if you consider error handling).

More specifically, I've observed ppl randomly add anmb_ prefix to string functions and magically expect this to fix their encoding issues. This is way too complex right now, doing it correctly is hard. e.g. the Console component deals with utf-8 strings everywhere, it's not pretty. This component would help a lot there. Twig is another place where strings are heavily manipulated and where graphemes are missing actually. It would benefit from the component too.

Copy link

MemberAuthor

nicolas-grekas commentedSep 11, 2019

For your consideration, we could turn these 4 methods:
Into these 2 methods if we change the order of the arguments:

This would be totally unexpected to me. I've seen no other libraries have this API and I'm not sure it works actually.

is in your plans that the methods returning self return a new mutated reference keeping the original one intact? If not/yes, why?

Absolutely! That's critical design concern, not just an implementation detail :) I added a note about it in the desription. Thanks for asking.

azjezz reviewed

Sep 11, 2019

View reviewed changes

Copy link

Contributor

azjezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is great, i believe this would make it easier for developers to deal with string encoding, just few notes about method naming :)

src/Symfony/Component/String/AbstractString.php OutdatedShow resolvedHide resolved

src/Symfony/Component/String/AbstractString.phpShow resolvedHide resolved

Copy link

Member

javiereguiluz commentedSep 11, 2019

@nicolas-grekas thanks for the explanation. It's perfectly clear now!

Another question: some methods are called "left", "right" instead of "prefix/suffix" or "start/end". What happens when the text is Arabic/Persian/Hebrew and uses right-to-text direction? For example,trimRight() removes things at the end of English text ... but at the beginning of Arabic text?

Copy link

Contributor

leofeyer commentedSep 11, 2019

We have been usingtchwork/utf8 in Contao for years and it really is essential if you work with multiple languages beyond the ASCII character range. So +1 for adding this in Symfony and keep up the good work@nicolas-grekas. 👍

ro0NL reviewed

Sep 11, 2019

View reviewed changes

src/Symfony/Component/String/GraphemeString.php OutdatedShow resolvedHide resolved

fancyweb reviewed

Sep 11, 2019

View reviewed changes

src/Symfony/Component/String/Exception/ExceptionInterface.php OutdatedShow resolvedHide resolved

Copy link

Contributor

Devristo commentedSep 11, 2019

It looks amazing. I am curious how it would work together with the rest of the ecosystem. Lets say compatibility with doctrine, intl, symfony/validator, etc? I am sure it will take time before it trickles down to other components, but the future seems bright ;)

BackEndTea reviewed

Sep 11, 2019

View reviewed changes

src/Symfony/Component/String/Exception/ExceptionInterface.php OutdatedShow resolvedHide resolved

src/Symfony/Component/String/composer.jsonShow resolvedHide resolved

azjezz reviewed

Sep 11, 2019

View reviewed changes

Copy link

Contributor

azjezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

i suggest addingAbstractString::contains(string ...$needles): bool, where it returns true in case the string contains one of the needles.

if ($text->contains(...$blacklisted)) {echo'nope!';}

src/Symfony/Component/String/AbstractString.php OutdatedShow resolvedHide resolved

nicolas-grekas force-pushed thestring-component branch from5676e2a tof398091Compare

September 23, 2019 14:07

jakzal reviewed

Sep 23, 2019

View reviewed changes

src/Symfony/Component/String/GraphemeString.phpShow resolvedHide resolved

nicolas-grekas force-pushed thestring-component branch 2 times, most recently from892f621 to921b92fCompare

September 23, 2019 18:02

xabbuh reviewed

Sep 23, 2019

View reviewed changes

Copy link

Member

xabbuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am not finished reviewing this PR, but here are some ideas I got so far.

src/Symfony/Component/String/AbstractString.phpShow resolvedHide resolved

src/Symfony/Component/String/AbstractString.php OutdatedShow resolvedHide resolved

src/Symfony/Component/String/AbstractString.phpShow resolvedHide resolved

nicolas-grekas force-pushed thestring-component branch 2 times, most recently from8900c38 to3b6f46aCompare

September 23, 2019 21:25

xabbuh reviewed

Sep 23, 2019

View reviewed changes

src/Symfony/Component/String/AbstractString.phpShow resolvedHide resolved

src/Symfony/Component/String/AbstractString.php OutdatedShow resolvedHide resolved

tigitz suggested changes

Sep 24, 2019

View reviewed changes

src/Symfony/Component/String/GraphemeString.phpShow resolvedHide resolved

src/Symfony/Component/String/BinaryString.php

		return \strlen($this->string) - \strlen($suffix) === ($this->ignoreCase ? strripos($this->string, $suffix) : strrpos($this->string, $suffix));
		}

		public function equalsTo($string): bool

Copy link

Contributor

tigitzSep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	publicfunction equalsTo($string):bool
	/**
	* @param AbstractString\|string\|string[]$string
	*/
	publicfunction equalsTo($string):bool

Can be useful for autocompletion, static analysis and so on. Other methods could benefit from this doc.

Copy link

MemberAuthor

nicolas-grekasSep 25, 2019•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Any object that implements__toString() is allowed actually. That's whatstring means already to me. What's the relation with autocompletion?

Copy link

Contributor

tigitzSep 26, 2019•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I mean that when working with PhpStorm it seems that it can help you autocomplete the variable to pass in a function given what's available in the scope if the function is properly documented:

As you can see above$id is not suggested here as it doesn't respect the documented type hint

src/Symfony/Component/String/AbstractUnicodeString.phpShow resolvedHide resolved

nicolas-grekas force-pushed thestring-component branch from3b6f46a to278fd29Compare

September 25, 2019 07:38

xabbuh requested changes

Sep 25, 2019

View reviewed changes

src/Symfony/Component/String/AbstractString.php OutdatedShow resolvedHide resolved

src/Symfony/Component/String/Utf8String.phpShow resolvedHide resolved

src/Symfony/Component/String/Utf8String.php OutdatedShow resolvedHide resolved

nicolas-grekasand others added3 commits

September 25, 2019 16:38

[String] a new component for object-oriented strings management with …

012e92a

…an abstract unit system

[String] add tests

82a0095

[String] add more tests

dd8745a

nicolas-grekas force-pushed thestring-component branch from278fd29 todd8745aCompare

September 25, 2019 14:39

fabpot approved these changes

Sep 26, 2019

View reviewed changes

Copy link

Member

fabpot commentedSep 26, 2019

Thank you@nicolas-grekas.

fabpot added a commit that referenced this pull request

Sep 26, 2019

feature#33553[String] a new component for object-oriented strings m…

5d154fb

…anagement with an abstract unit system (nicolas-grekas, hhamon, gharlan)This PR was merged into the 5.0-dev branch.Discussion----------[String] a new component for object-oriented strings management with an abstract unit system| Q             | A| ------------- | ---| Branch?       | master| Bug fix?      | no| New feature?  | yes| Deprecations? | no| Tickets       | -| License       | MIT| Doc PR        | -This is a reboot of#22184 (thanks@hhamon for working on it) and a generalization of my previous work on the topic ([patchwork/utf8](https://github.com/tchwork/utf8)). Unlike existing libraries (including `patchwork/utf8`), this component provides a unified API for the 3 unit systems of strings: bytes, code points and grapheme clusters.The unified API is defined by the `AbstractString` class. It has 2 direct child classes: `BinaryString` and `AbstractUnicodeString`, itself extended by `Utf8String` and `GraphemeString`.All objects are immutable and provide clear edge-case semantics, using exceptions and/or (nullable) types!Two helper functions are provided to create such strings:```phpnew GraphemeString('foo') == u('foo'); // when dealing with Unicode, prefer grapheme unitsnew BinaryString('foo') == b('foo');````GraphemeString` is the most linguistic-friendly variant of them, which means it's the one ppl should use most of the time *when dealing with written text*.Future ideas: - improve tests - add more docblocks (only where they'd add value!) - consider adding more methods in the string API (`is*()?`, `*Encode()`?, etc.) - first class Emoji support - merge the Inflector component into this one - use `width()` to improve `truncate()` and `wordwrap()` - move method `slug()` to a dedicated locale-aware service class - propose your ideas (send PRs after merge)Out of (current) scope: - what [intl](https://php.net/intl) provides (collations, transliterations, confusables, segmentation, etc)Here is the unified API I'm proposing in this PR, borrowed from looking at many existing libraries, but also Java, Python, JavaScript and Go.```phpfunction __construct(string $string = '');static function unwrap(array $values): arraystatic function wrap(array $values): arrayfunction after($needle, bool $includeNeedle = false, int $offset = 0): self;function afterLast($needle, bool $includeNeedle = false, int $offset = 0): self;function append(string ...$suffix): self;function before($needle, bool $includeNeedle = false, int $offset = 0): self;function beforeLast($needle, bool $includeNeedle = false, int $offset = 0): self;function camel(): self;function chunk(int $length = 1): array;function collapseWhitespace(): selffunction endsWith($suffix): bool;function ensureEnd(string $suffix): self;function ensureStart(string $prefix): self;function equalsTo($string): bool;function folded(): self;function ignoreCase(): self;function indexOf($needle, int $offset = 0): ?int;function indexOfLast($needle, int $offset = 0): ?int;function isEmpty(): bool;function join(array $strings): self;function jsonSerialize(): string;function length(): int;function lower(): self;function match(string $pattern, int $flags = 0, int $offset = 0): array;function padBoth(int $length, string $padStr = ' '): self;function padEnd(int $length, string $padStr = ' '): self;function padStart(int $length, string $padStr = ' '): self;function prepend(string ...$prefix): self;function repeat(int $multiplier): self;function replace(string $from, string $to): self;function replaceMatches(string $fromPattern, $to): self;function slice(int $start = 0, int $length = null): self;function snake(): self;function splice(string $replacement, int $start = 0, int $length = null): self;function split(string $delimiter, int $limit = null, int $flags = null): array;function startsWith($prefix): bool;function title(bool $allWords = false): self;function toBinary(string $toEncoding = null): BinaryString;function toGrapheme(): GraphemeString;function toUtf8(): Utf8String;function trim(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;function trimEnd(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;function trimStart(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;function truncate(int $length, string $ellipsis = ''): self;function upper(): self;function width(bool $ignoreAnsiDecoration = true): int;function wordwrap(int $width = 75, string $break = "\n", bool $cut = false): self;function __clone();function __toString(): string;````AbstractUnicodeString` adds these:```phpstatic function fromCodePoints(int ...$codes): self;function ascii(array $rules = []): self;function codePoint(int $index = 0): ?int;function folded(bool $compat = true): parent;function normalize(int $form = self::NFC): self;function slug(string $separator = '-'): self;```and `BinaryString`:```phpstatic function fromRandom(int $length = 16): self;function byteCode(int $index = 0): ?int;function isUtf8(): bool;function toUtf8(string $fromEncoding = null): Utf8String;function toGrapheme(string $fromEncoding = null): GraphemeString;```Case insensitive operations are done with the `ignoreCase()` method.e.g. `b('abc')->ignoreCase()->indexOf('B')` will return `1`.For reference, CLDR transliterations (used in the `ascii()` method) are defined here:https://github.com/unicode-org/cldr/tree/master/common/transformsCommits-------dd8745a [String] add more tests82a0095 [String] add tests012e92a [String] a new component for object-oriented strings management with an abstract unit system

fabpot merged commitdd8745a intosymfony:master

Sep 26, 2019

fabpot mentioned this pull request

Sep 26, 2019

[String] a new component for object-oriented strings management with an…symfony/symfony-docs#12376

Closed

nicolas-grekas deleted the string-component branch

September 26, 2019 08:32

Copy link

MemberAuthor

nicolas-grekas commentedSep 26, 2019

Thank you everyone for the reviews, it's been invaluable!
Please send PRs for any follow-ups now!

barryvdh mentioned this pull request

Sep 26, 2019

[7.x] Use external ascii packagelaravel/framework#29947

Merged

tigitz mentioned this pull request

Sep 26, 2019

Notifier Component#33687

Merged

nicolas-grekas mentioned this pull request

Oct 2, 2019

[String] renamed core classes to Byte/CodePoint/UnicodeString#33816

Merged

nicolas-grekas added a commit that referenced this pull request

Oct 7, 2019

minor#33816[String] renamed core classes to Byte/CodePoint/UnicodeS…

2dba6e7

…tring (nicolas-grekas)This PR was merged into the 5.0-dev branch.Discussion----------[String] renamed core classes to Byte/CodePoint/UnicodeString| Q             | A| ------------- | ---| Branch?       | master| Bug fix?      | no| New feature?  | no| Deprecations? | no| Tickets       | -| License       | MIT| Doc PR        | -In#33553 there have been discussions about the naming of the classes - eg. "what's a grapheme", "why `Utf8String`", "lowercase on binary is weird", etc.What about these names? Would they get your votes *vs* the current ones?- `BinaryString` -> `ByteString`- `Utf8String` -> `CodePointString`- `GraphemeString` -> `UnicodeString`Commits-------63c105d [String] renamed core classes to Byte/CodePoint/UnicodeString

alexeyshockov mentioned this pull request

Oct 17, 2019

Is this repository unmaintained?danielstjules/Stringy#198

Open

fabpot mentioned this pull request

Nov 12, 2019

Release v5.0.0-BETA1#34339

Merged

Copy link

Contributor

MaPePeR commentedNov 21, 2019

no way to unignore case ? and no way to know whether the current object is ignoring case ?
->ignoreCase() applies only to the very next call in the fluent API chain. This should answer both your questions. SeeAbstractString::__clone()

I think this answer is incomplete, because there is nothing that stops someone from calling a function likefoo(u("abc")->ignoreCase()), which will then result in every operation in that function being case insensitive unless the function creates a clone of the argument.

This might also happen unintentionally when someone intends to do a lot of case insensitive operations, so they do$a = u("abc")->ignoreCase(); and later afoo($a) is added and strange things might happen.

You might say "Use the code in a wrong way and you will get wrong results", but i still think that this behavior is kind of odd.
I like the style of writing/reading it - that part is ok - but because we cannot overwrite the assignment operator, like in C++, to clear theignoreCase-Flag this can cause some weird side effects.

Maybe have theignoreCase() function return an object of a different class, that is not part of theAbstractString hierarchy and implements a subset of the string functions, so we can at least protect against this with type hints?