Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Paweł bbkr Pabian
Paweł bbkr Pabian

Posted on

     

Fun with UTF-8: browsing code points namespace

This trick is very useful if you need to find all code points with given name or property. Let's find all characters that have ogonek (tiny tail) from previous post.

$ raku -e 'for 1..1_112_064 {    next unless .uniname.contains( "WITH OGONEK" );    say .chr, " ", .uniname;}'Ą LATIN CAPITAL LETTER A WITH OGONEKą LATIN SMALL LETTER A WITH OGONEKĘ LATIN CAPITAL LETTER E WITH OGONEKę LATIN SMALL LETTER E WITH OGONEKĮ LATIN CAPITAL LETTER I WITH OGONEKį LATIN SMALL LETTER I WITH OGONEKŲ LATIN CAPITAL LETTER U WITH OGONEKų LATIN SMALL LETTER U WITH OGONEKǪ LATIN CAPITAL LETTER O WITH OGONEKǫ LATIN SMALL LETTER O WITH OGONEKǬ LATIN CAPITAL LETTER O WITH OGONEK AND MACRONǭ LATIN SMALL LETTER O WITH OGONEK AND MACRON
Enter fullscreen modeExit fullscreen mode

InRaku you can calluniname orchr methods on integer value directly to get code point name or character under this code point respectively. If you are not familiar with.method syntax - this is just a lazy way to call a method inside a block on whatever value your iteration is at the moment, without assigning it explicitly to named variable. If you want you can be more explicit about it like:for 1..1_112_064 -> $codepoint { next unless $codepoint.uniname... }.

Did you know that:

  • There are899 digits defined in Unicode?
$ raku -e '( 1 .. 1_112_064 ).grep( *.uniname.contains( "DIGIT" ) ).elems.say;'899
Enter fullscreen modeExit fullscreen mode
  • There are154 sentence terminals?
$ raku -e '( 1 .. 1_112_064 ).grep( *.uniprop( "Sentence_Terminal" ) ).elems.say;'154
Enter fullscreen modeExit fullscreen mode

(Unicode properties will be explained in next post)

  • Within those 154 sentence terminals there are22 question marks?
$ raku -e 'for 1 .. 1_112_064 { next unless .uniname.ends-with( "QUESTION MARK" ); say .chr, " ", .uniname; }'? QUESTION MARK¿ INVERTED QUESTION MARK; GREEK QUESTION MARK՞ ARMENIAN QUESTION MARK؟ ARABIC QUESTION MARK፧ ETHIOPIC QUESTION MARK᥅ LIMBU QUESTION MARK...
Enter fullscreen modeExit fullscreen mode

So, what was the funniest thing you found in Unicode?¿⸮

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Raku / Rust / Perl programmer. Databases tamer. And optimization maniac.
  • Location
    Gdańsk
  • Education
    Gdańsk University of Technology
  • Work
    GetResponse
  • Joined

More fromPaweł bbkr Pabian

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp