Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Refactor of a function
Sebastian Rapetti
Sebastian Rapetti

Posted on

     

Refactor of a function

Before starting, the reader considers this my personal opinion, how I see the things and not the only way resolve the problem.

I am working on a validator library and one of the various implemented filters do HTML escaping (replacing special characters with the correct HTML entities). For escape special char I need to know name or number of the html entity. Not all entities have a name then use the number is oblugatory.

For get the number of char, PHP offer a function called ord(). Unfortunately ord() does not work with UTF-8 multibyte characters and if multi-byte functions are not availables need to find another solution.

To be honest I don't like use multi-byte function on a filter because I cannot presume that all strings or character passed to the filter are multi-byte. In this case, defensive programming help us and we consider that all passed can be dangerous.

In comments of ord() function documentations I found a little snippet of code that seem the solution for my problem. It return the char number for all multi-byte chars.

Original code

This is the original version of the function (it's five years old code), prensent in PHP docs comments:

<?phpfunction ordutf8($string, &$offset){    $code = ord(substr($string, $offset, 1));    if ($code >= 128) {//otherwise 0xxxxxxx        if ($code < 224)            $bytesnumber = 2;//110xxxxx        else if ($code < 240)            $bytesnumber = 3;//1110xxxx        else if ($code < 248)            $bytesnumber = 4;//11110xxx        $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);        for ($i = 2; $i <= $bytesnumber; $i++) {            $offset ++;            $code2 = ord(substr($string, $offset, 1)) - 128;//10xxxxxx            $codetemp = $codetemp * 64 + $code2;        }        $code = $codetemp;    }    $offset += 1;    if ($offset >= strlen($string))        $offset = -1;    return $code;}
Enter fullscreen modeExit fullscreen mode

Code is thinked to manage a phrase, not only one character at time. Inside this function, there are two IF, five IF-ELSE and one FOR statements. Too complicated.

Step 1

We are in 2018 and PHP7 (7.2 current major version) provide a lot of new features, the one I love most is type checking!
Next, I need to manage one char at time, offset argument is no longer required.

<?phpfunction ordutf8(string $char) : int{    $code = ord(substr($char, 0, 1));    if ($code >= 128) {        if ($code < 224)            $bytesnumber = 2;        else if ($code < 240)            $bytesnumber = 3;        else if ($code < 248)            $bytesnumber = 4;        $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);        $offset = 0;        for ($i = 2; $i <= $bytesnumber; $i++) {            $offset ++;            $code2 = ord(substr($char, $offset, 1)) - 128;            $codetemp = $codetemp * 64 + $code2;        }        $code = $codetemp;    }    return $code;}
Enter fullscreen modeExit fullscreen mode

One of two IF statement was removed.

Step 2

This code piece contains two IF-ELSE written using ternary operators:

<?php$codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
Enter fullscreen modeExit fullscreen mode

We replace it with static values, provide directly the correct number without check it every time.

<?phpfunction ordutf8(string $char) : int{    $code = ord(substr($char, 0, 1));    if ($code >= 128) {        $count = 0;        if ($code < 224) {            $bytesnumber = 2;        } else if ($code < 240) {            $bytesnumber = 3;            $count = 32;        } else if ($code < 248) {            $bytesnumber = 4;            $count = 48;        }        $codetemp = $code - 192 - $count;        $offset = 0;        for ($i = 2; $i <= $bytesnumber; $i++) {            $offset ++;            $code2 = ord(substr($char, $offset, 1)) - 128;            $codetemp = $codetemp * 64 + $code2;        }        $code = $codetemp;    }    return $code;}
Enter fullscreen modeExit fullscreen mode

Two of five IF-ELSE statements was removed.

Step 3

We can refactor FOR statements for remove variables and reduce lines of code.

<?phpfunction ordutf8(string $char) : int{    $code = ord(substr($char, 0, 1));    if ($code >= 128) {        $count = 0;        if ($code < 224) {            $bytes = 2;        } else if ($code < 240) {            $bytes = 3;            $count = 32;        } else if ($code < 248) {            $bytes = 4;            $count = 48;        }        $temp = $code - 192 - $count;        for ($i = 1; $i < $bytes; $i++) {            $code = $temp = $temp * 64 + ord(substr($char, $i, 1)) - 128;        }    }    return $code;}
Enter fullscreen modeExit fullscreen mode

In details from this:

<?php$codetemp = $code - 192 - $count;$offset = 0;for ($i = 2; $i <= $bytesnumber; $i++) {    $offset ++;    $code2 = ord(substr($char, $offset, 1)) - 128;    $codetemp = $codetemp * 64 + $code2;}$code = $codetemp;
Enter fullscreen modeExit fullscreen mode

To this:

<?php$temp = $code - 192 - $count;for ($i = 1; $i < $bytes; $i++) {    $code = $temp = $temp * 64 + ord(substr($char, $i, 1)) - 128;}
Enter fullscreen modeExit fullscreen mode

Step 4

Now we need to try to remove a little bit of IF-ELSE statements. This step seem more complicated than previous, but let us reflect on the conditions used in the IF statements.

we now have four conditions:

  • >=128 2 bytes
  • <224 2 bytes
  • <240 3 bytes
  • <248 4 bytes

we can rewrite them using only three:

  • >127 2 bytes
  • >223 3 bytes
  • >239 4 bytes

without using IF-ELSE statements and overwriting the variables when to the next condition is true.

<?phpfunction ordutf8(string $char) : int{    $code = ord(substr($char, 0, 1));    if ($code > 127) {        $bytes = 2;         $count = 0;        if ($code > 223){            $bytes = 3;             $count = 32;        }        if ($code > 239){            $bytes = 4;             $count = 48;        }        $temp = $code - 192 - $count;        for ($i = 1; $i < $bytes; $i++) {            $code = $temp = $temp * 64 + ord(substr($char, $i, 1)) - 128;        }    }    return $code;}
Enter fullscreen modeExit fullscreen mode

Five of five IF-ELSE statements was removed. Remain 3 IF and one FOR statements.

Step 5

We now try to remove FOR statement and rewrite all as mathematics formula.

<?phpfunction ordutf8(string $char) : int{    $code = ord(substr($char, 0, 1));    if ($code > 239){        return ((/*($code - 240) * 64 + */ord(substr($char, 1, 1)) - 128) *                 64 + ord(substr($char, 2, 1)) - 128) *                 64 + ord(substr($char, 3, 1)) - 128;    }    if ($code > 223){        return (($code - 224) * 64 + ord(substr($char, 1, 1)) - 128)                * 64 + ord(substr($char, 2, 1)) - 128;    }    if ($code > 127) {        return ($code - 192) * 64 + ord(substr($char, 1, 1)) - 128;    }    return $code;}
Enter fullscreen modeExit fullscreen mode

Now only three IF statement remain.
UPDATE: Commented code in first IF statement return always zero. Useless code.

Conclusion

I hope this article can help you understand that you do not have to stop at the moment the code works. The code can always be improved, simplified and made more efficient.

Code improvement can be appreciated running little test onhttps://3v4l.org/TWFLX.

Code is also available on myGist

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Hello! I am a hobbyist programmer with years of experience with PHP and the Web apps. PHP fanatic.
  • Location
    Olbia
  • Joined

More fromSebastian Rapetti

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp