Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitd3f4532

Browse files
committed
Improve code clarity in epilogue of UTF-8 verification fast path
The previous coding was correct, but the style and commentary were a bitvague about which operations had to happen, in what circumstances, andin what order. Rearrange so that the epilogue does nothing in the DFA ENDstate. That allows turning some conditional statements in the backtrackinglogic into asserts. With that, we can be more explicit about needingto backtrack at least one byte in non-END states to ensure checking thecurrent byte sequence in the slow path. No change to the regression tests,since they should be able catch deficiencies here already.In passing, improve the comments around DFA states where the firstcontinuation byte has a restricted range.
1 parent9007d4e commitd3f4532

File tree

1 file changed

+25
-25
lines changed

1 file changed

+25
-25
lines changed

‎src/common/wchar.c

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1807,12 +1807,11 @@ pg_utf8_verifychar(const unsigned char *s, int len)
18071807
#defineCS1 16
18081808
#defineCS2 1
18091809
#defineCS3 5
1810-
/* Leading byte was E0/ED, expect 1 more continuation byte */
1811-
#defineP3A 6
1812-
#defineP3B 20
1813-
/* Leading byte was F0/F4, expect 2 more continuation bytes */
1814-
#defineP4A 25
1815-
#defineP4B 30
1810+
/* Partial states, where the first continuation byte has a restricted range */
1811+
#defineP3A 6/* Lead was E0, check for 3-byte overlong */
1812+
#defineP3B 20/* Lead was ED, check for surrogate */
1813+
#defineP4A 25/* Lead was F0, check for 4-byte overlong */
1814+
#defineP4B 30/* Lead was F4, check for too-large */
18161815
/* Begin and End are the same state */
18171816
#defineEND BGN
18181817

@@ -1941,31 +1940,32 @@ pg_utf8_verifystr(const unsigned char *s, int len)
19411940
len-=STRIDE_LENGTH;
19421941
}
19431942

1944-
/*
1945-
* The error state persists, so we only need to check for it here. In
1946-
* case of error we start over from the beginning with the slow path
1947-
* so we can count the valid bytes.
1948-
*/
1943+
/* The error state persists, so we only need to check for it here. */
19491944
if (state==ERR)
19501945
{
1946+
/*
1947+
* Start over from the beginning with the slow path so we can
1948+
* count the valid bytes.
1949+
*/
19511950
len=orig_len;
19521951
s=start;
19531952
}
1954-
1955-
/*
1956-
* We treat all other states as success, but it's possible the fast
1957-
* path exited in the middle of a multibyte sequence, since that
1958-
* wouldn't have caused an error. Before checking the remaining bytes,
1959-
* walk backwards to find the last byte that could have been the start
1960-
* of a valid sequence.
1961-
*/
1962-
while (s>start)
1953+
elseif (state!=END)
19631954
{
1964-
s--;
1965-
len++;
1966-
1967-
if (!IS_HIGHBIT_SET(*s)||pg_utf_mblen(s)>1)
1968-
break;
1955+
/*
1956+
* The fast path exited in the middle of a multibyte sequence.
1957+
* Walk backwards to find the leading byte so that the slow path
1958+
* can resume checking from there. We must always backtrack at
1959+
* least one byte, since the current byte could be e.g. an ASCII
1960+
* byte after a 2-byte lead, which is invalid.
1961+
*/
1962+
do
1963+
{
1964+
Assert(s>start);
1965+
s--;
1966+
len++;
1967+
Assert(IS_HIGHBIT_SET(*s));
1968+
}while (pg_utf_mblen(s) <=1);
19691969
}
19701970
}
19711971

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp