Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitcf98467

Browse files
committed
Improve behavior of to_timestamp()/to_date() functions
to_timestamp()/to_date() functions were introduced mainly for Oraclecompatibility, and became very popular among PostgreSQL users. However, somebehavior of to_timestamp()/to_date() functions are both incompatible with Oracleand confusing for our users. This behavior is related to handling of spaces andseparators in non FX (fixed format) mode. This commit reworks this behaviormaking less confusing, better documented and more compatible with Oracle.Nevertheless, there are still following incompatibilities with Oracle.1) We don't insist that there are no format string patterns unmatched to input string.2) In FX mode we don't insist space and separators in format string to exactly match input string.3) When format string patterns are divided by mix of spaces and separators, we don't distinguish them, while Oracle takes into account only last group of spaces/separators.Discussion:https://postgr.es/m/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.comAuthor: Artur Zakirov, Alexander Korotkov, Liudmila MantrovaReview: Amul Sul, Robert Haas, Tom Lane, Dmitry Dolgov, David G. Johnston
1 parent5f08acc commitcf98467

File tree

4 files changed

+280
-26
lines changed

4 files changed

+280
-26
lines changed

‎doc/src/sgml/func.sgml

Lines changed: 58 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6262,16 +6262,57 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
62626262
<listitem>
62636263
<para>
62646264
<function>to_timestamp</function> and <function>to_date</function>
6265-
skip multiple blank spaces in the input string unless the
6266-
<literal>FX</literal> option is used. For example,
6267-
<literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
6265+
skip multiple blank spaces at the beginning of the input string and
6266+
around date and time values unless the <literal>FX</literal> option is used. For example,
6267+
<literal>to_timestamp('&nbsp;2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
6268+
<literal>to_timestamp('2000 - JUN', 'YYYY-MON')</literal> work, but
62686269
<literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
6269-
because <function>to_timestamp</function> expectsone space only.
6270+
because <function>to_timestamp</function> expectsa single space only.
62706271
<literal>FX</literal> must be specified as the first item in
62716272
the template.
62726273
</para>
62736274
</listitem>
62746275

6276+
<listitem>
6277+
<para>
6278+
A separator (a space or a non-letter/non-digit character) in the template string of
6279+
<function>to_timestamp</function> and <function>to_date</function>
6280+
matches any single separator in the input string or is skipped,
6281+
unless the <literal>FX</literal> option is used.
6282+
For example, <literal>to_timestamp('2000JUN', 'YYYY///MON')</literal> and
6283+
<literal>to_timestamp('2000/JUN', 'YYYY MON')</literal> work, but
6284+
<literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
6285+
returns an error because the number of separators in the input string
6286+
exceeds the number of separators in the template.
6287+
</para>
6288+
<para>
6289+
If <literal>FX</literal> is specified, separator in template string
6290+
matches to exactly one character in input string. Notice we don't insist
6291+
input string character to be the same as template string separator.
6292+
For example, <literal>to_timestamp('2000/JUN', 'FXYYYY MON')</literal>
6293+
works, but <literal>to_timestamp('2000/JUN', 'FXYYYY&nbsp;&nbsp;MON')</literal>
6294+
returns an error because a space second template string space consumed
6295+
letter <literal>J</literal> from the input string.
6296+
</para>
6297+
</listitem>
6298+
6299+
<listitem>
6300+
<para>
6301+
<literal>TZH</literal> template pattern can match a signed number.
6302+
Without the <literal>FX</literal> option, it may lead to ambiguity in
6303+
interpretation of the minus sign, which can also be interpreted as a separator.
6304+
This ambiguity is resolved as follows. If the number of separators before
6305+
<literal>TZH</literal> in the template string is less than the number of
6306+
separators before the minus sign in the input string, the minus sign
6307+
is interpreted as part of <literal>TZH</literal>.
6308+
Otherwise, the minus sign is considered to be a separator between values.
6309+
For example, <literal>to_timestamp('2000 -10', 'YYYY TZH')</literal> matches
6310+
<literal>-10</literal> to <literal>TZH</literal>, but
6311+
<literal>to_timestamp('2000 -10', 'YYYY&nbsp;&nbsp;TZH')</literal>
6312+
matches <literal>10</literal> to <literal>TZH</literal>.
6313+
</para>
6314+
</listitem>
6315+
62756316
<listitem>
62766317
<para>
62776318
Ordinary text is allowed in <function>to_char</function>
@@ -6287,6 +6328,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
62876328
string; for example <literal>"XX"</literal> skips two input characters
62886329
(whether or not they are <literal>XX</literal>).
62896330
</para>
6331+
<tip>
6332+
<para>
6333+
Prior to <productname>PostgreSQL</productname> 12, it was possible to
6334+
skip arbitrary text in the input string using non-letter or non-digit
6335+
characters. For example,
6336+
<literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> used to
6337+
work. Now you can only use letter characters for this purpose. For example,
6338+
<literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> and
6339+
<literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal>
6340+
skip <literal>y</literal>, <literal>m</literal>, and
6341+
<literal>d</literal>.
6342+
</para>
6343+
</tip>
62906344
</listitem>
62916345

62926346
<listitem>

‎src/backend/utils/adt/formatting.c

Lines changed: 98 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,8 @@ typedef struct
165165
#defineNODE_TYPE_END1
166166
#defineNODE_TYPE_ACTION2
167167
#defineNODE_TYPE_CHAR3
168+
#defineNODE_TYPE_SEPARATOR4
169+
#defineNODE_TYPE_SPACE5
168170

169171
#defineSUFFTYPE_PREFIX1
170172
#defineSUFFTYPE_POSTFIX2
@@ -955,6 +957,7 @@ typedef struct NUMProc
955957
staticconstKeyWord*index_seq_search(constchar*str,constKeyWord*kw,
956958
constint*index);
957959
staticconstKeySuffix*suff_search(constchar*str,constKeySuffix*suf,inttype);
960+
staticboolis_separator_char(constchar*str);
958961
staticvoidNUMDesc_prepare(NUMDesc*num,FormatNode*n);
959962
staticvoidparse_format(FormatNode*node,constchar*str,constKeyWord*kw,
960963
constKeySuffix*suf,constint*index,intver,NUMDesc*Num);
@@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
10441047
returnNULL;
10451048
}
10461049

1050+
staticbool
1051+
is_separator_char(constchar*str)
1052+
{
1053+
/* ASCII printable character, but not letter or digit */
1054+
return (*str>0x20&&*str<0x7F&&
1055+
!(*str >='A'&&*str <='Z')&&
1056+
!(*str >='a'&&*str <='z')&&
1057+
!(*str >='0'&&*str <='9'));
1058+
}
1059+
10471060
/* ----------
10481061
* Prepare NUMDesc (number description struct) via FormatNode struct
10491062
* ----------
@@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
13191332
if (*str=='\\'&&*(str+1)=='"')
13201333
str++;
13211334
chlen=pg_mblen(str);
1322-
n->type=NODE_TYPE_CHAR;
1335+
1336+
if (ver==DCH_TYPE&&is_separator_char(str))
1337+
n->type=NODE_TYPE_SEPARATOR;
1338+
elseif (isspace((unsignedchar)*str))
1339+
n->type=NODE_TYPE_SPACE;
1340+
else
1341+
n->type=NODE_TYPE_CHAR;
1342+
13231343
memcpy(n->character,str,chlen);
13241344
n->character[chlen]='\0';
13251345
n->key=NULL;
@@ -2987,27 +3007,66 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
29873007
intlen,
29883008
value;
29893009
boolfx_mode= false;
3010+
/* number of extra skipped characters (more than given in format string) */
3011+
intextra_skip=0;
29903012

29913013
for (n=node,s=in;n->type!=NODE_TYPE_END&&*s!='\0';n++)
29923014
{
2993-
if (n->type!=NODE_TYPE_ACTION)
3015+
/*
3016+
* Ignore spaces at the beginning of the string and before fields when
3017+
* not in FX (fixed width) mode.
3018+
*/
3019+
if (!fx_mode&& (n->type!=NODE_TYPE_ACTION||n->key->id!=DCH_FX)&&
3020+
(n->type==NODE_TYPE_ACTION||n==node))
3021+
{
3022+
while (*s!='\0'&&isspace((unsignedchar)*s))
3023+
{
3024+
s++;
3025+
extra_skip++;
3026+
}
3027+
}
3028+
3029+
if (n->type==NODE_TYPE_SPACE||n->type==NODE_TYPE_SEPARATOR)
3030+
{
3031+
if (!fx_mode)
3032+
{
3033+
/*
3034+
* In non FX (fixed format) mode one format string space or
3035+
* separator match to one space or separator in input string.
3036+
* Or match nothing if there is no space or separator in
3037+
* the current position of input string.
3038+
*/
3039+
extra_skip--;
3040+
if (isspace((unsignedchar)*s)||is_separator_char(s))
3041+
{
3042+
s++;
3043+
extra_skip++;
3044+
}
3045+
}
3046+
else
3047+
{
3048+
/*
3049+
* In FX mode, on format string space or separator we consume
3050+
* exactly one character from input string. Notice we don't
3051+
* insist that the consumed character match the format's
3052+
* character.
3053+
*/
3054+
s+=pg_mblen(s);
3055+
}
3056+
continue;
3057+
}
3058+
elseif (n->type!=NODE_TYPE_ACTION)
29943059
{
29953060
/*
2996-
* Separator, so consume one character from input string. Notice
2997-
* we don't insist that the consumed character match the format's
2998-
* character.
3061+
* Text character, so consume one character from input string.
3062+
* Notice we don't insist that the consumed character match the
3063+
* format's character.
3064+
* Text field ignores FX mode.
29993065
*/
30003066
s+=pg_mblen(s);
30013067
continue;
30023068
}
30033069

3004-
/* Ignore spaces before fields when not in FX (fixed width) mode */
3005-
if (!fx_mode&&n->key->id!=DCH_FX)
3006-
{
3007-
while (*s!='\0'&&isspace((unsignedchar)*s))
3008-
s++;
3009-
}
3010-
30113070
from_char_set_mode(out,n->key->date_mode);
30123071

30133072
switch (n->key->id)
@@ -3086,10 +3145,24 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
30863145
n->key->name)));
30873146
break;
30883147
caseDCH_TZH:
3089-
out->tzsign=*s=='-' ?-1 :+1;
3090-
3148+
/*
3149+
* Value of TZH might be negative. And the issue is that we
3150+
* might swallow minus sign as the separator. So, if we have
3151+
* skipped more characters than specified in the format string,
3152+
* then we consider prepending last skipped minus to TZH.
3153+
*/
30913154
if (*s=='+'||*s=='-'||*s==' ')
3155+
{
3156+
out->tzsign=*s=='-' ?-1 :+1;
30923157
s++;
3158+
}
3159+
else
3160+
{
3161+
if (extra_skip>0&&*(s-1)=='-')
3162+
out->tzsign=-1;
3163+
else
3164+
out->tzsign=+1;
3165+
}
30933166

30943167
from_char_parse_int_len(&out->tzh,&s,2,n);
30953168
break;
@@ -3261,6 +3334,17 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
32613334
SKIP_THth(s,n->suffix);
32623335
break;
32633336
}
3337+
3338+
/* Ignore all spaces after fields */
3339+
if (!fx_mode)
3340+
{
3341+
extra_skip=0;
3342+
while (*s!='\0'&&isspace((unsignedchar)*s))
3343+
{
3344+
s++;
3345+
extra_skip++;
3346+
}
3347+
}
32643348
}
32653349
}
32663350

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp