NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commitcf98467

committed

Improve behavior of to_timestamp()/to_date() functions

to_timestamp()/to_date() functions were introduced mainly for Oraclecompatibility, and became very popular among PostgreSQL users. However, somebehavior of to_timestamp()/to_date() functions are both incompatible with Oracleand confusing for our users. This behavior is related to handling of spaces andseparators in non FX (fixed format) mode. This commit reworks this behaviormaking less confusing, better documented and more compatible with Oracle.Nevertheless, there are still following incompatibilities with Oracle.1) We don't insist that there are no format string patterns unmatched to input string.2) In FX mode we don't insist space and separators in format string to exactly match input string.3) When format string patterns are divided by mix of spaces and separators, we don't distinguish them, while Oracle takes into account only last group of spaces/separators.Discussion:https://postgr.es/m/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.comAuthor: Artur Zakirov, Alexander Korotkov, Liudmila MantrovaReview: Amul Sul, Robert Haas, Tom Lane, Dmitry Dolgov, David G. Johnston

1 parent5f08acc commitcf98467Copy full SHA for cf98467

File tree

4 files changed

+280

-26

lines changed

doc/src/sgml
- func.sgml
src
- backend/utils/adt
  - formatting.c
- test/regress
  - expected
    - horology.out
  - sql
    - horology.sql

4 files changed

+280

-26

lines changed

`‎doc/src/sgml/func.sgml‎`

Lines changed: 58 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -6262,16 +6262,57 @@ SELECT regexp_match('abc01234xyz', '(?:(.?)(\d+)(.)){1,1}');`
`6262`	`6262`	`<listitem>`
`6263`	`6263`	`<para>`
`6264`	`6264`	`<function>to_timestamp</function> and <function>to_date</function>`
`6265`		`- skip multiple blank spaces in the input string unless the`
`6266`		`- <literal>FX</literal> option is used. For example,`
`6267`		`- <literal>to_timestamp('2000    JUN', 'YYYY MON')</literal> works, but`
	`6265`	`+ skip multiple blank spaces at the beginning of the input string and`
	`6266`	`+ around date and time values unless the <literal>FX</literal> option is used. For example,`
	`6267`	`+ <literal>to_timestamp(' 2000    JUN', 'YYYY MON')</literal> and`
	`6268`	`+ <literal>to_timestamp('2000 - JUN', 'YYYY-MON')</literal> work, but`
`6268`	`6269`	`<literal>to_timestamp('2000    JUN', 'FXYYYY MON')</literal> returns an error`
`6269`		`- because <function>to_timestamp</function> expectsone space only.`
	`6270`	`+ because <function>to_timestamp</function> expectsa single space only.`
`6270`	`6271`	`<literal>FX</literal> must be specified as the first item in`
`6271`	`6272`	`the template.`
`6272`	`6273`	`</para>`
`6273`	`6274`	`</listitem>`
`6274`	`6275`
	`6276`	`+ <listitem>`
	`6277`	`+ <para>`
	`6278`	`+ A separator (a space or a non-letter/non-digit character) in the template string of`
	`6279`	`+ <function>to_timestamp</function> and <function>to_date</function>`
	`6280`	`+ matches any single separator in the input string or is skipped,`
	`6281`	`+ unless the <literal>FX</literal> option is used.`
	`6282`	`+ For example, <literal>to_timestamp('2000JUN', 'YYYY///MON')</literal> and`
	`6283`	`+ <literal>to_timestamp('2000/JUN', 'YYYY MON')</literal> work, but`
	`6284`	`+ <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>`
	`6285`	`+ returns an error because the number of separators in the input string`
	`6286`	`+ exceeds the number of separators in the template.`
	`6287`	`+ </para>`
	`6288`	`+ <para>`
	`6289`	`+ If <literal>FX</literal> is specified, separator in template string`
	`6290`	`+ matches to exactly one character in input string. Notice we don't insist`
	`6291`	`+ input string character to be the same as template string separator.`
	`6292`	`+ For example, <literal>to_timestamp('2000/JUN', 'FXYYYY MON')</literal>`
	`6293`	`+ works, but <literal>to_timestamp('2000/JUN', 'FXYYYY  MON')</literal>`
	`6294`	`+ returns an error because a space second template string space consumed`
	`6295`	`+ letter <literal>J</literal> from the input string.`
	`6296`	`+ </para>`
	`6297`	`+ </listitem>`
	`6298`	`+`
	`6299`	`+ <listitem>`
	`6300`	`+ <para>`
	`6301`	`+ <literal>TZH</literal> template pattern can match a signed number.`
	`6302`	`+ Without the <literal>FX</literal> option, it may lead to ambiguity in`
	`6303`	`+ interpretation of the minus sign, which can also be interpreted as a separator.`
	`6304`	`+ This ambiguity is resolved as follows. If the number of separators before`
	`6305`	`+ <literal>TZH</literal> in the template string is less than the number of`
	`6306`	`+ separators before the minus sign in the input string, the minus sign`
	`6307`	`+ is interpreted as part of <literal>TZH</literal>.`
	`6308`	`+ Otherwise, the minus sign is considered to be a separator between values.`
	`6309`	`+ For example, <literal>to_timestamp('2000 -10', 'YYYY TZH')</literal> matches`
	`6310`	`+ <literal>-10</literal> to <literal>TZH</literal>, but`
	`6311`	`+ <literal>to_timestamp('2000 -10', 'YYYY  TZH')</literal>`
	`6312`	`+ matches <literal>10</literal> to <literal>TZH</literal>.`
	`6313`	`+ </para>`
	`6314`	`+ </listitem>`
	`6315`	`+`
`6275`	`6316`	`<listitem>`
`6276`	`6317`	`<para>`
`6277`	`6318`	`Ordinary text is allowed in <function>to_char</function>`
`@@ -6287,6 +6328,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.?)(\d+)(.)){1,1}');`
`6287`	`6328`	`string; for example <literal>"XX"</literal> skips two input characters`
`6288`	`6329`	`(whether or not they are <literal>XX</literal>).`
`6289`	`6330`	`</para>`
	`6331`	`+ <tip>`
	`6332`	`+ <para>`
	`6333`	`+ Prior to <productname>PostgreSQL</productname> 12, it was possible to`
	`6334`	`+ skip arbitrary text in the input string using non-letter or non-digit`
	`6335`	`+ characters. For example,`
	`6336`	`+ <literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> used to`
	`6337`	`+ work. Now you can only use letter characters for this purpose. For example,`
	`6338`	`+ <literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> and`
	`6339`	`+ <literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal>`
	`6340`	`+ skip <literal>y</literal>, <literal>m</literal>, and`
	`6341`	`+ <literal>d</literal>.`
	`6342`	`+ </para>`
	`6343`	`+ </tip>`
`6290`	`6344`	`</listitem>`
`6291`	`6345`
`6292`	`6346`	`<listitem>`

`‎src/backend/utils/adt/formatting.c‎`

Lines changed: 98 additions & 14 deletions

Original file line number	Diff line number	Diff line change
`@@ -165,6 +165,8 @@ typedef struct`
`165`	`165`	`#defineNODE_TYPE_END1`
`166`	`166`	`#defineNODE_TYPE_ACTION2`
`167`	`167`	`#defineNODE_TYPE_CHAR3`
	`168`	`+#defineNODE_TYPE_SEPARATOR4`
	`169`	`+#defineNODE_TYPE_SPACE5`
`168`	`170`
`169`	`171`	`#defineSUFFTYPE_PREFIX1`
`170`	`172`	`#defineSUFFTYPE_POSTFIX2`
`@@ -955,6 +957,7 @@ typedef struct NUMProc`
`955`	`957`	`staticconstKeyWordindex_seq_search(constcharstr,constKeyWord*kw,`
`956`	`958`	`constint*index);`
`957`	`959`	`staticconstKeySuffixsuff_search(constcharstr,constKeySuffix*suf,inttype);`
	`960`	`+staticboolis_separator_char(constchar*str);`
`958`	`961`	`staticvoidNUMDesc_prepare(NUMDescnum,FormatNoden);`
`959`	`962`	`staticvoidparse_format(FormatNodenode,constcharstr,constKeyWord*kw,`
`960`	`963`	`constKeySuffixsuf,constintindex,intver,NUMDesc*Num);`
`@@ -1044,6 +1047,16 @@ suff_search(const char str, const KeySuffix suf, int type)`
`1044`	`1047`	`returnNULL;`
`1045`	`1048`	`}`
`1046`	`1049`
	`1050`	`+staticbool`
	`1051`	`+is_separator_char(constchar*str)`
	`1052`	`+{`
	`1053`	`+/* ASCII printable character, but not letter or digit */`
	`1054`	`+return (str>0x20&&str<0x7F&&`
	`1055`	`+!(str >='A'&&str <='Z')&&`
	`1056`	`+!(str >='a'&&str <='z')&&`
	`1057`	`+!(str >='0'&&str <='9'));`
	`1058`	`+}`
	`1059`	`+`
`1047`	`1060`	`/* ----------`
`1048`	`1061`	`* Prepare NUMDesc (number description struct) via FormatNode struct`
`1049`	`1062`	`* ----------`
`@@ -1319,7 +1332,14 @@ parse_format(FormatNode node, const char str, const KeyWord *kw,`
`1319`	`1332`	`if (str=='\\'&&(str+1)=='"')`
`1320`	`1333`	`str++;`
`1321`	`1334`	`chlen=pg_mblen(str);`
`1322`		`-n->type=NODE_TYPE_CHAR;`
	`1335`	`+`
	`1336`	`+if (ver==DCH_TYPE&&is_separator_char(str))`
	`1337`	`+n->type=NODE_TYPE_SEPARATOR;`
	`1338`	`+elseif (isspace((unsignedchar)*str))`
	`1339`	`+n->type=NODE_TYPE_SPACE;`
	`1340`	`+else`
	`1341`	`+n->type=NODE_TYPE_CHAR;`
	`1342`	`+`
`1323`	`1343`	`memcpy(n->character,str,chlen);`
`1324`	`1344`	`n->character[chlen]='\0';`
`1325`	`1345`	`n->key=NULL;`
`@@ -2987,27 +3007,66 @@ DCH_from_char(FormatNode node, char in, TmFromChar *out)`
`2987`	`3007`	`intlen,`
`2988`	`3008`	`value;`
`2989`	`3009`	`boolfx_mode= false;`
	`3010`	`+/* number of extra skipped characters (more than given in format string) */`
	`3011`	`+intextra_skip=0;`
`2990`	`3012`
`2991`	`3013`	`for (n=node,s=in;n->type!=NODE_TYPE_END&&*s!='\0';n++)`
`2992`	`3014`	`{`
`2993`		`-if (n->type!=NODE_TYPE_ACTION)`
	`3015`	`+/*`
	`3016`	`+ * Ignore spaces at the beginning of the string and before fields when`
	`3017`	`+ * not in FX (fixed width) mode.`
	`3018`	`+ */`
	`3019`	`+if (!fx_mode&& (n->type!=NODE_TYPE_ACTION\|\|n->key->id!=DCH_FX)&&`
	`3020`	`+(n->type==NODE_TYPE_ACTION\|\|n==node))`
	`3021`	`+{`
	`3022`	`+while (s!='\0'&&isspace((unsignedchar)s))`
	`3023`	`+{`
	`3024`	`+s++;`
	`3025`	`+extra_skip++;`
	`3026`	`+}`
	`3027`	`+}`
	`3028`	`+`
	`3029`	`+if (n->type==NODE_TYPE_SPACE\|\|n->type==NODE_TYPE_SEPARATOR)`
	`3030`	`+{`
	`3031`	`+if (!fx_mode)`
	`3032`	`+{`
	`3033`	`+/*`
	`3034`	`+ * In non FX (fixed format) mode one format string space or`
	`3035`	`+ * separator match to one space or separator in input string.`
	`3036`	`+ * Or match nothing if there is no space or separator in`
	`3037`	`+ * the current position of input string.`
	`3038`	`+ */`
	`3039`	`+extra_skip--;`
	`3040`	`+if (isspace((unsignedchar)*s)\|\|is_separator_char(s))`
	`3041`	`+{`
	`3042`	`+s++;`
	`3043`	`+extra_skip++;`
	`3044`	`+}`
	`3045`	`+}`
	`3046`	`+else`
	`3047`	`+{`
	`3048`	`+/*`
	`3049`	`+ * In FX mode, on format string space or separator we consume`
	`3050`	`+ * exactly one character from input string. Notice we don't`
	`3051`	`+ * insist that the consumed character match the format's`
	`3052`	`+ * character.`
	`3053`	`+ */`
	`3054`	`+s+=pg_mblen(s);`
	`3055`	`+}`
	`3056`	`+continue;`
	`3057`	`+}`
	`3058`	`+elseif (n->type!=NODE_TYPE_ACTION)`
`2994`	`3059`	`{`
`2995`	`3060`	`/*`
`2996`		`- * Separator, so consume one character from input string. Notice`
`2997`		`- * we don't insist that the consumed character match the format's`
`2998`		`- * character.`
	`3061`	`+ * Text character, so consume one character from input string.`
	`3062`	`+ * Notice we don't insist that the consumed character match the`
	`3063`	`+ * format's character.`
	`3064`	`+ * Text field ignores FX mode.`
`2999`	`3065`	`*/`
`3000`	`3066`	`s+=pg_mblen(s);`
`3001`	`3067`	`continue;`
`3002`	`3068`	`}`
`3003`	`3069`
`3004`		`-/* Ignore spaces before fields when not in FX (fixed width) mode */`
`3005`		`-if (!fx_mode&&n->key->id!=DCH_FX)`
`3006`		`-{`
`3007`		`-while (s!='\0'&&isspace((unsignedchar)s))`
`3008`		`-s++;`
`3009`		`-}`
`3010`		`-`
`3011`	`3070`	`from_char_set_mode(out,n->key->date_mode);`
`3012`	`3071`
`3013`	`3072`	`switch (n->key->id)`
`@@ -3086,10 +3145,24 @@ DCH_from_char(FormatNode node, char in, TmFromChar *out)`
`3086`	`3145`	`n->key->name)));`
`3087`	`3146`	`break;`
`3088`	`3147`	`caseDCH_TZH:`
`3089`		`-out->tzsign=*s=='-' ?-1 :+1;`
`3090`		`-`
	`3148`	`+/*`
	`3149`	`+ * Value of TZH might be negative. And the issue is that we`
	`3150`	`+ * might swallow minus sign as the separator. So, if we have`
	`3151`	`+ * skipped more characters than specified in the format string,`
	`3152`	`+ * then we consider prepending last skipped minus to TZH.`
	`3153`	`+ */`
`3091`	`3154`	`if (s=='+'\|\|s=='-'\|\|*s==' ')`
	`3155`	`+{`
	`3156`	`+out->tzsign=*s=='-' ?-1 :+1;`
`3092`	`3157`	`s++;`
	`3158`	`+}`
	`3159`	`+else`
	`3160`	`+{`
	`3161`	`+if (extra_skip>0&&*(s-1)=='-')`
	`3162`	`+out->tzsign=-1;`
	`3163`	`+else`
	`3164`	`+out->tzsign=+1;`
	`3165`	`+}`
`3093`	`3166`
`3094`	`3167`	`from_char_parse_int_len(&out->tzh,&s,2,n);`
`3095`	`3168`	`break;`
`@@ -3261,6 +3334,17 @@ DCH_from_char(FormatNode node, char in, TmFromChar *out)`
`3261`	`3334`	`SKIP_THth(s,n->suffix);`
`3262`	`3335`	`break;`
`3263`	`3336`	`}`
	`3337`	`+`
	`3338`	`+/* Ignore all spaces after fields */`
	`3339`	`+if (!fx_mode)`
	`3340`	`+{`
	`3341`	`+extra_skip=0;`
	`3342`	`+while (s!='\0'&&isspace((unsignedchar)s))`
	`3343`	`+{`
	`3344`	`+s++;`
	`3345`	`+extra_skip++;`
	`3346`	`+}`
	`3347`	`+}`
`3264`	`3348`	`}`
`3265`	`3349`	`}`
`3266`	`3350`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitcf98467

File tree

4 files changed

4 files changed

`‎doc/src/sgml/func.sgml‎`

`‎src/backend/utils/adt/formatting.c‎`

0 commit comments