NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit8870917

committed

Apply auto-vectorization to the inner loop of numeric multiplication.

Compile numeric.c with -ftree-vectorize where available, and adjustthe innermost loop of mul_var() so that it is amenable to beingauto-vectorized. (Mainly, that involves making it process the arraysleft-to-right not right-to-left.)Applying -ftree-vectorize actually makes numeric.o smaller, at leastwith my compiler (gcc 8.3.1 on x86_64), and it's a little faster too.Independently of that, fixing the inner loop to be vectorizable alsomakes things a bit faster. But doing both is a huge win formultiplications with lots of digits. For me, the numeric regressiontest is the same speed to within measurement noise, but numeric_bigis a full 45% faster.We also looked into applying -funroll-loops, but that makes numeric.obloat quite a bit, and the additional speed improvement is verymarginal.Amit Khandekar, reviewed and edited a little by meDiscussion:https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com

1 parent695de5d commit8870917Copy full SHA for 8870917

File tree

2 files changed

+15

-3

lines changed

src/backend/utils/adt
- Makefile
- numeric.c

2 files changed

+15

-3

lines changed

`‎src/backend/utils/adt/Makefile‎`

Lines changed: 3 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -125,6 +125,9 @@ clean distclean maintainer-clean:`
`125`	`125`
`126`	`126`	`like.o: like.c like_match.c`
`127`	`127`
	`128`	`+# Some code in numeric.c benefits from auto-vectorization`
	`129`	`+numeric.o: CFLAGS += ${CFLAGS_VECTORIZE}`
	`130`	`+`
`128`	`131`	`varlena.o: varlena.c levenshtein.c`
`129`	`132`
`130`	`133`	`include$(top_srcdir)/src/backend/common.mk`

`‎src/backend/utils/adt/numeric.c‎`

Lines changed: 12 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -8191,6 +8191,7 @@ mul_var(const NumericVar var1, const NumericVar var2, NumericVar *result,`
`8191`	`8191`	`intres_weight;`
`8192`	`8192`	`intmaxdigits;`
`8193`	`8193`	`int*dig;`
	`8194`	`+int*dig_i1_2;`
`8194`	`8195`	`intcarry;`
`8195`	`8196`	`intmaxdig;`
`8196`	`8197`	`intnewdig;`
`@@ -8327,10 +8328,18 @@ mul_var(const NumericVar var1, const NumericVar var2, NumericVar *result,`
`8327`	`8328`	`*`
`8328`	`8329`	`* As above, digits of var2 can be ignored if they don't contribute,`
`8329`	`8330`	`* so we only include digits for which i1+i2+2 <= res_ndigits - 1.`
	`8331`	`+ *`
	`8332`	`+ * This inner loop is the performance bottleneck for multiplication,`
	`8333`	`+ * so we want to keep it simple enough so that it can be`
	`8334`	`+ * auto-vectorized. Accordingly, process the digits left-to-right`
	`8335`	`+ * even though schoolbook multiplication would suggest right-to-left.`
	`8336`	`+ * Since we aren't propagating carries in this loop, the order does`
	`8337`	`+ * not matter.`
`8330`	`8338`	`*/`
`8331`		`-for (i2=Min(var2ndigits-1,res_ndigits-i1-3),i=i1+i2+2;`
`8332`		`-i2 >=0;i2--)`
`8333`		`-dig[i--]+=var1digit*var2digits[i2];`
	`8339`	`+i=Min(var2ndigits-1,res_ndigits-i1-3);`
	`8340`	`+dig_i1_2=&dig[i1+2];`
	`8341`	`+for (i2=0;i2 <=i;i2++)`
	`8342`	`+dig_i1_2[i2]+=var1digit*var2digits[i2];`
`8334`	`8343`	`}`
`8335`	`8344`
`8336`	`8345`	`/*`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit8870917

File tree

2 files changed

2 files changed

`‎src/backend/utils/adt/Makefile‎`

`‎src/backend/utils/adt/numeric.c‎`

0 commit comments