Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit58a359e

Browse files
committed
Speedup tuple deformation with additional function inlining
This adjusts slot_deform_heap_tuple() to add special-case loops toeliminate much of the branching that was done within the body of themain deform loop.Previously, while looping over each attribute to deform,slot_deform_heap_tuple() would always recheck if the given attribute wasNULL by looking at HeapTupleHasNulls() and if so, went on to check thetuple's NULL bitmap. Since many tuples won't contain any NULLs, we canjust check HeapTupleHasNulls() once and when there are no NULLs, use amore compact version of the deforming loop which contains no NULL checkingcode at all.The same is possible for the "slow" mode checking part of the loop. Thatvariable was checked several times for each attribute, once to determineif the offset to the attribute value could be taken from the attcacheoff,and again to check if the offset could be cached for next time.These "slow" checks can mostly be eliminated by instead having multipleloops. Initially, we can start in the non-slow loop and break out ofthat loop if and only if we must stop caching the offset. Thiseliminates branching for both slow and non-slow deforming methods. Theamount of code required for the no nulls / non-slow version is verysmall. It's possible to have separate loops like this due to the factthat once we move into slow mode, we never need to switch back intonon-slow mode for a given tuple.We have the compiler take care of writing out the multiple requiredloops by having a pg_attribute_always_inline function which gets calledvarious times passing in constant values for the "slow" and "hasnulls"parameters. This allows the compiler to eliminate const-false branchesand remove comparisons for const-true ones.This commit has shown overall query performance increases of around 5-20%in deform-heavy OLAP-type workloads.Author: David RowleyReviewed-by: Victor YegorovDiscussion:https://postgr.es/m/CAGnEbog92Og2CpC2S8=g_HozGsWtt_3kRS1sXjLz0jKSoCNfLw@mail.gmail.comDiscussion:https://postgr.es/m/CAApHDvo9e0XG71WrefYaRv5n4xNPLK4k8LjD0mSR3c9KR2vi2Q@mail.gmail.com
1 parentd85ce01 commit58a359e

File tree

1 file changed

+154
-54
lines changed

1 file changed

+154
-54
lines changed

‎src/backend/executor/execTuples.c

Lines changed: 154 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -991,54 +991,40 @@ tts_buffer_heap_store_tuple(TupleTableSlot *slot, HeapTuple tuple,
991991
}
992992

993993
/*
994-
* slot_deform_heap_tuple
995-
*Given a TupleTableSlot, extract data from the slot's physical tuple
996-
*into its Datum/isnull arrays. Data is extracted up through the
997-
*natts'th column (caller must ensure this is a legal column number).
994+
* slot_deform_heap_tuple_internal
995+
*An always inline helper function for use in slot_deform_heap_tuple to
996+
*allow the compiler to emit specialized versions of this function for
997+
*various combinations of "slow" and "hasnulls". For example, if a
998+
*given tuple has no nulls, then we needn't check "hasnulls" for every
999+
*attribute that we're deforming. The caller can just call this
1000+
*function with hasnulls set to constant-false and have the compiler
1001+
*remove the constant-false branches and emit more optimal code.
9981002
*
999-
*This is essentially an incremental version of heap_deform_tuple:
1000-
*on each call we extract attributes up to the one needed, without
1001-
*re-computing information about previously extracted attributes.
1002-
*slot->tts_nvalid is the number of attributes already extracted.
1003+
* Returns the next attnum to deform, which can be equal to natts when the
1004+
* function manages to deform all requested attributes. *offp is an input and
1005+
* output parameter which is the byte offset within the tuple to start deforming
1006+
* from which, on return, gets set to the offset where the next attribute
1007+
* should be deformed from. *slowp is set to true when subsequent deforming
1008+
* of this tuple must use a version of this function with "slow" passed as
1009+
* true.
10031010
*
1004-
* This is marked as always inline, so the different offp for different types
1005-
* of slots gets optimized away.
1011+
* Callers cannot assume when we return "attnum" (i.e. all requested
1012+
* attributes have been deformed) that slow mode isn't required for any
1013+
* additional deforming as the final attribute may have caused a switch to
1014+
* slow mode.
10061015
*/
1007-
staticpg_attribute_always_inlinevoid
1008-
slot_deform_heap_tuple(TupleTableSlot*slot,HeapTupletuple,uint32*offp,
1009-
intnatts)
1016+
staticpg_attribute_always_inlineint
1017+
slot_deform_heap_tuple_internal(TupleTableSlot*slot,HeapTupletuple,
1018+
intattnum,intnatts,boolslow,
1019+
boolhasnulls,uint32*offp,bool*slowp)
10101020
{
10111021
TupleDesctupleDesc=slot->tts_tupleDescriptor;
10121022
Datum*values=slot->tts_values;
10131023
bool*isnull=slot->tts_isnull;
10141024
HeapTupleHeadertup=tuple->t_data;
1015-
boolhasnulls=HeapTupleHasNulls(tuple);
1016-
intattnum;
10171025
char*tp;/* ptr to tuple data */
1018-
uint32off;/* offset in tuple data */
10191026
bits8*bp=tup->t_bits;/* ptr to null bitmap in tuple */
1020-
boolslow;/* can we use/set attcacheoff? */
1021-
1022-
/* We can only fetch as many attributes as the tuple has. */
1023-
natts=Min(HeapTupleHeaderGetNatts(tuple->t_data),natts);
1024-
1025-
/*
1026-
* Check whether the first call for this tuple, and initialize or restore
1027-
* loop state.
1028-
*/
1029-
attnum=slot->tts_nvalid;
1030-
if (attnum==0)
1031-
{
1032-
/* Start from the first attribute */
1033-
off=0;
1034-
slow= false;
1035-
}
1036-
else
1037-
{
1038-
/* Restore state from previous execution */
1039-
off=*offp;
1040-
slow=TTS_SLOW(slot);
1041-
}
1027+
boolslownext= false;
10421028

10431029
tp= (char*)tup+tup->t_hoff;
10441030

@@ -1050,14 +1036,20 @@ slot_deform_heap_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
10501036
{
10511037
values[attnum]= (Datum)0;
10521038
isnull[attnum]= true;
1053-
slow= true;/* can't use attcacheoff anymore */
1054-
continue;
1039+
if (!slow)
1040+
{
1041+
*slowp= true;
1042+
returnattnum+1;
1043+
}
1044+
else
1045+
continue;
10551046
}
10561047

10571048
isnull[attnum]= false;
10581049

1050+
/* calculate the offset of this attribute */
10591051
if (!slow&&thisatt->attcacheoff >=0)
1060-
off=thisatt->attcacheoff;
1052+
*offp=thisatt->attcacheoff;
10611053
elseif (thisatt->attlen==-1)
10621054
{
10631055
/*
@@ -1066,31 +1058,140 @@ slot_deform_heap_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
10661058
* pad bytes in any case: then the offset will be valid for either
10671059
* an aligned or unaligned value.
10681060
*/
1069-
if (!slow&&
1070-
off==att_nominal_alignby(off,thisatt->attalignby))
1071-
thisatt->attcacheoff=off;
1061+
if (!slow&&*offp==att_nominal_alignby(*offp,thisatt->attalignby))
1062+
thisatt->attcacheoff=*offp;
10721063
else
10731064
{
1074-
off=att_pointer_alignby(off,thisatt->attalignby,-1,
1075-
tp+off);
1076-
slow= true;
1065+
*offp=att_pointer_alignby(*offp,
1066+
thisatt->attalignby,
1067+
-1,
1068+
tp+*offp);
1069+
1070+
if (!slow)
1071+
slownext= true;
10771072
}
10781073
}
10791074
else
10801075
{
10811076
/* not varlena, so safe to use att_nominal_alignby */
1082-
off=att_nominal_alignby(off,thisatt->attalignby);
1077+
*offp=att_nominal_alignby(*offp,thisatt->attalignby);
10831078

10841079
if (!slow)
1085-
thisatt->attcacheoff=off;
1080+
thisatt->attcacheoff=*offp;
1081+
}
1082+
1083+
values[attnum]=fetchatt(thisatt,tp+*offp);
1084+
1085+
*offp=att_addlength_pointer(*offp,thisatt->attlen,tp+*offp);
1086+
1087+
/* check if we need to switch to slow mode */
1088+
if (!slow)
1089+
{
1090+
/*
1091+
* We're unable to deform any further if the above code set
1092+
* 'slownext', or if this isn't a fixed-width attribute.
1093+
*/
1094+
if (slownext||thisatt->attlen <=0)
1095+
{
1096+
*slowp= true;
1097+
returnattnum+1;
1098+
}
10861099
}
1100+
}
10871101

1088-
values[attnum]=fetchatt(thisatt,tp+off);
1102+
returnnatts;
1103+
}
10891104

1090-
off=att_addlength_pointer(off,thisatt->attlen,tp+off);
1105+
/*
1106+
* slot_deform_heap_tuple
1107+
*Given a TupleTableSlot, extract data from the slot's physical tuple
1108+
*into its Datum/isnull arrays. Data is extracted up through the
1109+
*natts'th column (caller must ensure this is a legal column number).
1110+
*
1111+
*This is essentially an incremental version of heap_deform_tuple:
1112+
*on each call we extract attributes up to the one needed, without
1113+
*re-computing information about previously extracted attributes.
1114+
*slot->tts_nvalid is the number of attributes already extracted.
1115+
*
1116+
* This is marked as always inline, so the different offp for different types
1117+
* of slots gets optimized away.
1118+
*/
1119+
staticpg_attribute_always_inlinevoid
1120+
slot_deform_heap_tuple(TupleTableSlot*slot,HeapTupletuple,uint32*offp,
1121+
intnatts)
1122+
{
1123+
boolhasnulls=HeapTupleHasNulls(tuple);
1124+
intattnum;
1125+
uint32off;/* offset in tuple data */
1126+
boolslow;/* can we use/set attcacheoff? */
1127+
1128+
/* We can only fetch as many attributes as the tuple has. */
1129+
natts=Min(HeapTupleHeaderGetNatts(tuple->t_data),natts);
10911130

1092-
if (thisatt->attlen <=0)
1093-
slow= true;/* can't use attcacheoff anymore */
1131+
/*
1132+
* Check whether the first call for this tuple, and initialize or restore
1133+
* loop state.
1134+
*/
1135+
attnum=slot->tts_nvalid;
1136+
if (attnum==0)
1137+
{
1138+
/* Start from the first attribute */
1139+
off=0;
1140+
slow= false;
1141+
}
1142+
else
1143+
{
1144+
/* Restore state from previous execution */
1145+
off=*offp;
1146+
slow=TTS_SLOW(slot);
1147+
}
1148+
1149+
/*
1150+
* If 'slow' isn't set, try deforming using deforming code that does not
1151+
* contain any of the extra checks required for non-fixed offset
1152+
* deforming. During deforming, if or when we find a NULL or a variable
1153+
* length attribute, we'll switch to a deforming method which includes the
1154+
* extra code required for non-fixed offset deforming, a.k.a slow mode.
1155+
* Because this is performance critical, we inline
1156+
* slot_deform_heap_tuple_internal passing the 'slow' and 'hasnull'
1157+
* parameters as constants to allow the compiler to emit specialized code
1158+
* with the known-const false comparisons and subsequent branches removed.
1159+
*/
1160+
if (!slow)
1161+
{
1162+
/* Tuple without any NULLs? We can skip doing any NULL checking */
1163+
if (!hasnulls)
1164+
attnum=slot_deform_heap_tuple_internal(slot,
1165+
tuple,
1166+
attnum,
1167+
natts,
1168+
false,/* slow */
1169+
false,/* hasnulls */
1170+
&off,
1171+
&slow);
1172+
else
1173+
attnum=slot_deform_heap_tuple_internal(slot,
1174+
tuple,
1175+
attnum,
1176+
natts,
1177+
false,/* slow */
1178+
true,/* hasnulls */
1179+
&off,
1180+
&slow);
1181+
}
1182+
1183+
/* If there's still work to do then we must be in slow mode */
1184+
if (attnum<natts)
1185+
{
1186+
/* XXX is it worth adding a separate call when hasnulls is false? */
1187+
attnum=slot_deform_heap_tuple_internal(slot,
1188+
tuple,
1189+
attnum,
1190+
natts,
1191+
true,/* slow */
1192+
hasnulls,
1193+
&off,
1194+
&slow);
10941195
}
10951196

10961197
/*
@@ -1104,7 +1205,6 @@ slot_deform_heap_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
11041205
slot->tts_flags &= ~TTS_FLAG_SLOW;
11051206
}
11061207

1107-
11081208
constTupleTableSlotOpsTTSOpsVirtual= {
11091209
.base_slot_size=sizeof(VirtualTupleTableSlot),
11101210
.init=tts_virtual_init,

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp