1//===- TargetTransformInfo.h ------------------------------------*- C++ -*-===// 3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4// See https://llvm.org/LICENSE.txt for license information. 5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 7//===----------------------------------------------------------------------===// 9/// This pass exposes codegen information to IR-level passes. Every 10/// transformation that uses codegen information is broken into three parts: 11/// 1. The IR-level analysis pass. 12/// 2. The IR-level transformation interface which provides the needed 14/// 3. Codegen-level implementation which uses target-specific hooks. 16/// This file defines #2, which is the interface that IR-level transformations 17/// use for querying the codegen. 19//===----------------------------------------------------------------------===// 21#ifndef LLVM_ANALYSIS_TARGETTRANSFORMINFO_H 22#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H 45classBlockFrequencyInfo;
51classOptimizationRemarkEmitter;
52classInterleavedAccessInfo;
57classLoopVectorizationLegality;
58classProfileSummaryInfo;
59classRecurrenceDescriptor;
65classTargetLibraryInfo;
70/// Information about a load/store intrinsic defined by the target. 72 /// This is the pointer that the intrinsic is loading from or storing to. 73 /// If this is non-null, then analysis/optimization passes can assume that 74 /// this intrinsic is functionally equivalent to a load/store from this 78// Ordering for atomic operations. 81// Same Id is set by the target for corresponding load/store intrinsics. 95/// Attributes of a target dependent hardware loop. 105// value in every iteration. 107// another hardware loop? 109// the loop via a phi? 111// icmp ne zero on the loop counter value and 112// produces an i1 to guard the loop entry. 126// If ScalarizationCost is UINT_MAX, the cost of scalarizing the 127// arguments and the return value will be computed based on types. 134bool TypeBasedOnly =
false);
166 /// Don't use tail folding 168 /// Use predicate only to mask operations on data in the loop. 169 /// When the VL is not known to be a power-of-2, this method requires a 170 /// runtime overflow check for the i + VL in the loop because it compares the 171 /// scalar induction variable against the tripcount rounded up by VL which may 172 /// overflow. When the VL is a power-of-2, both the increment and uprounded 173 /// tripcount will overflow to 0, which does not require a runtime check 174 /// since the loop is exited when the loop induction variable equals the 175 /// uprounded trip-count, which are both 0. 177 /// Same as Data, but avoids using the get.active.lane.mask intrinsic to 178 /// calculate the mask and instead implements this with a 179 /// splat/stepvector/cmp. 180 /// FIXME: Can this kind be removed now that SelectionDAGBuilder expands the 181 /// active.lane.mask intrinsic when it is not natively supported? 183 /// Use predicate to control both data and control flow. 184 /// This method always requires a runtime overflow check for the i + VL 185 /// increment inside the loop, because it uses the result direclty in the 186 /// active.lane.mask to calculate the mask for the next iteration. If the 187 /// increment overflows, the mask is no longer correct. 189 /// Use predicate to control both data and control flow, but modify 190 /// the trip count so that a runtime overflow check can be avoided 191 /// and such that the scalar epilogue loop can always be removed. 193 /// Use predicated EVL instructions for tail-folding. 194 /// Indicates that VP intrinsics should be used. 207classTargetTransformInfo;
210/// This pass provides access to the codegen interfaces that are needed 211/// for IR-level transformations. 216 /// Get the kind of extension that an instruction represents. 220 /// Construct a TTI object using a type implementing the \c Concept 223 /// This is used by targets to construct a TTI wrapping their target-specific 224 /// implementation that encodes appropriate costs for their target. 227 /// Construct a baseline TTI object using a minimal implementation of 228 /// the \c Concept API below. 230 /// The TTI implementation will reflect the information in the DataLayout 231 /// provided if non-null. 234// Provide move semantics. 238// We need to define the destructor out-of-line to define our sub-classes 242 /// Handle the invalidation of this information. 244 /// When used as a result of \c TargetIRAnalysis this method will be called 245 /// when the function this was computed for changes. When it returns false, 246 /// the information is preserved across those changes. 249// FIXME: We should probably in some way ensure that the subtarget 250// information for a function hasn't changed. 254 /// \name Generic Target Information 257 /// The kind of cost model. 259 /// There are several different cost models that can be customized by the 260 /// target. The normalization of each cost model may be target specific. 261 /// e.g. TCK_SizeAndLatency should be comparable to target thresholds such as 262 /// those derived from MCSchedModel::LoopMicroOpBufferSize etc. 270 /// Underlying constants for 'cost' values in this interface. 272 /// Many APIs in this interface return a cost. This enum defines the 273 /// fundamental values that should be used to interpret (and produce) those 274 /// costs. The costs are returned as an int rather than a member of this 275 /// enumeration because it is expected that the cost of one IR instruction 276 /// may have a multiplicative factor to it or otherwise won't fit directly 277 /// into the enum. Moreover, it is common to sum or average costs which works 278 /// better as simple integral values. Thus this enum only provides constants. 279 /// Also note that the returned costs are signed integers to make it natural 280 /// to add, subtract, and test with zero (a common boundary condition). It is 281 /// not expected that 2^32 is a realistic cost to be modeling at any point. 283 /// Note that these costs should usually reflect the intersection of code-size 284 /// cost and execution cost. A free instruction is typically one that folds 285 /// into another instruction. For example, reg-to-reg moves can often be 286 /// skipped by renaming the registers in the CPU, but they still are encoded 287 /// and thus wouldn't be considered 'free' here. 289TCC_Free = 0,
///< Expected to fold away in lowering. 290TCC_Basic = 1,
///< The cost of a typical 'add' instruction. 294 /// Estimate the cost of a GEP operation when lowered. 296 /// \p PointeeType is the source element type of the GEP. 297 /// \p Ptr is the base pointer operand. 298 /// \p Operands is the list of indices following the base pointer. 300 /// \p AccessType is a hint as to what type of memory might be accessed by 301 /// users of the GEP. getGEPCost will use it to determine if the GEP can be 302 /// folded into the addressing mode of a load/store. If AccessType is null, 303 /// then the resulting target type based off of PointeeType will be used as an 310 /// Describe known properties for a set of pointers. 312 /// All the GEPs in a set have same base address. 314 /// These properties only valid if SameBaseAddress is set. 315 /// True if all pointers are separated by a unit stride. 317 /// True if distance between any two neigbouring pointers is a known value. 326return {
/*IsSameBaseAddress=*/1,
/*IsUnitStride=*/1,
327/*IsKnownStride=*/1, 0};
330return {
/*IsSameBaseAddress=*/1,
/*IsUnitStride=*/0,
331/*IsKnownStride=*/1, 0};
334return {
/*IsSameBaseAddress=*/1,
/*IsUnitStride=*/0,
335/*IsKnownStride=*/0, 0};
338static_assert(
sizeof(PointersChainInfo) == 4,
"Was size increase justified?");
340 /// Estimate the cost of a chain of pointers (typically pointer operands of a 341 /// chain of loads or stores within same block) operations set when lowered. 342 /// \p AccessTy is the type of the loads/stores that will ultimately use the 346const PointersChainInfo &
Info,
Type *AccessTy,
349 /// \returns A value by which our inlining threshold should be multiplied. 350 /// This is primarily used to bump up the inlining threshold wholesale on 351 /// targets where calls are unusually expensive. 353 /// TODO: This is a rather blunt instrument. Perhaps altering the costs of 354 /// individual classes of instructions would be better. 360 /// \returns The bonus of inlining the last call to a static function. 363 /// \returns A value to be added to the inlining threshold. 366 /// \returns The cost of having an Alloca in the caller if not inlined, to be 367 /// added to the threshold 370 /// \returns Vector bonus in percent. 372 /// Vector bonuses: We want to more aggressively inline vector-dense kernels 373 /// and apply this bonus based on the percentage of vector instructions. A 374 /// bonus is applied if the vector instructions exceed 50% and half that 375 /// amount is applied if it exceeds 10%. Note that these bonuses are some what 376 /// arbitrary and evolved over time by accident as much as because they are 377 /// principled bonuses. 378 /// FIXME: It would be nice to base the bonus values on something more 379 /// scientific. A target may has no bonus on vector instructions. 382 /// \return the expected cost of a memcpy, which could e.g. depend on the 383 /// source/destination type and alignment and the number of bytes copied. 386 /// Returns the maximum memset / memcpy size in bytes that still makes it 387 /// profitable to inline the call. 390 /// \return The estimated number of case clusters when lowering \p 'SI'. 391 /// \p JTSize Set a jump table size only when \p SI is suitable for a jump 398 /// Estimate the cost of a given IR user when lowered. 400 /// This can estimate the cost of either a ConstantExpr or Instruction when 403 /// \p Operands is a list of operands which can be a result of transformations 404 /// of the current operands. The number of the operands on the list must equal 405 /// to the number of the current operands the IR user has. Their order on the 406 /// list must be the same as the order of the current operands the IR user 409 /// The returned cost is defined in terms of \c TargetCostConstants, see its 410 /// comments for a detailed explanation of the cost values. 415 /// This is a helper function which calls the three-argument 416 /// getInstructionCost with \p Operands which are the current operands U has. 423 /// If a branch or a select condition is skewed in one direction by more than 424 /// this factor, it is very likely to be predicted correctly. 427 /// Returns estimated penalty of a branch misprediction in latency. Indicates 428 /// how aggressive the target wants for eliminating unpredictable branches. A 429 /// zero return value means extra optimization applied to them should be 433 /// Return true if branch divergence exists. 435 /// Branch divergence has a significantly negative impact on GPU performance 436 /// when threads in the same wavefront take different paths due to conditional 439 /// If \p F is passed, provides a context function. If \p F is known to only 440 /// execute in a single threaded environment, the target may choose to skip 441 /// uniformity analysis and assume all values are uniform. 444 /// Returns whether V is a source of divergence. 446 /// This function provides the target-dependent information for 447 /// the target-independent UniformityAnalysis. 450// Returns true for the target specific 451// set of operations which produce uniform result 452// even taking non-uniform arguments 455 /// Query the target whether the specified address space cast from FromAS to 459 /// Return false if a \p AS0 address cannot possibly alias a \p AS1 address. 462 /// Returns the address space ID for a target's 'flat' address space. Note 463 /// this is not necessarily the same as addrspace(0), which LLVM sometimes 464 /// refers to as the generic address space. The flat address space is a 465 /// generic address space that can be used access multiple segments of memory 466 /// with different address spaces. Access of a memory location through a 467 /// pointer with this address space is expected to be legal but slower 468 /// compared to the same memory location accessed through a pointer with a 469 /// different address space. 471 /// This is for targets with different pointer representations which can 472 /// be converted with the addrspacecast instruction. If a pointer is converted 473 /// to this address space, optimizations should attempt to replace the access 474 /// with the source address space. 476 /// \returns ~0u if the target does not have such a flat address space to 480 /// Return any intrinsic address operand indexes which may be rewritten if 481 /// they use a flat address space pointer. 483 /// \returns true if the intrinsic was handled. 489 /// Return true if globals in this address space can have initializers other 497 std::pair<const Value *, unsigned>
500 /// Rewrite intrinsic call \p II such that \p OldV will be replaced with \p 501 /// NewV, which has a different address space. This should happen for every 502 /// operand index that collectFlatAddressOperands returned for the intrinsic. 503 /// \returns nullptr if the intrinsic was not handled. Otherwise, returns the 504 /// new value (which may be the original \p II with modified operands). 508 /// Test whether calls to a function lower to actual program function 511 /// The idea is to test whether the program is likely to require a 'call' 512 /// instruction or equivalent in order to call the given function. 514 /// FIXME: It's not clear that this is a good or useful query API. Client's 515 /// should probably move to simpler cost metrics using the above. 516 /// Alternatively, we could split the cost interface into distinct code-size 517 /// and execution-speed costs. This would allow modelling the core of this 518 /// query more accurately as a call is a single small instruction, but 519 /// incurs significant execution cost. 523 /// TODO: Some of these could be merged. Also, a lexical ordering 524 /// isn't always optimal. 535 /// Parameters that control the generic loop unrolling transformation. 537 /// The cost threshold for the unrolled loop. Should be relative to the 538 /// getInstructionCost values returned by this API, and the expectation is 539 /// that the unrolled loop's instructions when run through that interface 540 /// should not exceed this cost. However, this is only an estimate. Also, 541 /// specific loops may be unrolled even with a cost above this threshold if 542 /// deemed profitable. Set this to UINT_MAX to disable the loop body cost 545 /// If complete unrolling will reduce the cost of the loop, we will boost 546 /// the Threshold by a certain percent to allow more aggressive complete 547 /// unrolling. This value provides the maximum boost percentage that we 548 /// can apply to Threshold (The value should be no less than 100). 549 /// BoostedThreshold = Threshold * min(RolledCost / UnrolledCost, 550 /// MaxPercentThresholdBoost / 100) 551 /// E.g. if complete unrolling reduces the loop execution time by 50% 552 /// then we boost the threshold by the factor of 2x. If unrolling is not 553 /// expected to reduce the running time, then we do not increase the 556 /// The cost threshold for the unrolled loop when optimizing for size (set 557 /// to UINT_MAX to disable). 559 /// The cost threshold for the unrolled loop, like Threshold, but used 560 /// for partial/runtime unrolling (set to UINT_MAX to disable). 562 /// The cost threshold for the unrolled loop when optimizing for size, like 563 /// OptSizeThreshold, but used for partial/runtime unrolling (set to 564 /// UINT_MAX to disable). 566 /// A forced unrolling factor (the number of concatenated bodies of the 567 /// original loop in the unrolled loop body). When set to 0, the unrolling 568 /// transformation will select an unrolling factor based on the current cost 569 /// threshold and other factors. 571 /// Default unroll count for loops with run-time trip count. 573// Set the maximum unrolling factor. The unrolling factor may be selected 574// using the appropriate cost threshold, but may not exceed this number 575// (set to UINT_MAX to disable). This does not apply in cases where the 576// loop is being fully unrolled. 578 /// Set the maximum upper bound of trip count. Allowing the MaxUpperBound 579 /// to be overrided by a target gives more flexiblity on certain cases. 580 /// By default, MaxUpperBound uses UnrollMaxUpperBound which value is 8. 582 /// Set the maximum unrolling factor for full unrolling. Like MaxCount, but 583 /// applies even if full unrolling is selected. This allows a target to fall 584 /// back to Partial unrolling if full unrolling is above FullUnrollMaxCount. 586// Represents number of instructions optimized when "back edge" 587// becomes "fall through" in unrolled loop. 588// For now we count a conditional branch on a backedge and a comparison 591 /// Allow partial unrolling (unrolling of loops to expand the size of the 592 /// loop body, not only to eliminate small constant-trip-count loops). 594 /// Allow runtime unrolling (unrolling of loops to expand the size of the 595 /// loop body even when the number of loop iterations is not known at 598 /// Allow generation of a loop remainder (extra iterations after unroll). 600 /// Allow emitting expensive instructions (such as divisions) when computing 601 /// the trip count of a loop for runtime unrolling. 603 /// Apply loop unroll on any kind of loop 604 /// (mainly to loops that fail runtime unrolling). 606 /// Allow using trip count upper bound to unroll loops. 608 /// Allow unrolling of all the iterations of the runtime loop remainder. 610 /// Allow unroll and jam. Used to enable unroll and jam for the target. 612 /// Threshold for unroll and jam, for inner loop size. The 'Threshold' 613 /// value above is used during unroll and jam for the outer loop size. 614 /// This value is used in the same manner to limit the size of the inner 617 /// Don't allow loop unrolling to simulate more than this number of 618 /// iterations when checking full unroll profitability 620 /// Don't disable runtime unroll for the loops which were vectorized. 622 /// Don't allow runtime unrolling if expanding the trip count takes more 623 /// than SCEVExpansionBudget. 625 /// Allow runtime unrolling multi-exit loops. Should only be set if the 626 /// target determined that multi-exit unrolling is profitable for the loop. 627 /// Fall back to the generic logic to determine whether multi-exit unrolling 628 /// is profitable if set to false. 632 /// Get target-customized preferences for the generic loop unrolling 633 /// transformation. The caller will initialize UP with the current 634 /// target-independent defaults. 639 /// Query the target whether it would be profitable to convert the given loop 640 /// into a hardware loop. 645// Query the target for which minimum vectorization factor epilogue 646// vectorization should be considered. 649 /// Query the target whether it would be prefered to create a predicated 650 /// vector loop, which can avoid the need to emit a scalar epilogue loop. 653 /// Query the target what the preferred style of tail folding is. 654 /// \param IVUpdateMayOverflow Tells whether it is known if the IV update 655 /// may (or will never) overflow for the suggested VF/UF in the given loop. 656 /// Targets can use this information to select a more optimal tail folding 657 /// style. The value conservatively defaults to true, such that no assumptions 658 /// are made on overflow. 662// Parameters that control the loop peeling transformation 664 /// A forced peeling factor (the number of bodied of the original loop 665 /// that should be peeled off before the loop body). When set to 0, the 666 /// a peeling factor based on profile information and other factors. 668 /// Allow peeling off loop iterations. 670 /// Allow peeling off loop iterations for loop nests. 672 /// Allow peeling basing on profile. Uses to enable peeling off all 673 /// iterations basing on provided profile. 674 /// If the value is true the peeling cost model can decide to peel only 675 /// some iterations and in this case it will set this to false. 679 /// Get target-customized preferences for the generic loop peeling 680 /// transformation. The caller will initialize \p PP with the current 681 /// target-independent defaults with information from \p L and \p SE. 685 /// Targets can implement their own combinations for target-specific 686 /// intrinsics. This function will be called from the InstCombine pass every 687 /// time a target-specific intrinsic is encountered. 689 /// \returns std::nullopt to not do anything target specific or a value that 690 /// will be returned from the InstCombiner. It is possible to return null and 691 /// stop further processing of the intrinsic by returning nullptr. 694 /// Can be used to implement target-specific instruction combining. 695 /// \see instCombineIntrinsic 698KnownBits & Known,
bool &KnownBitsComputed)
const;
699 /// Can be used to implement target-specific instruction combining. 700 /// \see instCombineIntrinsic 705 SimplifyAndSetOp)
const;
708 /// \name Scalar Target Information 711 /// Flags indicating the kind of support for population count. 713 /// Compared to the SW implementation, HW support is supposed to 714 /// significantly boost the performance when the population is dense, and it 715 /// may or may not degrade performance if the population is sparse. A HW 716 /// support is considered as "Fast" if it can outperform, or is on a par 717 /// with, SW implementation when the population is sparse; otherwise, it is 718 /// considered as "Slow". 721 /// Return true if the specified immediate is legal add immediate, that 722 /// is the target has add instructions which can add a register with the 723 /// immediate without having to materialize the immediate into a register. 726 /// Return true if adding the specified scalable immediate is legal, that is 727 /// the target has add instructions which can add a register with the 728 /// immediate (multiplied by vscale) without having to materialize the 729 /// immediate into a register. 732 /// Return true if the specified immediate is legal icmp immediate, 733 /// that is the target has icmp instructions which can compare a register 734 /// against the immediate without having to materialize the immediate into a 738 /// Return true if the addressing mode represented by AM is legal for 739 /// this target, for a load/store of the specified type. 740 /// The type may be VoidTy, in which case only return true if the addressing 741 /// mode is legal for a load/store of any legal type. 742 /// If target returns true in LSRWithInstrQueries(), I may be valid. 743 /// \param ScalableOffset represents a quantity of bytes multiplied by vscale, 744 /// an invariant value known only at runtime. Most targets should not accept 745 /// a scalable offset. 747 /// TODO: Handle pre/postinc as well. 749bool HasBaseReg, int64_t Scale,
751 int64_t ScalableOffset = 0)
const;
753 /// Return true if LSR cost of C1 is lower than C2. 757 /// Return true if LSR major cost is number of registers. Targets which 758 /// implement their own isLSRCostLess and unset number of registers as major 759 /// cost should return false, otherwise return true. 762 /// Return true if LSR should drop a found solution if it's calculated to be 763 /// less profitable than the baseline. 766 /// \returns true if LSR should not optimize a chain that includes \p I. 769 /// Return true if the target can fuse a compare and branch. 770 /// Loop-strength-reduction (LSR) uses that knowledge to adjust its cost 771 /// calculation for the instructions in a loop. 774 /// Return true if the target can save a compare for loop count, for example 775 /// hardware loop saves a compare. 786 /// Return the preferred addressing mode LSR should make efforts to generate. 790 /// Return true if the target supports masked store. 792 /// Return true if the target supports masked load. 795 /// Return true if the target supports nontemporal store. 797 /// Return true if the target supports nontemporal load. 800 /// \Returns true if the target supports broadcasting a load to a vector of 801 /// type <NumElements x ElementTy>. 804 /// Return true if the target supports masked scatter. 806 /// Return true if the target supports masked gather. 808 /// Return true if the target forces scalarizing of llvm.masked.gather 811 /// Return true if the target forces scalarizing of llvm.masked.scatter 815 /// Return true if the target supports masked compress store. 817 /// Return true if the target supports masked expand load. 820 /// Return true if the target supports strided load. 823 /// Return true is the target supports interleaved access for the given vector 824 /// type \p VTy, interleave factor \p Factor, alignment \p Alignment and 825 /// address space \p AddrSpace. 827Align Alignment,
unsigned AddrSpace)
const;
829// Return true if the target supports masked vector histograms. 832 /// Return true if this is an alternating opcode pattern that can be lowered 833 /// to a single instruction on the target. In X86 this is for the addsub 834 /// instruction which corrsponds to a Shuffle + Fadd + FSub pattern in IR. 835 /// This function expectes two opcodes: \p Opcode1 and \p Opcode2 being 836 /// selected by \p OpcodeMask. The mask contains one bit per lane and is a `0` 837 /// when \p Opcode0 is selected and `1` when Opcode1 is selected. 838 /// \p VecTy is the vector type of the instruction to be generated. 842 /// Return true if we should be enabling ordered reductions for the target. 845 /// Return true if the target has a unified operation to calculate division 846 /// and remainder. If so, the additional implicit multiplication and 847 /// subtraction required to calculate a remainder from division are free. This 848 /// can enable more aggressive transformations for division and remainder than 849 /// would typically be allowed using throughput or size cost models. 852 /// Return true if the given instruction (assumed to be a memory access 853 /// instruction) has a volatile variant. If that's the case then we can avoid 854 /// addrspacecast to generic AS for volatile loads/stores. Default 855 /// implementation returns false, which prevents address space inference for 856 /// volatile loads/stores. 859 /// Return true if target doesn't mind addresses in vectors. 862 /// Return the cost of the scaling factor used in the addressing 863 /// mode represented by AM for this target, for a load/store 864 /// of the specified type. 865 /// If the AM is supported, the return value must be >= 0. 866 /// If the AM is not supported, it returns a negative value. 867 /// TODO: Handle pre/postinc as well. 871unsigned AddrSpace = 0)
const;
873 /// Return true if the loop strength reduce pass should make 874 /// Instruction* based TTI queries to isLegalAddressingMode(). This is 875 /// needed on SystemZ, where e.g. a memcpy can only have a 12 bit unsigned 876 /// immediate offset and no index register. 879 /// Return true if it's free to truncate a value of type Ty1 to type 880 /// Ty2. e.g. On x86 it's free to truncate a i32 value in register EAX to i16 881 /// by referencing its sub-register AX. 884 /// Return true if it is profitable to hoist instruction in the 885 /// then/else to before if. 890 /// Return true if this type is legal. 893 /// Returns the estimated number of registers required to represent \p Ty. 896 /// Return true if switches should be turned into lookup tables for the 900 /// Return true if switches should be turned into lookup tables 901 /// containing this constant value for the target. 904 /// Return true if lookup tables should be turned into relative lookup tables. 907 /// Return true if the input function which is cold at all call sites, 908 /// should use coldcc calling convention. 913 /// Identifies if the vector form of the intrinsic has a scalar operand. 915unsigned ScalarOpdIdx)
const;
917 /// Identifies if the vector form of the intrinsic is overloaded on the type 918 /// of the operand at index \p OpdIdx, or on the return type if \p OpdIdx is 923 /// Identifies if the vector form of the intrinsic that returns a struct is 924 /// overloaded at the struct element index \p RetIdx. 928 /// Estimate the overhead of scalarizing an instruction. Insert and Extract 929 /// are set if the demanded result elements need to be inserted and/or 930 /// extracted from vectors. The involved values may be passed in VL if 933constAPInt &DemandedElts,
934bool Insert,
bool Extract,
938 /// Estimate the overhead of scalarizing an instructions unique 939 /// non-constant operands. The (potentially vector) types to use for each of 940 /// argument are passes via Tys. 943 ArrayRef<Type *> Tys,
946 /// If target has efficient vector element load/store instructions, it can 947 /// return true here so that insertion/extraction costs are not added to 948 /// the scalarization cost of a load/store. 951 /// If the target supports tail calls. 954 /// If target supports tail call on \p CB 957 /// Don't restrict interleaved unrolling to small loops. 960 /// Returns options for expansion of memcmp. IsZeroCmp is 961// true if this is the expansion of memcmp(p1, p2, s) == 0. 963// Return true if memcmp expansion is enabled. 966// Maximum number of load operations. 969// The list of available load sizes (in bytes), sorted in decreasing order. 972// For memcmp expansion when the memcmp result is only compared equal or 973// not-equal to 0, allow up to this number of load pairs per block. As an 974// example, this may allow 'memcmp(a, b, 3) == 0' in a single block: 975// a0 = load2bytes &a[0] 976// b0 = load2bytes &b[0] 977// a2 = load1byte &a[2] 978// b2 = load1byte &b[2] 979// r = cmp eq (a0 ^ b0 | a2 ^ b2), 0 982// Set to true to allow overlapping loads. For example, 7-byte compares can 983// be done with two 4-byte compares instead of 4+2+1-byte compares. This 984// requires all loads in LoadSizes to be doable in an unaligned way. 987// Sometimes, the amount of data that needs to be compared is smaller than 988// the standard register size, but it cannot be loaded with just one load 989// instruction. For example, if the size of the memory comparison is 6 990// bytes, we can handle it more efficiently by loading all 6 bytes in a 991// single block and generating an 8-byte number, instead of generating two 992// separate blocks with conditional jumps for 4 and 2 byte loads. This 993// approach simplifies the process and produces the comparison result as 994// normal. This array lists the allowed sizes of memcmp tails that can be 995// merged into one block 1001 /// Should the Select Optimization pass be enabled and ran. 1004 /// Should the Select Optimization pass treat the given instruction like a 1005 /// select, potentially converting it to a conditional branch. This can 1006 /// include select-like instructions like or(zext(c), x) that can be converted 1010 /// Enable matching of interleaved access groups. 1013 /// Enable matching of interleaved access groups that contain predicated 1014 /// accesses or gaps and therefore vectorized using masked 1015 /// vector loads/stores. 1018 /// Indicate that it is potentially unsafe to automatically vectorize 1019 /// floating-point operations because the semantics of vector and scalar 1020 /// floating-point semantics may differ. For example, ARM NEON v7 SIMD math 1021 /// does not support IEEE-754 denormal numbers, while depending on the 1022 /// platform, scalar floating-point math does. 1023 /// This applies to floating-point math operations and calls, not memory 1024 /// operations, shuffles, or casts. 1027 /// Determine if the target supports unaligned memory accesses. 1031unsigned *
Fast =
nullptr)
const;
1033 /// Return hardware support for population count. 1036 /// Return true if the hardware has a fast square-root instruction. 1039 /// Return true if the cost of the instruction is too high to speculatively 1040 /// execute and should be kept behind a branch. 1041 /// This normally just wraps around a getInstructionCost() call, but some 1042 /// targets might report a low TCK_SizeAndLatency value that is incompatible 1043 /// with the fixed TCC_Expensive value. 1044 /// NOTE: This assumes the instruction passes isSafeToSpeculativelyExecute(). 1047 /// Return true if it is faster to check if a floating-point value is NaN 1048 /// (or not-NaN) versus a comparison against a constant FP zero value. 1049 /// Targets should override this if materializing a 0.0 for comparison is 1050 /// generally as cheap as checking for ordered/unordered. 1053 /// Return the expected cost of supporting the floating point operation 1054 /// of the specified type. 1057 /// Return the expected cost of materializing for the given integer 1058 /// immediate of the specified type. 1062 /// Return the expected cost of materialization for the given integer 1063 /// immediate of the specified type for a given instruction. The cost can be 1064 /// zero if the immediate can be folded into the specified instruction. 1073 /// Return the expected cost for the given integer when optimising 1074 /// for size. This is different than the other integer immediate cost 1075 /// functions in that it is subtarget agnostic. This is useful when you e.g. 1076 /// target one ISA such as Aarch32 but smaller encodings could be possible 1077 /// with another such as Thumb. This return value is used as a penalty when 1078 /// the total costs for a constant is calculated (the bigger the cost, the 1079 /// more beneficial constant hoisting is). 1083 /// It can be advantageous to detach complex constants from their uses to make 1084 /// their generation cheaper. This hook allows targets to report when such 1085 /// transformations might negatively effect the code generation of the 1086 /// underlying operation. The motivating example is divides whereby hoisting 1087 /// constants prevents the code generator's ability to transform them into 1088 /// combinations of simpler operations. 1094 /// \name Vector Target Information 1097 /// The various kinds of shuffle patterns for vector queries. 1101SK_Select,
///< Selects elements from the corresponding lane of 1102 ///< either source operand. This is equivalent to a 1103 ///< vector select with a constant condition operand. 1108 ///< with any shuffle mask. 1111SK_Splice///< Concatenates elements from the first input vector 1112 ///< with elements of the second input vector. Returning 1113 ///< a vector of the same type as the input vectors. 1114 ///< Index indicates start offset in first input vector. 1117 /// Additional information about an operand's possible values. 1125 /// Additional properties of an operand's values. 1132// Describe the values an operand can take. We're in the process 1133// of migrating uses of OperandValueKind and OperandValueProperties 1134// to use this class, and then will change the internal representation. 1157 /// \return the number of registers in the target-provided register class. 1160 /// \return true if the target supports load/store that enables fault 1161 /// suppression of memory operands when the source condition is false. 1164 /// \return the target-provided register class ID for the provided type, 1165 /// accounting for type promotion and other type-legalization techniques that 1166 /// the target might apply. However, it specifically does not account for the 1167 /// scalarization or splitting of vector types. Should a vector type require 1168 /// scalarization or splitting into multiple underlying vector registers, that 1169 /// type should be mapped to a register class containing no registers. 1170 /// Specifically, this is designed to provide a simple, high-level view of the 1171 /// register allocation later performed by the backend. These register classes 1172 /// don't necessarily map onto the register classes used by the backend. 1173 /// FIXME: It's not currently possible to determine how many registers 1174 /// are used by the provided type. 1177 /// \return the target-provided register class name 1182 /// \return The width of the largest scalar or vector register type. 1185 /// \return The width of the smallest vector register type. 1188 /// \return The maximum value of vscale if the target specifies an 1189 /// architectural maximum vector length, and std::nullopt otherwise. 1192 /// \return the value of vscale to tune the cost model for. 1195 /// \return true if vscale is known to be a power of 2 1198 /// \return True if the vectorization factor should be chosen to 1199 /// make the vector of the smallest element type match the size of a 1200 /// vector register. For wider element types, this could result in 1201 /// creating vectors that span multiple vector registers. 1202 /// If false, the vectorization factor will be chosen based on the 1203 /// size of the widest element type. 1204 /// \p K Register Kind for vectorization. 1207 /// \return The minimum vectorization factor for types of given element 1208 /// bit width, or 0 if there is no minimum VF. The returned value only 1209 /// applies when shouldMaximizeVectorBandwidth returns true. 1210 /// If IsScalable is true, the returned ElementCount must be a scalable VF. 1213 /// \return The maximum vectorization factor for types of given element 1214 /// bit width and opcode, or 0 if there is no maximum VF. 1215 /// Currently only used by the SLP vectorizer. 1216unsignedgetMaximumVF(
unsigned ElemWidth,
unsigned Opcode)
const;
1218 /// \return The minimum vectorization factor for the store instruction. Given 1219 /// the initial estimation of the minimum vector factor and store value type, 1220 /// it tries to find possible lowest VF, which still might be profitable for 1221 /// the vectorization. 1222 /// \param VF Initial estimation of the minimum vector factor. 1223 /// \param ScalarMemTy Scalar memory type of the store operation. 1224 /// \param ScalarValTy Scalar type of the stored value. 1225 /// Currently only used by the SLP vectorizer. 1227Type *ScalarValTy)
const;
1229 /// \return True if it should be considered for address type promotion. 1230 /// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is 1231 /// profitable without finding other extensions fed by the same input. 1233constInstruction &
I,
bool &AllowPromotionWithoutCommonHeader)
const;
1235 /// \return The size of a cache line in bytes. 1238 /// The possible cache levels 1240L1D,
// The L1 data cache 1241L2D,
// The L2 data cache 1243// We currently do not model L3 caches, as their sizes differ widely between 1244// microarchitectures. Also, we currently do not have a use for L3 cache 1245// size modeling yet. 1248 /// \return The size of the cache level in bytes, if available. 1251 /// \return The associativity of the cache level, if available. 1254 /// \return The minimum architectural page size for the target. 1257 /// \return How much before a load we should place the prefetch 1258 /// instruction. This is currently measured in number of 1262 /// Some HW prefetchers can handle accesses up to a certain constant stride. 1263 /// Sometimes prefetching is beneficial even below the HW prefetcher limit, 1264 /// and the arguments provided are meant to serve as a basis for deciding this 1265 /// for a particular loop. 1267 /// \param NumMemAccesses Number of memory accesses in the loop. 1268 /// \param NumStridedMemAccesses Number of the memory accesses that 1269 /// ScalarEvolution could find a known stride 1271 /// \param NumPrefetches Number of software prefetches that will be 1272 /// emitted as determined by the addresses 1273 /// involved and the cache line size. 1274 /// \param HasCall True if the loop contains a call. 1276 /// \return This is the minimum stride in bytes where it makes sense to start 1277 /// adding SW prefetches. The default is 1, i.e. prefetch with any 1280unsigned NumStridedMemAccesses,
1281unsigned NumPrefetches,
bool HasCall)
const;
1283 /// \return The maximum number of iterations to prefetch ahead. If 1284 /// the required number of iterations is more than this number, no 1285 /// prefetching is performed. 1288 /// \return True if prefetching should also be done for writes. 1291 /// \return if target want to issue a prefetch in address space \p AS. 1294 /// \return The cost of a partial reduction, which is a reduction from a 1295 /// vector to another vector with fewer elements of larger size. They are 1296 /// represented by the llvm.experimental.partial.reduce.add intrinsic, which 1297 /// takes an accumulator and a binary operation operand that itself is fed by 1298 /// two extends. An example of an operation that uses a partial reduction is a 1299 /// dot product, which reduces two vectors to another of 4 times fewer and 4 1300 /// times larger elements. 1306 std::optional<unsigned> BinOp = std::nullopt)
const;
1308 /// \return The maximum interleave factor that any transform should try to 1309 /// perform for this target. This number depends on the level of parallelism 1310 /// and the number of execution units in the CPU. 1313 /// Collect properties of V used in cost analysis, e.g. OP_PowerOf2. 1316 /// This is an approximation of reciprocal throughput of a math/logic op. 1317 /// A higher cost indicates less expected throughput. 1318 /// From Agner Fog's guides, reciprocal throughput is "the average number of 1319 /// clock cycles per instruction when the instructions are not part of a 1320 /// limiting dependency chain." 1321 /// Therefore, costs should be scaled to account for multiple execution units 1322 /// on the target that can process this type of instruction. For example, if 1323 /// there are 5 scalar integer units and 2 vector integer units that can 1324 /// calculate an 'add' in a single cycle, this model should indicate that the 1325 /// cost of the vector add instruction is 2.5 times the cost of the scalar 1326 /// add instruction. 1327 /// \p Args is an optional argument which holds the instruction operands 1328 /// values so the TTI can analyze those values searching for special 1329 /// cases or optimizations based on those values. 1330 /// \p CxtI is the optional original context instruction, if one exists, to 1331 /// provide even more information. 1332 /// \p TLibInfo is used to search for platform specific vector library 1333 /// functions for instructions that might be converted to calls (e.g. frem). 1335unsigned Opcode,
Type *Ty,
1339 ArrayRef<const Value *>
Args = {},
const Instruction *CxtI =
nullptr,
1340const TargetLibraryInfo *TLibInfo =
nullptr)
const;
1342 /// Returns the cost estimation for alternating opcode pattern that can be 1343 /// lowered to a single instruction on the target. In X86 this is for the 1344 /// addsub instruction which corrsponds to a Shuffle + Fadd + FSub pattern in 1345 /// IR. This function expects two opcodes: \p Opcode1 and \p Opcode2 being 1346 /// selected by \p OpcodeMask. The mask contains one bit per lane and is a `0` 1347 /// when \p Opcode0 is selected and `1` when Opcode1 is selected. 1348 /// \p VecTy is the vector type of the instruction to be generated. 1350VectorType *VecTy,
unsigned Opcode0,
unsigned Opcode1,
1351const SmallBitVector &OpcodeMask,
1354 /// \return The cost of a shuffle instruction of kind Kind and of type Tp. 1355 /// The exact mask may be passed as Mask, or else the array will be empty. 1356 /// The index and subtype parameters are used by the subvector insertion and 1357 /// extraction shuffle kinds to show the insert/extract point and the type of 1358 /// the subvector being inserted/extracted. The operands of the shuffle can be 1359 /// passed through \p Args, which helps improve the cost estimation in some 1360 /// cases, like in broadcast loads. 1361 /// NOTE: For subvector extractions Tp represents the source type. 1366 ArrayRef<const Value *>
Args = {},
1367const Instruction *CxtI =
nullptr)
const;
1369 /// Represents a hint about the context in which a cast is used. 1371 /// For zext/sext, the context of the cast is the operand, which must be a 1372 /// load of some kind. For trunc, the context is of the cast is the single 1373 /// user of the instruction, which must be a store of some kind. 1375 /// This enum allows the vectorizer to give getCastInstrCost an idea of the 1376 /// type of cast it's dealing with, as not every cast is equal. For instance, 1377 /// the zext of a load may be free, but the zext of an interleaving load can 1378 //// be (very) expensive! 1380 /// See \c getCastContextHint to compute a CastContextHint from a cast 1381 /// Instruction*. Callers can use it if they don't need to override the 1382 /// context and just want it to be calculated from the instruction. 1384 /// FIXME: This handles the types of load/store that the vectorizer can 1385 /// produce, which are the cases where the context instruction is most 1386 /// likely to be incorrect. There are other situations where that can happen 1387 /// too, which might be handled here but in the long run a more general 1388 /// solution of costing multiple instructions at the same times may be better. 1390None,
///< The cast is not used with a load/store of any kind. 1391Normal,
///< The cast is used with a normal load/store. 1392Masked,
///< The cast is used with a masked load/store. 1394Interleave,
///< The cast is used with an interleaved load/store. 1395Reversed,
///< The cast is used with a reversed load/store. 1398 /// Calculates a CastContextHint from \p I. 1399 /// This should be used by callers of getCastInstrCost if they wish to 1400 /// determine the context from some instruction. 1401 /// \returns the CastContextHint for ZExt/SExt/Trunc, None if \p I is nullptr, 1402 /// or if it's another type of cast. 1405 /// \return The expected cost of cast instructions, such as bitcast, trunc, 1406 /// zext, etc. If there is an existing instruction that holds Opcode, it 1407 /// may be passed in the 'I' parameter. 1414 /// \return The expected cost of a sign- or zero-extended vector extract. Use 1415 /// Index = -1 to indicate that there is no information about the index value. 1420 /// \return The expected cost of control-flow related instructions such as 1421 /// Phi, Ret, Br, Switch. 1427 /// \returns The expected cost of compare and select instructions. If there 1428 /// is an existing instruction that holds Opcode, it may be passed in the 1429 /// 'I' parameter. The \p VecPred parameter can be used to indicate the select 1430 /// is using a compare with the specified predicate as condition. When vector 1431 /// types are passed, \p VecPred must be used for all lanes. For a 1432 /// comparison, the two operands are the natural values. For a select, the 1433 /// two operands are the *value* operands, not the condition operand. 1440const Instruction *
I =
nullptr)
const;
1442 /// \return The expected cost of vector Insert and Extract. 1443 /// Use -1 to indicate that there is no information on the index value. 1444 /// This is used when the instruction is not available; a typical use 1445 /// case is to provision the cost of vectorization/scalarization in 1446 /// vectorizer passes. 1449unsignedIndex = -1, Value *Op0 =
nullptr,
1450 Value *Op1 =
nullptr)
const;
1452 /// \return The expected cost of vector Insert and Extract. 1453 /// Use -1 to indicate that there is no information on the index value. 1454 /// This is used when the instruction is not available; a typical use 1455 /// case is to provision the cost of vectorization/scalarization in 1456 /// vectorizer passes. 1457 /// \param ScalarUserAndIdx encodes the information about extracts from a 1458 /// vector with 'Scalar' being the value being extracted,'User' being the user 1459 /// of the extract(nullptr if user is not known before vectorization) and 1460 /// 'Idx' being the extract lane. 1464 ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx)
const;
1466 /// \return The expected cost of vector Insert and Extract. 1467 /// This is used when instruction is available, and implementation 1468 /// asserts 'I' is not nullptr. 1470 /// A typical suitable use case is cost estimation when vector instruction 1471 /// exists (e.g., from basic blocks during transformation). 1474unsignedIndex = -1)
const;
1476 /// \return The cost of replication shuffle of \p VF elements typed \p EltTy 1477 /// \p ReplicationFactor times. 1479 /// For example, the mask for \p ReplicationFactor=3 and \p VF=4 is: 1480 /// <0,0,0,1,1,1,2,2,2,3,3,3> 1483const APInt &DemandedDstElts,
1486 /// \return The cost of Load and Store instructions. 1492const Instruction *
I =
nullptr)
const;
1494 /// \return The cost of VP Load and Store instructions. 1501 /// \return The cost of masked Load and Store instructions. 1506 /// \return The cost of Gather or Scatter operation 1507 /// \p Opcode - is a type of memory access Load or Store 1508 /// \p DataTy - a vector type of the data to be loaded or stored 1509 /// \p Ptr - pointer [or vector of pointers] - address[es] in memory 1510 /// \p VariableMask - true when the memory access is predicated with a mask 1511 /// that is not a compile-time constant 1512 /// \p Alignment - alignment of single element 1513 /// \p I - the optional original context instruction, if one exists, e.g. the 1514 /// load/store to transform or the call to the gather/scatter intrinsic 1516unsigned Opcode,
Type *DataTy,
constValue *
Ptr,
bool VariableMask,
1520 /// \return The cost of strided memory operations. 1521 /// \p Opcode - is a type of memory access Load or Store 1522 /// \p DataTy - a vector type of the data to be loaded or stored 1523 /// \p Ptr - pointer [or vector of pointers] - address[es] in memory 1524 /// \p VariableMask - true when the memory access is predicated with a mask 1525 /// that is not a compile-time constant 1526 /// \p Alignment - alignment of single element 1527 /// \p I - the optional original context instruction, if one exists, e.g. the 1528 /// load/store to transform or the call to the gather/scatter intrinsic 1530unsigned Opcode,
Type *DataTy,
constValue *
Ptr,
bool VariableMask,
1534 /// \return The cost of the interleaved memory operation. 1535 /// \p Opcode is the memory operation code 1536 /// \p VecTy is the vector type of the interleaved access. 1537 /// \p Factor is the interleave factor 1538 /// \p Indices is the indices for interleaved load members (as interleaved 1539 /// load allows gaps) 1540 /// \p Alignment is the alignment of the memory operation 1541 /// \p AddressSpace is address space of the pointer. 1542 /// \p UseMaskForCond indicates if the memory access is predicated. 1543 /// \p UseMaskForGaps indicates if gaps should be masked. 1548bool UseMaskForCond =
false,
bool UseMaskForGaps =
false)
const;
1550 /// A helper function to determine the type of reduction algorithm used 1551 /// for a given \p Opcode and set of FastMathFlags \p FMF. 1553return FMF && !(*FMF).allowReassoc();
1556 /// Calculate the cost of vector reduction intrinsics. 1558 /// This is the cost of reducing the vector value of type \p Ty to a scalar 1559 /// value using the operation denoted by \p Opcode. The FastMathFlags 1560 /// parameter \p FMF indicates what type of reduction we are performing: 1561 /// 1. Tree-wise. This is the typical 'fast' reduction performed that 1562 /// involves successively splitting a vector into half and doing the 1563 /// operation on the pair of halves until you have a scalar value. For 1565 /// (v0, v1, v2, v3) 1566 /// ((v0+v2), (v1+v3), undef, undef) 1567 /// ((v0+v2+v1+v3), undef, undef, undef) 1568 /// This is the default behaviour for integer operations, whereas for 1569 /// floating point we only do this if \p FMF indicates that 1570 /// reassociation is allowed. 1571 /// 2. Ordered. For a vector with N elements this involves performing N 1572 /// operations in lane order, starting with an initial scalar value, i.e. 1573 /// result = InitVal + v0 1574 /// result = result + v1 1575 /// result = result + v2 1576 /// result = result + v3 1577 /// This is only the case for FP operations and when reassociation is not 1581unsigned Opcode,
VectorType *Ty, std::optional<FastMathFlags> FMF,
1588 /// Calculate the cost of an extended reduction pattern, similar to 1589 /// getArithmeticReductionCost of an Add reduction with multiply and optional 1590 /// extensions. This is the cost of as: 1591 /// ResTy vecreduce.add(mul (A, B)). 1592 /// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)). 1597 /// Calculate the cost of an extended reduction pattern, similar to 1598 /// getArithmeticReductionCost of a reduction with an extension. 1599 /// This is the cost of as: 1600 /// ResTy vecreduce.opcode(ext(Ty A)). 1606 /// \returns The cost of Intrinsic instructions. Analyses the real arguments. 1607 /// Three cases are handled: 1. scalar instruction 2. vector instruction 1608 /// 3. scalar instruction which is to be vectorized. 1612 /// \returns The cost of Call instructions. 1617 /// \returns The number of pieces into which the provided type must be 1618 /// split during legalization. Zero is returned when the answer is unknown. 1621 /// \returns The cost of the address computation. For most targets this can be 1622 /// merged into the instruction indexing mode. Some targets might want to 1623 /// distinguish between address computation for memory operations on vector 1624 /// types and scalar types. Such targets should override this function. 1625 /// The 'SE' parameter holds pointer for the scalar evolution object which 1626 /// is used in order to get the Ptr step value in case of constant stride. 1627 /// The 'Ptr' parameter holds SCEV of the access pointer. 1632 /// \returns The cost, if any, of keeping values of the given types alive 1633 /// over a callsite. 1635 /// Some types may require the use of register classes that do not have 1636 /// any callee-saved registers, so would require a spill and fill. 1639 /// \returns True if the intrinsic is a supported memory intrinsic. Info 1640 /// will contain additional information - whether the intrinsic may write 1641 /// or read to memory, volatility and the pointer. Info is undefined 1642 /// if false is returned. 1645 /// \returns The maximum element size, in bytes, for an element 1646 /// unordered-atomic memory intrinsic. 1649 /// \returns A value which is the result of the given memory intrinsic. New 1650 /// instructions may be created to extract the result from the given intrinsic 1651 /// memory operation. Returns nullptr if the target cannot create a result 1652 /// from the given intrinsic. 1654Type *ExpectedType)
const;
1656 /// \returns The type to use in a loop expansion of a memcpy call. 1659unsigned DestAddrSpace,
Align SrcAlign,
Align DestAlign,
1660 std::optional<uint32_t> AtomicElementSize = std::nullopt)
const;
1662 /// \param[out] OpsOut The operand types to copy RemainingBytes of memory. 1663 /// \param RemainingBytes The number of bytes to copy. 1665 /// Calculates the operand types to use when copying \p RemainingBytes of 1666 /// memory, where source and destination alignments are \p SrcAlign and 1667 /// \p DestAlign respectively. 1670unsigned RemainingBytes,
unsigned SrcAddrSpace,
unsigned DestAddrSpace,
1672 std::optional<uint32_t> AtomicCpySize = std::nullopt)
const;
1674 /// \returns True if the two functions have compatible attributes for inlining 1679 /// Returns a penalty for invoking call \p Call in \p F. 1680 /// For example, if a function F calls a function G, which in turn calls 1681 /// function H, then getInlineCallPenalty(F, H()) would return the 1682 /// penalty of calling H from F, e.g. after inlining G into F. 1683 /// \p DefaultCallPenalty is passed to give a default penalty that 1684 /// the target can amend or override. 1686unsigned DefaultCallPenalty)
const;
1688 /// \returns True if the caller and callee agree on how \p Types will be 1689 /// passed to or returned from the callee. 1691 /// \param Types List of types to check. 1695 /// The type of load/store indexing. 1704 /// \returns True if the specified indexed load for the given type is legal. 1707 /// \returns True if the specified indexed store for the given type is legal. 1710 /// \returns The bitwidth of the largest vector type that should be used to 1711 /// load/store in the given address space. 1714 /// \returns True if the load instruction is legal to vectorize. 1717 /// \returns True if the store instruction is legal to vectorize. 1720 /// \returns True if it is legal to vectorize the given load chain. 1722unsigned AddrSpace)
const;
1724 /// \returns True if it is legal to vectorize the given store chain. 1726unsigned AddrSpace)
const;
1728 /// \returns True if it is legal to vectorize the given reduction kind. 1732 /// \returns True if the given type is supported for scalable vectors 1735 /// \returns The new vector factor value if the target doesn't support \p 1736 /// SizeInBytes loads or has a better vector factor. 1738unsigned ChainSizeInBytes,
1741 /// \returns The new vector factor value if the target doesn't support \p 1742 /// SizeInBytes stores or has a better vector factor. 1744unsigned ChainSizeInBytes,
1747 /// Flags describing the kind of vector reduction. 1751false;
///< If the op a min/max kind, true if it's a max operation. 1752boolIsSigned =
false;
///< Whether the operation is a signed int reduction. 1754false;
///< If op is an fp min/max, whether NaNs may be present. 1757 /// \returns True if the targets prefers fixed width vectorization if the 1758 /// loop vectorizer's cost-model assigns an equal cost to the fixed and 1759 /// scalable version of the vectorized loop. 1762 /// \returns True if the target prefers reductions in loop. 1766 /// \returns True if the target prefers reductions select kept in the loop 1767 /// when tail folding. i.e. 1771 /// s = select (mask, a, p) 1772 /// vecreduce.add(s) 1774 /// As opposed to the normal scheme of p = phi (0, a) which allows the select 1775 /// to be pulled out of the loop. If the select(.., add, ..) can be predicated 1776 /// by the target, this can lead to cleaner code generation. 1780 /// Return true if the loop vectorizer should consider vectorizing an 1781 /// otherwise scalar epilogue loop. 1784 /// \returns True if the target wants to expand the given reduction intrinsic 1785 /// into a shuffle sequence. 1790 /// \returns The shuffle sequence pattern used to expand the given reduction 1795 /// \returns the size cost of rematerializing a GlobalValue address relative 1796 /// to a stack reload. 1799 /// \returns the lower bound of a trip count to decide on vectorization 1800 /// while tail-folding. 1803 /// \returns True if the target supports scalable vectors. 1806 /// \return true when scalable vectorization is preferred. 1809 /// \name Vector Predication Information 1811 /// Whether the target supports the %evl parameter of VP intrinsic efficiently 1812 /// in hardware, for the given opcode and type/alignment. (see LLVM Language 1813 /// Reference - "Vector Predication Intrinsics"). 1814 /// Use of %evl is discouraged when that is not the case. 1816Align Alignment)
const;
1818 /// Return true if sinking I's operands to the same basic block as I is 1819 /// profitable, e.g. because the operands can be folded into a target 1820 /// instruction during instruction selection. After calling the function 1821 /// \p Ops contains the Uses to sink ordered by dominance (dominating users 1826 /// Return true if it's significantly cheaper to shift a vector by a uniform 1827 /// scalar than by an amount which will vary across each lane. On x86 before 1828 /// AVX2 for example, there is a "psllw" instruction for the former case, but 1829 /// no simple instruction for a general "a << b" operation on vectors. 1830 /// This should also apply to lowering for vector funnel shifts (rotates). 1835// keep the predicating parameter 1837// where legal, discard the predicate parameter 1839// transform into something else that is also predicating 1843// How to transform the EVL parameter. 1844// Legal: keep the EVL parameter as it is. 1845// Discard: Ignore the EVL parameter where it is safe to do so. 1846// Convert: Fold the EVL into the mask parameter. 1849// How to transform the operator. 1850// Legal: The target supports this operator. 1851// Convert: Convert this to a non-VP operation. 1852// The 'Discard' strategy is invalid. 1862 /// \returns How the target needs this vector-predicated operation to be 1867 /// \returns Whether a 32-bit branch instruction is available in Arm or Thumb 1870 /// Used by the LowerTypeTests pass, which constructs an IR inline assembler 1871 /// node containing a jump table in a format suitable for the target, so it 1872 /// needs to know what format of jump table it can legally use. 1874 /// For non-Arm targets, this function isn't used. It defaults to returning 1875 /// false, but it shouldn't matter what it returns anyway. 1878 /// Returns a bitmask constructed from the target-features or fmv-features 1879 /// metadata of a function. 1882 /// Returns true if this is an instance of a function with multiple versions. 1885 /// \return The maximum number of function arguments the target supports. 1888 /// \return For an array of given Size, return alignment boundary to 1889 /// pad to. Default is no padding. 1895 /// The abstract base class used to type erase specific TTI 1896 /// implementations. 1899 /// The template model for the base class which wraps a concrete 1900 /// implementation in a type erased interface. 1901template <
typename T>
classModel;
1903 std::unique_ptr<Concept> TTIImpl;
1951virtual std::pair<const Value *, unsigned>
1955Value *NewV)
const = 0;
1974KnownBits & Known,
bool &KnownBitsComputed) = 0;
1979 SimplifyAndSetOp) = 0;
1984 int64_t BaseOffset,
bool HasBaseReg,
1985 int64_t Scale,
unsigned AddrSpace,
1987 int64_t ScalableOffset) = 0;
2008Align Alignment) = 0;
2010Align Alignment) = 0;
2016unsigned AddrSpace) = 0;
2028bool HasBaseReg, int64_t Scale,
2029unsigned AddrSpace) = 0;
2042unsigned ScalarOpdIdx) = 0;
2093Type *Ty =
nullptr)
const = 0;
2103bool IsScalable)
const = 0;
2106Type *ScalarValTy)
const = 0;
2108constInstruction &
I,
bool &AllowPromotionWithoutCommonHeader) = 0;
2115 /// \return How much before a load we should place the prefetch 2116 /// instruction. This is currently measured in number of 2120 /// \return Some HW prefetchers can handle accesses up to a certain 2121 /// constant stride. This is the minimum stride in bytes where it 2122 /// makes sense to start adding SW prefetches. The default is 1, 2123 /// i.e. prefetch with any stride. Sometimes prefetching is beneficial 2124 /// even below the HW prefetcher limit, and the arguments provided are 2125 /// meant to serve as a basis for deciding this for a particular loop. 2127unsigned NumStridedMemAccesses,
2128unsigned NumPrefetches,
2129bool HasCall)
const = 0;
2131 /// \return The maximum number of iterations to prefetch ahead. If 2132 /// the required number of iterations is more than this number, no 2133 /// prefetching is performed. 2136 /// \return True if prefetching should also be done for writes. 2139 /// \return if target want to issue a prefetch in address space \p AS. 2142 /// \return The cost of a partial reduction, which is a reduction from a 2143 /// vector to another vector with fewer elements of larger size. They are 2144 /// represented by the llvm.experimental.partial.reduce.add intrinsic, which 2145 /// takes an accumulator and a binary operation operand that itself is fed by 2146 /// two extends. An example of an operation that uses a partial reduction is a 2147 /// dot product, which reduces two vectors to another of 4 times fewer and 4 2148 /// times larger elements. 2154 std::optional<unsigned> BinOp)
const = 0;
2162VectorType *VecTy,
unsigned Opcode0,
unsigned Opcode1,
2190 /// \param ScalarUserAndIdx encodes the information about extracts from a 2191 /// vector with 'Scalar' being the value being extracted,'User' being the user 2192 /// of the extract(nullptr if user is not known before vectorization) and 2193 /// 'Idx' being the extract lane. 2197ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) = 0;
2205constAPInt &DemandedDstElts,
2223bool VariableMask,
Align Alignment,
2228bool VariableMask,
Align Alignment,
2235bool UseMaskForCond =
false,
bool UseMaskForGaps =
false) = 0;
2238 std::optional<FastMathFlags> FMF,
2265Type *ExpectedType) = 0;
2268unsigned DestAddrSpace,
Align SrcAlign,
Align DestAlign,
2269 std::optional<uint32_t> AtomicElementSize)
const = 0;
2273unsigned RemainingBytes,
unsigned SrcAddrSpace,
unsigned DestAddrSpace,
2275 std::optional<uint32_t> AtomicCpySize)
const = 0;
2279unsigned DefaultCallPenalty)
const = 0;
2290unsigned AddrSpace)
const = 0;
2293unsigned AddrSpace)
const = 0;
2298unsigned ChainSizeInBytes,
2301unsigned ChainSizeInBytes,
2318Align Alignment)
const = 0;
2334template <
typename T>
2339 Model(
T Impl) : Impl(std::move(Impl)) {}
2340 ~Model()
override =
default;
2342constDataLayout &getDataLayout()
const override{
2343return Impl.getDataLayout();
2347 getGEPCost(Type *PointeeType,
const Value *
Ptr,
2348 ArrayRef<const Value *>
Operands, Type *AccessType,
2352 InstructionCost getPointersChainCost(ArrayRef<const Value *> Ptrs,
2354const PointersChainInfo &
Info,
2359unsigned getInliningThresholdMultiplier()
const override{
2360return Impl.getInliningThresholdMultiplier();
2362unsigned adjustInliningThreshold(
const CallBase *CB)
override{
2363return Impl.adjustInliningThreshold(CB);
2365unsigned getInliningCostBenefitAnalysisSavingsMultiplier()
const override{
2366return Impl.getInliningCostBenefitAnalysisSavingsMultiplier();
2368unsigned getInliningCostBenefitAnalysisProfitableMultiplier()
const override{
2369return Impl.getInliningCostBenefitAnalysisProfitableMultiplier();
2371int getInliningLastCallToStaticBonus()
const override{
2372return Impl.getInliningLastCallToStaticBonus();
2374int getInlinerVectorBonusPercent()
const override{
2375return Impl.getInlinerVectorBonusPercent();
2377unsigned getCallerAllocaCost(
const CallBase *CB,
2378const AllocaInst *AI)
const override{
2379return Impl.getCallerAllocaCost(CB, AI);
2381 InstructionCost getMemcpyCost(
const Instruction *
I)
override{
2382return Impl.getMemcpyCost(
I);
2385uint64_t getMaxMemIntrinsicInlineSizeThreshold()
const override{
2386return Impl.getMaxMemIntrinsicInlineSizeThreshold();
2389 InstructionCost getInstructionCost(
const User *U,
2394 BranchProbability getPredictableBranchThreshold()
override{
2395return Impl.getPredictableBranchThreshold();
2397 InstructionCost getBranchMispredictPenalty()
override{
2398return Impl.getBranchMispredictPenalty();
2400bool hasBranchDivergence(
const Function *
F =
nullptr)
override{
2401return Impl.hasBranchDivergence(
F);
2403bool isSourceOfDivergence(
const Value *V)
override{
2404return Impl.isSourceOfDivergence(V);
2407bool isAlwaysUniform(
const Value *V)
override{
2408return Impl.isAlwaysUniform(V);
2411bool isValidAddrSpaceCast(
unsigned FromAS,
unsigned ToAS)
const override{
2412return Impl.isValidAddrSpaceCast(FromAS, ToAS);
2415bool addrspacesMayAlias(
unsigned AS0,
unsigned AS1)
const override{
2416return Impl.addrspacesMayAlias(AS0, AS1);
2419unsigned getFlatAddressSpace()
override{
return Impl.getFlatAddressSpace(); }
2421bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,
2423return Impl.collectFlatAddressOperands(OpIndexes, IID);
2426bool isNoopAddrSpaceCast(
unsigned FromAS,
unsigned ToAS)
const override{
2427return Impl.isNoopAddrSpaceCast(FromAS, ToAS);
2431 canHaveNonUndefGlobalInitializerInAddressSpace(
unsigned AS)
const override{
2432return Impl.canHaveNonUndefGlobalInitializerInAddressSpace(AS);
2435unsigned getAssumedAddrSpace(
const Value *V)
const override{
2436return Impl.getAssumedAddrSpace(V);
2439bool isSingleThreaded()
const override{
return Impl.isSingleThreaded(); }
2441 std::pair<const Value *, unsigned>
2442 getPredicatedAddrSpace(
const Value *V)
const override{
2443return Impl.getPredicatedAddrSpace(V);
2446Value *rewriteIntrinsicWithAddressSpace(IntrinsicInst *
II, Value *OldV,
2447 Value *NewV)
const override{
2448return Impl.rewriteIntrinsicWithAddressSpace(
II, OldV, NewV);
2451bool isLoweredToCall(
const Function *
F)
override{
2452return Impl.isLoweredToCall(
F);
2454void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
2455 UnrollingPreferences &UP,
2456 OptimizationRemarkEmitter *ORE)
override{
2457return Impl.getUnrollingPreferences(L, SE, UP, ORE);
2459void getPeelingPreferences(Loop *L, ScalarEvolution &SE,
2460 PeelingPreferences &PP)
override{
2461return Impl.getPeelingPreferences(L, SE, PP);
2463bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,
2464 AssumptionCache &AC, TargetLibraryInfo *LibInfo,
2465 HardwareLoopInfo &HWLoopInfo)
override{
2466return Impl.isHardwareLoopProfitable(L, SE, AC, LibInfo, HWLoopInfo);
2468unsigned getEpilogueVectorizationMinVF()
override{
2469return Impl.getEpilogueVectorizationMinVF();
2471bool preferPredicateOverEpilogue(TailFoldingInfo *TFI)
override{
2472return Impl.preferPredicateOverEpilogue(TFI);
2475 getPreferredTailFoldingStyle(
bool IVUpdateMayOverflow =
true)
override{
2476return Impl.getPreferredTailFoldingStyle(IVUpdateMayOverflow);
2478 std::optional<Instruction *>
2479 instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &
II)
override{
2480return Impl.instCombineIntrinsic(IC,
II);
2482 std::optional<Value *>
2483 simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &
II,
2484 APInt DemandedMask, KnownBits &Known,
2485bool &KnownBitsComputed)
override{
2486return Impl.simplifyDemandedUseBitsIntrinsic(IC,
II, DemandedMask, Known,
2489 std::optional<Value *> simplifyDemandedVectorEltsIntrinsic(
2490 InstCombiner &IC, IntrinsicInst &
II, APInt DemandedElts, APInt &UndefElts,
2491 APInt &UndefElts2, APInt &UndefElts3,
2492 std::function<
void(Instruction *,
unsigned, APInt, APInt &)>
2493 SimplifyAndSetOp)
override{
2494return Impl.simplifyDemandedVectorEltsIntrinsic(
2495 IC,
II, DemandedElts, UndefElts, UndefElts2, UndefElts3,
2498bool isLegalAddImmediate(int64_t Imm)
override{
2499return Impl.isLegalAddImmediate(Imm);
2501bool isLegalAddScalableImmediate(int64_t Imm)
override{
2502return Impl.isLegalAddScalableImmediate(Imm);
2504bool isLegalICmpImmediate(int64_t Imm)
override{
2505return Impl.isLegalICmpImmediate(Imm);
2507bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset,
2508bool HasBaseReg, int64_t Scale,
unsigned AddrSpace,
2509 Instruction *
I, int64_t ScalableOffset)
override{
2510return Impl.isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, Scale,
2511 AddrSpace,
I, ScalableOffset);
2513bool isLSRCostLess(
const TargetTransformInfo::LSRCost &C1,
2514const TargetTransformInfo::LSRCost &C2)
override{
2515return Impl.isLSRCostLess(C1, C2);
2517bool isNumRegsMajorCostOfLSR()
override{
2518return Impl.isNumRegsMajorCostOfLSR();
2520bool shouldDropLSRSolutionIfLessProfitable()
const override{
2521return Impl.shouldDropLSRSolutionIfLessProfitable();
2523bool isProfitableLSRChainElement(Instruction *
I)
override{
2524return Impl.isProfitableLSRChainElement(
I);
2526bool canMacroFuseCmp()
override{
return Impl.canMacroFuseCmp(); }
2527bool canSaveCmp(Loop *L, BranchInst **BI, ScalarEvolution *SE, LoopInfo *LI,
2528 DominatorTree *DT, AssumptionCache *AC,
2529 TargetLibraryInfo *LibInfo)
override{
2530return Impl.canSaveCmp(L, BI, SE, LI, DT, AC, LibInfo);
2533 getPreferredAddressingMode(
const Loop *L,
2534 ScalarEvolution *SE)
const override{
2535return Impl.getPreferredAddressingMode(L, SE);
2537bool isLegalMaskedStore(Type *DataType, Align Alignment)
override{
2538return Impl.isLegalMaskedStore(DataType, Alignment);
2540bool isLegalMaskedLoad(Type *DataType, Align Alignment)
override{
2541return Impl.isLegalMaskedLoad(DataType, Alignment);
2543bool isLegalNTStore(Type *DataType, Align Alignment)
override{
2544return Impl.isLegalNTStore(DataType, Alignment);
2546bool isLegalNTLoad(Type *DataType, Align Alignment)
override{
2547return Impl.isLegalNTLoad(DataType, Alignment);
2549bool isLegalBroadcastLoad(Type *ElementTy,
2550 ElementCount NumElements)
const override{
2551return Impl.isLegalBroadcastLoad(ElementTy, NumElements);
2553bool isLegalMaskedScatter(Type *DataType, Align Alignment)
override{
2554return Impl.isLegalMaskedScatter(DataType, Alignment);
2556bool isLegalMaskedGather(Type *DataType, Align Alignment)
override{
2557return Impl.isLegalMaskedGather(DataType, Alignment);
2559bool forceScalarizeMaskedGather(
VectorType *DataType,
2560 Align Alignment)
override{
2561return Impl.forceScalarizeMaskedGather(DataType, Alignment);
2563bool forceScalarizeMaskedScatter(
VectorType *DataType,
2564 Align Alignment)
override{
2565return Impl.forceScalarizeMaskedScatter(DataType, Alignment);
2567bool isLegalMaskedCompressStore(Type *DataType, Align Alignment)
override{
2568return Impl.isLegalMaskedCompressStore(DataType, Alignment);
2570bool isLegalMaskedExpandLoad(Type *DataType, Align Alignment)
override{
2571return Impl.isLegalMaskedExpandLoad(DataType, Alignment);
2573bool isLegalStridedLoadStore(Type *DataType, Align Alignment)
override{
2574return Impl.isLegalStridedLoadStore(DataType, Alignment);
2576bool isLegalInterleavedAccessType(
VectorType *VTy,
unsigned Factor,
2578unsigned AddrSpace)
override{
2579return Impl.isLegalInterleavedAccessType(VTy, Factor, Alignment, AddrSpace);
2581bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType)
override{
2582return Impl.isLegalMaskedVectorHistogram(AddrType, DataType);
2584bool isLegalAltInstr(
VectorType *VecTy,
unsigned Opcode0,
unsigned Opcode1,
2585const SmallBitVector &OpcodeMask)
const override{
2586return Impl.isLegalAltInstr(VecTy, Opcode0, Opcode1, OpcodeMask);
2588bool enableOrderedReductions()
override{
2589return Impl.enableOrderedReductions();
2591bool hasDivRemOp(Type *DataType,
bool IsSigned)
override{
2592return Impl.hasDivRemOp(DataType, IsSigned);
2594bool hasVolatileVariant(Instruction *
I,
unsigned AddrSpace)
override{
2595return Impl.hasVolatileVariant(
I, AddrSpace);
2597bool prefersVectorizedAddressing()
override{
2598return Impl.prefersVectorizedAddressing();
2600 InstructionCost getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,
2601 StackOffset BaseOffset,
bool HasBaseReg,
2603unsigned AddrSpace)
override{
2604return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg, Scale,
2607bool LSRWithInstrQueries()
override{
return Impl.LSRWithInstrQueries(); }
2608bool isTruncateFree(Type *Ty1, Type *Ty2)
override{
2609return Impl.isTruncateFree(Ty1, Ty2);
2611bool isProfitableToHoist(Instruction *
I)
override{
2612return Impl.isProfitableToHoist(
I);
2614bool useAA()
override{
return Impl.useAA(); }
2615bool isTypeLegal(Type *Ty)
override{
return Impl.isTypeLegal(Ty); }
2616unsigned getRegUsageForType(Type *Ty)
override{
2617return Impl.getRegUsageForType(Ty);
2619bool shouldBuildLookupTables()
override{
2620return Impl.shouldBuildLookupTables();
2622bool shouldBuildLookupTablesForConstant(Constant *
C)
override{
2623return Impl.shouldBuildLookupTablesForConstant(
C);
2625bool shouldBuildRelLookupTables()
override{
2626return Impl.shouldBuildRelLookupTables();
2628bool useColdCCForColdCall(Function &
F)
override{
2629return Impl.useColdCCForColdCall(
F);
2632return Impl.isTargetIntrinsicTriviallyScalarizable(
ID);
2636unsigned ScalarOpdIdx)
override{
2637return Impl.isTargetIntrinsicWithScalarOpAtArg(
ID, ScalarOpdIdx);
2641int OpdIdx)
override{
2642return Impl.isTargetIntrinsicWithOverloadTypeAtArg(
ID, OpdIdx);
2646int RetIdx)
override{
2647return Impl.isTargetIntrinsicWithStructReturnOverloadAtField(
ID, RetIdx);
2650 InstructionCost getScalarizationOverhead(
VectorType *Ty,
2651const APInt &DemandedElts,
2652bool Insert,
bool Extract,
2654 ArrayRef<Value *> VL = {})
override {
2655return Impl.getScalarizationOverhead(Ty, DemandedElts, Insert, Extract,
2659 getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
2660 ArrayRef<Type *> Tys,
2662return Impl.getOperandsScalarizationOverhead(Args, Tys,
CostKind);
2665bool supportsEfficientVectorElementLoadStore()
override{
2666return Impl.supportsEfficientVectorElementLoadStore();
2669bool supportsTailCalls()
override{
return Impl.supportsTailCalls(); }
2670bool supportsTailCallFor(
const CallBase *CB)
override{
2671return Impl.supportsTailCallFor(CB);
2674bool enableAggressiveInterleaving(
bool LoopHasReductions)
override{
2675return Impl.enableAggressiveInterleaving(LoopHasReductions);
2677 MemCmpExpansionOptions enableMemCmpExpansion(
bool OptSize,
2678bool IsZeroCmp)
const override{
2679return Impl.enableMemCmpExpansion(OptSize, IsZeroCmp);
2681bool enableSelectOptimize()
override{
2682return Impl.enableSelectOptimize();
2684bool shouldTreatInstructionLikeSelect(
const Instruction *
I)
override{
2685return Impl.shouldTreatInstructionLikeSelect(
I);
2687bool enableInterleavedAccessVectorization()
override{
2688return Impl.enableInterleavedAccessVectorization();
2690bool enableMaskedInterleavedAccessVectorization()
override{
2691return Impl.enableMaskedInterleavedAccessVectorization();
2693bool isFPVectorizationPotentiallyUnsafe()
override{
2694return Impl.isFPVectorizationPotentiallyUnsafe();
2696bool allowsMisalignedMemoryAccesses(LLVMContext &Context,
unsignedBitWidth,
2698unsigned *
Fast)
override{
2703return Impl.getPopcntSupport(IntTyWidthInBit);
2705bool haveFastSqrt(Type *Ty)
override{
return Impl.haveFastSqrt(Ty); }
2707bool isExpensiveToSpeculativelyExecute(
const Instruction*
I)
override{
2708return Impl.isExpensiveToSpeculativelyExecute(
I);
2711bool isFCmpOrdCheaperThanFCmpZero(Type *Ty)
override{
2712return Impl.isFCmpOrdCheaperThanFCmpZero(Ty);
2715 InstructionCost getFPOpCost(Type *Ty)
override{
2716return Impl.getFPOpCost(Ty);
2719 InstructionCost getIntImmCodeSizeCost(
unsigned Opc,
unsignedIdx,
2720const APInt &Imm, Type *Ty)
override{
2721return Impl.getIntImmCodeSizeCost(Opc,
Idx, Imm, Ty);
2723 InstructionCost getIntImmCost(
const APInt &Imm, Type *Ty,
2725return Impl.getIntImmCost(Imm, Ty,
CostKind);
2727 InstructionCost getIntImmCostInst(
unsigned Opc,
unsignedIdx,
2728const APInt &Imm, Type *Ty,
2730 Instruction *Inst =
nullptr)
override{
2731return Impl.getIntImmCostInst(Opc,
Idx, Imm, Ty,
CostKind, Inst);
2734const APInt &Imm, Type *Ty,
2736return Impl.getIntImmCostIntrin(IID,
Idx, Imm, Ty,
CostKind);
2738bool preferToKeepConstantsAttached(
const Instruction &Inst,
2739const Function &Fn)
const override{
2740return Impl.preferToKeepConstantsAttached(Inst, Fn);
2742unsigned getNumberOfRegisters(
unsigned ClassID)
const override{
2743return Impl.getNumberOfRegisters(ClassID);
2745bool hasConditionalLoadStoreForType(Type *Ty =
nullptr)
const override{
2746return Impl.hasConditionalLoadStoreForType(Ty);
2748unsigned getRegisterClassForType(
boolVector,
2749 Type *Ty =
nullptr)
const override{
2750return Impl.getRegisterClassForType(
Vector, Ty);
2752constchar *getRegisterClassName(
unsigned ClassID)
const override{
2753return Impl.getRegisterClassName(ClassID);
2755 TypeSize getRegisterBitWidth(
RegisterKind K)
const override{
2756return Impl.getRegisterBitWidth(K);
2758unsigned getMinVectorRegisterBitWidth()
const override{
2759return Impl.getMinVectorRegisterBitWidth();
2762return Impl.getMaxVScale();
2764 std::optional<unsigned> getVScaleForTuning()
const override{
2765return Impl.getVScaleForTuning();
2767bool isVScaleKnownToBeAPowerOfTwo()
const override{
2768return Impl.isVScaleKnownToBeAPowerOfTwo();
2770bool shouldMaximizeVectorBandwidth(
2772return Impl.shouldMaximizeVectorBandwidth(K);
2774 ElementCount getMinimumVF(
unsigned ElemWidth,
2775bool IsScalable)
const override{
2776return Impl.getMinimumVF(ElemWidth, IsScalable);
2778unsigned getMaximumVF(
unsigned ElemWidth,
unsigned Opcode)
const override{
2779return Impl.getMaximumVF(ElemWidth, Opcode);
2781unsigned getStoreMinimumVF(
unsigned VF, Type *ScalarMemTy,
2782 Type *ScalarValTy)
const override{
2783return Impl.getStoreMinimumVF(VF, ScalarMemTy, ScalarValTy);
2785bool shouldConsiderAddressTypePromotion(
2786const Instruction &
I,
bool &AllowPromotionWithoutCommonHeader)
override{
2787return Impl.shouldConsiderAddressTypePromotion(
2788I, AllowPromotionWithoutCommonHeader);
2790unsigned getCacheLineSize()
const override{
return Impl.getCacheLineSize(); }
2791 std::optional<unsigned> getCacheSize(
CacheLevel Level)
const override{
2792return Impl.getCacheSize(Level);
2794 std::optional<unsigned>
2795 getCacheAssociativity(
CacheLevel Level)
const override{
2796return Impl.getCacheAssociativity(Level);
2799 std::optional<unsigned> getMinPageSize()
const override{
2800return Impl.getMinPageSize();
2803 /// Return the preferred prefetch distance in terms of instructions. 2805unsigned getPrefetchDistance()
const override{
2806return Impl.getPrefetchDistance();
2809 /// Return the minimum stride necessary to trigger software 2812unsigned getMinPrefetchStride(
unsigned NumMemAccesses,
2813unsigned NumStridedMemAccesses,
2814unsigned NumPrefetches,
2815bool HasCall)
const override{
2816return Impl.getMinPrefetchStride(NumMemAccesses, NumStridedMemAccesses,
2817 NumPrefetches, HasCall);
2820 /// Return the maximum prefetch distance in terms of loop 2823unsigned getMaxPrefetchIterationsAhead()
const override{
2824return Impl.getMaxPrefetchIterationsAhead();
2827 /// \return True if prefetching should also be done for writes. 2828bool enableWritePrefetching()
const override{
2829return Impl.enableWritePrefetching();
2832 /// \return if target want to issue a prefetch in address space \p AS. 2833bool shouldPrefetchAddressSpace(
unsigned AS)
const override{
2834return Impl.shouldPrefetchAddressSpace(AS);
2837 InstructionCost getPartialReductionCost(
2838unsigned Opcode, Type *InputTypeA, Type *InputTypeB, Type *AccumType,
2841 std::optional<unsigned> BinOp = std::nullopt)
const override{
2842return Impl.getPartialReductionCost(Opcode, InputTypeA, InputTypeB,
2843 AccumType, VF, OpAExtend, OpBExtend,
2847unsigned getMaxInterleaveFactor(ElementCount VF)
override{
2848return Impl.getMaxInterleaveFactor(VF);
2850unsigned getEstimatedNumberOfCaseClusters(
const SwitchInst &SI,
2852 ProfileSummaryInfo *PSI,
2853 BlockFrequencyInfo *BFI)
override{
2854return Impl.getEstimatedNumberOfCaseClusters(SI, JTSize, PSI, BFI);
2856 InstructionCost getArithmeticInstrCost(
2858 OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,
2859 ArrayRef<const Value *> Args,
2860const Instruction *CxtI =
nullptr)
override{
2861return Impl.getArithmeticInstrCost(Opcode, Ty,
CostKind, Opd1Info, Opd2Info,
2864 InstructionCost getAltInstrCost(
VectorType *VecTy,
unsigned Opcode0,
2866const SmallBitVector &OpcodeMask,
2868return Impl.getAltInstrCost(VecTy, Opcode0, Opcode1, OpcodeMask,
CostKind);
2875 ArrayRef<const Value *> Args,
2876const Instruction *CxtI)
override{
2877return Impl.getShuffleCost(Kind, Tp, Mask,
CostKind,
Index, SubTp, Args,
2880 InstructionCost getCastInstrCost(
unsigned Opcode, Type *Dst, Type *Src,
2883const Instruction *
I)
override{
2884return Impl.getCastInstrCost(Opcode, Dst, Src, CCH,
CostKind,
I);
2886 InstructionCost getExtractWithExtendCost(
unsigned Opcode, Type *Dst,
2888unsignedIndex)
override{
2889return Impl.getExtractWithExtendCost(Opcode, Dst, VecTy,
Index);
2892const Instruction *
I =
nullptr)
override{
2893return Impl.getCFInstrCost(Opcode,
CostKind,
I);
2895 InstructionCost getCmpSelInstrCost(
unsigned Opcode, Type *ValTy, Type *CondTy,
2898 OperandValueInfo Op1Info,
2899 OperandValueInfo Op2Info,
2900const Instruction *
I)
override{
2901return Impl.getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred,
CostKind,
2902 Op1Info, Op2Info,
I);
2904 InstructionCost getVectorInstrCost(
unsigned Opcode, Type *Val,
2906unsignedIndex, Value *Op0,
2907 Value *Op1)
override{
2908return Impl.getVectorInstrCost(Opcode, Val,
CostKind,
Index, Op0, Op1);
2910 InstructionCost getVectorInstrCost(
2913 ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx)
override{
2914return Impl.getVectorInstrCost(Opcode, Val,
CostKind,
Index, Scalar,
2917 InstructionCost getVectorInstrCost(
const Instruction &
I, Type *Val,
2919unsignedIndex)
override{
2923 getReplicationShuffleCost(Type *EltTy,
int ReplicationFactor,
int VF,
2924const APInt &DemandedDstElts,
2926return Impl.getReplicationShuffleCost(EltTy, ReplicationFactor, VF,
2929 InstructionCost getMemoryOpCost(
unsigned Opcode, Type *Src, Align Alignment,
2932 OperandValueInfo OpInfo,
2933const Instruction *
I)
override{
2937 InstructionCost getVPMemoryOpCost(
unsigned Opcode, Type *Src, Align Alignment,
2940const Instruction *
I)
override{
2941return Impl.getVPMemoryOpCost(Opcode, Src, Alignment,
AddressSpace,
2944 InstructionCost getMaskedMemoryOpCost(
unsigned Opcode, Type *Src,
2947return Impl.getMaskedMemoryOpCost(Opcode, Src, Alignment,
AddressSpace,
2951 getGatherScatterOpCost(
unsigned Opcode, Type *DataTy,
const Value *
Ptr,
2952bool VariableMask, Align Alignment,
2954const Instruction *
I =
nullptr)
override{
2955return Impl.getGatherScatterOpCost(Opcode, DataTy,
Ptr, VariableMask,
2959 getStridedMemoryOpCost(
unsigned Opcode, Type *DataTy,
const Value *
Ptr,
2960bool VariableMask, Align Alignment,
2962const Instruction *
I =
nullptr)
override{
2963return Impl.getStridedMemoryOpCost(Opcode, DataTy,
Ptr, VariableMask,
2966 InstructionCost getInterleavedMemoryOpCost(
2967unsigned Opcode, Type *VecTy,
unsigned Factor, ArrayRef<unsigned> Indices,
2969bool UseMaskForCond,
bool UseMaskForGaps)
override{
2970return Impl.getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
2972 UseMaskForCond, UseMaskForGaps);
2975 getArithmeticReductionCost(
unsigned Opcode,
VectorType *Ty,
2976 std::optional<FastMathFlags> FMF,
2978return Impl.getArithmeticReductionCost(Opcode, Ty, FMF,
CostKind);
2983return Impl.getMinMaxReductionCost(IID, Ty, FMF,
CostKind);
2986 getExtendedReductionCost(
unsigned Opcode,
bool IsUnsigned, Type *ResTy,
2989return Impl.getExtendedReductionCost(Opcode, IsUnsigned, ResTy, Ty, FMF,
2993 getMulAccReductionCost(
bool IsUnsigned, Type *ResTy,
VectorType *Ty,
2995return Impl.getMulAccReductionCost(IsUnsigned, ResTy, Ty,
CostKind);
2997 InstructionCost getIntrinsicInstrCost(
const IntrinsicCostAttributes &ICA,
2999return Impl.getIntrinsicInstrCost(ICA,
CostKind);
3001 InstructionCost getCallInstrCost(Function *
F, Type *
RetTy,
3002 ArrayRef<Type *> Tys,
3006unsigned getNumberOfParts(Type *Tp)
override{
3007return Impl.getNumberOfParts(Tp);
3009 InstructionCost getAddressComputationCost(Type *Ty, ScalarEvolution *SE,
3010const SCEV *
Ptr)
override{
3011return Impl.getAddressComputationCost(Ty, SE,
Ptr);
3013 InstructionCost getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys)
override{
3014return Impl.getCostOfKeepingLiveOverCall(Tys);
3016bool getTgtMemIntrinsic(IntrinsicInst *Inst,
3017 MemIntrinsicInfo &
Info)
override{
3018return Impl.getTgtMemIntrinsic(Inst,
Info);
3020unsigned getAtomicMemIntrinsicMaxElementSize()
const override{
3021return Impl.getAtomicMemIntrinsicMaxElementSize();
3023Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst,
3024 Type *ExpectedType)
override{
3025return Impl.getOrCreateResultFromMemIntrinsic(Inst, ExpectedType);
3027Type *getMemcpyLoopLoweringType(
3028 LLVMContext &Context, Value *
Length,
unsigned SrcAddrSpace,
3029unsigned DestAddrSpace, Align SrcAlign, Align DestAlign,
3030 std::optional<uint32_t> AtomicElementSize)
const override{
3031return Impl.getMemcpyLoopLoweringType(Context,
Length, SrcAddrSpace,
3032 DestAddrSpace, SrcAlign, DestAlign,
3035void getMemcpyLoopResidualLoweringType(
3036 SmallVectorImpl<Type *> &OpsOut, LLVMContext &Context,
3037unsigned RemainingBytes,
unsigned SrcAddrSpace,
unsigned DestAddrSpace,
3038 Align SrcAlign, Align DestAlign,
3039 std::optional<uint32_t> AtomicCpySize)
const override{
3040 Impl.getMemcpyLoopResidualLoweringType(OpsOut, Context, RemainingBytes,
3041 SrcAddrSpace, DestAddrSpace,
3042 SrcAlign, DestAlign, AtomicCpySize);
3045const Function *Callee)
const override{
3046return Impl.areInlineCompatible(Caller, Callee);
3048unsigned getInlineCallPenalty(
const Function *
F,
const CallBase &Call,
3049unsigned DefaultCallPenalty)
const override{
3050return Impl.getInlineCallPenalty(
F, Call, DefaultCallPenalty);
3052bool areTypesABICompatible(
const Function *Caller,
const Function *Callee,
3053const ArrayRef<Type *> &Types)
const override{
3054return Impl.areTypesABICompatible(Caller, Callee, Types);
3057return Impl.isIndexedLoadLegal(
Mode, Ty, getDataLayout());
3060return Impl.isIndexedStoreLegal(
Mode, Ty, getDataLayout());
3062unsigned getLoadStoreVecRegBitWidth(
unsigned AddrSpace)
const override{
3063return Impl.getLoadStoreVecRegBitWidth(AddrSpace);
3065bool isLegalToVectorizeLoad(LoadInst *LI)
const override{
3066return Impl.isLegalToVectorizeLoad(LI);
3068bool isLegalToVectorizeStore(StoreInst *SI)
const override{
3069return Impl.isLegalToVectorizeStore(SI);
3071bool isLegalToVectorizeLoadChain(
unsigned ChainSizeInBytes, Align Alignment,
3072unsigned AddrSpace)
const override{
3073return Impl.isLegalToVectorizeLoadChain(ChainSizeInBytes, Alignment,
3076bool isLegalToVectorizeStoreChain(
unsigned ChainSizeInBytes, Align Alignment,
3077unsigned AddrSpace)
const override{
3078return Impl.isLegalToVectorizeStoreChain(ChainSizeInBytes, Alignment,
3081bool isLegalToVectorizeReduction(
const RecurrenceDescriptor &RdxDesc,
3082 ElementCount VF)
const override{
3083return Impl.isLegalToVectorizeReduction(RdxDesc, VF);
3085bool isElementTypeLegalForScalableVector(Type *Ty)
const override{
3086return Impl.isElementTypeLegalForScalableVector(Ty);
3088unsigned getLoadVectorFactor(
unsigned VF,
unsigned LoadSize,
3089unsigned ChainSizeInBytes,
3091return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
3093unsigned getStoreVectorFactor(
unsigned VF,
unsigned StoreSize,
3094unsigned ChainSizeInBytes,
3096return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
3098bool preferFixedOverScalableIfEqualCost()
const override{
3099return Impl.preferFixedOverScalableIfEqualCost();
3101bool preferInLoopReduction(
unsigned Opcode, Type *Ty,
3102 ReductionFlags Flags)
const override{
3103return Impl.preferInLoopReduction(Opcode, Ty, Flags);
3105bool preferPredicatedReductionSelect(
unsigned Opcode, Type *Ty,
3106 ReductionFlags Flags)
const override{
3107return Impl.preferPredicatedReductionSelect(Opcode, Ty, Flags);
3109bool preferEpilogueVectorization()
const override{
3110return Impl.preferEpilogueVectorization();
3113bool shouldExpandReduction(
const IntrinsicInst *
II)
const override{
3114return Impl.shouldExpandReduction(
II);
3118 getPreferredExpandedReductionShuffle(
const IntrinsicInst *
II)
const override{
3119return Impl.getPreferredExpandedReductionShuffle(
II);
3122unsigned getGISelRematGlobalCost()
const override{
3123return Impl.getGISelRematGlobalCost();
3126unsigned getMinTripCountTailFoldingThreshold()
const override{
3127return Impl.getMinTripCountTailFoldingThreshold();
3130bool supportsScalableVectors()
const override{
3131return Impl.supportsScalableVectors();
3134bool enableScalableVectorization()
const override{
3135return Impl.enableScalableVectorization();
3138bool hasActiveVectorLength(
unsigned Opcode, Type *DataType,
3139 Align Alignment)
const override{
3140return Impl.hasActiveVectorLength(Opcode, DataType, Alignment);
3143bool isProfitableToSinkOperands(Instruction *
I,
3144 SmallVectorImpl<Use *> &Ops)
const override{
3145return Impl.isProfitableToSinkOperands(
I, Ops);
3148bool isVectorShiftByScalarCheap(Type *Ty)
const override{
3149return Impl.isVectorShiftByScalarCheap(Ty);
3153 getVPLegalizationStrategy(
const VPIntrinsic &PI)
const override{
3154return Impl.getVPLegalizationStrategy(PI);
3157bool hasArmWideBranch(
bool Thumb)
const override{
3158return Impl.hasArmWideBranch(Thumb);
3161uint64_t getFeatureMask(
const Function &
F)
const override{
3162return Impl.getFeatureMask(
F);
3165bool isMultiversionedFunction(
const Function &
F)
const override{
3166return Impl.isMultiversionedFunction(
F);
3169unsigned getMaxNumArgs()
const override{
3170return Impl.getMaxNumArgs();
3173unsigned getNumBytesToPadGlobalArray(
unsignedSize,
3179template <
typename T>
3181 : TTIImpl(new Model<
T>(Impl)) {}
3183/// Analysis pass providing the \c TargetTransformInfo. 3185/// The core idea of the TargetIRAnalysis is to expose an interface through 3186/// which LLVM targets can analyze and provide information about the middle 3187/// end's target-independent IR. This supports use cases such as target-aware 3188/// cost modeling of IR constructs. 3190/// This is a function analysis because much of the cost modeling for targets 3191/// is done in a subtarget specific way and LLVM supports compiling different 3192/// functions targeting different subtargets in order to support runtime 3193/// dispatch according to the observed subtarget. 3198 /// Default construct a target IR analysis. 3200 /// This will use the module's datalayout to construct a baseline 3201 /// conservative TTI result. 3204 /// Construct an IR analysis pass around a target-provide callback. 3206 /// The callback will be called with a particular function for which the TTI 3207 /// is needed and must return a TTI object for that function. 3210// Value semantics. We spell out the constructors for MSVC. 3212 : TTICallback(Arg.TTICallback) {}
3214 : TTICallback(
std::
move(Arg.TTICallback)) {}
3216 TTICallback =
RHS.TTICallback;
3220 TTICallback = std::move(
RHS.TTICallback);
3230 /// The callback used to produce a result. 3232 /// We use a completely opaque callback so that targets can provide whatever 3233 /// mechanism they desire for constructing the TTI for a given function. 3235 /// FIXME: Should we really use std::function? It's relatively inefficient. 3236 /// It might be possible to arrange for even stateful callbacks to outlive 3237 /// the analysis and thus use a function_ref which would be lighter weight. 3238 /// This may also be less error prone as the callback is likely to reference 3239 /// the external TargetMachine, and that reference needs to never dangle. 3242 /// Helper function used as the callback in the default constructor. 3246/// Wrapper pass for TargetTransformInfo. 3248/// This pass can be constructed from a TTI object which it stores internally 3249/// and is queried by passes. 3252 std::optional<TargetTransformInfo>
TTI;
3254virtualvoid anchor();
3259 /// We must provide a default constructor for the pass but it should 3262 /// Use the constructor below or call one of the creation routines. 3270/// Create an analysis pass wrapper around a TTI object. 3272/// This analysis pass just holds the TTI instance and makes it available to AMDGPU Lower Kernel Arguments
This file implements a class to represent arbitrary precision integral constant values and operations...
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Atomic ordering constants.
Analysis containing CSE Info
static cl::opt< TargetTransformInfo::TargetCostKind > CostKind("cost-kind", cl::desc("Target cost kind"), cl::init(TargetTransformInfo::TCK_RecipThroughput), cl::values(clEnumValN(TargetTransformInfo::TCK_RecipThroughput, "throughput", "Reciprocal throughput"), clEnumValN(TargetTransformInfo::TCK_Latency, "latency", "Instruction latency"), clEnumValN(TargetTransformInfo::TCK_CodeSize, "code-size", "Code size"), clEnumValN(TargetTransformInfo::TCK_SizeAndLatency, "size-latency", "Code size and latency")))
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
static cl::opt< bool > ForceNestedLoop("force-nested-hardware-loop", cl::Hidden, cl::init(false), cl::desc("Force allowance of nested hardware loops"))
static cl::opt< bool > ForceHardwareLoopPHI("force-hardware-loop-phi", cl::Hidden, cl::init(false), cl::desc("Force hardware loop counter to be updated through a phi"))
This header defines various interfaces for pass management in LLVM.
This file defines an InstructionCost class that is used when calculating the cost of an instruction,...
std::optional< unsigned > getMaxVScale(const Function &F, const TargetTransformInfo &TTI)
mir Rename Register Operands
uint64_t IntrinsicInst * II
static cl::opt< RegAllocEvictionAdvisorAnalysis::AdvisorMode > Mode("regalloc-enable-advisor", cl::Hidden, cl::init(RegAllocEvictionAdvisorAnalysis::AdvisorMode::Default), cl::desc("Enable regalloc advisor mode"), cl::values(clEnumValN(RegAllocEvictionAdvisorAnalysis::AdvisorMode::Default, "default", "Default"), clEnumValN(RegAllocEvictionAdvisorAnalysis::AdvisorMode::Release, "release", "precompiled"), clEnumValN(RegAllocEvictionAdvisorAnalysis::AdvisorMode::Development, "development", "for training")))
Class for arbitrary precision integers.
an instruction to allocate memory on the stack
API to communicate dependencies between analyses during invalidation.
A container for analyses that lazily runs them and caches their results.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Class to represent array types.
A cache of @llvm.assume calls within a function.
LLVM Basic Block Representation.
BlockFrequencyInfo pass uses BlockFrequencyInfoImpl implementation to estimate IR basic block frequen...
Conditional or Unconditional Branch instruction.
Base class for all callable instructions (InvokeInst and CallInst) Holds everything related to callin...
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
This is an important base class in LLVM.
A parsed version of the target data layout string in and methods for querying it.
Concrete subclass of DominatorTreeBase that is used to compute a normal dominator tree.
Convenience struct for specifying and reasoning about fast-math flags.
ImmutablePass class - This class is used to provide information that does not need to be run.
The core instruction combiner logic.
static InstructionCost getInvalid(CostType Val=0)
Class to represent integer types.
Drive the analysis of interleaved memory accesses in the loop.
FastMathFlags getFlags() const
const SmallVectorImpl< Type * > & getArgTypes() const
Type * getReturnType() const
bool skipScalarizationCost() const
const SmallVectorImpl< const Value * > & getArgs() const
InstructionCost getScalarizationCost() const
const IntrinsicInst * getInst() const
Intrinsic::ID getID() const
bool isTypeBasedOnly() const
A wrapper class for inspecting calls to intrinsic functions.
This is an important class for using LLVM in a threaded context.
An instruction for reading from memory.
LoopVectorizationLegality checks if it is legal to vectorize a loop, and to what vectorization factor...
Represents a single loop in the control flow graph.
The optimization diagnostic interface.
A set of analyses that are preserved following a run of a transformation pass.
Analysis providing profile information.
The RecurrenceDescriptor is used to identify recurrences variables in a loop.
This class represents an analyzed expression in the program.
The main scalar evolution driver.
This is a 'bitvector' (really, a variable-sized bit array), optimized for the case when the array is ...
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
An instruction for storing to memory.
Analysis pass providing the TargetTransformInfo.
TargetIRAnalysis(const TargetIRAnalysis &Arg)
TargetIRAnalysis & operator=(const TargetIRAnalysis &RHS)
Result run(const Function &F, FunctionAnalysisManager &)
TargetTransformInfo Result
TargetIRAnalysis()
Default construct a target IR analysis.
TargetIRAnalysis & operator=(TargetIRAnalysis &&RHS)
TargetIRAnalysis(TargetIRAnalysis &&Arg)
Provides information about what library functions are available for the current target.
Wrapper pass for TargetTransformInfo.
TargetTransformInfoWrapperPass()
We must provide a default constructor for the pass but it should never be used.
TargetTransformInfo & getTTI(const Function &F)
virtual bool preferFixedOverScalableIfEqualCost() const =0
virtual std::optional< Value * > simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II, APInt DemandedMask, KnownBits &Known, bool &KnownBitsComputed)=0
virtual InstructionCost getAddressComputationCost(Type *Ty, ScalarEvolution *SE, const SCEV *Ptr)=0
virtual TypeSize getRegisterBitWidth(RegisterKind K) const =0
virtual const DataLayout & getDataLayout() const =0
virtual InstructionCost getBranchMispredictPenalty()=0
virtual bool isProfitableLSRChainElement(Instruction *I)=0
virtual InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask, Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I=nullptr)=0
virtual InstructionCost getIntImmCostInst(unsigned Opc, unsigned Idx, const APInt &Imm, Type *Ty, TargetCostKind CostKind, Instruction *Inst=nullptr)=0
virtual InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, TTI::TargetCostKind CostKind)=0
virtual uint64_t getFeatureMask(const Function &F) const =0
virtual void getUnrollingPreferences(Loop *L, ScalarEvolution &, UnrollingPreferences &UP, OptimizationRemarkEmitter *ORE)=0
virtual bool isLegalNTStore(Type *DataType, Align Alignment)=0
virtual unsigned adjustInliningThreshold(const CallBase *CB)=0
virtual InstructionCost getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy, CmpInst::Predicate VecPred, TTI::TargetCostKind CostKind, OperandValueInfo Op1Info, OperandValueInfo Op2Info, const Instruction *I)=0
virtual bool isExpensiveToSpeculativelyExecute(const Instruction *I)=0
virtual bool shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const =0
virtual bool isLegalInterleavedAccessType(VectorType *VTy, unsigned Factor, Align Alignment, unsigned AddrSpace)=0
virtual std::optional< Instruction * > instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II)=0
virtual bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty, ReductionFlags) const =0
virtual VPLegalization getVPLegalizationStrategy(const VPIntrinsic &PI) const =0
virtual bool isLegalNTLoad(Type *DataType, Align Alignment)=0
virtual bool enableOrderedReductions()=0
virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit)=0
virtual unsigned getNumberOfRegisters(unsigned ClassID) const =0
virtual std::pair< const Value *, unsigned > getPredicatedAddrSpace(const Value *V) const =0
virtual bool isLegalMaskedGather(Type *DataType, Align Alignment)=0
virtual bool areTypesABICompatible(const Function *Caller, const Function *Callee, const ArrayRef< Type * > &Types) const =0
virtual bool supportsTailCalls()=0
virtual InstructionCost getIntImmCost(const APInt &Imm, Type *Ty, TargetCostKind CostKind)=0
virtual bool shouldPrefetchAddressSpace(unsigned AS) const =0
virtual bool isFCmpOrdCheaperThanFCmpZero(Type *Ty)=0
virtual unsigned getMinVectorRegisterBitWidth() const =0
virtual InstructionCost getAltInstrCost(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1, const SmallBitVector &OpcodeMask, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput) const =0
virtual std::optional< unsigned > getVScaleForTuning() const =0
virtual InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, Type *Ty, TargetCostKind CostKind)=0
virtual InstructionCost getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty, FastMathFlags FMF, TTI::TargetCostKind CostKind)=0
virtual bool supportsEfficientVectorElementLoadStore()=0
virtual unsigned getRegUsageForType(Type *Ty)=0
virtual bool hasArmWideBranch(bool Thumb) const =0
virtual MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const =0
virtual InstructionCost getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput)=0
virtual InstructionCost getArithmeticInstrCost(unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind, OperandValueInfo Opd1Info, OperandValueInfo Opd2Info, ArrayRef< const Value * > Args, const Instruction *CxtI=nullptr)=0
virtual unsigned getAssumedAddrSpace(const Value *V) const =0
virtual bool isTruncateFree(Type *Ty1, Type *Ty2)=0
virtual bool isTargetIntrinsicTriviallyScalarizable(Intrinsic::ID ID)=0
virtual bool collectFlatAddressOperands(SmallVectorImpl< int > &OpIndexes, Intrinsic::ID IID) const =0
virtual InstructionCost getGEPCost(Type *PointeeType, const Value *Ptr, ArrayRef< const Value * > Operands, Type *AccessType, TTI::TargetCostKind CostKind)=0
virtual bool shouldBuildLookupTables()=0
virtual bool isLegalBroadcastLoad(Type *ElementTy, ElementCount NumElements) const =0
virtual bool isLegalToVectorizeStore(StoreInst *SI) const =0
virtual bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType)=0
virtual bool isVectorShiftByScalarCheap(Type *Ty) const =0
virtual unsigned getGISelRematGlobalCost() const =0
virtual unsigned getCallerAllocaCost(const CallBase *CB, const AllocaInst *AI) const =0
virtual InstructionCost getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, StackOffset BaseOffset, bool HasBaseReg, int64_t Scale, unsigned AddrSpace)=0
virtual Type * getMemcpyLoopLoweringType(LLVMContext &Context, Value *Length, unsigned SrcAddrSpace, unsigned DestAddrSpace, Align SrcAlign, Align DestAlign, std::optional< uint32_t > AtomicElementSize) const =0
virtual bool forceScalarizeMaskedScatter(VectorType *DataType, Align Alignment)=0
virtual bool supportsTailCallFor(const CallBase *CB)=0
virtual std::optional< unsigned > getMaxVScale() const =0
virtual InstructionCost getInstructionCost(const User *U, ArrayRef< const Value * > Operands, TargetCostKind CostKind)=0
virtual bool isLegalToVectorizeReduction(const RecurrenceDescriptor &RdxDesc, ElementCount VF) const =0
virtual unsigned getMaxNumArgs() const =0
virtual bool shouldExpandReduction(const IntrinsicInst *II) const =0
virtual bool enableWritePrefetching() const =0
virtual bool useColdCCForColdCall(Function &F)=0
virtual unsigned getInlineCallPenalty(const Function *F, const CallBase &Call, unsigned DefaultCallPenalty) const =0
virtual bool preferInLoopReduction(unsigned Opcode, Type *Ty, ReductionFlags) const =0
virtual int getInlinerVectorBonusPercent() const =0
virtual unsigned getMaxPrefetchIterationsAhead() const =0
virtual bool isLegalMaskedScatter(Type *DataType, Align Alignment)=0
virtual bool isIndexedLoadLegal(MemIndexedMode Mode, Type *Ty) const =0
virtual unsigned getCacheLineSize() const =0
virtual bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const =0
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize, unsigned ChainSizeInBytes, VectorType *VecTy) const =0
virtual ReductionShuffle getPreferredExpandedReductionShuffle(const IntrinsicInst *II) const =0
virtual AddressingModeKind getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const =0
virtual bool shouldBuildLookupTablesForConstant(Constant *C)=0
virtual bool preferPredicateOverEpilogue(TailFoldingInfo *TFI)=0
virtual bool isProfitableToHoist(Instruction *I)=0
virtual InstructionCost getScalarizationOverhead(VectorType *Ty, const APInt &DemandedElts, bool Insert, bool Extract, TargetCostKind CostKind, ArrayRef< Value * > VL={})=0
virtual bool isLegalMaskedExpandLoad(Type *DataType, Align Alignment)=0
virtual InstructionCost getFPOpCost(Type *Ty)=0
virtual unsigned getMinTripCountTailFoldingThreshold() const =0
virtual bool enableMaskedInterleavedAccessVectorization()=0
virtual unsigned getRegisterClassForType(bool Vector, Type *Ty=nullptr) const =0
virtual bool isTypeLegal(Type *Ty)=0
virtual BranchProbability getPredictableBranchThreshold()=0
virtual bool enableScalableVectorization() const =0
virtual bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info)=0
virtual bool isValidAddrSpaceCast(unsigned FromAS, unsigned ToAS) const =0
virtual const char * getRegisterClassName(unsigned ClassID) const =0
virtual unsigned getMaxInterleaveFactor(ElementCount VF)=0
virtual bool enableAggressiveInterleaving(bool LoopHasReductions)=0
virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1, const SmallBitVector &OpcodeMask) const =0
virtual bool haveFastSqrt(Type *Ty)=0
virtual bool isLegalMaskedCompressStore(Type *DataType, Align Alignment)=0
virtual std::optional< unsigned > getCacheSize(CacheLevel Level) const =0
virtual InstructionCost getCallInstrCost(Function *F, Type *RetTy, ArrayRef< Type * > Tys, TTI::TargetCostKind CostKind)=0
virtual InstructionCost getPointersChainCost(ArrayRef< const Value * > Ptrs, const Value *Base, const TTI::PointersChainInfo &Info, Type *AccessTy, TTI::TargetCostKind CostKind)=0
virtual void getPeelingPreferences(Loop *L, ScalarEvolution &SE, PeelingPreferences &PP)=0
virtual std::optional< unsigned > getCacheAssociativity(CacheLevel Level) const =0
virtual bool supportsScalableVectors() const =0
virtual void getMemcpyLoopResidualLoweringType(SmallVectorImpl< Type * > &OpsOut, LLVMContext &Context, unsigned RemainingBytes, unsigned SrcAddrSpace, unsigned DestAddrSpace, Align SrcAlign, Align DestAlign, std::optional< uint32_t > AtomicCpySize) const =0
virtual bool isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID, unsigned ScalarOpdIdx)=0
virtual bool forceScalarizeMaskedGather(VectorType *DataType, Align Alignment)=0
virtual unsigned getNumberOfParts(Type *Tp)=0
virtual bool isLegalICmpImmediate(int64_t Imm)=0
virtual unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI, unsigned &JTSize, ProfileSummaryInfo *PSI, BlockFrequencyInfo *BFI)=0
virtual InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind, const Instruction *I=nullptr)=0
virtual bool isElementTypeLegalForScalableVector(Type *Ty) const =0
virtual TailFoldingStyle getPreferredTailFoldingStyle(bool IVUpdateMayOverflow=true)=0
virtual bool hasDivRemOp(Type *DataType, bool IsSigned)=0
virtual unsigned getMinPrefetchStride(unsigned NumMemAccesses, unsigned NumStridedMemAccesses, unsigned NumPrefetches, bool HasCall) const =0
virtual bool shouldBuildRelLookupTables()=0
virtual InstructionCost getOperandsScalarizationOverhead(ArrayRef< const Value * > Args, ArrayRef< Type * > Tys, TargetCostKind CostKind)=0
virtual bool isLoweredToCall(const Function *F)=0
virtual bool isSourceOfDivergence(const Value *V)=0
virtual bool isLegalAddScalableImmediate(int64_t Imm)=0
virtual bool canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const =0
virtual unsigned getInliningCostBenefitAnalysisSavingsMultiplier() const =0
virtual bool isLegalMaskedLoad(Type *DataType, Align Alignment)=0
virtual unsigned getNumBytesToPadGlobalArray(unsigned Size, Type *ArrayType) const =0
virtual InstructionCost getExtendedReductionCost(unsigned Opcode, bool IsUnsigned, Type *ResTy, VectorType *Ty, FastMathFlags FMF, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput)=0
virtual bool isFPVectorizationPotentiallyUnsafe()=0
virtual Value * getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType)=0
virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize, unsigned ChainSizeInBytes, VectorType *VecTy) const =0
virtual InstructionCost getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm, Type *Ty)=0
virtual bool hasConditionalLoadStoreForType(Type *Ty=nullptr) const =0
virtual InstructionCost getPartialReductionCost(unsigned Opcode, Type *InputTypeA, Type *InputTypeB, Type *AccumType, ElementCount VF, PartialReductionExtendKind OpAExtend, PartialReductionExtendKind OpBExtend, std::optional< unsigned > BinOp) const =0
virtual InstructionCost getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src, CastContextHint CCH, TTI::TargetCostKind CostKind, const Instruction *I)=0
virtual bool isProfitableToSinkOperands(Instruction *I, SmallVectorImpl< Use * > &OpsToSink) const =0
virtual bool hasBranchDivergence(const Function *F=nullptr)=0
virtual InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty, std::optional< FastMathFlags > FMF, TTI::TargetCostKind CostKind)=0
virtual bool isMultiversionedFunction(const Function &F) const =0
virtual unsigned getInliningThresholdMultiplier() const =0
virtual InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF, const APInt &DemandedDstElts, TTI::TargetCostKind CostKind)=0
virtual bool isLegalMaskedStore(Type *DataType, Align Alignment)=0
virtual InstructionCost getVectorInstrCost(const Instruction &I, Type *Val, TTI::TargetCostKind CostKind, unsigned Index)=0
virtual bool isLegalToVectorizeLoad(LoadInst *LI) const =0
virtual bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const =0
virtual unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) const =0
virtual bool isTargetIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID, int OpdIdx)=0
virtual bool isLSRCostLess(const TargetTransformInfo::LSRCost &C1, const TargetTransformInfo::LSRCost &C2)=0
virtual bool shouldDropLSRSolutionIfLessProfitable() const =0
virtual bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const =0
virtual InstructionCost getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef< unsigned > Indices, Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind, bool UseMaskForCond=false, bool UseMaskForGaps=false)=0
virtual bool prefersVectorizedAddressing()=0
virtual uint64_t getMaxMemIntrinsicInlineSizeThreshold() const =0
virtual InstructionCost getShuffleCost(ShuffleKind Kind, VectorType *Tp, ArrayRef< int > Mask, TTI::TargetCostKind CostKind, int Index, VectorType *SubTp, ArrayRef< const Value * > Args, const Instruction *CxtI)=0
virtual InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind, OperandValueInfo OpInfo, const Instruction *I)=0
virtual bool canSaveCmp(Loop *L, BranchInst **BI, ScalarEvolution *SE, LoopInfo *LI, DominatorTree *DT, AssumptionCache *AC, TargetLibraryInfo *LibInfo)=0
virtual InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind)=0
virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, HardwareLoopInfo &HWLoopInfo)=0
virtual bool isAlwaysUniform(const Value *V)=0
virtual std::optional< unsigned > getMinPageSize() const =0
virtual bool canMacroFuseCmp()=0
virtual InstructionCost getMemcpyCost(const Instruction *I)=0
virtual ElementCount getMinimumVF(unsigned ElemWidth, bool IsScalable) const =0
virtual bool areInlineCompatible(const Function *Caller, const Function *Callee) const =0
virtual bool addrspacesMayAlias(unsigned AS0, unsigned AS1) const =0
virtual InstructionCost getExtractWithExtendCost(unsigned Opcode, Type *Dst, VectorType *VecTy, unsigned Index)=0
virtual unsigned getEpilogueVectorizationMinVF()=0
virtual std::optional< Value * > simplifyDemandedVectorEltsIntrinsic(InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts, APInt &UndefElts2, APInt &UndefElts3, std::function< void(Instruction *, unsigned, APInt, APInt &)> SimplifyAndSetOp)=0
virtual InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask, Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I=nullptr)=0
virtual unsigned getFlatAddressSpace()=0
virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index, Value *Op0, Value *Op1)=0
virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index, Value *Scalar, ArrayRef< std::tuple< Value *, User *, int > > ScalarUserAndIdx)=0
virtual unsigned getPrefetchDistance() const =0
virtual bool enableSelectOptimize()=0
virtual bool LSRWithInstrQueries()=0
virtual bool shouldTreatInstructionLikeSelect(const Instruction *I)=0
virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace)=0
virtual bool preferToKeepConstantsAttached(const Instruction &Inst, const Function &Fn) const =0
virtual bool isNumRegsMajorCostOfLSR()=0
virtual bool isLegalStridedLoadStore(Type *DataType, Align Alignment)=0
virtual bool isSingleThreaded() const =0
virtual bool isLegalAddImmediate(int64_t Imm)=0
virtual Value * rewriteIntrinsicWithAddressSpace(IntrinsicInst *II, Value *OldV, Value *NewV) const =0
virtual bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, bool HasBaseReg, int64_t Scale, unsigned AddrSpace, Instruction *I, int64_t ScalableOffset)=0
virtual bool shouldConsiderAddressTypePromotion(const Instruction &I, bool &AllowPromotionWithoutCommonHeader)=0
virtual unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy, Type *ScalarValTy) const =0
virtual bool isVScaleKnownToBeAPowerOfTwo() const =0
virtual InstructionCost getCostOfKeepingLiveOverCall(ArrayRef< Type * > Tys)=0
virtual bool hasActiveVectorLength(unsigned Opcode, Type *DataType, Align Alignment) const =0
virtual bool enableInterleavedAccessVectorization()=0
virtual bool isTargetIntrinsicWithStructReturnOverloadAtField(Intrinsic::ID ID, int RetIdx)=0
virtual unsigned getAtomicMemIntrinsicMaxElementSize() const =0
virtual bool preferEpilogueVectorization() const =0
virtual InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind, const Instruction *I)=0
virtual int getInliningLastCallToStaticBonus() const =0
virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const =0
virtual bool isIndexedStoreLegal(MemIndexedMode Mode, Type *Ty) const =0
virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context, unsigned BitWidth, unsigned AddressSpace, Align Alignment, unsigned *Fast)=0
virtual unsigned getInliningCostBenefitAnalysisProfitableMultiplier() const =0
This pass provides access to the codegen interfaces that are needed for IR-level transformations.
bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) const
bool isLegalToVectorizeLoad(LoadInst *LI) const
std::optional< unsigned > getVScaleForTuning() const
static CastContextHint getCastContextHint(const Instruction *I)
Calculates a CastContextHint from I.
unsigned getMaxNumArgs() const
bool addrspacesMayAlias(unsigned AS0, unsigned AS1) const
Return false if a AS0 address cannot possibly alias a AS1 address.
bool isLegalMaskedScatter(Type *DataType, Align Alignment) const
Return true if the target supports masked scatter.
InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask, Align Alignment, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput, const Instruction *I=nullptr) const
bool shouldBuildLookupTables() const
Return true if switches should be turned into lookup tables for the target.
bool isLegalToVectorizeStore(StoreInst *SI) const
bool enableAggressiveInterleaving(bool LoopHasReductions) const
Don't restrict interleaved unrolling to small loops.
uint64_t getFeatureMask(const Function &F) const
Returns a bitmask constructed from the target-features or fmv-features metadata of a function.
bool isMultiversionedFunction(const Function &F) const
Returns true if this is an instance of a function with multiple versions.
bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) const
Return true if it is faster to check if a floating-point value is NaN (or not-NaN) versus a compariso...
bool preferInLoopReduction(unsigned Opcode, Type *Ty, ReductionFlags Flags) const
bool supportsEfficientVectorElementLoadStore() const
If target has efficient vector element load/store instructions, it can return true here so that inser...
bool isAlwaysUniform(const Value *V) const
unsigned getAssumedAddrSpace(const Value *V) const
bool shouldDropLSRSolutionIfLessProfitable() const
Return true if LSR should drop a found solution if it's calculated to be less profitable than the bas...
bool isLSRCostLess(const TargetTransformInfo::LSRCost &C1, const TargetTransformInfo::LSRCost &C2) const
Return true if LSR cost of C1 is lower than C2.
unsigned getPrefetchDistance() const
Type * getMemcpyLoopLoweringType(LLVMContext &Context, Value *Length, unsigned SrcAddrSpace, unsigned DestAddrSpace, Align SrcAlign, Align DestAlign, std::optional< uint32_t > AtomicElementSize=std::nullopt) const
bool isLegalMaskedExpandLoad(Type *DataType, Align Alignment) const
Return true if the target supports masked expand load.
bool prefersVectorizedAddressing() const
Return true if target doesn't mind addresses in vectors.
InstructionCost getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy, CmpInst::Predicate VecPred, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput, OperandValueInfo Op1Info={OK_AnyValue, OP_None}, OperandValueInfo Op2Info={OK_AnyValue, OP_None}, const Instruction *I=nullptr) const
bool hasBranchDivergence(const Function *F=nullptr) const
Return true if branch divergence exists.
MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const
InstructionCost getAddressComputationCost(Type *Ty, ScalarEvolution *SE=nullptr, const SCEV *Ptr=nullptr) const
bool invalidate(Function &, const PreservedAnalyses &, FunctionAnalysisManager::Invalidator &)
Handle the invalidation of this information.
void getUnrollingPreferences(Loop *L, ScalarEvolution &, UnrollingPreferences &UP, OptimizationRemarkEmitter *ORE) const
Get target-customized preferences for the generic loop unrolling transformation.
bool shouldBuildLookupTablesForConstant(Constant *C) const
Return true if switches should be turned into lookup tables containing this constant value for the ta...
InstructionCost getOperandsScalarizationOverhead(ArrayRef< const Value * > Args, ArrayRef< Type * > Tys, TTI::TargetCostKind CostKind) const
Estimate the overhead of scalarizing an instructions unique non-constant operands.
bool supportsTailCallFor(const CallBase *CB) const
If target supports tail call on CB.
std::optional< Instruction * > instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const
Targets can implement their own combinations for target-specific intrinsics.
bool isProfitableLSRChainElement(Instruction *I) const
TypeSize getRegisterBitWidth(RegisterKind K) const
unsigned getInlineCallPenalty(const Function *F, const CallBase &Call, unsigned DefaultCallPenalty) const
Returns a penalty for invoking call Call in F.
bool isExpensiveToSpeculativelyExecute(const Instruction *I) const
Return true if the cost of the instruction is too high to speculatively execute and should be kept be...
bool isLegalMaskedGather(Type *DataType, Align Alignment) const
Return true if the target supports masked gather.
InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput, OperandValueInfo OpdInfo={OK_AnyValue, OP_None}, const Instruction *I=nullptr) const
std::optional< unsigned > getMaxVScale() const
InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF, const APInt &DemandedDstElts, TTI::TargetCostKind CostKind) const
InstructionCost getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef< unsigned > Indices, Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput, bool UseMaskForCond=false, bool UseMaskForGaps=false) const
bool isSingleThreaded() const
std::optional< Value * > simplifyDemandedVectorEltsIntrinsic(InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts, APInt &UndefElts2, APInt &UndefElts3, std::function< void(Instruction *, unsigned, APInt, APInt &)> SimplifyAndSetOp) const
Can be used to implement target-specific instruction combining.
bool enableOrderedReductions() const
Return true if we should be enabling ordered reductions for the target.
InstructionCost getInstructionCost(const User *U, TargetCostKind CostKind) const
This is a helper function which calls the three-argument getInstructionCost with Operands which are t...
unsigned getInliningCostBenefitAnalysisProfitableMultiplier() const
InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, TTI::TargetCostKind CostKind) const
InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty, std::optional< FastMathFlags > FMF, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput) const
Calculate the cost of vector reduction intrinsics.
unsigned getAtomicMemIntrinsicMaxElementSize() const
InstructionCost getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src, TTI::CastContextHint CCH, TTI::TargetCostKind CostKind=TTI::TCK_SizeAndLatency, const Instruction *I=nullptr) const
bool LSRWithInstrQueries() const
Return true if the loop strength reduce pass should make Instruction* based TTI queries to isLegalAdd...
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize, unsigned ChainSizeInBytes, VectorType *VecTy) const
VPLegalization getVPLegalizationStrategy(const VPIntrinsic &PI) const
static PartialReductionExtendKind getPartialReductionExtendKind(Instruction *I)
Get the kind of extension that an instruction represents.
bool enableWritePrefetching() const
bool shouldTreatInstructionLikeSelect(const Instruction *I) const
Should the Select Optimization pass treat the given instruction like a select, potentially converting...
bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const
bool shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const
TailFoldingStyle getPreferredTailFoldingStyle(bool IVUpdateMayOverflow=true) const
Query the target what the preferred style of tail folding is.
InstructionCost getGEPCost(Type *PointeeType, const Value *Ptr, ArrayRef< const Value * > Operands, Type *AccessType=nullptr, TargetCostKind CostKind=TCK_SizeAndLatency) const
Estimate the cost of a GEP operation when lowered.
bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const
bool isLegalInterleavedAccessType(VectorType *VTy, unsigned Factor, Align Alignment, unsigned AddrSpace) const
Return true is the target supports interleaved access for the given vector type VTy,...
unsigned getRegUsageForType(Type *Ty) const
Returns the estimated number of registers required to represent Ty.
bool isLegalBroadcastLoad(Type *ElementTy, ElementCount NumElements) const
\Returns true if the target supports broadcasting a load to a vector of type <NumElements x ElementTy...
bool isIndexedStoreLegal(enum MemIndexedMode Mode, Type *Ty) const
std::pair< const Value *, unsigned > getPredicatedAddrSpace(const Value *V) const
unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) const
ReductionShuffle getPreferredExpandedReductionShuffle(const IntrinsicInst *II) const
InstructionCost getExtendedReductionCost(unsigned Opcode, bool IsUnsigned, Type *ResTy, VectorType *Ty, FastMathFlags FMF, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput) const
Calculate the cost of an extended reduction pattern, similar to getArithmeticReductionCost of a reduc...
static OperandValueInfo getOperandInfo(const Value *V)
Collect properties of V used in cost analysis, e.g. OP_PowerOf2.
InstructionCost getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput) const
Calculate the cost of an extended reduction pattern, similar to getArithmeticReductionCost of an Add ...
unsigned getRegisterClassForType(bool Vector, Type *Ty=nullptr) const
bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, bool HasBaseReg, int64_t Scale, unsigned AddrSpace=0, Instruction *I=nullptr, int64_t ScalableOffset=0) const
Return true if the addressing mode represented by AM is legal for this target, for a load/store of th...
PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const
Return hardware support for population count.
unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI, unsigned &JTSize, ProfileSummaryInfo *PSI, BlockFrequencyInfo *BFI) const
bool isElementTypeLegalForScalableVector(Type *Ty) const
bool forceScalarizeMaskedGather(VectorType *Type, Align Alignment) const
Return true if the target forces scalarizing of llvm.masked.gather intrinsics.
unsigned getMaxPrefetchIterationsAhead() const
bool canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const
Return true if globals in this address space can have initializers other than undef.
ElementCount getMinimumVF(unsigned ElemWidth, bool IsScalable) const
InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, Type *Ty, TargetCostKind CostKind) const
bool enableMaskedInterleavedAccessVectorization() const
Enable matching of interleaved access groups that contain predicated accesses or gaps and therefore v...
InstructionCost getIntImmCostInst(unsigned Opc, unsigned Idx, const APInt &Imm, Type *Ty, TargetCostKind CostKind, Instruction *Inst=nullptr) const
Return the expected cost of materialization for the given integer immediate of the specified type for...
bool isLegalStridedLoadStore(Type *DataType, Align Alignment) const
Return true if the target supports strided load.
TargetTransformInfo & operator=(TargetTransformInfo &&RHS)
InstructionCost getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty, FastMathFlags FMF=FastMathFlags(), TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput) const
TargetCostKind
The kind of cost model.
@ TCK_RecipThroughput
Reciprocal throughput.
@ TCK_CodeSize
Instruction code size.
@ TCK_SizeAndLatency
The weighted sum of size and latency.
@ TCK_Latency
The latency of instruction.
InstructionCost getArithmeticInstrCost(unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput, TTI::OperandValueInfo Opd1Info={TTI::OK_AnyValue, TTI::OP_None}, TTI::OperandValueInfo Opd2Info={TTI::OK_AnyValue, TTI::OP_None}, ArrayRef< const Value * > Args={}, const Instruction *CxtI=nullptr, const TargetLibraryInfo *TLibInfo=nullptr) const
This is an approximation of reciprocal throughput of a math/logic op.
bool areTypesABICompatible(const Function *Caller, const Function *Callee, const ArrayRef< Type * > &Types) const
bool enableSelectOptimize() const
Should the Select Optimization pass be enabled and ran.
bool collectFlatAddressOperands(SmallVectorImpl< int > &OpIndexes, Intrinsic::ID IID) const
Return any intrinsic address operand indexes which may be rewritten if they use a flat address space ...
OperandValueProperties
Additional properties of an operand's values.
int getInliningLastCallToStaticBonus() const
InstructionCost getPointersChainCost(ArrayRef< const Value * > Ptrs, const Value *Base, const PointersChainInfo &Info, Type *AccessTy, TargetCostKind CostKind=TTI::TCK_RecipThroughput) const
Estimate the cost of a chain of pointers (typically pointer operands of a chain of loads or stores wi...
bool isVScaleKnownToBeAPowerOfTwo() const
bool isIndexedLoadLegal(enum MemIndexedMode Mode, Type *Ty) const
unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const
bool isSourceOfDivergence(const Value *V) const
Returns whether V is a source of divergence.
bool isLegalICmpImmediate(int64_t Imm) const
Return true if the specified immediate is legal icmp immediate, that is the target has icmp instructi...
bool isTypeLegal(Type *Ty) const
Return true if this type is legal.
static bool requiresOrderedReduction(std::optional< FastMathFlags > FMF)
A helper function to determine the type of reduction algorithm used for a given Opcode and set of Fas...
bool isLegalToVectorizeReduction(const RecurrenceDescriptor &RdxDesc, ElementCount VF) const
std::optional< unsigned > getCacheAssociativity(CacheLevel Level) const
bool isLegalNTLoad(Type *DataType, Align Alignment) const
Return true if the target supports nontemporal load.
InstructionCost getMemcpyCost(const Instruction *I) const
unsigned adjustInliningThreshold(const CallBase *CB) const
bool isLegalAddImmediate(int64_t Imm) const
Return true if the specified immediate is legal add immediate, that is the target has add instruction...
bool isTargetIntrinsicWithStructReturnOverloadAtField(Intrinsic::ID ID, int RetIdx) const
Identifies if the vector form of the intrinsic that returns a struct is overloaded at the struct elem...
InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput, const Instruction *I=nullptr) const
unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize, unsigned ChainSizeInBytes, VectorType *VecTy) const
InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput) const
bool canSaveCmp(Loop *L, BranchInst **BI, ScalarEvolution *SE, LoopInfo *LI, DominatorTree *DT, AssumptionCache *AC, TargetLibraryInfo *LibInfo) const
Return true if the target can save a compare for loop count, for example hardware loop saves a compar...
bool isTargetIntrinsicTriviallyScalarizable(Intrinsic::ID ID) const
Value * rewriteIntrinsicWithAddressSpace(IntrinsicInst *II, Value *OldV, Value *NewV) const
Rewrite intrinsic call II such that OldV will be replaced with NewV, which has a different address sp...
InstructionCost getCostOfKeepingLiveOverCall(ArrayRef< Type * > Tys) const
unsigned getMinPrefetchStride(unsigned NumMemAccesses, unsigned NumStridedMemAccesses, unsigned NumPrefetches, bool HasCall) const
Some HW prefetchers can handle accesses up to a certain constant stride.
bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty, ReductionFlags Flags) const
InstructionCost getShuffleCost(ShuffleKind Kind, VectorType *Tp, ArrayRef< int > Mask={}, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput, int Index=0, VectorType *SubTp=nullptr, ArrayRef< const Value * > Args={}, const Instruction *CxtI=nullptr) const
bool shouldPrefetchAddressSpace(unsigned AS) const
InstructionCost getIntImmCost(const APInt &Imm, Type *Ty, TargetCostKind CostKind) const
Return the expected cost of materializing for the given integer immediate of the specified type.
unsigned getMinVectorRegisterBitWidth() const
bool isLegalNTStore(Type *DataType, Align Alignment) const
Return true if the target supports nontemporal store.
unsigned getFlatAddressSpace() const
Returns the address space ID for a target's 'flat' address space.
bool preferToKeepConstantsAttached(const Instruction &Inst, const Function &Fn) const
It can be advantageous to detach complex constants from their uses to make their generation cheaper.
bool hasArmWideBranch(bool Thumb) const
const char * getRegisterClassName(unsigned ClassID) const
bool preferEpilogueVectorization() const
Return true if the loop vectorizer should consider vectorizing an otherwise scalar epilogue loop.
bool shouldConsiderAddressTypePromotion(const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const
BranchProbability getPredictableBranchThreshold() const
If a branch or a select condition is skewed in one direction by more than this factor,...
unsigned getCallerAllocaCost(const CallBase *CB, const AllocaInst *AI) const
unsigned getCacheLineSize() const
bool allowsMisalignedMemoryAccesses(LLVMContext &Context, unsigned BitWidth, unsigned AddressSpace=0, Align Alignment=Align(1), unsigned *Fast=nullptr) const
Determine if the target supports unaligned memory accesses.
InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask, Align Alignment, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput, const Instruction *I=nullptr) const
int getInlinerVectorBonusPercent() const
bool hasActiveVectorLength(unsigned Opcode, Type *DataType, Align Alignment) const
unsigned getEpilogueVectorizationMinVF() const
PopcntSupportKind
Flags indicating the kind of support for population count.
InstructionCost getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm, Type *Ty) const
Return the expected cost for the given integer when optimising for size.
AddressingModeKind getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const
Return the preferred addressing mode LSR should make efforts to generate.
bool isLoweredToCall(const Function *F) const
Test whether calls to a function lower to actual program function calls.
bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const
bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, HardwareLoopInfo &HWLoopInfo) const
Query the target whether it would be profitable to convert the given loop into a hardware loop.
unsigned getInliningThresholdMultiplier() const
InstructionCost getBranchMispredictPenalty() const
Returns estimated penalty of a branch misprediction in latency.
unsigned getNumberOfRegisters(unsigned ClassID) const
bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1, const SmallBitVector &OpcodeMask) const
Return true if this is an alternating opcode pattern that can be lowered to a single instruction on t...
bool isProfitableToHoist(Instruction *I) const
Return true if it is profitable to hoist instruction in the then/else to before if.
bool supportsScalableVectors() const
bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) const
Return true if the given instruction (assumed to be a memory access instruction) has a volatile varia...
bool isLegalMaskedCompressStore(Type *DataType, Align Alignment) const
Return true if the target supports masked compress store.
std::optional< unsigned > getMinPageSize() const
bool isFPVectorizationPotentiallyUnsafe() const
Indicate that it is potentially unsafe to automatically vectorize floating-point operations because t...
bool isLegalMaskedStore(Type *DataType, Align Alignment) const
Return true if the target supports masked store.
bool shouldBuildRelLookupTables() const
Return true if lookup tables should be turned into relative lookup tables.
PartialReductionExtendKind
unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy, Type *ScalarValTy) const
std::optional< unsigned > getCacheSize(CacheLevel Level) const
std::optional< Value * > simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II, APInt DemandedMask, KnownBits &Known, bool &KnownBitsComputed) const
Can be used to implement target-specific instruction combining.
bool isLegalAddScalableImmediate(int64_t Imm) const
Return true if adding the specified scalable immediate is legal, that is the target has add instructi...
bool isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID, unsigned ScalarOpdIdx) const
Identifies if the vector form of the intrinsic has a scalar operand.
bool hasDivRemOp(Type *DataType, bool IsSigned) const
Return true if the target has a unified operation to calculate division and remainder.
InstructionCost getAltInstrCost(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1, const SmallBitVector &OpcodeMask, TTI::TargetCostKind CostKind=TTI::TCK_RecipThroughput) const
Returns the cost estimation for alternating opcode pattern that can be lowered to a single instructio...
TargetCostConstants
Underlying constants for 'cost' values in this interface.
@ TCC_Expensive
The cost of a 'div' instruction on x86.
@ TCC_Free
Expected to fold away in lowering.
@ TCC_Basic
The cost of a typical 'add' instruction.
InstructionCost getScalarizationOverhead(VectorType *Ty, const APInt &DemandedElts, bool Insert, bool Extract, TTI::TargetCostKind CostKind, ArrayRef< Value * > VL={}) const
Estimate the overhead of scalarizing an instruction.
bool enableInterleavedAccessVectorization() const
Enable matching of interleaved access groups.
unsigned getMinTripCountTailFoldingThreshold() const
InstructionCost getInstructionCost(const User *U, ArrayRef< const Value * > Operands, TargetCostKind CostKind) const
Estimate the cost of a given IR user when lowered.
unsigned getMaxInterleaveFactor(ElementCount VF) const
bool enableScalableVectorization() const
bool isVectorShiftByScalarCheap(Type *Ty) const
Return true if it's significantly cheaper to shift a vector by a uniform scalar than by an amount whi...
bool isNumRegsMajorCostOfLSR() const
Return true if LSR major cost is number of registers.
unsigned getInliningCostBenefitAnalysisSavingsMultiplier() const
bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) const
InstructionCost getExtractWithExtendCost(unsigned Opcode, Type *Dst, VectorType *VecTy, unsigned Index) const
unsigned getGISelRematGlobalCost() const
unsigned getNumBytesToPadGlobalArray(unsigned Size, Type *ArrayType) const
MemIndexedMode
The type of load/store indexing.
@ MIM_Unindexed
No indexing.
@ MIM_PostInc
Post-incrementing.
@ MIM_PostDec
Post-decrementing.
@ MIM_PreDec
Pre-decrementing.
@ MIM_PreInc
Pre-incrementing.
bool areInlineCompatible(const Function *Caller, const Function *Callee) const
bool useColdCCForColdCall(Function &F) const
Return true if the input function which is cold at all call sites, should use coldcc calling conventi...
InstructionCost getFPOpCost(Type *Ty) const
Return the expected cost of supporting the floating point operation of the specified type.
bool supportsTailCalls() const
If the target supports tail calls.
bool canMacroFuseCmp() const
Return true if the target can fuse a compare and branch.
Value * getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType) const
bool isValidAddrSpaceCast(unsigned FromAS, unsigned ToAS) const
Query the target whether the specified address space cast from FromAS to ToAS is valid.
unsigned getNumberOfParts(Type *Tp) const
bool hasConditionalLoadStoreForType(Type *Ty=nullptr) const
InstructionCost getPartialReductionCost(unsigned Opcode, Type *InputTypeA, Type *InputTypeB, Type *AccumType, ElementCount VF, PartialReductionExtendKind OpAExtend, PartialReductionExtendKind OpBExtend, std::optional< unsigned > BinOp=std::nullopt) const
InstructionCost getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, StackOffset BaseOffset, bool HasBaseReg, int64_t Scale, unsigned AddrSpace=0) const
Return the cost of the scaling factor used in the addressing mode represented by AM for this target,...
bool isTruncateFree(Type *Ty1, Type *Ty2) const
Return true if it's free to truncate a value of type Ty1 to type Ty2.
bool isProfitableToSinkOperands(Instruction *I, SmallVectorImpl< Use * > &Ops) const
Return true if sinking I's operands to the same basic block as I is profitable, e....
void getMemcpyLoopResidualLoweringType(SmallVectorImpl< Type * > &OpsOut, LLVMContext &Context, unsigned RemainingBytes, unsigned SrcAddrSpace, unsigned DestAddrSpace, Align SrcAlign, Align DestAlign, std::optional< uint32_t > AtomicCpySize=std::nullopt) const
bool preferPredicateOverEpilogue(TailFoldingInfo *TFI) const
Query the target whether it would be prefered to create a predicated vector loop, which can avoid the...
bool forceScalarizeMaskedScatter(VectorType *Type, Align Alignment) const
Return true if the target forces scalarizing of llvm.masked.scatter intrinsics.
bool isTargetIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID, int OpdIdx) const
Identifies if the vector form of the intrinsic is overloaded on the type of the operand at index OpdI...
bool haveFastSqrt(Type *Ty) const
Return true if the hardware has a fast square-root instruction.
bool shouldExpandReduction(const IntrinsicInst *II) const
TargetTransformInfo(T Impl)
Construct a TTI object using a type implementing the Concept API below.
uint64_t getMaxMemIntrinsicInlineSizeThreshold() const
Returns the maximum memset / memcpy size in bytes that still makes it profitable to inline the call.
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index=-1, Value *Op0=nullptr, Value *Op1=nullptr) const
ShuffleKind
The various kinds of shuffle patterns for vector queries.
@ SK_InsertSubvector
InsertSubvector. Index indicates start offset.
@ SK_Select
Selects elements from the corresponding lane of either source operand.
@ SK_PermuteSingleSrc
Shuffle elements of single source vector with any shuffle mask.
@ SK_Transpose
Transpose two vectors.
@ SK_Splice
Concatenates elements from the first input vector with elements of the second input vector.
@ SK_Broadcast
Broadcast element 0 to all other elements.
@ SK_PermuteTwoSrc
Merge elements from two source vectors into one with any shuffle mask.
@ SK_Reverse
Reverse the order of the vector.
@ SK_ExtractSubvector
ExtractSubvector Index indicates start offset.
void getPeelingPreferences(Loop *L, ScalarEvolution &SE, PeelingPreferences &PP) const
Get target-customized preferences for the generic loop peeling transformation.
InstructionCost getCallInstrCost(Function *F, Type *RetTy, ArrayRef< Type * > Tys, TTI::TargetCostKind CostKind=TTI::TCK_SizeAndLatency) const
InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind=TTI::TCK_SizeAndLatency, const Instruction *I=nullptr) const
CastContextHint
Represents a hint about the context in which a cast is used.
@ Reversed
The cast is used with a reversed load/store.
@ Masked
The cast is used with a masked load/store.
@ None
The cast is not used with a load/store of any kind.
@ Normal
The cast is used with a normal load/store.
@ Interleave
The cast is used with an interleaved load/store.
@ GatherScatter
The cast is used with a gather/scatter.
OperandValueKind
Additional information about an operand's possible values.
@ OK_UniformConstantValue
@ OK_NonUniformConstantValue
CacheLevel
The possible cache levels.
bool preferFixedOverScalableIfEqualCost() const
bool isLegalMaskedLoad(Type *DataType, Align Alignment) const
Return true if the target supports masked load.
The instances of the Type class are immutable: once they are created, they are never changed.
This is the common base class for vector predication intrinsics.
LLVM Value Representation.
Base class of all SIMD vector types.
constexpr char Args[]
Key for Kernel::Metadata::mArgs.
bool areInlineCompatible(const Function &Caller, const Function &Callee)
@ Fast
Attempts to make calls as fast as possible (e.g.
@ C
The default llvm calling convention, compatible with C.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Type
MessagePack types as defined in the standard, with the exception of Integer being divided into a sign...
This is an optimization pass for GlobalISel generic memory operations.
AtomicOrdering
Atomic ordering for LLVM's memory model.
ImmutablePass * createTargetTransformInfoWrapperPass(TargetIRAnalysis TIRA)
Create an analysis pass wrapper around a TTI object.
constexpr unsigned BitWidth
OutputIt move(R &&Range, OutputIt Out)
Provide wrappers to std::move which take ranges instead of having to pass begin/end explicitly.
@ DataAndControlFlowWithoutRuntimeCheck
Use predicate to control both data and control flow, but modify the trip count so that a runtime over...
@ DataWithEVL
Use predicated EVL instructions for tail-folding.
@ DataAndControlFlow
Use predicate to control both data and control flow.
@ DataWithoutLaneMask
Same as Data, but avoids using the get.active.lane.mask intrinsic to calculate the mask and instead i...
Implement std::hash so that hash_code can be used in STL containers.
This struct is a compact representation of a valid (non-zero power of two) alignment.
A CRTP mix-in that provides informational APIs needed for analysis passes.
A special type used by analysis passes to provide an address that identifies that particular analysis...
Attributes of a target dependent hardware loop.
bool canAnalyze(LoopInfo &LI)
HardwareLoopInfo()=delete
bool isHardwareLoopCandidate(ScalarEvolution &SE, LoopInfo &LI, DominatorTree &DT, bool ForceNestedLoop=false, bool ForceHardwareLoopPHI=false)
Information about a load/store intrinsic defined by the target.
unsigned short MatchingId
Value * PtrVal
This is the pointer that the intrinsic is loading from or storing to.
InterleavedAccessInfo * IAI
TailFoldingInfo(TargetLibraryInfo *TLI, LoopVectorizationLegality *LVL, InterleavedAccessInfo *IAI)
LoopVectorizationLegality * LVL
unsigned Insns
TODO: Some of these could be merged.
Returns options for expansion of memcmp. IsZeroCmp is.
SmallVector< unsigned, 8 > LoadSizes
unsigned NumLoadsPerBlock
bool AllowOverlappingLoads
SmallVector< unsigned, 4 > AllowedTailExpansions
bool isNegatedPowerOf2() const
OperandValueInfo getNoProps() const
OperandValueProperties Properties
bool AllowPeeling
Allow peeling off loop iterations.
bool AllowLoopNestsPeeling
Allow peeling off loop iterations for loop nests.
bool PeelProfiledIterations
Allow peeling basing on profile.
unsigned PeelCount
A forced peeling factor (the number of bodied of the original loop that should be peeled off before t...
Describe known properties for a set of pointers.
unsigned IsKnownStride
True if distance between any two neigbouring pointers is a known value.
static PointersChainInfo getKnownStride()
bool isUnitStride() const
unsigned IsUnitStride
These properties only valid if SameBaseAddress is set.
bool isKnownStride() const
unsigned IsSameBaseAddress
All the GEPs in a set have same base address.
static PointersChainInfo getUnitStride()
static PointersChainInfo getUnknownStride()
Flags describing the kind of vector reduction.
bool IsSigned
Whether the operation is a signed int reduction.
bool IsMaxOp
If the op a min/max kind, true if it's a max operation.
bool NoNaN
If op is an fp min/max, whether NaNs may be present.
Parameters that control the generic loop unrolling transformation.
unsigned Count
A forced unrolling factor (the number of concatenated bodies of the original loop in the unrolled loo...
bool UpperBound
Allow using trip count upper bound to unroll loops.
unsigned Threshold
The cost threshold for the unrolled loop.
bool Force
Apply loop unroll on any kind of loop (mainly to loops that fail runtime unrolling).
unsigned PartialOptSizeThreshold
The cost threshold for the unrolled loop when optimizing for size, like OptSizeThreshold,...
bool UnrollVectorizedLoop
Don't disable runtime unroll for the loops which were vectorized.
unsigned DefaultUnrollRuntimeCount
Default unroll count for loops with run-time trip count.
unsigned MaxPercentThresholdBoost
If complete unrolling will reduce the cost of the loop, we will boost the Threshold by a certain perc...
bool RuntimeUnrollMultiExit
Allow runtime unrolling multi-exit loops.
unsigned SCEVExpansionBudget
Don't allow runtime unrolling if expanding the trip count takes more than SCEVExpansionBudget.
unsigned UnrollAndJamInnerLoopThreshold
Threshold for unroll and jam, for inner loop size.
unsigned MaxIterationsCountToAnalyze
Don't allow loop unrolling to simulate more than this number of iterations when checking full unroll ...
bool AllowRemainder
Allow generation of a loop remainder (extra iterations after unroll).
bool UnrollAndJam
Allow unroll and jam. Used to enable unroll and jam for the target.
bool UnrollRemainder
Allow unrolling of all the iterations of the runtime loop remainder.
unsigned FullUnrollMaxCount
Set the maximum unrolling factor for full unrolling.
unsigned PartialThreshold
The cost threshold for the unrolled loop, like Threshold, but used for partial/runtime unrolling (set...
bool Runtime
Allow runtime unrolling (unrolling of loops to expand the size of the loop body even when the number ...
bool Partial
Allow partial unrolling (unrolling of loops to expand the size of the loop body, not only to eliminat...
unsigned OptSizeThreshold
The cost threshold for the unrolled loop when optimizing for size (set to UINT_MAX to disable).
bool AllowExpensiveTripCount
Allow emitting expensive instructions (such as divisions) when computing the trip count of a loop for...
unsigned MaxUpperBound
Set the maximum upper bound of trip count.
bool shouldDoNothing() const
VPTransform EVLParamStrategy
VPLegalization(VPTransform EVLParamStrategy, VPTransform OpStrategy)