Given integer valuesx andy, C and C++ both return as the quotientq = x/y the floor of the floating point equivalent. I'm interested in a method of returning the ceiling instead. For example,ceil(10/5)=2 andceil(11/5)=3.
The obvious approach involves something like:
q = x / y;if (q * y < x) ++q;This requires an extra comparison and multiplication; and other methods I've seen (used in fact) involve casting as afloat ordouble. Is there a more direct method that avoids the additional multiplication (or a second division) and branch, and that also avoids casting as a floating point number?
- 107the divide instruction often returns both quotient and remainder at the same time so there's no need to multiply, just
q = x/y + (x % y != 0);is enoughphuclv– phuclv2014-01-25 11:17:22 +00:00CommentedJan 25, 2014 at 11:17 - 1@LưuVĩnhPhúc Seriously you need to add that as the answer. I just used that for my answer during a codility test. It worked like a charm though I am not certain how the mod part of the answer works but it did the job.Zachary Kraus– Zachary Kraus2014-08-26 00:56:45 +00:00CommentedAug 26, 2014 at 0:56
- 3@AndreasGrapentin the answer below by Miguel Figueiredo was submitted nearly a year before Lưu Vĩnh Phúc left the comment above. While I understand how appealing and elegant Miguel's solution is, I'm not inclined to change the accepted answer at this late date. Both approaches remain sound. If you feel strongly enough about it, I suggest you show your support by up-voting Miguel's answer below.andand– andand2014-08-26 02:51:48 +00:00CommentedAug 26, 2014 at 2:51
- 3Strange, I have not seen any sane measurement or analysis of the proposed solutions. You talk about speed on near-the-bone, but there is no discussion of architectures, pipelines, branching instructions and clock cycles.Rado– Rado2016-12-18 19:35:38 +00:00CommentedDec 18, 2016 at 19:35
- 1See also:stackoverflow.com/questions/63436490/…andand– andand2021-05-08 03:23:40 +00:00CommentedMay 8, 2021 at 3:23
12 Answers12
For positive numbers where you want to find the ceiling (q) of x when divided by y.
unsigned int x, y, q;To round up ...
q = (x + y - 1) / y;or (avoiding overflow in x+y)
q = 1 + ((x - 1) / y); // if x != 010 Comments
x/y is the ceiling of the division. C90 didn't specify how to round, and I don't think the current C++ standard does either.x == 0 ? 0 : 1 + ((x - 1) / y) resolve this safely and efficiently?For positive numbers:
q = x/y + (x % y != 0);6 Comments
idiv remainder trick, ending up calling it twice even with /O2. godbolt proof:godbolt.org/z/YM16j3xes . gcc generates a nice minimal assembly with oneidiv atestne and anadd. clang too, oneidiv only, acmp and a weirdsbb eax,-1Sparky's answer is one standard way to solve this problem, but as I also wrote in my comment, you run the risk of overflows. This can be solved by using a wider type, but what if you want to dividelong longs?
Nathan Ernst's answer provides one solution, but it involves a function call, a variable declaration and a conditional, which makes it no shorter than the OPs code and probably even slower, because it is harder to optimize.
My solution is this:
q = (x % y) ? x / y + 1 : x / y;It will be slightly faster than the OPs code, because the modulo and the division is performed using the same instruction on the processor, because the compiler can see that they are equivalent. At least gcc 4.4.1 performs this optimization with -O2 flag on x86.
In theory the compiler might inline the function call in Nathan Ernst's code and emit the same thing, but gcc didn't do that when I tested it. This might be because it would tie the compiled code to a single version of the standard library.
As a final note, none of this matters on a modern machine, except if you are in an extremely tight loop and all your data is in registers or the L1-cache. Otherwise all of these solutions will be equally fast, except for possibly Nathan Ernst's, which might be significantly slower if the function has to be fetched from main memory.
9 Comments
q = (x > 0)? 1 + (x - 1)/y: (x / y);q = x / y + (x % y > 0); is easier than? : expression?You could use thediv function in cstdlib to get the quotient & remainder in a single call and then handle the ceiling separately, like in the below
#include <cstdlib>#include <iostream>int div_ceil(int numerator, int denominator){ std::div_t res = std::div(numerator, denominator); return res.rem ? (res.quot + 1) : res.quot;}int main(int, const char**){ std::cout << "10 / 5 = " << div_ceil(10, 5) << std::endl; std::cout << "11 / 5 = " << div_ceil(11, 5) << std::endl; return 0;}3 Comments
return res.quot + !!res.rem; :)std::div is overloaded forint,long,long long andintmax_t (the latter two since C++11); whether it internally promotes would be an implementation detail (and I can't see a strong reason for why they wouldn't implement it independently for each).ldiv promotes, butstd::div shouldn't need to.There's a solution for both positive and negativex but only for positivey with just 1 division and without branches:
int div_ceil(int x, int y) { return x / y + (x % y > 0);}Note, ifx is positive then division is towards zero, and we should add 1 if reminder is not zero.
Ifx is negative then division is towards zero, that's what we need, and we will not add anything becausex % y is not positive
5 Comments
x andy were negative, both{ x, y } < -1, but not divisible, your approach would end up being either truncated or floored division, because(x % y) < 0 even though the quotient is>= 0.yHow about this? (requires y non-negative, so don't use this in the rare case where y is a variable with no non-negativity guarantee)
q = (x > 0)? 1 + (x - 1)/y: (x / y);I reducedy/y to one, eliminating the termx + y - 1 and with it any chance of overflow.
I avoidx - 1 wrapping around whenx is an unsigned type and contains zero.
For signedx, negative and zero still combine into a single case.
Probably not a huge benefit on a modern general-purpose CPU, but this would be far faster in an embedded system than any of the other correct answers.
2 Comments
I would have rather commented but I don't have a high enough rep.
As far as I am aware, for positive arguments and a divisor which is a power of 2, this is the fastest way (tested in CUDA):
//example y=8q = (x >> 3) + !!(x & 7);For generic positive arguments only, I tend to do it like so:
q = x/y + !!(x % y);2 Comments
q = x/y + !!(x % y); stacks up againstq = x/y + (x % y == 0); and theq = (x + y - 1) / y; solutions performance-wise in contemporary CUDA.q = x/y + (x % y == 0); should beq = x/y + (x % y != 0); insteadThis works for positive or negative numbers:
q = x / y + ((x % y != 0) ? !((x > 0) ^ (y > 0)) : 0);If there is a remainder, checks to see ifx andy are of the same sign and adds1 accordingly.
2 Comments
!((x > 0) ^ (y > 0)) - what a convoluted way of saying( x <= 0 )^( 0 < y ) - you're essentially trying to say "sign matches", orXNOR - so just invert one side of thexor equation, then you can skip the logical negate altogetherFor signed or unsigned integers.
q = x / y + !(((x < 0) != (y < 0)) || !(x % y));
For signed dividends and unsigned divisors.
q = x / y + !((x < 0) || !(x % y));
For unsigned dividends and signed divisors.
q = x / y + !((y < 0) || !(x % y));
For unsigned integers.
q = x / y + !!(x % y);
Zero divisor fails (as with a native operation). Cannotcause overflow.
Corresponding floored and moduloconstexpr implementations here, along with templates to select the necessary overloads (as full optimization and to prevent mismatched sign comparison warnings):
https://github.com/libbitcoin/libbitcoin-system/wiki/Integer-Division-Unraveled
Comments
simplified generic form,
int div_up(int n, int d) { return n / d + (((n < 0) ^ (d > 0)) && (n % d));} //i.e. +1 iff (not exact int && positive result)For a more generic answer,C++ functions for integer division with well defined rounding strategy
Comments
With the usual caveats about profiling if this really matters (it won't unless you're doing this A LOT):
As @phuclv says, on modern processors quotient and remainder will be calculated in one instruction. All these assume unsigned numbers without worrying about overflow. With x86-64 GCC -O3
unsigned int f(unsigned int x, unsigned int y){ return x / y + (x % y != 0);}produces
mov eax, edixor edx, edx # zero edxdiv esi # divides edx:eax (y) by esi (x) # eax = quotient, edx = remaindercmp edx, 1 # set CF = (edx - 1 < 0), i.e. edx == 0sbb eax, -1 # eax -= CF - 1, i.e. eax += 1 - CF, no branchrethttps://godbolt.org/z/4Gsn3Kj5s
unsigned int f(unsigned int x, unsigned int y){ return (x + y - 1) / y;}is clever and uses lea to do the addition and subtraction
lea eax, [rsi-1+rdi]xor edx, edxdiv esirethttps://godbolt.org/z/9sfsc1Wa5
For 64-bit inputs, the results are similar but with 64-bit registers instead.
I would guess LEA is faster than CMP/SBB asLEA is a fast instruction, but I didn't benchmark anything.
In a deleted answer, @Matt suggests the remainder increment version is faster, but his g++ compile command didn't include optimization flag which is supsect.
Comments
Compile with O3, The compiler performs optimization well.
q = x / y;if (x % y) ++q;Comments
Explore related questions
See similar questions with these tags.












