This is my implementation of "merge sort" in C++. I'm still a newbie at C++ so:
- Have I understood how merge sort works and implemented it in the right way?
- Have I used any bad practices?
- How could the code be improved upon in terms of efficiency?
Criticism is welcome and appreciated!
#include <iostream>#include <vector>#include <string>#include <sstream>using namespace std;vector<float> merge(vector<float> firstHalf, vector<float> secondHalf){ vector<float> combined; for(int i = firstHalf.size() + secondHalf.size(); i > 0; i--){//merge two vectors if(firstHalf.back() > secondHalf.back() && !firstHalf.empty()){ combined.push_back(firstHalf.back()); firstHalf.pop_back(); }else if(!secondHalf.empty()){ combined.push_back(secondHalf.back()); secondHalf.pop_back(); } } vector<float> revCombined;//reverse merged vectors. Vectors don't have pop_front and I didn't want to use lists. for(int i = 0; i < combined.size(); i++){ revCombined.push_back(combined[combined.size()-i-1]); } return revCombined;}vector<float> mergeSort(vector<float> &inputArray){//for example [9, 8, 1] as input if(inputArray.size() > 1){ vector<float> firstHalf; vector<float> secondHalf; for(int i = 0; i < inputArray.size()/2; i++){//auto round the input array because size() returns int firstHalf.push_back(inputArray[i]); }//first half = [9, 8] for(int i = inputArray.size()/2; i < inputArray.size(); i++){ secondHalf.push_back(inputArray[i]); }//second half = [1] return merge(mergeSort(firstHalf), mergeSort(secondHalf)); } else{ return inputArray; }}vector<string> split(string str, char delimiter) { vector<string> internal; stringstream ss(str); // Turn the string into a stream. string tok; while(getline(ss, tok, delimiter)) { internal.push_back(tok); } return internal;}vector<float> floatVectorInput(){ string inputString; getline(cin, inputString); vector<string> stringArray = split(inputString, ' '); vector<float> array; for(int i = 0; i < stringArray.size(); i++){ array.push_back(stof(stringArray[i])); } return array; }int main(){ cout << "Array to sort (separate by spaces): " << endl; vector<float> inputArray = floatVectorInput(); vector<float> sorted = mergeSort(inputArray); cout << endl << "Sorted Array:" << endl; for(int i = 0; i < sorted.size(); i++){ cout << sorted[i]; if(i == sorted.size()-1){ cout << endl << endl; }else{ cout << ", "; } } return 0;}- \$\begingroup\$Have you tested your code? To me it seems that the check
if(firstHalf.back() > secondHalf.back() && !firstHalf.empty())will cause a segfault. Also, you keep creating and moving a lot of vectors, perhaps it is better to just keep one scratchpad vector and do the merging on it. Finally, you usually sort a vector by modifying it rather than creating another vector with elements sorted.\$\endgroup\$Raziman T V– Raziman T V2017-01-21 17:26:11 +00:00CommentedJan 21, 2017 at 17:26 - \$\begingroup\$Doesn't cause a Segmentation Fault for me\$\endgroup\$user2635139– user26351392017-01-21 17:34:53 +00:00CommentedJan 21, 2017 at 17:34
- \$\begingroup\$@RazimanT.V. The code seems to work correctly. You can play with it (and various inputs) here:ideone.com/rWbAsl\$\endgroup\$πάντα ῥεῖ– πάντα ῥεῖ2017-01-21 17:37:07 +00:00CommentedJan 21, 2017 at 17:37
- 2\$\begingroup\$The code performs
back()on provably empty vectors. That is undefined behaviour and just waiting for the right input to crash.\$\endgroup\$Raziman T V– Raziman T V2017-01-21 17:40:14 +00:00CommentedJan 21, 2017 at 17:40 - 2\$\begingroup\$What version of C++ are you using? If it isc++11 orc++14, please add the relevant tags to your question.\$\endgroup\$5gon12eder– 5gon12eder2017-01-21 18:15:02 +00:00CommentedJan 21, 2017 at 18:15
2 Answers2
merge callsback() on empty vector
The logic in yourmerge function has a bug.
if (firstHalf.back() > secondHalf.back() && !firstHalf.empty()) { … }The first problem is that you accessfirstHalf.back()before the check that!firstHalf.empty(). The checks should be in reversed order.
The second problem is that you're also accessingsecondHalf.back() without checking whether!secondHalf.empty().
While the code could be fixed with less changes, I suggest that you look at the problem again: Merging only really makes sense ifneither of the halves is empty. So let's break the problem into two parts:
- Merge two non-empty containers.
- Append the remaining elements to the already merged container.
std::vector<float> combined{};// Merge two non-empty containers.while (!firstHalf.empty() && !secondHalf.empty()) { if (firstHalf.back() > secondHalf.back()) { combined.push_back(firstHalf.back()); firstHalf.pop_back(); } else { combined.push_back(secondHalf.back()); secondHalf.pop_back(); }}// Append the remaining elements to the already merged container.while (!firstHalf.empty()) { combined.push_back(firstHalf.back()); firstHalf.pop_back();}while (!secondHalf.empty()) { combined.push_back(secondHalf.back()); secondHalf.pop_back();}This version is not only correct but also simpler to understand.
Input does not handle excess spaces correctly
If you feed your program with input that puts more than one space between two numbers, your program will crash.
While it is a legitimate choice to require exactly one space between subsequent numbers, it turns out that you can make the behavior more user-friendly andsimplify the code at the same time.
std::vector<float> floatVectorInput(){ std::vector<float> numbers{}; std::string line{}; std::getline(std::cin, line); std::istringstream iss{line}; float x; while (iss >> x) { numbers.push_back(x); } if (!iss.eof()) { throw std::runtime_error{"Garbage input"}; } return numbers; }Instead of first splitting into a vector of strings, I'm reading thefloats from thestd::istringstream directly. This will skip over any amount of white-space as desired.
The!iss.eof() check is there because I want to make sure that we stop because the input is exhausted, not because there was something that is not a floating-point number.
Avoidusing namespace std
I know that many C++ tutorials want you to put
using namespace std;into every file. I don't think that this is good advice and recommend against doing it. Importingnamespaces is a complex operation and you probably don't understand all its implications. For example, what will the following code print with and without theusing namespace std;?
#include <iostream>#include <utility>//using namespace std;void swap(double x, double y){ std::clog << "swap(" << x << ", " << y << ")\n";}int main(){ int a = 1; int b = 2; swap(a, b); swap(3, 4);}With the offending line commented out, the code will print
swap(1, 2)andswap(3, 4)as you'd probably expect. However, withusing namespace std, it will only print the second line.
What happened?
By
using namespace std, we've madestd::swap(defined in<utility>) visible. Since our ownswaptakesdoubles as parameters, calling it with an argument of typeintis not a perfect match. What the C++ compiler does is adding an implicit conversion frominttodouble. However, if there is also a function that doesn't require this conversion to happen, the compiler will prefer it. It just so happens thatstd::swapis atemplatethat will be a better match in this case. So why is only the first call resolved tostd::swapthen? This is becausestd::swaptakes its arguments as mutable references. A temporary object (like an integer literal in this case) doesn't bind to a mutable reference.
I understand that this is complicated stuff for a beginner and you probablyshouldn't have to worry about it at this point. But if you'reusing namespace std, you'll have to know it or you won't understand your code.
That said,using namespace std is also frowned upon in production-quality code (written by people who ought to understand the aforementioned language rules) so you're teaching yourself a bad habit by using it. Just be explicit and prefix symbols from the standard library withstd::. It will tell the reader at a glance where the symbol comes from which makes understanding the code easier for readers of any level of experience.
Beconst correct
YourmergeSort function doesn't modify its argument (sorts the vector in-place) but actually returns a new, sorted vector. Therefore, make its argumentconst.
std::vector<float> mergeSort(const std::vector<float>& inputArray);// ^^^^^You should do this with any variable that you don't intend to modify.
Use the correct integer types
This comment is actually wrong.
auto round the input array because size() returns int
std::vector::size is of typestd::vector::size_type which is probably an alias forstd::size_t. In any case, it is anunsigned integer type. Compiling your code with warnings enabled will warn you about this mixing ofsigned andunsigned types.
Unfortunately,std::size_t is anunsigned integer for historic reasons. This meas that you cannot let inidces go below zero which makes the loop conditions a bit more complicated in some places.
Makedouble your go-to floating-point type
Unless you have a good reason to usefloat, preferdouble for its greater range and precision. On modern hardware, you'll hardly notice a performance difference.
That's not to say there are no valid use-cases forfloat. Butunless you have such a case, your default-choice should be fordouble.
Prefer'\n' overstd::endl by default
std::endl does more than outputting a new line. It alsoflushes the output stream. Which is an expensive operation.
Sometimes, this is just what you want. For example, in this line,
cout << "Array to sort (separate by spaces): " << endl;you've used it correctly. The text must be visible before the instruction on the subsequent line gets executed or the program might start blocking for user input before the unlucky user was asked to provide some.
However, in all other places in your program, there is no need to flush. And output flushing isslow. It probably doesn't make any measurable difference for your small program but I've seen code where replacingstd::endl with'\n' literally made the code ten times faster! And as a bonus: it's also less typing.
Consider usingauto type deduction (C++11)
C++11 introduced a great feature:auto type deduction. This means that if you're declaring a variable and immediately initialize it with some value, you don't have to spell out its type. The compiler willdeduce it for you.
So, for example, you could replace
vector<float> sorted = mergeSort(inputArray);by
auto sorted = mergeSort(inputArray);and the compiler will figure out for you thatsorted is of typevector<float>.
Usingauto systematically can help you avoid repeating the same information over and over again and also avoid doing some silly mistakes that happen when youthink you know the type.
(Applying the earlier advice, we should useconst auto here, of course.)
Avoid index-basedfor loops (C++11)
You're using C-stylefor loops throughout your code. While they are notwrong, they're less readable than ideal and allow you to make some avoidable bugs with the index bookkeeping. In some cases, they might also perform worse than ideal.
Since C++11, you can iterate over any standard library container using the following idiom, known asrange-basedfor loop.
Before:
for (int i = 0; i < stringArray.size(); i++) { array.push_back(stof(stringArray[i]));}After:
for (auto&& s : stringArray) { array.push_back(stof(s));}Notice how I am usingauto type deduction here. The type ofs will be deduced tostd::string& here but you don't necessarily have to worry about it. You can always useauto&& if you don't care about the actual type. Preferconst auto& if you don't intend to modify the object, though. Don't use plainauto as it will make an unnecessary copy.
Prefer standard library algorithms over raw loops (intermediate level)
The previous item notwithstanding, you should try to avoid raw loops in your code at all, if possible. (Search for Sean Parent's great “Better Code” talks and in particular for his “no raw loops” advice.)
For example, the loop from above could become this.
std::transform( std::begin(stringArray), std::end(stringArray), std::back_inserter(array), [](const std::string& s){ return std::stof(s); });I understand that this might not be very easy to understand for a beginner and it is perfectly acceptable to write your own loops at this point. But once you become more familiar with C++, standard algorithms are definitely something to look into.
(Taking this further, production code would of course use the standard library'sstd::stable_sort algorithm instead of rolling its own merge sort implementation. But implementing algorithms yourself for learning is a good exercise.)
Make the code generic (intermediate level)
Your sorting logic can currently only be used for sortingstd::vector<float>. You could generalize it to work with any container of any type that defines a comparison function.
This is not a defect in your code. If you only have to sortstd::vector<float>, that's fine. In fact, it is better to write specific code that is correct than to write generic code that is flawed. Once you've learned abouttemplates andmove semantics, you can attempt to make your code generic.
Avoid unnecessary copies
I liked this comment in your code.
Reverse merged vectors. Vectors don't have pop_front and I didn't want to use lists.
Indeed, using lists would be a bad idea here (and almost everywhere else, too). However, there is no need to create another vector. For one, you can reverse the elements of a vectorin-place. In this case, however, an even better solution would be to use astd::deque (which haspop_front) instead of astd:vector.
Of course, you could also re-work your code such that you don't actually pop the items off the front of your vector but merely iterate over it.
There are some other places where your code makes copies that could be avoided. For example, the splitting into thefirstHalf andsecondHalf vectors isn't necessary. You could instead always pass a reference to the original vector and a pair of indices. Alternatively (and preferably) useiterators.
An expert would probably also try to avoid the allocations for the merged intermediary results and instead use a single working array. But that is probably too complicated for now.
The naming could be improved slightly
Your code is generally very readable but at some places, I'd suggest looking carefully at the chosen names. For example,internal doesn't really tell me anything useful. More as a matter of taste, I'd avoid names likestringArray in favor ofstrings. Especially since it is astd::vector and not astd::array.
Use tools to detect easy bugs
Some of the issues pointed out in this review could easily have been found by tools.
The most important tool is your compiler. Always compile with a high level of warnings enabled and treat them as errors. For GCC and Clang, I recommend you use the-Wall,-Wextra,-Werror and-pedantic options. There are even more warnings available but these should cover a good part.
If your C++ implementation supports it, build debug builds with a “checked STL”. That is, a modified standard library that performs more checking than required (or even allowed) by the standard. When using GCC, this is really simple. All you have to do is compile with the-D_GLIBCXX_DEBUG option.
This cannot find all issues, however. So in addition, I recommend that you compile your code with asanitizer. GCC and Clang both support the-fsanitize=TOOL flag whereTOOL is the name of the sanitizer to use. I recommend that you useaddress andundefined routinely. Using these switches will instrument your binaries with some run-time checks.
Alternatively – or, preferably, in addition – you can useValgrind which is a tool to instrument your binaries on the fly.
Instrumentation of your code is only useful, however, if you have appropriatetests that execute the instrumented code. In your case, testing would be easy. Just write a function that callsmergeSort with a number of interesting inputs. “Interesting” in this case might be: an empty vector, a vector of a single element, a sorted vector, a vector sorted in reverse, a vector with duplicate elements, a random vector, etc. Of course, the test should also verify that the output is sorted correctly. Re-run these tests whenever you make modifications to your code to be alerted of newly introduced bugs immediately.
- \$\begingroup\$Wow!! Fantastic! This is the most insightful thing I've read in a while! What a brilliant answer! Thank You!\$\endgroup\$user2635139– user26351392017-01-21 22:38:48 +00:00CommentedJan 21, 2017 at 22:38
- \$\begingroup\$Here are my improvements so far:github.com/BenjaminThomas1999/Merge-Sort/blob/master/main.cpp\$\endgroup\$user2635139– user26351392017-01-21 22:40:24 +00:00CommentedJan 21, 2017 at 22:40
- Have I understood how merge sort works and implemented it in the right way?
You understood the algorithm correctly, and your code yields the desired results.
- Have I used any bad practices?
- How could the code be improved upon in terms of efficiency?
See my further points below please:
1. Don't useusing namespace std;
While that would work in your particular case, it's considered bad practice. Especially when you move out your code to separate header files.
See more details here please:
Why is “using namespace std;” considered bad practice?
2. Check constraints in order
This code may call undefined behavior, if the constraints aren't checked first (logical boolean arithmetic operations are executed in order):
if(firstHalf.back() > secondHalf.back() && !firstHalf.empty()){Since there's the possibility callingstd::vector::back() with an empty vector, that code should be
if(!firstHalf.empty() && !secondHalf.empty() && firstHalf.back() > secondHalf.back()){to avoid calling the offensive statements
3. Simplify your data input processing
Your data input function can be simplified a lot. You don't need anothersplit() function to do this. Simply use astd::istringstream to do this:
vector<float> floatVectorInput(){ string inputString; getline(cin, inputString); vector<float> array; std::istringstream iss(inputString); float val; while(iss >> val){ array.push_back(val); } return array; }4. Prefer loops and dynamically allocated stacks over recursion
Recursive function calls like
return merge(mergeSort(firstHalf), mergeSort(secondHalf));always are limited to the (OS defined) stack size regarding the call stack depth.
On large input lists this may bail out with a Stack Overflow error.
You can simply replace that with astd::stack<std::pair<std::vector<float>,std::vector<float>>> structure that is handled within a loop.
- \$\begingroup\$Vectors are being modified inside, so const won't work.\$\endgroup\$Raziman T V– Raziman T V2017-01-21 20:07:08 +00:00CommentedJan 21, 2017 at 20:07
- \$\begingroup\$@RazimanT.V. Ah, the
pop_back()stuff, yes. THX, I've been overlooking that.\$\endgroup\$πάντα ῥεῖ– πάντα ῥεῖ2017-01-21 20:09:06 +00:00CommentedJan 21, 2017 at 20:09
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

