This is an implementation of Huffman Coding that works on ASCII values. I simplifiedmain to show an example of user input to the program. I only removed handling of non-printable ASCII characters, as that is not something I need reviewed. Initially, I tried to usestd::priority_queue<std::unique_ptr<node>>, but you cannot easily remove these pointers due tostd::priority_queue::top returning a const reference, so I had to usestd::shared_ptr. More or less, I just want to know if there are any glaring issue with how I used the standard library or guideline violations.
#include <iostream>#include <map>#include <memory>#include <queue>#include <string>#include <unordered_map>#include <utility>#include <optional>#include <sstream>/// node/// node in huffman treestruct node { node(char c, int freq, std::shared_ptr<node> left, std::shared_ptr<node> right) : c {c} , freq {freq} , left {std::move(left)} , right {std::move(right)} {} node(int freq, std::shared_ptr<node> left, std::shared_ptr<node> right) : freq {freq} , left {std::move(left)} , right {std::move(right)} {} std::optional<char> c; int freq; std::shared_ptr<node> left; std::shared_ptr<node> right;};/// build_huffman_codings/// traverses a huffman tree and finds encodings for all characters/// \param root root of huffman tree or subtree/// \param accumulator prefix of huffman code gathered before root/// \param codings mapping of character to its final encodingvoid build_huffman_codings(const std::shared_ptr<node>& root, std::map<char, std::string>& codings, std::string accumulator = "") { // leaf node adds to codings if (!root->left && !root->right) { codings[root->c.value()] = accumulator; return; } // left branch if (root->left) { build_huffman_codings(root->left, codings, accumulator + "0"); } // right branch if (root->right) { build_huffman_codings(root->right, codings, accumulator + "1"); }}/// huffman/// compute huffman codings for given frequencies of characters/// \param freq mapping of character and frequencies/// \return mapping of character to a binary representationstd::map<char, std::string> huffman(const std::map<char, int>& freq) { // pre-allocate nodes std::vector<std::shared_ptr<node>> nodes; for (const auto& pair : freq) { nodes.emplace_back(std::make_shared<node>(pair.first, pair.second, nullptr, nullptr)); } // compare freq of node pointers auto compare_greater = [] (const auto& p1, const auto& p2) { return p1->freq > p2->freq; }; // priority queue holds nodes in increasing order by frequency std::priority_queue<std::shared_ptr<node>, std::vector<std::shared_ptr<node>>, decltype(compare_greater)> queue {compare_greater, std::move(nodes)}; const std::size_t size = queue.size(); // repeat size - 1 times for (std::size_t i = 1; i < size; ++i) { // remove first two nodes std::shared_ptr<node> x = queue.top(); queue.pop(); std::shared_ptr<node> y = queue.top(); queue.pop(); // add new node queue.emplace(std::make_shared<node>(x->freq + y->freq, x, y)); } std::map<char, std::string> codings; build_huffman_codings(queue.top(), codings); return codings;}int main() { // store character with its frequency std::map<char, int> frequencies; // example user input - real implementation handles non-printable ascii std::stringstream freq_txt { "a 5\n" "b 25\n" "c 7\n" "d 15\n" "e 4\n" "f 12\n" }; char c; int f; while (freq_txt) { freq_txt >> c; freq_txt >> f; frequencies[c] = f; } // huffman code table const auto codings = huffman(frequencies); for (const auto& pair : codings) { std::cout << pair.first << ' ' << pair.second << '\n'; }}1 Answer1
At first glance, I see a a lot of good usage of the standard library. So, the remarks I have are rather small:
- in huffman(), you create a vector
nodes, as you know exactly the amount of elements to store into it, I would reserve it. This reduces memory usage slightly and improves performance. (nodes.reserve(freq.size())) - In the same function, you don't seem to have handling for
size == 0, most likely because of the simplification? - a small optimization in the for-loop, you can use
std::move()onxandy. - In build_huffman_condings, you have
std::string accumulator ="". It's better to write= std::string{}as this doesn't requirestrlento be called.
On the more high level, I am wondering about the used datastructures, especiallystd::map. I knowstd::unordered_map ain't an ideal hash-map, though, it does perform better thanstd::map if you don't need ordering. I don't see in your code that need, hence I would recommend replacing it. (You can also use abseil ... for a better implementation, or as the values are small, boosts flat_map)
In short, your usage of the standard library looks fine.
- \$\begingroup\$Thanks for responding. I used map over unordered_map because my calling code needed to output the table with ascii values in order.\$\endgroup\$Brady Dean– Brady Dean2020-04-26 12:59:24 +00:00CommentedApr 26, 2020 at 12:59
- \$\begingroup\$Even than, if you know the characters, depending on the length of the text, it might be beneficial to use hash_map and at the end loop over all characters and locate. Though that all depends on the use cases.\$\endgroup\$JVApen– JVApen2020-04-26 15:52:41 +00:00CommentedApr 26, 2020 at 15:52
- \$\begingroup\$Since
queue.top()returns aconst shared_ptr&- and even if I madexandyconst, should I still usemoveon them? I'm getting warnings to not move const variables. I'm assuming it will handle the reference counts correctly and the warning may not apply in this situation.\$\endgroup\$Brady Dean– Brady Dean2020-04-27 02:08:36 +00:00CommentedApr 27, 2020 at 2:08 - \$\begingroup\$You indeed make a copy of the shared_ptr when creating variable x, which is needed. However, when passing it along to the new node, you could move it.\$\endgroup\$JVApen– JVApen2020-04-27 10:54:45 +00:00CommentedApr 27, 2020 at 10:54
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

