Implementation of Huffman Coding

Question 1

This is an implementation of Huffman Coding that works on ASCII values. I simplifiedmain to show an example of user input to the program. I only removed handling of non-printable ASCII characters, as that is not something I need reviewed. Initially, I tried to usestd::priority_queue<std::unique_ptr<node>>, but you cannot easily remove these pointers due tostd::priority_queue::top returning a const reference, so I had to usestd::shared_ptr. More or less, I just want to know if there are any glaring issue with how I used the standard library or guideline violations.

#include <iostream>#include <map>#include <memory>#include <queue>#include <string>#include <unordered_map>#include <utility>#include <optional>#include <sstream>/// node/// node in huffman treestruct node {    node(char c, int freq, std::shared_ptr<node> left, std::shared_ptr<node> right)        : c {c}        , freq {freq}        , left {std::move(left)}        , right {std::move(right)}    {}    node(int freq, std::shared_ptr<node> left, std::shared_ptr<node> right)        : freq {freq}        , left {std::move(left)}        , right {std::move(right)}    {}    std::optional<char> c;    int freq;    std::shared_ptr<node> left;    std::shared_ptr<node> right;};/// build_huffman_codings/// traverses a huffman tree and finds encodings for all characters/// \param root root of huffman tree or subtree/// \param accumulator prefix of huffman code gathered before root/// \param codings mapping of character to its final encodingvoid build_huffman_codings(const std::shared_ptr<node>& root,                           std::map<char, std::string>& codings,                           std::string accumulator = "") {    // leaf node adds to codings    if (!root->left && !root->right) {        codings[root->c.value()] = accumulator;        return;    }    // left branch    if (root->left) {        build_huffman_codings(root->left, codings, accumulator + "0");    }    // right branch    if (root->right) {        build_huffman_codings(root->right, codings, accumulator + "1");    }}/// huffman/// compute huffman codings for given frequencies of characters/// \param freq mapping of character and frequencies/// \return mapping of character to a binary representationstd::map<char, std::string> huffman(const std::map<char, int>& freq) {    // pre-allocate nodes    std::vector<std::shared_ptr<node>> nodes;    for (const auto& pair : freq) {        nodes.emplace_back(std::make_shared<node>(pair.first, pair.second, nullptr, nullptr));    }    // compare freq of node pointers    auto compare_greater = [] (const auto& p1, const auto& p2) {        return p1->freq > p2->freq;    };    // priority queue holds nodes in increasing order by frequency    std::priority_queue<std::shared_ptr<node>,        std::vector<std::shared_ptr<node>>,        decltype(compare_greater)>        queue {compare_greater, std::move(nodes)};    const std::size_t size = queue.size();    // repeat size - 1 times    for (std::size_t i = 1; i < size; ++i) {        // remove first two nodes        std::shared_ptr<node> x = queue.top();        queue.pop();        std::shared_ptr<node> y = queue.top();        queue.pop();        // add new node        queue.emplace(std::make_shared<node>(x->freq + y->freq, x, y));    }    std::map<char, std::string> codings;    build_huffman_codings(queue.top(), codings);    return codings;}int main() {    // store character with its frequency    std::map<char, int> frequencies;    // example user input - real implementation handles non-printable ascii    std::stringstream freq_txt {        "a 5\n"        "b 25\n"        "c 7\n"        "d 15\n"        "e 4\n"        "f 12\n"    };    char c;    int f;    while (freq_txt) {        freq_txt >> c;        freq_txt >> f;        frequencies[c] = f;    }    // huffman code table    const auto codings = huffman(frequencies);    for (const auto& pair : codings) {        std::cout << pair.first << ' ' << pair.second << '\n';    }}

Question 2

At first glance, I see a a lot of good usage of the standard library. So, the remarks I have are rather small:

in huffman(), you create a vectornodes, as you know exactly the amount of elements to store into it, I would reserve it. This reduces memory usage slightly and improves performance. (nodes.reserve(freq.size()))
In the same function, you don't seem to have handling forsize == 0, most likely because of the simplification?
a small optimization in the for-loop, you can usestd::move() onx andy.
In build_huffman_condings, you havestd::string accumulator ="". It's better to write= std::string{} as this doesn't requirestrlen to be called.

On the more high level, I am wondering about the used datastructures, especiallystd::map. I knowstd::unordered_map ain't an ideal hash-map, though, it does perform better thanstd::map if you don't need ordering. I don't see in your code that need, hence I would recommend replacing it. (You can also use abseil ... for a better implementation, or as the values are small, boosts flat_map)

In short, your usage of the standard library looks fine.

Question 3

Thanks for responding. I used map over unordered_map because my calling code needed to output the table with ascii values in order.

Question 4

Even than, if you know the characters, depending on the length of the text, it might be beneficial to use hash_map and at the end loop over all characters and locate. Though that all depends on the use cases.

Question 5

Sincequeue.top() returns aconst shared_ptr& - and even if I madex andy const, should I still usemove on them? I'm getting warnings to not move const variables. I'm assuming it will handle the reference counts correctly and the warning may not apply in this situation.

Question 6

You indeed make a copy of the shared_ptr when creating variable x, which is needed. However, when passing it along to the new node, you could move it.

JVApen 1,1916 silver badges17 bronze badges · Accepted Answer · 2020-04-26 08:23:37Z

At first glance, I see a a lot of good usage of the standard library. So, the remarks I have are rather small:

in huffman(), you create a vectornodes, as you know exactly the amount of elements to store into it, I would reserve it. This reduces memory usage slightly and improves performance. (nodes.reserve(freq.size()))
In the same function, you don't seem to have handling forsize == 0, most likely because of the simplification?
a small optimization in the for-loop, you can usestd::move() onx andy.
In build_huffman_condings, you havestd::string accumulator ="". It's better to write= std::string{} as this doesn't requirestrlen to be called.

On the more high level, I am wondering about the used datastructures, especiallystd::map. I knowstd::unordered_map ain't an ideal hash-map, though, it does perform better thanstd::map if you don't need ordering. I don't see in your code that need, hence I would recommend replacing it. (You can also use abseil ... for a better implementation, or as the values are small, boosts flat_map)

In short, your usage of the standard library looks fine.

Thanks for responding. I used map over unordered_map because my calling code needed to output the table with ascii values in order.
Even than, if you know the characters, depending on the length of the text, it might be beneficial to use hash_map and at the end loop over all characters and locate. Though that all depends on the use cases.
Sincequeue.top() returns aconst shared_ptr& - and even if I madex andy const, should I still usemove on them? I'm getting warnings to not move const variables. I'm assuming it will handle the reference counts correctly and the warning may not apply in this situation.
You indeed make a copy of the shared_ptr when creating variable x, which is needed. However, when passing it along to the new node, you could move it.

Movatterモバイル変換

Stack Exchange Network

Implementation of Huffman Coding

1 Answer1

You mustlog in to answer this question.

Related

Hot Network Questions

Subscribe to RSS