I wrote a simple huffman coding algorithm for learning and practice. I just used the technique given on thewikipedia page.
Could you tell me my missing points and mistakes?
Node.java
package ged.gont.bst.huffmancode;public class Node implements Comparable<Node> { private char letter; private int freq; private Node leftChild; private Node rightChild; public Node(char letter, int freq, Node leftChild, Node rightChild) { this.letter = letter; this.freq = freq; this.leftChild = leftChild; this.rightChild = rightChild; } public char getLetter() { return letter; } public int getFreq() { return freq; } public Node getLeftChild() { return leftChild; } public Node getRightChild() { return rightChild; } public boolean isLeaf() { return this.leftChild == null && this.rightChild == null; } @Override public int compareTo(Node arg0) { return this.freq - arg0.freq; }}HuffmanCode.java
package ged.gont.bst.huffmancode;import java.util.LinkedHashMap;import java.util.Map;import java.util.PriorityQueue;public class HuffmanCode { private Node root; Map<Character, String> charMap = new LinkedHashMap<>(); /** * Encodes string in which most used characters have min codeword length * * @param inputString compressed string * @return encoded string * @throws IllegelArgumentException if inputString contains invalid ASCII character */ public String encode(String inputString) { char[] letters = inputString.toCharArray(); Map<Character, Integer> charFreq = new LinkedHashMap<>(); PriorityQueue<Node> priorityQueue = new PriorityQueue<>(); String encodedString = ""; for (char c : letters) { if ((int) c > 255) { throw new IllegalArgumentException("Input contains invalid ASCII character"); } if (charFreq.containsKey(c)) { charFreq.put(c, charFreq.get(c) + 1); } else { charFreq.put(c, 1); } } for (Character c : charFreq.keySet()) { priorityQueue.offer(new Node(c, charFreq.get(c), null, null)); } while (priorityQueue.size() > 1) { Node leftChild = priorityQueue.remove(); Node rightChild = priorityQueue.remove(); priorityQueue.offer(new Node(Character.MIN_VALUE, leftChild.getFreq() + rightChild.getFreq(), leftChild, rightChild)); } root = priorityQueue.remove(); generatePrefix(root, ""); for (int i = 0; i < inputString.length(); i++) { encodedString += (charMap.get(inputString.charAt(i))); } return encodedString; } /** * Generates prefix code in bit string format * * @param root * @param code */ private void generatePrefix(Node root, String prefix) { if (!root.isLeaf()) { generatePrefix(root.getLeftChild(), prefix.concat("0")); generatePrefix(root.getRightChild(), prefix.concat("1")); } else { charMap.put(root.getLetter(), prefix); } } /** * Decodes the given encoded string * * @param encodedString * @return decoded string */ public String decode(String encodedString) { String decodedString = ""; Node currentNode = root; for (int i = 0; i < encodedString.length(); i++) { if (encodedString.charAt(i) == '0') { currentNode = currentNode.getLeftChild(); } else if (encodedString.charAt(i) == '1') { currentNode = currentNode.getRightChild(); } if (currentNode.isLeaf()) { decodedString += currentNode.getLetter(); currentNode = root; } } return decodedString; }}TestHuffmanCode.java
package ged.gont.testbst.testhuffmancode;import static org.junit.jupiter.api.Assertions.assertEquals;import org.junit.jupiter.api.*;import ged.gont.bst.huffmancode.*;public class TestHuffmanCode { static HuffmanCode huffmanCode; static String inputString; static String expectedEncodedString; @BeforeAll public static void init() { huffmanCode = new HuffmanCode(); inputString = "A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED"; expectedEncodedString = "1000011101001000110010011101100111001001000111110010011111011111100010001111110100111001001011111011101000111111001"; } @Test public void testEncode() { assertEquals(expectedEncodedString, huffmanCode.encode(inputString)); } @Test public void testDecode() { assertEquals(inputString, huffmanCode.decode(huffmanCode.encode(inputString))); }}- 3\$\begingroup\$What's going on in
code.prefix("0")? It doesn't look likecodeis defined (but it looks like it used to be a parameter and isn't anymore)\$\endgroup\$user555045– user5550452022-02-19 12:06:03 +00:00CommentedFeb 19, 2022 at 12:06 - 2\$\begingroup\$You represent the encoded text as a string consisting ofcharacters '0' and '1'. This is by the factor of 16 larger space than a java.util.BitSet would normally require.\$\endgroup\$coderodde– coderodde2022-02-19 12:45:51 +00:00CommentedFeb 19, 2022 at 12:45
- 1\$\begingroup\$Is there a reason why you are not using standard classes for binary tree representation avaible in Java?\$\endgroup\$convert– convert2022-02-20 14:02:31 +00:00CommentedFeb 20, 2022 at 14:02
- \$\begingroup\$@convert: suggest a JRE class. Then, gedofgont introduced
for learning and practice(tagging reinventing-the-wheel is definitewhere a wheel is well-known).\$\endgroup\$greybeard– greybeard2022-03-13 09:42:48 +00:00CommentedMar 13, 2022 at 9:42
1 Answer1
General
Classes not carefully designed for extension should be marked `final`.root is problematic as a class-level variable. The API of HuffmanCode implies thatdecode() depends only on the encoded string, and that one could calldecode more than once with different values. That will fail, though. The code only works if you callencode directly before the correspondingdecode.
- It would be better to localize
rootand do whatever computation is necessary fordecodein that method. This would mean changing the output ofencodeso that the tree can be rebuilt`. - Another option would be to separate out the building of the tree into its own method, and then have
encodeanddecodetake the tree as an argument. The downside here is needing to track both the tree and the encoded string. - A third option is to build the HuffmanCode class using a static
encodefactory method. The class would keep the tree as an instance variable. It could have a publictoStringmethod to return the encoded value, and a publicdecodemethod to return the decoded value.
Which of these is best depends on your needs. But the API right now will result in cranky users callingencode() multiple times and then trying todecode() multiple times and getting nonsense results.
Node
This class is immutable, so all the variables can be marked asfinal. This conveys the intent that they cannot be changed, and prevents accidental modification after construction time.
This class is only intended for use inside its package, and should therefore have default (package-private) access, not public.
This class has one constructor, but it should have two, one for leaf node construction and one for internal node construction.
arg0 might be better named asnode orotherNode.
HuffmanCode
`charMap` should be `private`.Abbreviations are confusing and should be avoided.characterFrequency would be preferable tocharFreq. LikewisecharMap could becharacterMap orcharacterEncodings.
characters can be inlined and still clear to readers.
letters is not a good variable name, since the values are actually ASCII characters, not letters.
It's not necessary to convert a char to an int for numerical comparisons. The compiler can do it natively.
It would be preferable if the invalid ASCII character exception contained both the invalid character and the input string.
Setting the map value to 1 or adding 1 to the map value is cleaner usingMap.merge(). It would look likecharFreq.merge(c, 1, Integer::sum);
The code could use an int[] instead of a map, since ASCII is a constrained set of ints from 0-255. Using anint[256] and incrementing the values there should be more time and space efficient. Prefer whichever is easier to read unless you have a documented performance bottleneck.
It's preferable to localize variables so they're initialized as close to where they're first used as is reasonable. The convention of declaring them at the top of a method is a holdover from languages where it's not possible to declare them later.
When modifyingStrings, it is preferable to useStringBuilder (orStringBuffer if thread safety is an issue). This is for efficiency reasons and clarity-of-intent reasons. The compiler will generally figure it out, but the hint doesn't hurt.
This block:
for (int i = 0; i < inputString.length(); i++) { encodedString.append(characterEncodings.get(inputString.charAt(i))); }can be replaced with
for (char c : inputString.toCharArray()) { encodedString.append(characterEncodings.get(c)); }IngeneratePrefix, it's easier to read if the non-negated case is in theif part and the negation is in theelse.
generatePrefix is a confusing name, because it's really generating the character encodings.
IngeneratePrefix,prefix might be better namedencoding.root might be better namednode.
It would be preferable ifdecode() handled invalid input more aggressively, rather than silently continuing if it doesn't see a0 or1.
TestHuffmanCode
Only having one test is not especially useful. There should be tests for all boundary conditions, and a variety of interesting inputs. For instance, zero characters, one character, non-ASCII characters only, non-ascii characters embedded in ASCII characters, etc.- 1\$\begingroup\$
class-level variable- what is that? Re. Java, I'd accept that designation forstaticdata members. While I'd preferHuffmanCoDec, what's problematic about instance data memberrootinHuffmanCode? I agree that documentation of use and interdependency ofencode()&decode(), while present, is lacking. Then again:learning and practice.\$\endgroup\$greybeard– greybeard2022-03-13 20:04:33 +00:00CommentedMar 13, 2022 at 20:04 - 1\$\begingroup\$@greybeard
HuffmanCode hc = new HuffmanCode();String s1 = hc.encode("abcde");String s2 = hc.encode("wxyz");System.out.println(hc.decode(s1));I would expect the last statement to actually decode s1. It will not, because root is now built to "wxyz". Either HuffmanCode is restricted to the string it's built from, in which case it needs to be constructed with the instance it's encoding, or it's not, in which case the implementation ofencodeneeds to change to embed the node tree.\$\endgroup\$Eric Stein– Eric Stein2022-03-14 00:27:40 +00:00CommentedMar 14, 2022 at 0:27 - 1\$\begingroup\$And I fixed the typo. You can feel free to edit stuff like that directly in the future. :)\$\endgroup\$Eric Stein– Eric Stein2022-03-14 11:55:10 +00:00CommentedMar 14, 2022 at 11:55
- \$\begingroup\$thank u so much for review and advise, i gonna try refactoring my code in accordance with your answer.\$\endgroup\$gedofgont– gedofgont2022-03-21 19:58:01 +00:00CommentedMar 21, 2022 at 19:58
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

