3
\$\begingroup\$

I wrote a simple huffman coding algorithm for learning and practice. I just used the technique given on thewikipedia page.

Could you tell me my missing points and mistakes?

Node.java

package ged.gont.bst.huffmancode;public class Node implements Comparable<Node> {    private char letter;    private int freq;    private Node leftChild;    private Node rightChild;    public Node(char letter, int freq, Node leftChild, Node rightChild) {        this.letter = letter;        this.freq = freq;        this.leftChild = leftChild;        this.rightChild = rightChild;    }    public char getLetter() {        return letter;    }    public int getFreq() {        return freq;    }    public Node getLeftChild() {        return leftChild;    }    public Node getRightChild() {        return rightChild;    }    public boolean isLeaf() {        return this.leftChild == null && this.rightChild == null;    }    @Override    public int compareTo(Node arg0) {        return this.freq - arg0.freq;    }}

HuffmanCode.java

package ged.gont.bst.huffmancode;import java.util.LinkedHashMap;import java.util.Map;import java.util.PriorityQueue;public class HuffmanCode {    private Node root;    Map<Character, String> charMap = new LinkedHashMap<>();    /**     * Encodes string in which most used characters have min codeword length     *      * @param inputString compressed string     * @return encoded string     * @throws IllegelArgumentException if inputString contains invalid ASCII character      */    public String encode(String inputString) {        char[] letters = inputString.toCharArray();        Map<Character, Integer> charFreq = new LinkedHashMap<>();        PriorityQueue<Node> priorityQueue = new PriorityQueue<>();        String encodedString = "";        for (char c : letters) {            if ((int) c > 255) {                throw new IllegalArgumentException("Input contains invalid ASCII character");            }            if (charFreq.containsKey(c)) {                charFreq.put(c, charFreq.get(c) + 1);            } else {                charFreq.put(c, 1);            }        }        for (Character c : charFreq.keySet()) {            priorityQueue.offer(new Node(c, charFreq.get(c), null, null));        }        while (priorityQueue.size() > 1) {            Node leftChild = priorityQueue.remove();            Node rightChild = priorityQueue.remove();            priorityQueue.offer(new Node(Character.MIN_VALUE, leftChild.getFreq() + rightChild.getFreq(), leftChild, rightChild));        }        root = priorityQueue.remove();        generatePrefix(root, "");        for (int i = 0; i < inputString.length(); i++) {            encodedString += (charMap.get(inputString.charAt(i)));        }        return encodedString;    }    /**     * Generates prefix code in bit string format     *      * @param root     * @param code     */    private void generatePrefix(Node root, String prefix) {        if (!root.isLeaf()) {            generatePrefix(root.getLeftChild(), prefix.concat("0"));            generatePrefix(root.getRightChild(), prefix.concat("1"));        } else {            charMap.put(root.getLetter(), prefix);        }    }    /**     * Decodes the given encoded string     *      * @param encodedString     * @return decoded string     */    public String decode(String encodedString) {        String decodedString = "";        Node currentNode = root;        for (int i = 0; i < encodedString.length(); i++) {            if (encodedString.charAt(i) == '0') {                currentNode = currentNode.getLeftChild();            } else if (encodedString.charAt(i) == '1') {                currentNode = currentNode.getRightChild();            }            if (currentNode.isLeaf()) {                decodedString += currentNode.getLetter();                currentNode = root;            }        }        return decodedString;    }}

TestHuffmanCode.java

package ged.gont.testbst.testhuffmancode;import static org.junit.jupiter.api.Assertions.assertEquals;import org.junit.jupiter.api.*;import ged.gont.bst.huffmancode.*;public class TestHuffmanCode {    static HuffmanCode huffmanCode;    static String inputString;    static String expectedEncodedString;    @BeforeAll    public static void init() {        huffmanCode = new HuffmanCode();        inputString = "A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED";        expectedEncodedString = "1000011101001000110010011101100111001001000111110010011111011111100010001111110100111001001011111011101000111111001";    }    @Test    public void testEncode() {        assertEquals(expectedEncodedString, huffmanCode.encode(inputString));    }    @Test    public void testDecode() {        assertEquals(inputString, huffmanCode.decode(huffmanCode.encode(inputString)));    }}
200_success's user avatar
200_success
146k22 gold badges191 silver badges481 bronze badges
askedFeb 19, 2022 at 11:11
gedofgont's user avatar
\$\endgroup\$
4
  • 3
    \$\begingroup\$What's going on incode.prefix("0")? It doesn't look likecode is defined (but it looks like it used to be a parameter and isn't anymore)\$\endgroup\$CommentedFeb 19, 2022 at 12:06
  • 2
    \$\begingroup\$You represent the encoded text as a string consisting ofcharacters '0' and '1'. This is by the factor of 16 larger space than a java.util.BitSet would normally require.\$\endgroup\$CommentedFeb 19, 2022 at 12:45
  • 1
    \$\begingroup\$Is there a reason why you are not using standard classes for binary tree representation avaible in Java?\$\endgroup\$CommentedFeb 20, 2022 at 14:02
  • \$\begingroup\$@convert: suggest a JRE class. Then, gedofgont introducedfor learning and practice (tagging reinventing-the-wheel is definitewhere a wheel is well-known).\$\endgroup\$CommentedMar 13, 2022 at 9:42

1 Answer1

2
\$\begingroup\$

General

Classes not carefully designed for extension should be marked `final`.

root is problematic as a class-level variable. The API of HuffmanCode implies thatdecode() depends only on the encoded string, and that one could calldecode more than once with different values. That will fail, though. The code only works if you callencode directly before the correspondingdecode.

  • It would be better to localizeroot and do whatever computation is necessary fordecode in that method. This would mean changing the output ofencode so that the tree can be rebuilt`.
  • Another option would be to separate out the building of the tree into its own method, and then haveencode anddecode take the tree as an argument. The downside here is needing to track both the tree and the encoded string.
  • A third option is to build the HuffmanCode class using a staticencode factory method. The class would keep the tree as an instance variable. It could have a publictoString method to return the encoded value, and a publicdecode method to return the decoded value.

Which of these is best depends on your needs. But the API right now will result in cranky users callingencode() multiple times and then trying todecode() multiple times and getting nonsense results.

Node

This class is immutable, so all the variables can be marked asfinal. This conveys the intent that they cannot be changed, and prevents accidental modification after construction time.

This class is only intended for use inside its package, and should therefore have default (package-private) access, not public.

This class has one constructor, but it should have two, one for leaf node construction and one for internal node construction.

arg0 might be better named asnode orotherNode.

HuffmanCode

`charMap` should be `private`.

Abbreviations are confusing and should be avoided.characterFrequency would be preferable tocharFreq. LikewisecharMap could becharacterMap orcharacterEncodings.

characters can be inlined and still clear to readers.

letters is not a good variable name, since the values are actually ASCII characters, not letters.

It's not necessary to convert a char to an int for numerical comparisons. The compiler can do it natively.

It would be preferable if the invalid ASCII character exception contained both the invalid character and the input string.

Setting the map value to 1 or adding 1 to the map value is cleaner usingMap.merge(). It would look likecharFreq.merge(c, 1, Integer::sum);

The code could use an int[] instead of a map, since ASCII is a constrained set of ints from 0-255. Using anint[256] and incrementing the values there should be more time and space efficient. Prefer whichever is easier to read unless you have a documented performance bottleneck.

It's preferable to localize variables so they're initialized as close to where they're first used as is reasonable. The convention of declaring them at the top of a method is a holdover from languages where it's not possible to declare them later.

When modifyingStrings, it is preferable to useStringBuilder (orStringBuffer if thread safety is an issue). This is for efficiency reasons and clarity-of-intent reasons. The compiler will generally figure it out, but the hint doesn't hurt.

This block:

    for (int i = 0; i < inputString.length(); i++) {        encodedString.append(characterEncodings.get(inputString.charAt(i)));    }

can be replaced with

    for (char c : inputString.toCharArray()) {        encodedString.append(characterEncodings.get(c));    }

IngeneratePrefix, it's easier to read if the non-negated case is in theif part and the negation is in theelse.

generatePrefix is a confusing name, because it's really generating the character encodings.

IngeneratePrefix,prefix might be better namedencoding.root might be better namednode.

It would be preferable ifdecode() handled invalid input more aggressively, rather than silently continuing if it doesn't see a0 or1.

TestHuffmanCode

Only having one test is not especially useful. There should be tests for all boundary conditions, and a variety of interesting inputs. For instance, zero characters, one character, non-ASCII characters only, non-ascii characters embedded in ASCII characters, etc.
answeredMar 13, 2022 at 18:10
Eric Stein's user avatar
\$\endgroup\$
4
  • 1
    \$\begingroup\$class-level variable - what is that? Re. Java, I'd accept that designation forstatic data members. While I'd preferHuffmanCoDec, what's problematic about instance data memberroot inHuffmanCode? I agree that documentation of use and interdependency ofencode() &decode(), while present, is lacking. Then again:learning and practice.\$\endgroup\$CommentedMar 13, 2022 at 20:04
  • 1
    \$\begingroup\$@greybeardHuffmanCode hc = new HuffmanCode();String s1 = hc.encode("abcde");String s2 = hc.encode("wxyz");System.out.println(hc.decode(s1)); I would expect the last statement to actually decode s1. It will not, because root is now built to "wxyz". Either HuffmanCode is restricted to the string it's built from, in which case it needs to be constructed with the instance it's encoding, or it's not, in which case the implementation ofencode needs to change to embed the node tree.\$\endgroup\$CommentedMar 14, 2022 at 0:27
  • 1
    \$\begingroup\$And I fixed the typo. You can feel free to edit stuff like that directly in the future. :)\$\endgroup\$CommentedMar 14, 2022 at 11:55
  • \$\begingroup\$thank u so much for review and advise, i gonna try refactoring my code in accordance with your answer.\$\endgroup\$CommentedMar 21, 2022 at 19:58

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.