2
\$\begingroup\$

I’m learningSwift and decided the Huffman Coding algorithm would be a good exercise while learning this new language. Below is a working version that encodes a string and decodes a string.

Here’s an example of how the API is used:

let huffEncoded = Huffman("MISSISSIPPI_RIVER!")let decode = huffEncoded.decode()
  1. I’d love to learn of anything I’m not doing that would be more swift-like.
  2. I’m wondering if my choice of using a class with a lot of static functions is a good approach or if a struct would be better. They seem like similar solutions but I’m not 100% sure the real difference past structs are value types and classes are reference types.
  3. I'm also not confident in how I used GCD, but I think that using a concurrent queue will be helpful if encoding or decoding a large string since this isn't updating a view.
  4. I’m happy to hear any suggestions on my approach to the algorithm itself
  5. My next step is to make this into a framework so it can be used by other code. Does my approach scale for this?

Thanks for any help!

class Huffman {    private var key = [String: String]()    private(set) var code = [String]()    lazy private var queue: DispatchQueue = {        return DispatchQueue(label: "huffman-queue", qos: .userInitiated, attributes: .concurrent)    }()    init(_ input: String) {        self.key = Huffman.getKey(for: input)        self.code = Huffman.encode(for: input, with: self.key)    }    func decode() -> String {        var word = ""        queue.sync { [unowned self] in            var reverseKey = [String:String]()            for (k, v) in self.key {                reverseKey[v] = k            }            for prefix in self.code {                if let letter = reverseKey[prefix] {                   word += letter                }            }        }        return word    }    static private func getKey(for input: String) -> [String: String] {        // sort letter frequency by decreasing count        let sortedFrequency = Array(input)            .reduce(into: [String: Int](), { freq, char in                let letter = String(char)                return freq[letter] = freq[letter, default: 0] + 1            })            .sorted(by: {$0.1 > $1.1})        // create queue of initial Nodes        let queue = sortedFrequency.map{ Node(name: $0.key, value: $0.value)}        // generate key by traversing tree        return Huffman.generateKey(for: Huffman.createTree(with: queue), prefix: "")    }    static private func encode(for input: String, with key: [String: String]) -> [String] {        var code = [String]()        let queue = DispatchQueue(label: "huffman-encode-queue", qos: .userInitiated, attributes: .concurrent)        queue.sync {            for char in input {                if let prefix = key[String(char)] {                    code.append(prefix)                }            }        }        return code    }    static private func generateKey(for node: Node, prefix: String) -> [String: String] {        var key = [String: String]()        if let left = node.left, let right = node.right {            key.merge(generateKey(for: left, prefix: prefix + "0"), uniquingKeysWith: {current,_ in current})            key.merge(generateKey(for: right, prefix: prefix + "1"), uniquingKeysWith: {current,_ in current})        }else {            key[node.name] = prefix        }        return key    }    static private func createTree(with queue: [Node]) -> Node {        var queue = queue        // until we have 1 root node, get subtree of least frequency        while queue.count > 1 {            let last = queue.count - 1            let node1 = queue[last]            let node2 = queue[last - 1]            // if we have a third then compare frequency to second            if let node3 = queue[safe: last - 2], node3.value + node2.value < node2.value + node1.value {                queue.remove(at: last - 1)                queue.remove(at: last - 2)                queue.append(Huffman.createRoot(with: node2, and: node3))            } else {                queue.removeLast()                queue.removeLast()                queue.append(Huffman.createRoot(with: node1, and: node2))            }        }        return queue[0]    }    static private func createRoot(with first: Node, and second: Node) -> Node {        return Node(name: "\(first.name)\(second.name)", value: first.value + second.value, left: first, right: second)    }}class Node {    var name: String    var value: Int    var left: Node?    var right: Node?    init(name: String, value: Int, left: Node? = nil, right: Node? = nil) {        self.name = name        self.value = value        self.left = left        self.right = right    }}extension Collection {    subscript (safe index: Index) -> Element? {        return indices.contains(index) ? self[index] : nil    }}
Sᴀᴍ Onᴇᴌᴀ's user avatar
Sᴀᴍ Onᴇᴌᴀ
29.6k16 gold badges46 silver badges203 bronze badges
askedDec 4, 2018 at 19:10
Turnipdabeets's user avatar
\$\endgroup\$
2
  • 1
    \$\begingroup\$Here is an alternative implementation, you might glean some ideas from it.\$\endgroup\$CommentedDec 4, 2018 at 22:30
  • \$\begingroup\$Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please seewhat you may and may not do after receiving answers.\$\endgroup\$CommentedDec 5, 2018 at 21:07

1 Answer1

2
\$\begingroup\$

Concurrency

In the encode and decode method you usequeue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.

Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).

Dispatching the code with GCD – if necessary – should be done by thecaller of these methods, not in the Huffman class itself.

The API

In its current form, the class seems of limited use to me.

let huffEncoded = Huffman("MISSISSIPPI_RIVER!")

builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to

  • retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,
  • retrieve the compressed string as a sequence of zeros and ones,
  • an initializer which takes a (previously created) Huffman tree,
  • a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.

Simplifications and other remarks

All properties inclass Node are never mutated, and can be declared as constants (withlet).

Infunc decode() the reverse dictionary mapping can be created with

let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))

instead of a loop. The following loop can be more simply done withcompactMap():

let word = code.compactMap({ reverseKey[$0] }).joined()

Infunc getKey() it is not necessary to create an array from the given string. Thereturn in the closure of thereduce() call is misleading: An assignment does not return anything.+= can be used for the increment. Using$n.value instead of$n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:

let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in        freq[String(char), default: 0] += 1    })    .sorted(by: {$0.value > $1.value})

Infunc encode(),compactMap() can be used again instead of a for loop:

let code = input.compactMap( { key[String($0)] } )

Infunc createTree() a “safe subscript” method (defined as an extensionof theCollection protocol) is used to access the third last queue element if it exists. Testingif queue.count >= 3 would seem more clear to me.

The Huffman tree algorithm

Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your program generates the codes

"D": 01"G": 11"F": 001"E": 101"A": 0001"C": 0000"H": 1000"B": 1001

with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.

answeredDec 4, 2018 at 21:37
Martin R's user avatar
\$\endgroup\$
4
  • \$\begingroup\$Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?\$\endgroup\$CommentedDec 4, 2018 at 21:47
  • \$\begingroup\$@Turnipdabeets: Here is an example: You start with weights1, 1, 1, 1, 1, 1. Then the last two are joined:1, 1, 1, 1, 2. Then1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.\$\endgroup\$CommentedDec 4, 2018 at 22:22
  • \$\begingroup\$Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.\$\endgroup\$CommentedDec 5, 2018 at 15:50
  • \$\begingroup\$I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array forcode or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance varcode is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.\$\endgroup\$CommentedDec 5, 2018 at 15:51

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.