I've been working on the classic LFU (Least Frequently Used) cache running in O(1) time problem lately and, as a student, I kind of struggled with the algorithms I found online. I got most of the idea behind them, but it still was a nightmare replicating them. I've tried to start coding it from scratch. Here's my result (Kotlin):
package main// Object stored in the cache, could be anythingclass Object (val key: String, val value: Int)class LeastFrequentlyUsed (private val cacheLimit :Int) { // Holds objects private val cache = mutableListOf<Object>() // Keeps track of hashes -> index, used in get(key) => hash(key) => index private val indexLinks = mutableMapOf<Int,Int>() // Keeps track of calls private val calledObjects = mutableMapOf<Int, Int>() // Keeps track of the last inserted index private var lastInserted :Int = 0 // Collectors and printer for the LFU cache state private fun collectCache() :MutableList<Pair<String,Int>> { val collector = mutableListOf<Pair<String,Int>>() cache.forEach { collector.add(Pair(it.key, it.value)) } return collector } private fun collectCounters () :MutableList<Pair<String,Int>> { val collector = mutableListOf<Pair<String,Int>>() cache.forEach { collector.add(Pair( it.key, calledObjects[it.key.hashCode()]!! )) } return collector } private fun collectIndexLinks () :MutableList<Pair<String,Int>> { val collector = mutableListOf<Pair<String,Int>>() cache.forEach { collector.add(Pair( it.key, indexLinks[it.key.hashCode()]!! )) } return collector } fun printLfu() { println("Cache: ${collectCache()}") println("Counters: ${collectCounters()}") println("IndexLinks: ${collectIndexLinks()}") println() } // LFU setter fun set(key :String, value: Int) { val hashKey = key.hashCode() // If we reached the cache limit we need to take away the less used item in the cache if (lastInserted == cacheLimit) { // Lower lastInserted of 1 just to bring it back to max later lastInserted-- // Brings up least used item, stored in the last index of the cache val lessUsedItem = cache.last() // Removes the least used item from cache cache.removeLast() // Clears indexLinks from the element removed from cache indexLinks.remove(lessUsedItem.key.hashCode()) // Clears calledObjects from the element removed from cache calledObjects.remove(lessUsedItem.key.hashCode()) } // Adds the new element to cache cache.add(Object(key,value)) // Adds the new element to indexLinks indexLinks[hashKey] = lastInserted // Adds the new element to calledObjects calledObjects[hashKey] = 0 // Turns back lastInserted to max = cache limit lastInserted++ } // LFU getter fun get (key :String) :Object? { val hashKey = key.hashCode() // Verify if the key is present in the LFU indexLinks[hashKey] ?: return null // Updates the calledObjects counter calledObjects[hashKey] = calledObjects[hashKey]!!.plus(1) // Gets the current element position in the cache val currentItemIndex = indexLinks[hashKey]!! if (currentItemIndex > 0 ) { // Gets the previous element position in the cache, which is the current -1 val previousItemIndex = currentItemIndex - 1 // Gets the key of the previous item to find it in calledObjects and see its number of calls val previousItemKey = cache[previousItemIndex].key.hashCode() // Checks which is the most called: the current or the previous. // If is the current, swap the two if (calledObjects[hashKey]!! > calledObjects[previousItemKey]!!) { // Adjusts the position in the cache val temp = cache[previousItemIndex] cache[previousItemIndex] = cache[currentItemIndex] cache[currentItemIndex] = temp // Adjusts their indices in the indexLists map indexLinks[hashKey] = indexLinks[hashKey]!! - 1 indexLinks[previousItemKey] = indexLinks[previousItemKey]!! + 1 } } return cache[indexLinks[hashKey]!!] }}fun main () { val lfu = LeastFrequentlyUsed(3) lfu.set("Alice", 10) lfu.set("Bob", 20) lfu.set("Charlie",30) lfu.printLfu() lfu.get("Charlie") lfu.get("Charlie") lfu.set("Dean", 50) lfu.get("Dean") lfu.printLfu() lfu.get("gianni")}The code actually works, but would someone more experienced tell me how near to an actual solution I am? Any problems I did not see or hints for improvement?
Git gist of the code
2 Answers2
helpful comments
I didn't find this comment very helpful.
// Object stored in the cache, could be anythingclass Object (val key: String, val value: Int)It turns out we're actually talking aboutaCachedObject, so I'd appreciate it ifyou would describe it in that way.The running system contains many objects,and only a subset of them are cached.That is the subset we restrict our focus to ATM.
Additionally, I assume we have some distinctionbetween Things that are RAM resident versusThings that are stored on disk.You have not yet helped me to name those.If you threw me a bone, like "DB record",then already I'd be on familiar groundand we would be on the same page.
And having read all the way to the bottom,no, it turns out it can't be "anything",it is not an OOP object, it won't be e.g. a Callable.It is restricted to a(String, Int) tuple.It's just an entry in a key-value store.
odd whitespace
Why would we introducefoo() :Intwhen we could more naturally writefoo(): Int ?I was initially thinking in terms of a Ruby or Common Lisp:kwkeyword symbol.
I speak several ALGOL languages.I confess Kotlin is not my native tongue.Don't y'all have a pretty printer which adjudicates such details?If you're trying to communicate technical ideas to humans,don't obfuscate them with idiosyncratic spacing.
superlative
"Less" is relative.
// If ... we need to take away the LESS used item ... val lessUsedItem = ...Prefer "least" when you intend the superlative.There exists no item less used than the LFU,it being theleast frequently used.
hash collision
This makes perfect sense.
cache.removeLast()What is going on here?
indexLinks.remove(lessUsedItem.key.hashCode()) ... calledObjects.remove(lessUsedItem.key.hashCode())If there's a collision on the hash code, how could that possibly be correct?
Rather than eliding important information with the.hashCode() call,you might be better off working with the actual.keyand letting the underlying datastructure sweat the details.That datastructure might be different from the currently proposed mutable map.
Do me a favor.Go search the keyspace for a hash collision.(Youwill find one!)Then add that key pair to your automated test suite.It will help this codebase to better evolve.
// Adds the new element to indexLinks indexLinks[hashKey] = lastInsertedSame remark.We don't care to first see if we're overwriting a non-empty entry?!?I can't say that I'm following thecalledObjects assignment, either.Oh, wait,get() explains that it seems to be some sort of ref countor frequency count, got it.
comments are for the "why", not the "how"
These comments are absurd, please stop it, just stop it.
// Updates the calledObjects counter ... // Gets the current element position in the cache ... // Gets the previous element position in the cache, which is the current -1The code already eloquently told usexactly what the comments describe.Yet still I am left hungry, with more questions to ask.Like "what doescalledObjectsmean?"
I am starting to believe that you take "call" as asynonym for "use" or "reference", good.I wish you had taught me that in the Review Context,or closer to the top of the source code.
RegardingObject
getshould not return anObject, it should only return itsvalue.Objectis now only used internally inLeastFrequentlyUsed. To keep the namespace clean you should make it a private nested class ofLeastFrequentlyUsed.- You say that
Object"could be anything", but in reality it can only use a String key and an Int value. Instead of changing the comment you should make it come true: Use generics. Objectis not a good name since it may be confused with ajava.lang.Objectthat is the base class of everything in Java. I would recommend to use something more innocuous instead, likeItem.
You class should then look something like this:
class LeastFrequentlyUsed<K, V> { fun get(key: K): V? { // ... } // ... // Item that is stored in the cache private class Item<K, V>( val key: K, val value: V, )}Other minor improvements
LeastFrequentlyUsedis not a good class name because it doesn't describe what that is. The class is a cache, so a better name would beLeastFrequentlyUsedCache. Also thelfuvariable inmainwould be better calledcache.
You could even try to implementMutableMap<K, V>because a lot of the semantics are the same (you should only consider this when the rest is done), then you should name your classLeastFrequentlyUsedMapor something similar instead.A better name for
cacheLimitwould becapacity. That's also the name that a lot of the collections of the standard lib use.This capacity must be greater than 0, so you should fail fast when that isn't the case. A good location to verify that is an
initblock that will be executed during object creation:init { require(capacity > 0)}Probably just a typo, but the variable
lessUsedItemshould be renamed toleastUsedItemorlfuItemor even justlfu.You should make
getandsetanoperator:operator fun get(key: K): V? { /*...*/ }operator fun set(key: K, value: V) { /*...*/ }Then you can use a more concise syntax. The following does the same with the second line using the new syntax enabled by the operator:
cache.set("Alice", 10)cache["Alice"] = 10Somewhere in
getyou seem to just want to add 1 by usingcalledObjects[hashKey]!!.plus(1). Instead of using theplusfunction you could simply use+ 1.You might want to consider using a
Longfor your counter instead of anInt. If your cache would run for a long time and would be heavily used it is conceivable that you would reach theInt's maximum value, triggering an integer overflow. Have a look at themaximum values in Kotlin to decide what would be appropriate for you.Instead of your
printLfufunction you could usetoStringto return a string representation of your object. That is how it is usually done:override fun toString(): String = "Cache: ${collectCache()}\n" + "Counters: ${collectCounters()}\n" + "IndexLinks: ${collectIndexLinks()}\n"Instead of
cache.printLfu()you could then simply useprintln(cache).And finally, let your IDE help you with the formatting. If you use IntelliJ IDEA or Android Studio you can use the shortcutCtrl+Alt+L.
Serious issues
Unfortunately, your code exhibits some unwanted/undefined behavior. I do not think it works as intended.
Adding the same key multiple times to the cache won't overwrite the previous value, it will just add it as an additional item. That will result in a crash in certain constellations because you generously used
!!throughout your code. Try this to reproduce:cache["Alice"] = 10cache["Bob"] = 20cache["Alice"] = 30cache["Charlie"] = 40Much less of a problem, but probably also not the desired behavior is that whenever there is a tie for theleast frequently used item you pick themost recently used of these. You probably would want to use theleast recently used to break the tie, though.
Finally, there is a huge issue where this algorithm simply doesn't work as it should:
cache["Alice"] = 10cache["Bob"] = 20cache["Charlie"] = 30cache["Charlie"]cache["Dean"] = 40cache["Dean"]println(cache)cache["Eve"] = 50println(cache)Using only the first letter of the names: A and B are never accessed so their counter is 0. C and D are both accessed once. When D is inserted B is dropped. So far so good, but when E is inserted now you would expect A to be dropped, but insteadD is dropped:
Counters: [(Alice, 0), (Charlie, 1), (Dean, 1)]Counters: [(Alice, 0), (Charlie, 1), (Eve, 0)](only
Countersis shown for brevity's sake)The reason for that is that both C and D were accessed the same number of times. Since you only compare the current and the previous item's counter to determine if they should switch positions (C and D), you will never reach the itembefore the previous item (A) that should be dropped.
These issues cannot be easily fixed, a major redesign of your algorithm is necessary. This is out of scope here so I decided to reviveyour previous StackOverflow question where you asked the same thing. I posted a fixed version of this algorithm that should work as intended.
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

