2
\$\begingroup\$

I've been working on the classic LFU (Least Frequently Used) cache running in O(1) time problem lately and, as a student, I kind of struggled with the algorithms I found online. I got most of the idea behind them, but it still was a nightmare replicating them. I've tried to start coding it from scratch. Here's my result (Kotlin):

package main// Object stored in the cache, could be anythingclass Object (val key: String, val value: Int)class LeastFrequentlyUsed (private val cacheLimit :Int) {   // Holds objects   private val cache = mutableListOf<Object>()   // Keeps track of hashes -> index, used in get(key) => hash(key) => index   private val indexLinks = mutableMapOf<Int,Int>()   // Keeps track of calls   private val calledObjects = mutableMapOf<Int, Int>()   // Keeps track of the last inserted index   private var lastInserted :Int = 0   // Collectors and printer for the LFU cache state   private fun collectCache() :MutableList<Pair<String,Int>> {      val collector = mutableListOf<Pair<String,Int>>()      cache.forEach {         collector.add(Pair(it.key, it.value))      }      return collector   }   private fun collectCounters () :MutableList<Pair<String,Int>> {      val collector = mutableListOf<Pair<String,Int>>()      cache.forEach {         collector.add(Pair(          it.key,            calledObjects[it.key.hashCode()]!!         ))      }      return collector   }   private fun collectIndexLinks () :MutableList<Pair<String,Int>> {      val collector = mutableListOf<Pair<String,Int>>()      cache.forEach {         collector.add(Pair(            it.key,            indexLinks[it.key.hashCode()]!!         ))      }      return collector   }   fun printLfu() {      println("Cache: ${collectCache()}")      println("Counters: ${collectCounters()}")      println("IndexLinks: ${collectIndexLinks()}")      println()   }   // LFU setter   fun set(key :String, value: Int) {      val hashKey = key.hashCode()      // If we reached the cache limit we need to take away the less used item in the cache      if (lastInserted == cacheLimit) {         // Lower lastInserted of 1 just to bring it back to max later         lastInserted--         // Brings up least used item, stored in the last index of the cache         val lessUsedItem = cache.last()         // Removes the least used item from cache         cache.removeLast()         // Clears indexLinks from the element removed from cache         indexLinks.remove(lessUsedItem.key.hashCode())         // Clears calledObjects from the element removed from cache         calledObjects.remove(lessUsedItem.key.hashCode())      }      // Adds the new element to cache      cache.add(Object(key,value))      // Adds the new element to indexLinks      indexLinks[hashKey] = lastInserted      // Adds the new element to calledObjects      calledObjects[hashKey] = 0      // Turns back lastInserted to max = cache limit      lastInserted++   }   // LFU getter   fun get (key :String) :Object? {      val hashKey = key.hashCode()      // Verify if the key is present in the LFU      indexLinks[hashKey] ?: return null      // Updates the calledObjects counter      calledObjects[hashKey] = calledObjects[hashKey]!!.plus(1)      // Gets the current element position in the cache      val currentItemIndex = indexLinks[hashKey]!!      if (currentItemIndex > 0 ) {         // Gets the previous element position in the cache, which is the current -1         val previousItemIndex = currentItemIndex - 1         // Gets the key of the previous item to find it in calledObjects and see its number of calls         val previousItemKey = cache[previousItemIndex].key.hashCode()         // Checks which is the most called: the current or the previous.         // If is the current, swap the two         if (calledObjects[hashKey]!! > calledObjects[previousItemKey]!!) {            // Adjusts the position in the cache            val temp = cache[previousItemIndex]            cache[previousItemIndex] = cache[currentItemIndex]            cache[currentItemIndex] = temp            // Adjusts their indices in the indexLists map            indexLinks[hashKey] = indexLinks[hashKey]!! - 1            indexLinks[previousItemKey] = indexLinks[previousItemKey]!! + 1         }      }      return cache[indexLinks[hashKey]!!]   }}fun main () {   val lfu = LeastFrequentlyUsed(3)   lfu.set("Alice", 10)   lfu.set("Bob", 20)   lfu.set("Charlie",30)   lfu.printLfu()   lfu.get("Charlie")   lfu.get("Charlie")   lfu.set("Dean", 50)   lfu.get("Dean")   lfu.printLfu()   lfu.get("gianni")}

The code actually works, but would someone more experienced tell me how near to an actual solution I am? Any problems I did not see or hints for improvement?

Git gist of the code

toolic's user avatar
toolic
16.4k6 gold badges29 silver badges220 bronze badges
askedJun 15, 2024 at 22:17
NicolaM94's user avatar
\$\endgroup\$

2 Answers2

3
\$\begingroup\$

helpful comments

I didn't find this comment very helpful.

// Object stored in the cache, could be anythingclass Object (val key: String, val value: Int)

It turns out we're actually talking aboutaCachedObject, so I'd appreciate it ifyou would describe it in that way.The running system contains many objects,and only a subset of them are cached.That is the subset we restrict our focus to ATM.

Additionally, I assume we have some distinctionbetween Things that are RAM resident versusThings that are stored on disk.You have not yet helped me to name those.If you threw me a bone, like "DB record",then already I'd be on familiar groundand we would be on the same page.

And having read all the way to the bottom,no, it turns out it can't be "anything",it is not an OOP object, it won't be e.g. a Callable.It is restricted to a(String, Int) tuple.It's just an entry in a key-value store.

odd whitespace

Why would we introducefoo() :Intwhen we could more naturally writefoo(): Int ?I was initially thinking in terms of a Ruby or Common Lisp:kwkeyword symbol.

I speak several ALGOL languages.I confess Kotlin is not my native tongue.Don't y'all have a pretty printer which adjudicates such details?If you're trying to communicate technical ideas to humans,don't obfuscate them with idiosyncratic spacing.

superlative

"Less" is relative.

      // If ... we need to take away the LESS used item ...         val lessUsedItem = ...

Prefer "least" when you intend the superlative.There exists no item less used than the LFU,it being theleast frequently used.

hash collision

This makes perfect sense.

         cache.removeLast()

What is going on here?

         indexLinks.remove(lessUsedItem.key.hashCode())         ...         calledObjects.remove(lessUsedItem.key.hashCode())

If there's a collision on the hash code, how could that possibly be correct?

Rather than eliding important information with the.hashCode() call,you might be better off working with the actual.keyand letting the underlying datastructure sweat the details.That datastructure might be different from the currently proposed mutable map.

Do me a favor.Go search the keyspace for a hash collision.(Youwill find one!)Then add that key pair to your automated test suite.It will help this codebase to better evolve.

      // Adds the new element to indexLinks      indexLinks[hashKey] = lastInserted

Same remark.We don't care to first see if we're overwriting a non-empty entry?!?I can't say that I'm following thecalledObjects assignment, either.Oh, wait,get() explains that it seems to be some sort of ref countor frequency count, got it.

comments are for the "why", not the "how"

These comments are absurd, please stop it, just stop it.

      // Updates the calledObjects counter      ...      // Gets the current element position in the cache      ...         // Gets the previous element position in the cache, which is the current -1

The code already eloquently told usexactly what the comments describe.Yet still I am left hungry, with more questions to ask.Like "what doescalledObjectsmean?"

I am starting to believe that you take "call" as asynonym for "use" or "reference", good.I wish you had taught me that in the Review Context,or closer to the top of the source code.

answeredJun 15, 2024 at 23:37
J_H's user avatar
\$\endgroup\$
0
\$\begingroup\$

RegardingObject

  1. get should not return anObject, it should only return itsvalue.
  2. Object is now only used internally inLeastFrequentlyUsed. To keep the namespace clean you should make it a private nested class ofLeastFrequentlyUsed.
  3. You say thatObject "could be anything", but in reality it can only use a String key and an Int value. Instead of changing the comment you should make it come true: Use generics.
  4. Object is not a good name since it may be confused with ajava.lang.Object that is the base class of everything in Java. I would recommend to use something more innocuous instead, likeItem.

You class should then look something like this:

class LeastFrequentlyUsed<K, V> {    fun get(key: K): V? {        // ...    }    // ...    // Item that is stored in the cache    private class Item<K, V>(        val key: K,        val value: V,    )}

Other minor improvements

  1. LeastFrequentlyUsed is not a good class name because it doesn't describe what that is. The class is a cache, so a better name would beLeastFrequentlyUsedCache. Also thelfu variable inmain would be better calledcache.
    You could even try to implementMutableMap<K, V> because a lot of the semantics are the same (you should only consider this when the rest is done), then you should name your classLeastFrequentlyUsedMap or something similar instead.

  2. A better name forcacheLimit would becapacity. That's also the name that a lot of the collections of the standard lib use.

  3. This capacity must be greater than 0, so you should fail fast when that isn't the case. A good location to verify that is aninit block that will be executed during object creation:

    init {    require(capacity > 0)}
  4. Probably just a typo, but the variablelessUsedItem should be renamed toleastUsedItem orlfuItem or even justlfu.

  5. You should makeget andset anoperator:

    operator fun get(key: K): V? { /*...*/ }operator fun set(key: K, value: V) { /*...*/ }

    Then you can use a more concise syntax. The following does the same with the second line using the new syntax enabled by the operator:

    cache.set("Alice", 10)cache["Alice"] = 10
  6. Somewhere inget you seem to just want to add 1 by usingcalledObjects[hashKey]!!.plus(1). Instead of using theplus function you could simply use+ 1.

  7. You might want to consider using aLong for your counter instead of anInt. If your cache would run for a long time and would be heavily used it is conceivable that you would reach theInt's maximum value, triggering an integer overflow. Have a look at themaximum values in Kotlin to decide what would be appropriate for you.

  8. Instead of yourprintLfu function you could usetoString to return a string representation of your object. That is how it is usually done:

    override fun toString(): String =    "Cache: ${collectCache()}\n" +    "Counters: ${collectCounters()}\n" +    "IndexLinks: ${collectIndexLinks()}\n"

    Instead ofcache.printLfu() you could then simply useprintln(cache).

  9. And finally, let your IDE help you with the formatting. If you use IntelliJ IDEA or Android Studio you can use the shortcutCtrl+Alt+L.

Serious issues

Unfortunately, your code exhibits some unwanted/undefined behavior. I do not think it works as intended.

  1. Adding the same key multiple times to the cache won't overwrite the previous value, it will just add it as an additional item. That will result in a crash in certain constellations because you generously used!! throughout your code. Try this to reproduce:

    cache["Alice"] = 10cache["Bob"] = 20cache["Alice"] = 30cache["Charlie"] = 40
  2. Much less of a problem, but probably also not the desired behavior is that whenever there is a tie for theleast frequently used item you pick themost recently used of these. You probably would want to use theleast recently used to break the tie, though.

  3. Finally, there is a huge issue where this algorithm simply doesn't work as it should:

    cache["Alice"] = 10cache["Bob"] = 20cache["Charlie"] = 30cache["Charlie"]cache["Dean"] = 40cache["Dean"]println(cache)cache["Eve"] = 50println(cache)

    Using only the first letter of the names: A and B are never accessed so their counter is 0. C and D are both accessed once. When D is inserted B is dropped. So far so good, but when E is inserted now you would expect A to be dropped, but insteadD is dropped:

    Counters: [(Alice, 0), (Charlie, 1), (Dean, 1)]Counters: [(Alice, 0), (Charlie, 1), (Eve, 0)]

    (onlyCounters is shown for brevity's sake)

    The reason for that is that both C and D were accessed the same number of times. Since you only compare the current and the previous item's counter to determine if they should switch positions (C and D), you will never reach the itembefore the previous item (A) that should be dropped.

These issues cannot be easily fixed, a major redesign of your algorithm is necessary. This is out of scope here so I decided to reviveyour previous StackOverflow question where you asked the same thing. I posted a fixed version of this algorithm that should work as intended.

answeredJun 18, 2024 at 0:37
tyg's user avatar
\$\endgroup\$

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.