Given a string, find the first non-repeating character in it andreturn its index. If it doesn't exist, return -1.
Examples:
s = "leetcode"return 0.s = "loveleetcode",return 2.Note: You may assume the string contain only lowercase letters.
My solution
public class FirstUniqueCharacter { public int firstUniqChar(String s) { LinkedHashMap<Character, Integer> map = new LinkedHashMap<>(); for (char c : s.toCharArray()) { if (Objects.isNull(map.get(c))) { map.put(c, 1); } else { int count = (Integer) map.get(c); map.put(c, ++count); } } for (Map.Entry e : map.entrySet()) { int value = (int) e.getValue(); Character c = (Character) e.getKey(); if (value == 1) { return s.indexOf(c); } } return -1; }}The statistics shows it’s still slower than 60% of the solutions submitted in terms of speed and space. How can I improve it? Any comments regarding code readibility are most welcome.
- 1\$\begingroup\$
string contain only lowercase letters- Constraints are not aligned with theoriginal problem statement on Leetcode and stated in a way which is slightly confusing. Also, technically, it's not a quote.\$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024-05-06 15:30:15 +00:00CommentedMay 6, 2024 at 15:30
3 Answers3
You don’t needObjects.isNull(...). Simply using!= null would suffice, and avoids an extra function call so should be faster (unless the compiler can optimize the call out).
Your loop at the end is using raw types. You should use
for(Map.Entry<Character, Integer> e : map.entrySey()) {instead, for type safety. Your castings then are unnecessary.
There is no need to fetch the characterc = e.getKey() unlessvalue == 1 is true; you can move that inside theif for a minor performance gain.
Counting the character occurrences is slightly dangerous: you could overflow anInteger, or even aLong with a long enough string. Simply flagging the character as “seen once” or “more than once” avoids the counting overflow bug.
Themap.put(c, 1) call returns the previous value stored in the map, ornull if no value was stored. Instead of fetching the value, testing whether it was present or not, and then storing another value, why not store & fetch in one operation? Then, if a value was already present, you can flag it as occurring twice (or more).
if( map.put(c, 1) != null ) map.put(c, 2);Since you are no longer counting, you don’t need aLinkedHashMap<Character, Integer>. You just have 3 states: not seen, seen once, and seen more than once. ABoolean can cover this.Boolean.TRUE is seen once (unique),Boolean.FALSE is seen more than once, and not present (null) is never seen.
LinkedHashMap<Character, Boolean> unique = new LinkedHashMap<>();for(char c: ...) { if (unique.put(c, Boolean.TRUE) != null) unique.put(c, Boolean.FALSE);}We’ve saved a tiny bit of space, since we only have twoBoolean objects, instead of several (possibly interned)Integer objects. Much more importantly, we’ve avoided auto boxing, so this should be much faster.
We are still wasting time storing bothBoolean.TRUE andBoolean.FALSE successively on the third and subsequent occurrences of any character.
If we always storeBoolean.FALSE, then on the first occurrencenull will be returned. We can detect that and overwrite it withBoolean.TRUE instead, so the exceptional first occurrence has 2 mapput operations, but subsequent occurrences only use 1 mapput operation, for better performance.
if (unique.put(c, Boolean.FALSE) == null) unique.put(c, Boolean.TRUE);To truly gain speed and reduce memory usage, avoid clunkyHashSet<> memory structures, and store the data yourself. You are told you can assume only lowercase letters are used, so you can use anew byte[26] array for “not seen”, “seen once”, and “seen multiple times” storage. And use anew char[26] array to maintain encounter order of “first seen” characters.
You can use a bit more memory and store the index of the first seen characters in anew int[26], so you can avoid the linears.indexOf(c) search at the end. You could even use0 for not seen,index+1 for seen once, and-1 for seen multiple times, and avoid thenew byte[26] flag storage.
- 1\$\begingroup\$Never assume that a "letter" means something between a-z.compart.com/en/unicode/category/Ll Regards, Your Scandinavian fellow\$\endgroup\$TorbenPutkonen– TorbenPutkonen2019-05-02 05:38:16 +00:00CommentedMay 2, 2019 at 5:38
- \$\begingroup\$@TorbenPutkonen The task description explicitly says you can assume that.\$\endgroup\$RoToRa– RoToRa2019-05-02 11:21:40 +00:00CommentedMay 2, 2019 at 11:21
- 1\$\begingroup\$@RoToRa Actually it says “You may assume the string contains only lowercase letters.”. It doesn’t say anything about ‘a-z’. On the other hand , it doesn’t mention the vulgarities ofUnicode combining characters. The “C” version of the problem uses
char *for its strings, but that doesn’t preclude multibyte character encodings either. Even with ‘a-z’ restrictions, using EBCDIC encoding would complicate the storage into just 26 array entries. Usingbyte[256]storage would be better, but still wrong for some encodings.\$\endgroup\$AJNeufeld– AJNeufeld2019-05-02 14:31:53 +00:00CommentedMay 2, 2019 at 14:31
Pay attention to Constraints
While cracking algorithmic problems, pay very close attention to the constraints of the problem at hand. They can often give valuable clues on how to attack the problem.
If you're attending an interview, and the interviewer didn't mention any input constraints, it's always good to raise a question about it.
According to theproblem statement on Leetcode the input string is guaranteed to contain onlylowercase English letters. I.e. we have only up to26 unique characters in the string.
And also each character is guaranteed to be represented by a singlechar, which is not the case with some languages, but you're unlikely to encounter such scenario when tasked with an algorithmic problem.
UsingString.toCharArray() is Costly
This method haslinear time complexity (not constant) and requires additionalO(n) space.
Before Java 9 when the strings were backed by achar[], the methodtoCharArray() would return itscopy (because exposing the internal storage will break the encapsulation and undermine the immutability ofString). And starting from Java 9 so-called compact strings backed bybyte[] were introduced, andtoCharArray() generateschar[] on demand based on the encoding of a particular string (which for now is eitherUTF-16 orLATIN1).
For a large string, the memory footprint of the code you shared will be dominated by thechar[] produced bytoCharArray(), not by the map of characters which is confined to at most26 entries.
Instead of paying this cost, you can simply iterate over the character indices and reduce the space complexity fromO(n) toO(1).
Leveraging the problem constraints
This problem can be solved in asingle iteration over the given string and by utilizingO(1) additional space. More precisely, by using asingle integer array that has26 elements, storing character indices.
Here's how it can be implemented:
private static final int NOT_PRESENT = -1;private static final int DUPLICATE = -2;private static final char MIN_LETTER = 'a';public static int firstUniqueChar(String input) { return getFirst(letterIndices(input));}private static int[] letterIndices(String input) { final int[] indices = initializeIndices(); for (int i = 0; i < input.length(); i++) { int position = input.charAt(i) - MIN_LETTER; indices[position] = indices[position] == NOT_PRESENT ? i : DUPLICATE; } return indices;}private static int[] initializeIndices() { int[] indices = new int[26]; Arrays.fill(indices, NOT_PRESENT); return indices;}private static int getFirst(final int[] indices) { int first = NOT_PRESENT; for (int i: indices) { if (i == DUPLICATE || i == NOT_PRESENT) continue; first = first == NOT_PRESENT ? i : Math.min(first, i); } return first;}Note: one might ask what about all these method calls, aren't they causing performance overhead? (and the interviewer might ask it as well to test your knowledge)
To understand how to answer this question, you need to have some knowledge aboutJIT compilation. In short, JIT compiler identifies the most frequently utilized code parts (hotspots), then inlines method calls and reorders instructions to optimize performance. The shorter a method is, the quicker it gets optimized. There's no advantage neither in piling code into one method, no in making use of built-in functionality offered by the JDK (such asMath.min() andArrays.fill()).
Minor points
- "Never abbrev" (a joke from Martin Fowler). Don't skimp on characters, use proper names, code quality has no nothing to do with number of keystrokes spared.
firstUniqChar ->firstUniqueChar
Leetcode platform will not allow you this change, since it'll be looking for the methodfirstUniqChar. But it doesn't invalidate the point. You can define a properly named method and call it from the one declared by Leetcode. Naming code elements is not an easy task, and requires practice.
- There's no reason is for
firstUniqCharto be an instance method (again, Leetcode specific thing, the remedy is the same as mentioned above).
- There's nothing in bad performing anexplicit null-check
Please, don't confuse what I'm saying with a situation when application is suffering from null-infestation and there arenull-checks all over the place. The point is thatby itself null-check is not evil, the many JDK methods returningnull and we need to dial with it.
MethodObjects.isNull() along with some other methods was introduced in Java 8 to serve as a predicateObjects::isNull. It's not useful when you're not dialing with functional interfaces.
Compare this
if (Objects.isNull(map.get(c))) { ... }with this
if (map.get(c) == null) { ... }And it becomes clear more clear that it's basically aMap.containsKey() check in disguise, and there's no need to mess withnull.
Whenever you find yourself writing the following conditional logic:if a key is absent, then put some value, otherwise update existing one; then it's good use case forMap.merge().
So, this
LinkedHashMap<Character, Integer> map = new LinkedHashMap<>();for (char c : s.toCharArray()) { if (Objects.isNull(map.get(c))) { map.put(c, 1); } else { int count = (Integer) map.get(c); map.put(c, ++count); }}can be changed to this
var charCount = new LinkedHashMap<Character, Integer>(); for (int i = 0; i < s.length(); i++) { charCount.merge(s.charAt(i), 1, Integer::sum);}- Proper names
The name of the code element should communicate its purpose to the reader of the code.
The namemap fails to do so. It doesn't reveal the purpose of the variable. No doubt, it's a map, it has no chance to be anything but a map. It's redundant to include type in the name, Java is a statically typed language, hence if you'll to treat this variable as, let'sLocalTime, Java compiler will tell you (and your IDE will tell as well).
Instead, the name should tell what this map is intended to store. Therefore, consider names likecharCount,countByChar, etc.
In Java Use the FirstIndexOf() and LastIndexOf() if these both return the same number then its the unique character in the statring of the string return the index of it break the loop. This is the fastest the solution.
class Solution {public int firstUniqChar(String s) { for (int i = 0; i < s.length(); i++) { int first = s.indexOf(s.charAt(i)); int last = s.lastIndexOf(s.charAt(i)); if (first == last) { return i; }- 3\$\begingroup\$Welcome to Code Review! You have presented an alternative solution, but haven't reviewed the code. Pleaseedit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)readingHow to Answer.\$\endgroup\$Toby Speight– Toby Speight2024-05-03 08:53:23 +00:00CommentedMay 3, 2024 at 8:53
- 3\$\begingroup\$A statement such as "This is the fastest solution" really requires corroboration. Please provide your benchmark results for it and the other solutions you tested. (I doubt you're right, given that this scales as O(n²))\$\endgroup\$Toby Speight– Toby Speight2024-05-03 08:55:29 +00:00CommentedMay 3, 2024 at 8:55
- 3\$\begingroup\$The goal of this site is declared as follows: "Code Review is a question and answer site forseeking peer review of your code". However, the answer you provided contains no attempt to review the code shared by OP. Secondly, this claim"This is the fastest the solution" is just a subjective opinion, it should be backed by the code complexity analysis. If you take a look at the implementations of
indexOf()andlastIndexOf(), you'll find out that doesn't perform in constant time (as you probably assumed) and your claim is incorrect. Your code has quadratic time complexity.\$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024-05-03 09:32:27 +00:00CommentedMay 3, 2024 at 9:32 - 2\$\begingroup\$"Use the FirstIndexOf() and LastIndexOf()" - note that method names are incorrect\$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024-05-03 09:40:28 +00:00CommentedMay 3, 2024 at 9:40
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.
