@@ -80,4 +80,24 @@ public List<List<String>> findDuplicate(String[] paths) {
80
80
return result ;
81
81
}
82
82
83
+ /**Answers to follow-up questions:
84
+ * 1. Imagine you are given a real file system, how will you search files? DFS or BFS ?
85
+ * A: Both BFS and DFS could do the work, but BFS will use extra memory, however, BFS takes advantage of memory locality, so BFS could be faster.
86
+ *
87
+ * 2. If the file content is very large (GB level), how will you modify your solution?
88
+ * A: We'll fist map all files according to their sizes, since files with different sizes are guaranteed to be different, then
89
+ * we can hash a small part of the files using MD5, SHA256, etc. Only when their md5 or sha256 is the same, we'll compare the contents byte by byte.
90
+ *
91
+ * 3. If you can only read the file by 1kb each time, how will you modify your solution?
92
+ * A: This is not going to change the solution, we can hash this 1kb chunk, and then also only compare byte by byte when it's necessary.
93
+ *
94
+ * 4. What is the time complexity of your modified solution? What is the most time consuming part and memory consuming part of it? How to optimize?
95
+ * A: O(n^2*k), in the worst time, we'll have to compare the file with every other file, k is the length of the file.
96
+ * Comparing the file (by size, by hash and eventually byte by byte) is the most time consuming part.
97
+ * Generating hash for every file will be the most memory consuming part.
98
+ *
99
+ * 5. How to make sure the duplicated files you find are not false positive?
100
+ * A: Size comparision, hash detection, byte by byte check, etc. will pretty sure to rule out false positive.
101
+ * */
102
+
83
103
}