Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Update suffix-array.md for Linear Time approach#1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
PeroxideParadox wants to merge3 commits intocp-algorithms:main
base:main
Choose a base branch
Loading
fromPeroxideParadox:patch-1

Conversation

PeroxideParadox
Copy link

Hi@adamant-pwn ,

The previous version of suffix Array did not include the Linear time algorithm , I have modified the same using the Skew Algorithm to introduce theO(n) time complexity approach
It will benefit competitive programmers by optimizing time-critical problems.

Reference:Skew Algorithm Paper

@adamant-pwn
Copy link
Member

Thanks! I'm not very familiar with linear-time suffix array construction algorithms, but random places over the internet claim that SA-iS is simpler and faster than Kärkkäinen-Sanders, is that true? I wonder if we should describe SA-IS instead of DC3... 🤔

@PeroxideParadox
Copy link
Author

Thank you for the feedback! You're absolutely right , While the DC3 algorithm is efficient for large inputs and widely used when memory consumption is a constraint, I agree that SA-IS is simpler and faster in many practical scenarios, especially in competitive programming. Given this, I’m happy to switch the implementation to SA-IS, as it better aligns with the goals of this repository.

Would you prefer I update the pull request to describe SA-IS instead of DC3, and I can add that?

@adamant-pwn
Copy link
Member

Ideally we'd rely on some benchmarks, for example implement both algorithms (in an as simple way as possible) and see how they fare onLibrary Checker. Then, if the performance turns out to be a close match, we should describe the one that is simpler to understand, otherwise we should probably focus on the one with better performance.

@PeroxideParadox
Copy link
Author

on benchmarks SA-IS performs better than DC3 and is easier to implement as well
Screenshot 2024-10-26 at 2 55 12 PM

So let's switch to SA-IS and let me update the pull request
Is that good ?

@adamant-pwn
Copy link
Member

Yes, let's do it then.

@PeroxideParadox
Copy link
Author

I have updated the Pull Request with the latest changes as discussed

github-actionsbot added a commit that referenced this pull requestApr 11, 2025
@mhaytermhayter reopened thisApr 11, 2025
@mhayter
Copy link
Contributor

It seems like this slipped through the cracks
Is this ready to go?

Copy link
Member

@adamant-pwnadamant-pwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hi, thanks for the pull request, and sorry for the late review!

I only reviewed part of it, but I think we should add some explanation, in text, of what constitutes the algorithm (not just tldr), and also some proper explanation on why it is linear, so I think it makes sense to return to review once it is done.

Also a lot of itemized/numerated lists misrender because of missing newlines before the first list item, which should be fixed.

Also, so that I better understand this pull request, could you please tell if you have used AI when making it (to which extent, if yes), and if the code implementation is your own?


### Understanding L-Type and S-Type Suffixes

In the SA-IS algorithm, suffixes are classified as:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please check your suggestion against thepreview. Lists require a newline before the first item, otherwise they misrender.

- **S-type (Right)**: A suffix is S-type if it is lexicographically smaller than or equal to the suffix immediately following it.

For the string `banana$`, let's classify each suffix:
1. Starting from the end, `$` is considered S-type by definition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This also requires a newline before the first list item.

| 3 | `ana$` | S |
| 2 | `nana$` | L |
| 1 | `anana$`| S |
| 0 | `banana$`| S |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I thinkbanana$ is actually larger thananana$?

The `L` and `S` types provide crucial information for sorting suffixes using induced sorting.
### Example:
For the string `s = "banana"`, The steps were :
1. The algorithm first classifies suffixes into L-type and S-type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This misses the newline.

### Example:
For the string `s = "banana"`, The steps were :
1. The algorithm first classifies suffixes into L-type and S-type.
2. It identifies LMS positions based on the L and S classifications.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Intro text says "the steps were", but the example above actually doesn't contain or explain any further steps.

if (n == 2) return (s[0] < s[1]) ? std::vector<int>{0, 1} : std::vector<int>{1, 0};
```
This part initializes the `sa_is` function, which constructs the suffix array for a given input vector `s`. It first handles edge cases:
- If the string is empty (`n == 0`), it returns an empty array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Newline needed before the first list item.


### 1. **Suffix Array Initialization and Base Cases**
```cpp
std::vector<int> sa_is(const std::vector<int>& s, int upper) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What isupper? It appears to be max possible character, but it's not explicitly stated anywhere.

Comment on lines +296 to +297
if (!ls[i]) sum_s[s[i]]++;
else sum_l[s[i] + 1]++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Shouldn't it be the other way around?ls[i] = true$\iff$ the suffix at$i$ is smaller than the suffix at$i+1$, so it means it is$S$-suffix?

2. **Identifying LMS Substrings**:
- SA-IS identifies LMS (Leftmost S-type) positions, which are boundaries between L-type and S-type suffixes. LMS substrings are critical as they serve as anchor points for the sorting process.

3. **Induced Sorting**:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think we really need to add further explanation, in text, of how exactly induced sorting works, and why recursive calls would be linear.

3. **Induced Sorting**:
- The algorithm first sorts LMS substrings recursively, then induces the order of the remaining suffixes. Sorting the LMS substrings is the key part of the algorithm, and once sorted, the remaining suffixes are arranged by their lexicographical order relative to LMS substrings.

4. **Recursive Call for LMS Substrings**:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Isn't this already included in the previous step?..

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@adamant-pwnadamant-pwnadamant-pwn requested changes

Requested changes must be addressed to merge this pull request.

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@PeroxideParadox@adamant-pwn@mhayter

[8]ページ先頭

©2009-2025 Movatter.jp