Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Allow for in-place restores.#1094

Open
mattl-netflix wants to merge18 commits into4.x
base:4.x
Choose a base branch
Loading
fromfeature/in_place_restore_4

Conversation

mattl-netflix
Copy link
Contributor

6369b6a 2024-05-02 | Cease stopping Cassandra in the restore process and reveal the ability to restore to a staging area and then import or move the data depending on whether Cassandra is running.
4ece8d3 2024-05-01 | Move getBackupDirectories to BackupRestoreUtil and change its signature to cease requiring an IConfiguration but rather just take the data directory path as a String.
6b3d45a 2024-04-10 | Cease wiping the data directory in advance of restore. It is counter-intuitive and will be empty in practice in most cases anyway.
f10289f 2024-04-10 | Consolidate restore status API.
4d6d07f 2024-03-09 | Remove backup racs as that configuration is not used in practice and is not relevant to whether a restore should be attempted.
aa3dffe 2024-03-09 | Remove redundant vertical whitespace.
8c23800 2024-03-09 | Remove redundant logging.
646973d 2024-03-09 | Remove all non-javadoc comments that don't provide warnings to future maintainers.
8d18f51 2024-03-09 | Remove waitForCompletion parameter to download method which is always false in practice. Remove final modifier on static method. Make static property an instance variable instead. Use parameterized constructor call in place of two lines.
ce6c7c1 2024-03-09 | Remove unused code from our restore tooling.

@mattl-netflixmattl-netflixforce-pushed thefeature/in_place_restore_4 branch from6369b6a to9557275CompareMay 5, 2024 20:11
String restoreFileName =
PATH_JOINER.join(dataDir, keyspace, columnFamily, indexDir, fileName);
return_ = new File(restoreFileName);
return_ =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

While we are in here, consider renaming this variable, its not idiomatic java

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done.

String dataDirectory, String monitoringFolder) throws IOException {
ImmutableSet.Builder<Path> backupPaths = ImmutableSet.builder();
Path dataPath = Paths.get(dataDirectory);
if (Files.exists(dataPath) && Files.isDirectory(dataPath))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done.

Files.newDirectoryStream(keyspaceDirPath, Files::isDirectory)) {
for (Path columnfamilyDirPath : keyspaceStream) {
Path backupDirPath =
Paths.get(columnfamilyDirPath.toString(), monitoringFolder);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I believe you can usecolumnFamilyDirPath.resolve(monitoringFolder) here instead of converting back and forth between string and path. Reference:https://docs.oracle.com/javase/8/docs/api/index.html?java/nio/file/Path.html

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done.

for (Path columnfamilyDirPath : keyspaceStream) {
Path backupDirPath =
Paths.get(columnfamilyDirPath.toString(), monitoringFolder);
if (Files.exists(backupDirPath) && Files.isDirectory(backupDirPath)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same comment here that theFile.exists call is redundant.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done.

if (instanceState.getRestoreStatus() != null
&& instanceState.getRestoreStatus().getStatus() != null
&& instanceState.getRestoreStatus().getStatus() == Status.STARTED) {
if (instanceState.isRestoring()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I like this cleanup 🙌

mattl-netflix reacted with thumbs up emoji
failedImports.addAll(importData(keyspace, table, tableDir.toString()));
}
} else {
recursiveMove(Paths.get(srcDir), Paths.get(configuration.getDataFileLocation()));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since the move is non-atomic I can see a weird edge case where we start Cassandra while the move is happening (and we have like 10,000+ files). Should we have some sort of "fence" that tells the subsystem that starts Cassandra it needs to wait while this happens?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes we should. I will need to address this in a follow-up, however. This is analogous to Cassandra getting started accidentally before the full download is complete the in the model we use today. Which hasn't happened yet, knock on wood. It might requires some rethinking about consolidating Cassandra's remediation policy in our environment. That is, the between Priam and systemd.

if (Files.isRegularFile(path)) {
Files.move(path, destination.resolve(path.getFileName()));
} else if (Files.isDirectory(path)) {
recursiveMove(path, destination.resolve(path.getFileName()));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If am reading this right, if the source path we are currently iterating over is a directory we recurse by moving the files in that directory into[destination/directory name]. Cant we just move the entire directory instead of recursing then?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My goal here was for the the operation to be strictly additive. That is, if there are already files present I want to ensure that for them to get overwritten that the operator does something explicit.

Note that to that end I have changed the policy regarding whether to replace files if they are already present in a follow-up commit.

} else if (Files.isDirectory(path)) {
recursiveMove(path, destination.resolve(path.getFileName()));
} else {
throw new IOException("Failed determining type of inode is " + path);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we need this error or just skip/log? Likely an edge case but do we really want to fail if thats the case?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In the new model using walkFileTree() we always continue.


// Cleanup local data
File dataDir = new File(config.getDataFileLocation());
if (dataDir.exists() && dataDir.isDirectory()) FileUtils.cleanDirectory(dataDir);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Question about this: I know we don't need to stopcass anymore because we useimport but I was surprised to see we no longer clean the directory. Is this for the case where we want to restore a subset of the data but keep whats there? Do we still need the option to clean the directory for some restores?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The common cases for restores are:

  1. Restores into empty clusters (e.g., red-black)
  2. Restored into live clusters (e.g., emergencies)

In neither case do we want any data to be deleted. I would prefer that in outlier cases where that is required that there is a separate explicit job or command that the operator needs to run to best manage expectations. It is scary today that we could configure Priam such that a restart deletes all of the data.

"config.doesCassandraStartManually() is set to True, hence Cassandra needs to be started manually ...");
List<String> failedImports = cassOps.importAll(config.getRestoreDataLocation());
if (!failedImports.isEmpty()) {
instanceState.endRestore(Status.FAILED, LocalDateTime.now());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

do we want to use the list of failed imports somewhere to log what failed and be able to resume? instead of starting over from the beginning?

Also if we only need the boolean why return the list? I would rather us use the list but if we are not going to I mean.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is a good point. In case we don't specify --copy-data and Cassandra's behavior is to remove the origin files regardless of successful import, we want a log of what failed so we can at least get a sense of the scope of what is missing. I will log the failures.

…hs when fetching incrementals. The iterators are always fully materialized. Also, remove the now-redundant method from BackupRestoreUtil that merely wrapped the MetaProxy call.
… false in practice.Remove final modifier on static method.Make static property an instance variable instead.Use parameterized constructor call in place of two lines.
…is not relevant to whether a restore should be attempted.
…intuitive and will be empty in practice in most cases anyway.
…re to cease requiring an IConfiguration but rather just take the data directory path as a String.
…y to restore to a staging area and then import or move the data depending on whether Cassandra is running.
@mattl-netflixmattl-netflixforce-pushed thefeature/in_place_restore_4 branch from9557275 toe1ebaa5CompareMarch 17, 2025 17:38
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@jrwestjrwestjrwest left review comments

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@mattl-netflix@jrwest

[8]ページ先頭

©2009-2025 Movatter.jp