卡桑德拉备份与EBS备份、卡桑德拉、EBS

2023-09-11 12:05:08 作者:心亦冷♡情亦散

目前我在寻找如何备份/恢复在卡桑德拉完成。我们已经建立在AWS三个节点的集群。据我所知,使用nodetool快照工具,我们可以采取一个快照,但它是有点麻烦的过程。

Currently I am looking how the backup/restore be done in Cassandra. We've setup a three node cluster in AWS. I understand that using nodetool snapshot tool we can take a snapshot but it's bit cumbersome process.

我的想法是: 利用EBS快照,因为他们更耐用,易于安装,但一个问题,我与EBS看到的是不一致的备份。因此,我的计划是先服用这将只运行刷新命令将所有的memTable数据,并将其复制刷新到磁盘(的SSTable),然后prepares与刷新sstables硬链接EBS快照运行脚本。 一旦这样做了,启动EBS快照,这是我们能够解决它,我们可能会面临如果我们只使用EBS snapshost不一致的问题。

My idea is : Make use of EBS snapshot because they're more durable and easy to setup but one problem which I see with EBS is inconsistency backup. Hence, my plan is run a script prior to taking EBS snapshot which would just run flush command to flush out all the memtable data and copies it on to the disk(SSTable) and then prepares the hard link with flushed sstables. Once that's done, initiate the EBS snapshot, this was we can address the inconsistency issue which we might face if we only use EBS snapshost.

请让我知道如果你看到任何问题,这种做法或分享您的建议。

Please let me know if you see any issue with this approach or share your suggestions.

推荐答案

作为永恒不变的,SSTables做有很大的帮助,当涉及到备份,确实如此。

Being immutable, SSTables do help a lot when it comes to backups, indeed.

您IDEIA声音效果良好的情况下,一切都是健康的集群上。其实,Cassandra是一致性配置(如果我说最终一致性,有些人可能在这里得罪了,嘿嘿),并作为系统本身可能不会是在给定的时间完全一致的,你不能说你的备份会为好。但是,在另一方面,卡桑德拉(和NoSQL型号)的美女之一是,它往往以恢复pretty的好,这是真正的卡珊德拉在大多数情况下(相当相反的关系型数据库,这是非常敏感的数据损失)。这不太可能你结束了一堆无用的数据,如果您有至少完全preserved SSTables文件。

Your ideia sounds ok for situations where everything is healthy on your cluster. Actually, Cassandra is consistency-configurable (if I say eventually consistent, some people may be offended here, hehe), and as the system itself may no be fully consistent at a given time, you cannot say your backup will be as well. But, by the other hand, one of the beauties of Cassandra (and NoSQL models) is that it tends to recover pretty well, which is true for Cassandra in most situations (quite opposite to a relational databases, which are very sensitive to data losses). It's very unlikely you end up with a bunch of useless data if you have at least fully preserved SSTables files.

请注意,EBS快照是块级。所以,当你有在它上面一个文件系统,也可能是一个问题为好。幸运的是,任何一个现代文件系统有日志时下,并pretty的可靠性,使不应该是一个问题,但有一个单独的分区中的数据是一个很好的做法,所以有人的机会之后还有一个写在里面充分冲洗较小。

Be aware that EBS Snapshots are block-level. So, when you have a filesystem on top of it, it may be a concern as well. Fortunately, any modern filesystem have journaling nowadays and are pretty reliable, so that shouldn't be a problem, but having your data in a separate partition is a good practice, so the chances of someone else writing in it right after a full flush are smaller.

您可能有一些丢失的副本,当你最终需要恢复你集群,要求你运行的 nodetool维修的,是什么,如果你以前做过,是有点痛苦的,需要很长的大的数据量。 (但是,的修复的建议定期反正运行,特别是如果你删除了很多。)

You may have some lost replicas when you eventually need to restore you cluster, demanding you to run nodetool repair, what, if you have done before, is a bit painful and takes very long for large amounts of data. (But, repair is recommended to be run regularly anyway, specially if you delete a lot.)

另一个要考虑的是暗示切换的(写,后者的行老板失踪,但被其它节点保持,直到业主回来)。我不知道他们发生了什么,当你冲洗,但我想他们是保存在内存中,并在提交日志而已。

Another thing to consider are hinted handoffs (writes whose row owners are missing, but which are kept by other nodes until the owners come back). I don't know what happens with them when you flush, but I guess they're kept in memory and on commit logs only.

和,当然,做一个完整的,你认为这会工作在未来前恢复。

And, off course, do a full restore before you assume this will work in the future.

我没有大的经验与卡桑德拉,但我所听到的,而不是冷备份,如快照有关备份解决方案,它是整个集群副本的另一个区域,或数据中心。它可能更昂贵,但更可靠的太比原始磁盘快照像你想怎么办。

I don't have a large experience with Cassandra, but what I have heard about backup solutions for it are whole cluster replicas in another region, or datacenter, instead of cold backups like snapshots. It's probably more expensive but more reliable too than raw disks snapshots like you trying to do.