最佳实践卡桑德拉设置在EC2上使用大量的数据数据、卡桑德拉

2023-09-11 23:40:04 作者:〃爱情和情爱哪个更痛

我做的,从物理机的大迁移到EC2实例。

I am doing a large migration from physical machines to ec2 instances.

截至目前我每次3 x.large节点有4个实例存储驱动器(RAID-0 1.6TB)。之后我设置这件事我记得的实例存储卷上的数据仅持续关联的Amazon EC2实例的生命周期中;如果停止或终止的实例,在实例存储卷上的任何数据都将丢失

As of right now I have 3 x.large nodes each with 4 instance store drives (raid-0 1.6TB). After I set this this up I remembered that "The data on an instance store volume persists only during the life of the associated Amazon EC2 instance; if you stop or terminate an instance, any data on instance store volumes is lost."

中,人们通常做在这种情况下怎么办?我很担心,如果一个箱子崩溃那么所有的数据都将丢失的那个盒子,如果它不是100%复制在另一个。

What do people usually do in this situation? I am worried that if one of the boxes crash then all of the data will be lost on that box if it is not 100% replicated on another.

http://www.hulen.com/?p=326 我在报纸上看到这些家伙使用ephermal驱动器,并定期备份使用EBS驱动器和快照的内容上面的链接。

http://www.hulen.com/?p=326 I read in the above link that these guys use ephermal drives and periodically backup the content using the EBS drives and snapshots."

在这里这个问题:how采取AWS EC2实例/临时存储的备份? 人声称,你不能备份ephermal数据到EBS快照。

In this question here: how to take backup of aws ec2 instance/ephemeral storage? People claim that you cannot backup ephermal data onto EBS snapshots.

时使用了几个EBS驱动器我最好的选择,并支持RAID0在一起,并能够直接从他们拍摄快照?我知道这可能是最昂贵的解决方案,但是,它似乎最有意义。

Is my best choice to use a few EBS drives and raid0 them together and be able to take snapshots directly from them? I know this is probably the most expensive solution, however, it seems to make the most sense.

任何信息将是巨大的。

Any info would be great.

感谢您的时间。

推荐答案

我已经运行卡桑德拉在EC2上2年以上。为了解决您的问题,则需要形成在EC2上一个适当的可用性架构为您卡桑德拉集群。这里是一个符号列表,你​​要考虑:

I have been running Cassandra on EC2 for over 2 years. To address your concerns, you need to form a proper availability architecture on EC2 for your Cassandra cluster. Here is a bullet list for you to consider:

在考虑至少3个区设​​立集群; 使用NetworkTopologyStrategy与EC2Snitch / EC2MultiRegionSnitch传播数据给每个区域的副本;这意味着,在每个区域的机器将有你的整个数据集相结合;例如strategy_options会是什么样{美东:3}

以上两种提示应在AWS满足基本的可用性,并在情况下,你的查询使用LOCAL_QUORUM发送,您的申请将被罚款,即使一个区域出现故障。

The above two tips should satisfy basic availability in AWS and in case your queries are sent using LOCAL_QUORUM, your application will be fine even if one zone goes down.

如果您担心2个区域下降(不记得它发生在AWS在过去2年来,我用的),那么你也可以添加其他区域的集群。

If you are concerned about 2 zones going down (don't recall it happened in AWS for the past 2 years of my use), then you can also add another region to your cluster.

通过上面的,如果任何一个节点死亡的任何原因,你可以与其他区域的节点恢复。毕竟,卡珊德拉的目的是为您提供这样的可用性。

With the above, if any node dies for any reason, you can restore it from nodes in other zones. After all, CAssandra was designed to provide you with this kind of availability.

关于EBS VS短暂:

About EBS vs Ephemeral:

我一向反对使用EBS卷在任何生产,因为它是最糟糕的AWS服务的可用性方面之一。他们去了好几次了一年,他们的缺点,通常级联到其他AWS服务,如ELBs和RDS。他们还喜欢网络附加存储,因此任何读/写操作将不得不在网络上。不要使用它们。即使DataStax不建议他们:

I have always been against using EBS volumes in anything production because it is one of the worst AWS service in terms of availability. They go down several times a year, and their downside usually cascades to other AWS services like ELBs and RDS. They are also like attached Network storage, so any read/write will have to go over the Network. Don't use them. Even DataStax doesn't recommend them:

http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/../../cassandra/architecture/architecturePlanningEC2_c.html

关于备份:

我使用了一个名为普里阿摩斯( https://github.com/Netflix/Priam ),它是由Netflix的解决方案。它可以把你的集群的夜间快照,复制一切S3。如果启用incremental_backups,还上载增量备份到S3。如果一个节点出现故障,您可以触发使用简单的API调用特定的节点上的恢复。它恢复速度快了很多,不投入了大量的数据流负载上的其他节点。我还添加了一个补丁,这其中让你去做抚育内的一个自动气象站地区的多个域控制器花哨的东西。

I use a solution called Priam (https://github.com/Netflix/Priam) which was written by Netflix. It can take a nightly snapshot of your cluster and copy everything to S3. If you enable incremental_backups, it also uploads incremental backups to S3. In case a node goes down, you can trigger a restore on the specific node using a simple API call. It restores a lot faster and does not put a lot of streaming load on your other nodes. I also added a patch to it which let's you do fancy things like bringing up multiple DCs inside one AWS region.

您可以阅读我的​​设置在这里: http://aryanet.com/blog/shrinking-the-cassandra-cluster-to-fewer-nodes

You can read about my setup here: http://aryanet.com/blog/shrinking-the-cassandra-cluster-to-fewer-nodes

希望上面的帮助。