.Net和Hadoop的 - 怎么知道/了解,什么是可用?Net、Hadoop

2023-09-04 01:28:35 作者:习惯了灬寂寞

我的问题是.NET有关BigData。 BigData用于存储和查询数据的巨大ammounts(Facebook,谷歌,Twitter的,...)。 BigData的例子是马preduce,Hadoop的,树妖,...

My question is regarding BigData in .Net. BigData is used to store and query huge ammounts of data (Facebook, Google, Twitter, ...). Examples of BigData are MapReduce, Hadoop, Dryad, ...

微软放弃了他们的树妖(DryadLinq)替代赞成的Hadoop(树妖和the文章),所以我想ppare自己$ P $为它和一切有什么关系呢。

Microsoft dropped their Dryad (DryadLinq) alternative in favor of Hadoop (Dryad and the article), so i'd like to prepare myself for it and everything that has to do with it.

现在什么可以?

Hadoop的连接器

SQL Server 2012的RC(不要在生产中不使用:))

Microsoft大数据信息

哪知道更多关于发布和发展?

在科技preVIEW注册

问题1 : 我应该怎么知道的Hadoop是不是唯一的.NET平台? (如何查询,特定的模式,建筑,...),将是有用的(在.NET环境中)

Question 1 : What should i know about Hadoop that isn't unique to the .Net platform? (how to query, specific patterns, architecture, ...) and will be usefull (in a .net environment)

问2 : 是否有对Hadoop的,在.Net平台的更多信息,比我已经知道了?

Question 2 : Is there more information on the Hadoop in the .Net platform, than i already know?

推荐答案

这是一个模糊的问题所以这里有一个模糊的回答:)

it's a vague question so here's a vague answer :)

Hadoop的自己是运行的map-reduce作业在集群中的一个工具,它的高度优化性能和良好的交易这种优化是通过的方式,可以很容易地消耗而不会产生对分配数据进行I / O处罚。

Hadoop on its own is a tool to run map-reduce jobs in a cluster, it's highly optimized for performance and a good deal of this optimization is done by distributing the data in a way that makes it easy to consume without incurring on I/O penalties.

对于这一点,你应该阅读有关 HDFS 并解释这是怎么做的,简而言之的内部什么情况是,输入的数据是成群一起在节点上本地运行的过程和顺序读取(这是HDFS的一个属性/限制)。

for this you should read about HDFS and the internals that explain how is this done, in a nutshell what happens is that the input data is clumped together in nodes to run the processes locally and read sequentially (this is a property/limitation of HDFS).

这样你输入你的BigData,它被分割和加工集群内的最有效的方式。

this way you input your "BigData" and it gets split and processed in the most efficient way inside the cluster.

现在的一切就是Hadoop的本身,还有在它上面,让您对数据进行高层次的抽象,工作的工具(图-减少是其中最简单的程序)。

now that' all there is to Hadoop itself, there's tools that work on top of it that allow you to perform high-level abstractions on the data (map-reduce is among the simplest procedures).

这些包括:

在猪 http://pig.apache.org/ 这与的map-reduce过程中的工作和建设更多的是语言复杂的操作 在蜂巢 http://hive.apache.org/ 类似previous更SQL化的 在层叠 http://www.cascading.org/ 另一个,更侧重于比查询数据流 Cascalog https://github.com/nathanmarz/cascalog 的基础上层叠,用Clojure写的 在HBase的 http://hbase.apache.org/ 在HDFS顶部的类型的NoSQL数据库 ElephantDB https://github.com/nathanmarz/elephantdb Hadoop的另一个NoSQL数据库 Pig http://pig.apache.org/ which is a language to work with the map-reduce process and construct more complex operations Hive http://hive.apache.org/ similar to the previous but more SQL-oriented Cascading http://www.cascading.org/ yet another, more focused on data flow than queries Cascalog https://github.com/nathanmarz/cascalog based on Cascading, written in Clojure HBase http://hbase.apache.org/ a type of NoSQL database on top of HDFS ElephantDB https://github.com/nathanmarz/elephantdb another NoSQL database for Hadoop

具体细节对于.NET

有关的Hadoop在Azure上(.NET),还有在MSDN here更多的info这里。有关通过自己的平台上构建的Hadoop应用程序。 这只是CTP现在不过关,当然,这种情况将会改变。

For Hadoop on Azure (.Net) , there's an introduction on msdn here with more info here. Related to building Hadoop applications through their platform. It's only CTP for now, but off course this will change.

下面是关于另一个很好的博文Hadoop马preduce 与 code

Here's another good blogpost about Hadoop and MapReduce with code

此外,还有一个公司经常提供有关Hadoop的信息: Cloudera公司,你应该检查有经常以获取更多信息。 欲了解更多信息,请检查上面链接了Cloudera的页面,你可以查看所有的概念有关的Hadoop(它的pretty的先进虽然)

Additionally, there's also a company that frequently gives information about Hadoop: Cloudera, you should check there frequently for more information. For more information, check the cloudera page linked above and you can view all the concepts about Hadoop (it's pretty advanced though)

我是pretty的肯定,这是不是你要找的人,但我不知道你想什么那么至少我希望你可以检查一些新的项目,可能会有所帮助。

I'm pretty sure this isn't what you were looking for but I've no idea what you want so at least I hope you can check a few new projects that may help.

同时检查风暴: https://github.com/nathanmarz/storm 它不涉及到Hadoop的,但工程上的实时场景其中的Hadoop不适合

also check Storm: https://github.com/nathanmarz/storm it's not related to Hadoop but works on realtime scenarios which Hadoop is not suited for.