星火:从S3使用Scala读取csv文件星火、文件、Scala、csv

2023-09-11 09:57:13 作者:brave(勇敢)

我写一个火花的工作,尝试使用Scala读取文本文件,下面的工作正常在我的本地机器上。

  VAL MYFILE =myLocalPath / myFile.csv
  对(线474;  -  Source.fromFile(MYFILE).getLines()){
    VAL数据= line.split()
    myHashMap.put(数据(0),数据(1).toDouble)
  }
 

然后我试图使它在AWS上的工作,我做了以下,但它似乎没有正确读取整个文件。应该用什么有道阅读S3这样的文本文件?非常感谢!

  VAL凭据=新BasicAWSCredentials(的myKey,mySecretKey);
VAL s3Client =新AmazonS3Client(凭证);
VAL s3Object = s3Client.getObject(新GetObjectRequest(myBucket,myFile.csv));

VAL读卡器=新的BufferedReader(新的InputStreamReader(s3Object.getObjectContent()));

变种线=
而((行= reader.readLine())!= NULL){
      VAL数据= line.split()
      myHashMap.put(数据(0),数据(1).toDouble)
      的println(线);
}
 

解决方案

我想我得到了它的工作如下图所示:

  VAL s3Object = s3Client.getObject(新GetObjectRequest(myBucket,mypath中/ myFile.csv));

    VAL myData的= Source.fromInputStream(s3Object.getObjectContent())。getLines()
    对(线474;  -  myData的){
        VAL数据= line.split()
        myMap.put(数据(0),数据(1).toDouble)
    }

    的println(我的地图+ myMap.toString())
 
单元测试pytest文件读取 CSV,XML 数据和代码分离

I am writing a spark job, trying to read a text file using scala, the following works fine on my local machine.

  val myFile = "myLocalPath/myFile.csv"
  for (line <- Source.fromFile(myFile).getLines()) {
    val data = line.split(",")
    myHashMap.put(data(0), data(1).toDouble)
  }

Then I tried to make it work on AWS, I did the following, but it didn't seem to read the entire file properly. What should be the proper way to read such text file on s3? Thanks a lot!

val credentials = new BasicAWSCredentials("myKey", "mySecretKey");
val s3Client = new AmazonS3Client(credentials);
val s3Object = s3Client.getObject(new GetObjectRequest("myBucket", "myFile.csv"));

val reader = new BufferedReader(new InputStreamReader(s3Object.getObjectContent()));

var line = ""
while ((line = reader.readLine()) != null) {
      val data = line.split(",")
      myHashMap.put(data(0), data(1).toDouble)
      println(line);
}

解决方案

I think I got it work like below:

    val s3Object= s3Client.getObject(new GetObjectRequest("myBucket", "myPath/myFile.csv"));

    val myData = Source.fromInputStream(s3Object.getObjectContent()).getLines()
    for (line <- myData) {
        val data = line.split(",")
        myMap.put(data(0), data(1).toDouble)
    }

    println(" my map : " + myMap.toString())

 
精彩推荐
图片推荐