如何将 Scala Spark Dataset.show 重定向到 log4j 记录器记录器、如何将、重定向、Scala

2023-09-06 14:07:26 作者:△我是疯子,但疯的真实

Spark API 文档展示了如何从发送到标准输出的数据集或数据帧中获取漂亮的打印片段.

The Spark API Doc's show how to get a pretty-print snippit from a dataset or dataframe sent to stdout.

可以将此输出定向到 log4j 记录器吗?或者:有人可以共享将创建类似于 df.show() 格式的输出的代码吗?

Can this output be directed to a log4j logger? Alternately: can someone share code which will create output formatted similarly to the df.show()?

有没有办法让标准输出在将 .show() 输出推送到记录器之前和之后都进入控制台?

Is there a way to do this which allow stdout to go to the console both before and after pushing the .show() output to the logger?

http://spark.apache.org/docs/latest/sql-programming-guide.htm

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

推荐答案

中的 showString() 函数teserecter 来自 Spark 代码(Dataset.scala).

The showString() function from teserecter comes from Spark code (Dataset.scala).

您不能在代码中使用该函数,因为它是包私有的,但您可以将以下代码段放在源代码中的文件 DatasetShims.scala 中,并在您的类中混入该特征访问该功能.

You can't use that function from your code because it's package private but you can place the following snippet in a file DatasetShims.scala in your source code and mix-in the trait in your classes to access the function.

package org.apache.spark.sql

trait DatasetShims {
  implicit class DatasetHelper[T](ds: Dataset[T]) {
    def toShowString(numRows: Int = 20, truncate: Int = 20, vertical: Boolean = false): String =
      "
" + ds.showString(numRows, truncate, vertical)
  }
}
 
精彩推荐
图片推荐