我是 Spark 的新手,在使用 PySpark 或 Spark Sql 将以下输入数据帧转换为所需的输出 df
(行到列)时需要帮助.
I am new to Spark and need a help with transposing the below input dataframe into the desired output df
(Rows to Columns) using PySpark or Spark Sql.
输入数据框-
A B C D
1 2 3 4
10 11 12 13
......
........
所需的输出(转置)数据
Required Output (transposed) data
A 1
B 2
C 3
D 4
A 11
B 12
C 13
D 14
....
......
如果我可以根据我们的要求旋转输入数据(列)会更好.
It is better if I can pivot the input data (columns) as per our requirement.
你可以做一个像下面这样的通用函数(灵感来自我之前的回答 这里):
You can make a generalized function like below (inspired from my previous answer here):
def stack_multiple_col(df,cols=df.columns,output_columns=["col","values"]):
"""stacks multiple columns in a dataframe,
takes all columns by default unless passed a list of values"""
return (f"""stack({len(cols)},{','.join(map(','.join,
(zip([f'"{i}"' for i in cols],cols))))}) as ({','.join(output_columns)})""")
样本运行:
df.selectExpr(stack_multiple_col(df)).show()
+---+------+
|col|values|
+---+------+
| A| 1|
| B| 2|
| C| 3|
| D| 4|
| A| 10|
| B| 11|
| C| 12|
| D| 13|
+---+------+
df.selectExpr(stack_multiple_col(df,cols=['A','B'],output_columns=["A","B"])).show()
+---+---+
| A| B|
+---+---+
| A| 1|
| B| 2|
| A| 10|
| B| 11|
+---+---+