将文本注释添加到聚类散点图 (tSNE)注释、文本、tSNE、到聚类散点图

2023-09-06 08:05:32 作者:青衫如故

我有 XY 数据(高维数据的二维 tSNE 嵌入),我想 scatter plot.数据被分配给几个 cluster,所以我想通过 cluster 对点进行颜色编码,然后为每个 cluster,具有与 cluster 相同的颜色编码,并且位于 cluster 点之外(尽可能地).

I have XY data (a 2D tSNE embedding of high dimensional data) which I'd like to scatter plot. The data are assigned to several clusters, so I'd like to color code the points by cluster and then add a single label for each cluster, that has the same color coding as the clusters, and is located outside (as much as possible) from the cluster's points.

知道如何在 ggplot2ggrepelplotly 中使用 R 来做到这一点吗?

Any idea how to do this using R in either ggplot2 and ggrepel or plotly?

这是示例数据(XY 坐标和 cluster 分配在 df 中,标签在 label.df) 和它的 ggplot2 部分:

Here's the example data (the XY coordinates and cluster assignments are in df and the labels in label.df) and the ggplot2 part of it:

library(dplyr)
library(ggplot2)
set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)

label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))

ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none")

推荐答案

ggrepel 包中的 geom_label_repel() 函数允许您在尝试的同时轻松地为绘图添加标签以排斥"标签不与其他元素重叠.对您现有代码的一点补充,我们在其中汇总数据/获取放置标签的坐标(这里我选择了每个集群的左上角区域 - 这是 x 的最小值和 y 的最大值)并合并它使用包含集群标签的现有数据.在对 geom_label_repel() 的调用中指定此数据框,并在 aes() 中指定包含 label 美学的变量.

The geom_label_repel() function in the ggrepel package allows you to easily add labels to plots while trying to "repel" the labels from not overlapping with other elements. A slight addition to your existing code where we summarize the data / get coordinates of where to put the labels (here I chose the upper left'ish region of each cluster - which is the min of x and the max of y) and merge it with your existing data containing the cluster labels. Specify this data frame in the call to geom_label_repel() and specify the variable that contains the label aesthetic in aes().

library(dplyr)
library(ggplot2)
library(ggrepel)

set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)

label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))
label.df_2 <- df %>% 
  group_by(cluster) %>% 
  summarize(x = min(x), y = max(y)) %>% 
  left_join(label.df)

ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none") +
  ggrepel::geom_label_repel(data = label.df_2, aes(label = label))