Dplyr和tidyr--用因子一次计算多个线性模型多个、因子、线性、模型

2023-09-03 10:17:35 作者：平凡之路

在深入了解tidyVerse之后，我开始一次拟合许多线性模型，如this中所述。也就是说，我会按照以下思路做一些事情：

library(dplyr)
library(tidyr)
library(purrr)
df <- data.frame(y = rnorm(10), 
                 x1 = runif(10),
                 x2 = runif(10))

df %>%
  gather(covariate, value, x1:x2) %>% 
  group_by(covariate) %>% 
  nest() %>% 
  mutate(model = map(.x = data , .f = ~lm(y ~ value, data = .))) %>% 
  mutate(rsquared = map_dbl(.x = model, .f = ~summary(.)$r.squared))

问题是，当变量类型不同时(例如，一个变量是数值变量，一个变量是因子变量)，这种方法会失败，因为gather()函数会将整个value向量强制为一个因子。例如，

df <- data.frame(y = rnorm(10), 
                 x1 = runif(10),
                 x3 = sample(c("a", "b", "c"), 10, replace = TRUE))

df %>%
  gather(covariate, value, x1:x3) %>% 
  sapply(class)

后跟警告

Warning message:
attributes are not identical across measure variables; they will be dropped 

          y   covariate       value 
  "numeric" "character" "character"

并且value列是一个字符，因此nest()的技巧将不再起作用，因为所有协变量都将作为因子输入。

我想知道是否有整洁的方法。

推荐答案

您可以在调整模型时转换类型，但应如注释中指出的那样继续操作，因为这可能会产生意外的后果。

如果仍要转换，可以对整个帧使用type_convertfromReadr，或仅对"Value"向量使用type.convert。

使用type_convert：

mutate(model = map(.x = data , .f = ~lm(y ~ value, data = readr::type_convert(.))))

使用type.convert：

mutate(model = map(.x = data , .f = ~lm(y ~ type.convert(value), data = .)))

作为链的一部分，这两个选项中的任何一个都会导致此情况下的预期结果：

df %>%
    gather(covariate, value, x1:x3) %>% 
    group_by(covariate) %>% 
    nest() %>% 
    mutate(model = map(.x = data , .f = ~lm(y ~ type.convert(value), data = .))) %>% 
    mutate(rsquared = map_dbl(.x = model, .f = ~summary(.)$r.squared))

# A tibble: 2 x 4
  covariate              data    model   rsquared
      <chr>            <list>   <list>      <dbl>
1        x1 <tibble [10 x 2]> <S3: lm> 0.33176960
2        x3 <tibble [10 x 2]> <S3: lm> 0.06150498

上一篇：使用dplyr和tidyr计算分组数据的平均值。平均值、数据、dplyr、tidyr

下一篇：覆盖Left_Join dplyr以更新数据数据、Left_Join、dplyr

相关推荐