文章/答案/技术大牛

发布

社区首页 >专栏 >全网最全 | R语言中的方差分析汇总

全网最全 | R语言中的方差分析汇总

邓飞

发布于 2021-09-27 08:48:31

3.1K00

代码可运行

文章被收录于专栏：育种数据分析之放飞自我育种数据分析之放飞自我

运行总次数：0

代码可运行

一文展示R语言中的方差分析常用模型 #2021.9.11

方差分析是一个全新的思路，它采用的是变异分解的思路，将组内组件分开，查看显著性。

变异分解，和数量遗传学的创立也密不可分，比如

表型 = 基因+ 环境

更进一步：表型 = 加性效应 + 非加性效应 + 环境

更更进一步：表型 = 加性效应 + 显性效应 + 上位性效应 + 环境

育种值是加性效应的部分

杂种优势是显性和上位性效应的部分

基因与环境互作是：环境*基因的效应

另外还有重复力效应（个体永久环境效应）、母体效应、窝别效应等等，都是使用表型数据剖分的形式进行计算和评估。

很多人分析数据，想看一下显著性与否，显著的话就说明有差异，具体差异是多少，需要进行多重比较。所以，先要有方差分析，才有显著性，只有显著了，才可以进行多重比较。先后顺序不能错。

方差分析，还有一定的前提假定。需要进行检验。

方差分析后，多重比较也有很多方法。

好在，现在的R语言足够友好，各种功能都已经打包好了，直接拿来用就行了。

下面看我的总结：

1. 方差分析的假定

上面这个思维导图，也可以看出，方差分析有三大假定：正态，独立和齐次，如果不满足，可以使用广义线性模型或者混合线性模型，或者广义线性混合模型去分析。

「本次我们的主题有：」

2. 数据来源

这里，我们使用的数据来源于R包agridat，它是讲农业相关的论文，书籍中相关的数据收集在了一起，更加符合我们的背景。

包的下载地址：https://cran.r-project.org/web/packages/agridat/index.html

「包的介绍」

❝「agridat: Agricultural Datasets」Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from small-plot trials, multi-environment trials, uniformity trials, yield monitors, and more. ❞

「包的安装方式：」

install.packages("agridat")

3. 单因素方差分析

❝比如一个处理有A，B，C三个水平，观测值y，想看一下这个处理是否达到显著性水平，这就可以用到方差分析了。 ❞

「数据描述：」

❝Corn yield and nitrogen fertilizer treatment with field characteristics for the Las Rosas farm, Rio Cuarto, Cordoba, Argentina. ❞

data(lasrosas.corn)
dat <- lasrosas.corn
str(dat)

「数据结构：」

> str(dat)
'data.frame': 3443 obs. of  9 variables:
 $ year : int  1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
 $ lat  : num  -33.1 -33.1 -33.1 -33.1 -33.1 ...
 $ long : num  -63.8 -63.8 -63.8 -63.8 -63.8 ...
 $ yield: num  72.1 73.8 77.2 76.3 75.5 ...
 $ nitro: num  132 132 132 132 132 ...
 $ topo : Factor w/ 4 levels "E","HT","LO",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ bv   : num  163 170 168 177 171 ...
 $ rep  : Factor w/ 3 levels "R1","R2","R3": 1 1 1 1 1 1 1 1 1 1 ...
 $ nf   : Factor w/ 6 levels "N0","N1","N2",..: 6 6 6 6 6 6 6 6 6 6 ...

这里数据有很多列，但是我们要演示单因素方差分析，这里的因素为nf，自变量（Y变量）是yield，想要看一下nf的不同水平是否达到显著性差异。

「建模：」

Y变量：yield

因子：nf

「R中的建模代码：」

 m1 = aov(yield ~ nf, data=dat)

m1为模型保存的名称
aov为R中的方差分析代码
yield为数据中的Y变量，这里是yield
~，波浪号前面为Y变量，后面为X变量
nf为分析的因子变量
dat为数据

「结果：」

> m1 = aov(yield ~ nf, data=dat)
> summary(m1)
              Df  Sum Sq Mean Sq F value   Pr(>F)    
nf             5   23987    4797    12.4 6.08e-12 ***
Residuals   3437 1330110     387                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

结果可以看出，nf之间达到极显著水平，可以进行多重比较。

「方差分析的显著性和多重比较有何关系？？？」

❝方差分析，一般会得到显著性（即P值小于0.05，说明显著，小于0.01，说明极显著，大于0.05，说明不显著），显著的意思是因素之间至少有一对水平达到显著性差异，具体是那一对呢？有几对呢？这就需要用到多重比较。所以，多重比较是在方差分析达到显著性之后进行的，只有显著了（P值小于0.05）才有能进行多重比较。多重比较的方法有很多，比如LSD，Tukey，Bonferroni，Holm等等，我们后面系统的介绍。 ❞

4. 单因素随机区组

这里区组的设置，可以控制一些环境误差。

「数据描述：」

❝Switchback trial in dairy with three treatments ❞

data(lucas.switchback)
dat <- lucas.switchback
str(dat)

「数据结构：」

> str(dat)
'data.frame': 36 obs. of  5 variables:
 $ cow   : Factor w/ 12 levels "C1","C10","C11",..: 1 5 6 7 8 9 10 11 12 2 ...
 $ trt   : Factor w/ 3 levels "T1","T2","T3": 1 2 3 1 2 3 1 2 3 1 ...
 $ period: Factor w/ 3 levels "P1","P2","P3": 1 1 1 1 1 1 1 1 1 1 ...
 $ yield : num  34.6 22.8 32.9 48.9 21.8 25.4 30.4 35.2 30.8 38.7 ...
 $ block : Factor w/ 3 levels "B1","B2","B3": 1 1 1 1 1 1 2 2 2 3 ...

「建模：」

Y变量：yield

因子：trt

区组：block

「R中的建模代码：」

m2 = aov(yield ~ block +trt, data=dat)
summary(m2)

「结果：」

> summary(m2)
            Df Sum Sq Mean Sq F value Pr(>F)  
block        2   30.9   15.46   0.306 0.7385  
trt          2  273.8  136.88   2.709 0.0823 .
Residuals   31 1566.1   50.52                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

结果可以看出，trt之间达到没有极显著水平。

5. 二因素方差分析（无交互）

这里区组的设置，可以控制一些环境误差。

无交互的意思是，二因素之间不考虑互作。

「数据描述：」

❝Switchback trial in dairy with three treatments ❞

data(lucas.switchback)
dat <- lucas.switchback
str(dat)

「数据结构：」

> str(dat)
'data.frame': 36 obs. of  5 variables:
 $ cow   : Factor w/ 12 levels "C1","C10","C11",..: 1 5 6 7 8 9 10 11 12 2 ...
 $ trt   : Factor w/ 3 levels "T1","T2","T3": 1 2 3 1 2 3 1 2 3 1 ...
 $ period: Factor w/ 3 levels "P1","P2","P3": 1 1 1 1 1 1 1 1 1 1 ...
 $ yield : num  34.6 22.8 32.9 48.9 21.8 25.4 30.4 35.2 30.8 38.7 ...
 $ block : Factor w/ 3 levels "B1","B2","B3": 1 1 1 1 1 1 2 2 2 3 ...

「建模：」

Y变量：yield
因子1：trt
因子2：period
区组：block

「R中的建模代码：」

m3 = aov(yield ~ block +trt +period, data=dat)
summary(m3)

「结果：」

> summary(m3)
            Df Sum Sq Mean Sq F value Pr(>F)  
block        2   30.9   15.46   0.308 0.7374  
trt          2  273.8  136.88   2.725 0.0823 .
period       2  109.5   54.74   1.090 0.3497  
Residuals   29 1456.6   50.23                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

结果可以看出，trt之间达到没有极显著水平，period之间没有达到显著性水平。

6. 二因素方差分析（有交互）

这里区组的设置，可以控制一些环境误差。

交互的意思是，二因素之间考虑互作。

「数据描述：」

❝Switchback trial in dairy with three treatments ❞

data(lucas.switchback)
dat <- lucas.switchback
str(dat)

「数据结构：」

> str(dat)
'data.frame': 36 obs. of  5 variables:
 $ cow   : Factor w/ 12 levels "C1","C10","C11",..: 1 5 6 7 8 9 10 11 12 2 ...
 $ trt   : Factor w/ 3 levels "T1","T2","T3": 1 2 3 1 2 3 1 2 3 1 ...
 $ period: Factor w/ 3 levels "P1","P2","P3": 1 1 1 1 1 1 1 1 1 1 ...
 $ yield : num  34.6 22.8 32.9 48.9 21.8 25.4 30.4 35.2 30.8 38.7 ...
 $ block : Factor w/ 3 levels "B1","B2","B3": 1 1 1 1 1 1 2 2 2 3 ...

「建模：」

Y变量：yield
因子1：trt
因子2：period
区组：block

「R中的建模代码：」

m4 = aov(yield ~ block +trt*period, data=dat)
summary(m4)

注意，这里的trt*period是R中公式的简写，表示trt + period + trt:period，其中trt：period表示互作效应。

「结果：」

> summary(m4)
            Df Sum Sq Mean Sq F value Pr(>F)  
block        2   30.9   15.46   0.339 0.7160  
trt          2  273.8  136.88   2.997 0.0681 .
period       2  109.5   54.74   1.199 0.3183  
trt:period   4  315.0   78.75   1.725 0.1761  
Residuals   25 1141.6   45.66                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

结果可以看出，trt之间达到没有极显著水平，period之间没有达到显著性水平，trt和period交互没有达到显著水平。

7. 多因素方差分析

这里区组的设置，可以控制一些环境误差。

多于两个因素的方差分析

「数据描述：」

❝Switchback trial in dairy with three treatments ❞

data(lucas.switchback)
dat <- lucas.switchback
str(dat)

「数据结构：」

> str(dat)
'data.frame': 36 obs. of  5 variables:
 $ cow   : Factor w/ 12 levels "C1","C10","C11",..: 1 5 6 7 8 9 10 11 12 2 ...
 $ trt   : Factor w/ 3 levels "T1","T2","T3": 1 2 3 1 2 3 1 2 3 1 ...
 $ period: Factor w/ 3 levels "P1","P2","P3": 1 1 1 1 1 1 1 1 1 1 ...
 $ yield : num  34.6 22.8 32.9 48.9 21.8 25.4 30.4 35.2 30.8 38.7 ...
 $ block : Factor w/ 3 levels "B1","B2","B3": 1 1 1 1 1 1 2 2 2 3 ...

「建模：」

Y变量：yield
因子1：trt
因子2：period
因子3：cow
区组：block

「R中的建模代码：」

m5 = aov(yield ~ block + trt*period + cow, data=dat)
summary(m5)

注意，这里的trt*period是R中公式的简写，表示trt + period + trt:period，其中trt：period表示互作效应。

「结果：」

> summary(m5)
            Df Sum Sq Mean Sq F value   Pr(>F)    
block        2   30.9   15.46  11.155 0.000926 ***
trt          2  273.8  136.88  98.739 9.96e-10 ***
period       2  109.5   54.74  39.486 6.49e-07 ***
cow          9 1428.8  158.76 114.523 8.29e-13 ***
trt:period   4    5.6    1.41   1.015 0.429042    
Residuals   16   22.2    1.39                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

结果可以看出，trt之间达到极显著水平，period之间达到显著性水平，trt和period交互没有达到显著水平，cow达到极显著水平。

8. 裂区试验方差分析

裂区试验，包括主区（wplot）和裂区（splot），其中裂区是镶嵌在主区中的，主区和裂区的残差不一样，在模型中需要特殊定义。

「数据描述：」

❝Strip-split plot of barley with fertilizer, calcium, and soil factors. ❞

data(cox.stripsplit)
dat <- cox.stripsplit
str(dat)

「数据结构：」

> str(dat)
'data.frame': 96 obs. of  5 variables:
 $ rep    : Factor w/ 4 levels "R1","R2","R3",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ soil   : Factor w/ 3 levels "S1","S2","S3": 1 1 1 1 1 1 1 1 2 2 ...
 $ fert   : Factor w/ 4 levels "F0","F1","F2",..: 1 1 2 2 3 3 4 4 1 1 ...
 $ calcium: Factor w/ 2 levels "C0","C1": 1 2 1 2 1 2 1 2 1 2 ...
 $ yield  : num  4.91 4.63 4.76 5.04 5.38 6.21 5.6 5.08 4.94 3.98 ...

「建模：」

Y变量：yield
主区：fert
裂区：soil
区组：brep

「R中的建模代码：」

m6 = aov(yield ~ rep + soil*fert + Error(rep/fert),data=dat)
summary(m6)

注意，这里的Error(rep/fert)是R中公式的定义残差，主要用于不同因素使用不同残差的情况，这里fert是主区。

「结果：」

> summary(m6)

Error: rep
    Df Sum Sq Mean Sq
rep  3   6.28   2.093

Error: rep:fert
          Df Sum Sq Mean Sq F value Pr(>F)  
fert       3  7.221  2.4071   3.562 0.0604 .
Residuals  9  6.082  0.6758                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
          Df Sum Sq Mean Sq F value  Pr(>F)   
soil       2  1.927  0.9633   7.155 0.00146 **
soil:fert  6  0.688  0.1147   0.852 0.53433   
Residuals 72  9.693  0.1346                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

结果可以看出，soil达到极显著，fert不显著，soil和fert的互作不显著。

9. 正态性检验

方差分析中，结果是否可信，在于数据是否满足假定条件。方差分析的假定包括数据正态性，数据的方差齐性，数据的独立性，其中可以检验的假定有：

数据的正态性
数据的齐性

这里，我们介绍如何对数据的正态性进行检验。

可以使用球性检验（Shapiro-Wilk）检验数据的正态性，也可以用qqplot查看残差的图，判断数据的正态性，也可以对数据做直方图，查看数据的正态性。

「数据描述：」

❝A classical N, P, K (nitrogen, phosphate, potassium) factorial experiment on the growth of peas conducted on 6 blocks. Each half of a fractional factorial design confounding the NPK interaction was used on 3 of the plots. ❞

data(npk)
dat <- npk
str(dat)

「数据结构：」

> str(dat)
'data.frame': 24 obs. of  5 variables:
 $ block: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ N    : Factor w/ 2 levels "0","1": 1 2 1 2 2 2 1 1 1 2 ...
 $ P    : Factor w/ 2 levels "0","1": 2 2 1 1 1 2 1 2 2 2 ...
 $ K    : Factor w/ 2 levels "0","1": 2 1 1 2 1 2 2 1 1 2 ...
 $ yield: num  49.5 62.8 46.8 57 59.8 58.5 55.5 56 62.8 55.8 ...

一般分析时，我们仅对Y变量进行正态性检验，如果是单因素或者多因素，也可以根据因素分组进行正态性检验，数据量大时，对于稍微偏态的数据，即使不太符合正态分布，也不影响结论。

这里，我们对yield进行正态性检验

「作直方图」

hist(dat$yield)

可以看到，yield大体符合正态分布。

「做qq图」

使用car软件包中的qqPlot函数。

library(car)
qqPlot(dat$yield)

可以看到，数据基本落在置信区间里面，数据符合正态分布。

「使用Shapiro-Wilk」检验数据正态分布

> shapiro.test(dat$yield)

 Shapiro-Wilk normality test

data:  dat$yield
W = 0.97884, p-value = 0.8735

可以看到，P值为0.8735，数据符合正态分布，与上图显示的结果一致。

10. 齐性检验

方差分析中，我们对结果是否自信，在于数据是否满足假定条件，方差分析的假定条件包括数据正态性，数据的方差齐性，数据的独立性，其中可以检验的假定有：

数据的正态性
数据的齐性

这里，我们介绍如何对数据的齐性进行检验。齐性检验，是检验不同样本的总体方差是否相同，是根据特定的模型，需要考虑不同的因子放到模型中，不能单独对某一列变量进行齐性检验。

「齐性检验：」

Bartlet检验
Levene检验

「数据描述：」

❝A classical N, P, K (nitrogen, phosphate, potassium) factorial experiment on the growth of peas conducted on 6 blocks. Each half of a fractional factorial design confounding the NPK interaction was used on 3 of the plots. ❞

data(npk)
dat <- npk
str(dat)

「数据结构：」

> str(dat)
'data.frame': 24 obs. of  5 variables:
 $ block: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ N    : Factor w/ 2 levels "0","1": 1 2 1 2 2 2 1 1 1 2 ...
 $ P    : Factor w/ 2 levels "0","1": 2 2 1 1 1 2 1 2 2 2 ...
 $ K    : Factor w/ 2 levels "0","1": 2 1 1 2 1 2 2 1 1 2 ...
 $ yield: num  49.5 62.8 46.8 57 59.8 58.5 55.5 56 62.8 55.8 ...

比如上面数据中，相对N进行单因素方差分析，查看不同N的水平是否满足方差齐性，可以这样操作：

「Bartlett检验」

Bartlett检验可以比较多个总体的方差

bartlett.test(yield ~ N,data=dat)

结果：

> bartlett.test(yield ~ N,data=dat)

 Bartlett test of homogeneity of variances

data:  yield by N
Bartlett's K-squared = 0.057652, df = 1, p-value = 0.8102

结果可以看出，不同的N之间，方差满足齐性要求。

「Levene检验」

Bartlett检验对数据的正态性非常敏感，而Levene检验是一种非参数检验方法，使用范围更广。

library(car)
leveneTest(yield ~ N, data=dat)

结果：

> leveneTest(yield ~ N, data=dat)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.0054 0.9421
      22

结果可以看出，P值为0.9421，大于0.05，说明数据符合方差齐性。

11. 多重比较

这里，我们介绍一下方差分析中的多重比较方法。

「agricolae包」

❝This package contains functionality for the Statistical Analysis of experimental designs applied specially for field experiments in agriculture and plant breeding. ❞

「多重比较方法：」

LSD
scheffe
HSD（Tukey）
SNK
Duncan

「数据描述：」

❝A classical N, P, K (nitrogen, phosphate, potassium) factorial experiment on the growth of peas conducted on 6 blocks. Each half of a fractional factorial design confounding the NPK interaction was used on 3 of the plots. ❞

data(npk)
dat <- npk
str(dat)

「数据结构：」

> str(dat)
'data.frame': 24 obs. of  5 variables:
 $ block: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ N    : Factor w/ 2 levels "0","1": 1 2 1 2 2 2 1 1 1 2 ...
 $ P    : Factor w/ 2 levels "0","1": 2 2 1 1 1 2 1 2 2 2 ...
 $ K    : Factor w/ 2 levels "0","1": 2 1 1 2 1 2 2 1 1 2 ...
 $ yield: num  49.5 62.8 46.8 57 59.8 58.5 55.5 56 62.8 55.8 ...

「方差分析」这里对N进行单因素方差分析，block为区组：

> m9 = aov(yield ~ block + N, data=dat)
> summary(m9)
            Df Sum Sq Mean Sq F value Pr(>F)   
block        5  343.3   68.66   3.395 0.0262 * 
N            1  189.3  189.28   9.360 0.0071 **
Residuals   17  343.8   20.22                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

结果可以看到，N因素达到极显著水平

11.1 LSD多重比较

❝Multiple comparisons of treatments by means of LSD and a grouping of treatments. The level by alpha default is 0.05. Returns p-values adjusted using one of several methods ❞

# LSD
re1 = LSD.test(m9,"N")
re1$groups

结果：

> re1 = LSD.test(m9,"N")
> re1$groups
     yield groups
1 57.68333      a
0 52.06667      b

11.2 scheffe多重比较

❝Scheffe 1959, method is very general in that all possible contrasts can be tested for significance and confidence intervals can be constructed for the corresponding linear. The test is conservative. ❞

代码：

(scheffe.test(m9,"N"))

结果

> (scheffe.test(m9,"N"))
$statistics
   MSerror Df        F   Mean       CV  Scheffe CriticalDifference
  20.22284 17 4.451322 54.875 8.194955 2.109816           3.873379

$parameters
     test name.t ntr alpha
  Scheffe      N   2  0.05

$means
     yield      std  r  Min  Max   Q25   Q50    Q75
0 52.06667 5.377957 12 44.2 62.8 48.30 52.35 55.625
1 57.68333 5.791347 12 48.8 69.5 54.85 57.85 60.350

$comparison
NULL

$groups
     yield groups
1 57.68333      a
0 52.06667      b

11.3 Tukey多重比较

❝It makes multiple comparisons of treatments by means of Tukey. The level by alpha default is 0.05. ❞

代码：

# Turkey
(HSD.test(m9,"N"))

结果

> (HSD.test(m9,"N"))
$statistics
   MSerror Df   Mean       CV      MSD
  20.22284 17 54.875 8.194955 3.873379

$parameters
   test name.t ntr StudentizedRange alpha
  Tukey      N   2          2.98373  0.05

$means
     yield      std  r  Min  Max   Q25   Q50    Q75
0 52.06667 5.377957 12 44.2 62.8 48.30 52.35 55.625
1 57.68333 5.791347 12 48.8 69.5 54.85 57.85 60.350

$comparison
NULL

$groups
     yield groups
1 57.68333      a
0 52.06667      b

11.4 SNK多重比较

❝SNK is derived from Tukey, but it is less conservative (finds more differences). Tukey controls the error for all comparisons, where SNK only controls for comparisons under consideration. The level by alpha default is 0.05. ❞

代码：

# SNK
(SNK.test(m9,"N"))

结果

> (HSD.test(m9,"N"))
$statistics
   MSerror Df   Mean       CV      MSD
  20.22284 17 54.875 8.194955 3.873379

$parameters
   test name.t ntr StudentizedRange alpha
  Tukey      N   2          2.98373  0.05

$means
     yield      std  r  Min  Max   Q25   Q50    Q75
0 52.06667 5.377957 12 44.2 62.8 48.30 52.35 55.625
1 57.68333 5.791347 12 48.8 69.5 54.85 57.85 60.350

$comparison
NULL

$groups
     yield groups
1 57.68333      a
0 52.06667      b

11.5 Duncan多重比较

❝This test is adapted from the Newman-Keuls method. Duncan's test does not control family wise error rate at the specified alpha level. It has more power than the other post tests, but only because it doesn't control the error rate properly. The Experimentwise Error Rate at: 1-(1-alpha)^(a-1); where "a" is the number of means and is the Per-Comparison Error Rate. Duncan's procedure is only very slightly more conservative than LSD. The level by alpha default is 0.05. ❞

代码：

# Duncan
(duncan.test(m9,"N"))

结果

> (duncan.test(m9,"N"))
$statistics
   MSerror Df   Mean       CV
  20.22284 17 54.875 8.194955

$parameters
    test name.t ntr alpha
  Duncan      N   2  0.05

$duncan
    Table CriticalRange
2 2.98373      3.873379

$means
     yield      std  r  Min  Max   Q25   Q50    Q75
0 52.06667 5.377957 12 44.2 62.8 48.30 52.35 55.625
1 57.68333 5.791347 12 48.8 69.5 54.85 57.85 60.350

$comparison
NULL

$groups
     yield groups
1 57.68333      a
0 52.06667      b