Correlation Types


This vignette can be cited as:

citation("correlation")
> 
> To cite 'correlation' in publications use:
> 
>   Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2019).
>   Methods and Algorithms for Correlation Analysis in R. Journal of Open
>   Source Software, 5(51), 2306. doi:10.21105/joss.02306
> 
> A BibTeX entry for LaTeX users is
> 
>   @Article{,
>     title = {Methods and Algorithms for Correlation Analysis in R.},
>     author = {Dominique Makowski and Mattan S. Ben-Shachar and Indrajeet Patil and Daniel Lüdecke},
>     doi = {10.21105/joss.02306},
>     year = {2020},
>     journal = {Journal of Open Source Software},
>     number = {51},
>     volume = {5},
>     pages = {2306},
>     url = {https://joss.theoj.org/papers/10.21105/joss.02306},
>   }

Different Methods for Correlations

Correlations tests are arguably one of the most commonly used statistical procedures, and are used as a basis in many applications such as exploratory data analysis, structural modeling, data engineering, etc. In this context, we present correlation, a toolbox for the R language (R Core Team 2019) and part of the easystats collection, focused on correlation analysis. Its goal is to be lightweight, easy to use, and allows for the computation of many different kinds of correlations, such as:

\[r_{xy} = \frac{cov(x,y)}{SD_x \times SD_y}\]

\[r_{s_{xy}} = \frac{cov(rank_x, rank_y)}{SD(rank_x) \times SD(rank_y)}\]

\[\tau_{xy} = \frac{2}{n(n-1)}\sum_{i<j}^{}sign(x_i - x_j) \times sign(y_i - y_j)\]

\[r_{xy.z} = r_{e_{x.z},e_{y.z}}\]

Comparison

We will fit different types of correlations of generated data with different link strengths and link types.

Let’s first load the required libraries for this analysis.

library(correlation)
library(bayestestR)
library(see)
library(ggplot2)
library(tidyr)
library(dplyr)

Utility functions

generate_results <- function(r, n = 100, transformation = "none") {
  data <- bayestestR::simulate_correlation(round(n), r = r)
  
  if (transformation != "none") {
    var <- ifelse(grepl("(", transformation, fixed = TRUE), "data$V2)", "data$V2")
    transformation <- paste0(transformation, var)
    data$V2 <- eval(parse(text = transformation))
  }
  
  out <- data.frame(n = n, transformation = transformation, r = r)

  out$Pearson <- cor_test(data, "V1", "V2", method = "pearson")$r
  out$Spearman <- cor_test(data, "V1", "V2", method = "spearman")$rho
  out$Kendall <- cor_test(data, "V1", "V2", method = "kendall")$tau
  out$Biweight <- cor_test(data, "V1", "V2", method = "biweight")$r
  out$Distance <- cor_test(data, "V1", "V2", method = "distance")$r
  out$Distance <- cor_test(data, "V1", "V2", method = "distance")$r
  
  out
}

Effect of Relationship Type

data <- data.frame()
for (r in seq(0, 0.999, length.out = 200)) {
  for (n in c(100)) {
    for (transformation in c(
      "none",
      "exp(",
      "log10(1+max(abs(data$V2))+",
      "1/",
      "tan(",
      "sin(",
      "cos(",
      "cos(2*",
      "abs(",
      "data$V2*",
      "data$V2*data$V2*",
      "ifelse(data$V2>0, 1, 0)*("
    )) {
      data <- rbind(data, generate_results(r, n, transformation = transformation))
    }
  }
}


data %>%
  tidyr::pivot_longer(-c(n, r, transformation),
                      names_to = "Type",
                      values_to = "Estimation") %>% 
  dplyr::mutate(Type = forcats::fct_relevel(Type, "Pearson", "Spearman", "Kendall", "Biweight", "Distance")) %>%
  ggplot(aes(x = r, y = Estimation, fill = Type)) +
  geom_smooth(aes(color = Type), method = "loess", alpha = 0) +
  geom_vline(aes(xintercept = 0.5), linetype = "dashed") +
  geom_hline(aes(yintercept = 0.5), linetype = "dashed") +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  see::theme_modern() +
  scale_color_flat_d(palette = "rainbow") +
  scale_fill_flat_d(palette = "rainbow") +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  facet_wrap(~transformation)