peacesciencer: Tools and Data for Quantitative Peace Science

peacesciencer  hexlogo

peacesciencer is an R package including various functions and data sets to allow easier analyses in the field of quantitative peace science. The goal is to provide an R package that reasonably approximates what made EUGene so attractive to scholars working in the field of quantitative peace science in the early 2000s. EUGene shined because it encouraged replications of conflict models while having the user also generate data from scratch. Likewise, this R package will offer tools to approximate what EUGene did within the R environment (i.e. not requiring Windows for installation).

Installation

You will ideally soon be able to install this on CRAN, as follows:

install.packages("peacesciencer")

Until then, you can install the development version of this package through the devtools package.

devtools::install_github("svmiller/peacesciencer")

Usage

The package is very much a work in progress. Right now, it has the following functions:

It also has the following data sets:

The workflow is going to look something like this. This is a “tidy”-friendly approach to a data-generating process in quantitative peace science.

First, start with one of two processes to create either dyad-year or state-year data. The dyad-year data are created with the create_dyadyears() function. It has a few optional parameters with hidden defaults. The user can specify what kind of state system (system) data they want to use—either Correlates of War ("cow") or Gledtisch-Ward ("gw"), whether they want to extend the data to the most recently concluded calendar year (mry) (i.e. Correlates of War state system membership data are current as of Dec. 31, 2016 and the script can extend that to the end of 2019), and whether the user wants directed or non-directed dyad-year data (directed).

The create_stateyears() works much the same way, though “directed” and “non-directed” make no sense in the state-year context. Both functions default to Correlates of War state system membership data to the most recently concluded calendar year.

Thereafter, the user can specify what additional variables they want added to these dyad-year or state-year data. Do note: the additional functions lean primarily on Correlates of War state code identifiers. Indeed, the bulk of the quantitative peace science data ecosystem is built around the Correlates of War project. The variables the user wants are added in a “pipe” in a process like this. Do note that the user may want to break up the data-generating process into a few manageable “chunks” (e.g. first generating dyad-year data and saving to an object, adding to it piece by piece).

All told, the process will look something like this.

library(tidyverse, quietly = TRUE)
library(peacesciencer)
library(tictoc)

tic()
create_dyadyears() %>%
  # Add Gleditsch-Ward codes
  add_gwcode_to_cow() %>%
  # Add GML MIDs 
  add_mids() %>%
  # Add trade data
  add_cow_trade() %>%
  # Add capital-to-capital distance
  add_capital_distance() %>%
  # Add contiguity information
  add_contiguity() %>%
  # Add major power data
  add_cow_majors() %>%
  # Add democracy variables
  add_democracy() %>%
  # Add IGOs data
  add_igos() %>%
  # Add National Material Capabilities
  add_nmc() %>%
  # add alliance data from Correlates of War
  add_cow_alliance() %>%
  # you should probably filter to politically relevant dyads earlier than later...
  # Or not, it's your time and computer processor...
  filter_prd()
## # A tibble: 2,063,670 x 75
##    ccode1 ccode2  year gwcode1 gwcode2 dispnum midongoing midonset sidea1 sidea2
##     <dbl>  <dbl> <dbl>   <dbl>   <dbl>   <dbl>      <dbl>    <dbl>  <dbl>  <dbl>
##  1      2     20  1920       2      20      NA          0        0     NA     NA
##  2      2     20  1921       2      20      NA          0        0     NA     NA
##  3      2     20  1922       2      20      NA          0        0     NA     NA
##  4      2     20  1923       2      20      NA          0        0     NA     NA
##  5      2     20  1924       2      20      NA          0        0     NA     NA
##  6      2     20  1925       2      20      NA          0        0     NA     NA
##  7      2     20  1926       2      20      NA          0        0     NA     NA
##  8      2     20  1927       2      20      NA          0        0     NA     NA
##  9      2     20  1928       2      20      NA          0        0     NA     NA
## 10      2     20  1929       2      20      NA          0        0     NA     NA
## # … with 2,063,660 more rows, and 65 more variables: revstate1 <dbl>,
## #   revstate2 <dbl>, revtype11 <dbl>, revtype12 <dbl>, revtype21 <dbl>,
## #   revtype22 <dbl>, fatality1 <dbl>, fatality2 <dbl>, fatalpre1 <dbl>,
## #   fatalpre2 <dbl>, hiact1 <dbl>, hiact2 <dbl>, hostlev1 <dbl>,
## #   hostlev2 <dbl>, orig1 <dbl>, orig2 <dbl>, hiact <dbl>, hostlev <dbl>,
## #   mindur <dbl>, maxdur <dbl>, outcome <dbl>, settle <dbl>, fatality <dbl>,
## #   fatalpre <dbl>, stmon <dbl>, endmon <dbl>, recip <dbl>, numa <dbl>,
## #   numb <dbl>, ongo2010 <dbl>, version <chr>, flow2 <dbl>, flow1 <dbl>,
## #   smoothflow2 <dbl>, smoothflow1 <dbl>, capdist <dbl>, conttype <dbl>,
## #   cowmaj1 <dbl>, cowmaj2 <dbl>, v2x_polyarchy1 <dbl>, polity21 <dbl>,
## #   xm_qudsest1 <dbl>, v2x_polyarchy2 <dbl>, polity22 <dbl>, xm_qudsest2 <dbl>,
## #   dyadigos <dbl>, milex1 <dbl>, milper1 <dbl>, irst1 <dbl>, pec1 <dbl>,
## #   tpop1 <dbl>, upop1 <dbl>, cinc1 <dbl>, milex2 <dbl>, milper2 <dbl>,
## #   irst2 <dbl>, pec2 <dbl>, tpop2 <dbl>, upop2 <dbl>, cinc2 <dbl>,
## #   defense <dbl>, neutrality <dbl>, nonaggression <dbl>, entente <dbl>,
## #   prd <dbl>
toc()
## 31.857 sec elapsed
# state-years now...

tic()
create_stateyears() %>%
  add_gwcode_to_cow() %>%
  add_capital_distance() %>%
  add_contiguity() %>%
  add_cow_majors() %>%
  add_cow_trade() %>%
  add_democracy() %>%
  add_igos() %>%
  add_nmc()
## # A tibble: 16,731 x 24
##    ccode statenme  year gwcode mincapdist  land   sea cowmaj imports exports
##    <dbl> <chr>    <dbl>  <dbl>      <dbl> <dbl> <dbl>  <dbl>   <dbl>   <dbl>
##  1     2 United …  1816      2      5742.     0     0      0      NA      NA
##  2     2 United …  1817      2      5742.     0     0      0      NA      NA
##  3     2 United …  1818      2      5742.     0     0      0      NA      NA
##  4     2 United …  1819      2      5742.     0     0      0      NA      NA
##  5     2 United …  1820      2      5742.     0     0      0      NA      NA
##  6     2 United …  1821      2      5742.     0     0      0      NA      NA
##  7     2 United …  1822      2      5744.     0     0      0      NA      NA
##  8     2 United …  1823      2      5744.     0     0      0      NA      NA
##  9     2 United …  1824      2      5744.     0     0      0      NA      NA
## 10     2 United …  1825      2      5744.     0     0      0      NA      NA
## # … with 16,721 more rows, and 14 more variables: v2x_polyarchy <dbl>,
## #   polity2 <dbl>, xm_qudsest <dbl>, sum_igo_full <dbl>,
## #   sum_igo_associate <dbl>, sum_igo_observer <dbl>, sum_igo_anytype <dbl>,
## #   milex <dbl>, milper <dbl>, irst <dbl>, pec <dbl>, tpop <dbl>, upop <dbl>,
## #   cinc <dbl>
toc()
## 2.977 sec elapsed