Abstract

How to query COMPASS from R

Introduction

Installation

rcompass installation from github requires the devtools package to be installed.

if(!requireNamespace("devtools")) install.packages("devtools")
devtools::install_github("onertipaday/rcompass")

Getting help

To get help open an issue on the rcompass github page.

The resource

COMPASS GraphQL Endpoint COMPASS documentation

The package

COMPASS (COMpendia Programmatic Access Support Software) is a software layer that provides a GraphQL endpoint to query compendia built using COMMAND>_ technology 1. COMPASS is meant to be the barebone interface on top of a compendium database from a single location with a unified output format. The rcompass package make this possible from within R. It relies on both the httr and ghql packages for querying the GraphQL interface, access and retrieve data. The rcompass package is build around a bunch of functions enable to query the compendium of interest and retrieve, store and manipulate data. Each of these functions are described in more details in their respective manual pages. We start by loading the package.

library(rcompass)
#> 
#> This is 'rCOMPASS' version 0.0.9 based on COMPASS version 0.9.4.

The COMPASS GraphQL endpoint might hosts different compendia. At the moment there is only the VESPUCCI compendium, but there are different version of VESPUCCI, and each version might have data normalized in different ways. In this case there are 2 versions of VESPUCCI, version 1.0 (legacy) and version 2.0 (latest). The latter has data normalized in 2 different ways, TPM normalization and LIMMA (the default one) while the legacy version has the legacy normalization only (i.e. per-sample logratios). For every query we will need to indicate the compendium we want to use, if no version, no normalization and no database is specified the default values (version 2.0, normalization limma) will be used.

We can use rcompass to retrieve some stats about VESPUCCI. The Vitis gene expression compendium version 2.0 normalized contains values for 29090 biological features (genes), measured for 1689 sample sets. This corresponds to a total of 160 experiments and 4101 samples measured on 53 different platforms.