wigglescout
is an R library that allows you to calculate summary values across bigWig files and BED files and visualize them in a genomics-relevant manner. It is based on broadly used libraries such as rtracklayer
and GenomicRanges
, among others for calculation, and mostly ggplot2
for visualization. You can look at the DESCRIPTION
file to get more information about all the libraries that make this one possible.
There are also many other tools whose functionality overlaps a little or much with wigglescout
, but there was no single tool that included all that I needed. The aim of this library is therefore not to replace any of those tools, or to provide a silver-bullet solution to genomics data analysis, but to provide a comprehensive, yet simple enough set of tools focused on bigWig files that can be used entirely from the R environment without switching back and forth across tools.
Other tools and libraries for akin purposes that you may be looking for include: deepTools
, SeqPlots
, bwtool
, wiggletools
, and the list is endless!
wigglescout
allows you to summarize and visualize the contents of bigWig files in two main ways:
- Genome-wide. Genome is partitioned on equally-sized bins and their aggregated value is calculated. Useful to get a general idea of the signal distribution without looking at specific places.
- Across sets of loci. This can be either summarized categories, or individual values, as in genome-wide analyses.
wigglescout
functionality is built in two layers. Names of functions that calculate values over bigWig files start with bw_
. These return GRanges
objects when possible, data.frame
objects otherwise (i.e. when values are summarized over some category, genomic location is lost in this process).
On the other hand, functions that plot such values and that usually make internal use of bw_
functions, start with plot_
.
Installation
wigglescout
is a package under active development. You can install it from this repository. For this, you will need remotes
to install it (and devtools
if you plan to work on it):
Additionally, there was an issue in the past with installing dependencies that come from BioConductor
repository. This seems to have been fixed now, but if you run into problems, I recommend installing manually these dependencies before running the installation:
install.packages(('BiocManager'))
BiocManager::install(c('GenomeInfoDb', 'GenomicRanges', 'rtracklayer'))
Then you can install directly from this GitHub repository:
Getting started
The vignettes or online documentation can give a comprehensive overview of what is available in the package. You can check the vignettes with browseVignettes("wigglescout")
.
Troubleshooting
Q: When running install_github
I get the following error:
Error: package or namespace load failed for ‘GenomeInfoDb’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
there is no package called ‘GenomeInfoDbData’
Error: package ‘GenomeInfoDb’ could not be loaded
Execution halted
A: This seemed to be a problem that came from installing Bioconductor
dependencies. A workaround is installing the BioConductor
packages manually: