About

ChIP-seq technology is nowadays routinely used to identify DNA-protein interaction and chromatin modifications, while DNase-seq is one of the most prominent methods to identify open chromatin regions. The output of both these techniques is a list of enriched regions, whose statistical significance is usually quantified with a score (p-value). Typically, only enriched regions whose significance is above a user-defined threshold (e.g. p-value < 10^-8) are considered. However, the simultaneous presence of an enriched region in replicate experiments would justify a local decrease of the stringency criterion, leveraging on the principle that repeated evidence is compensating for weak evidence.

MuSERA is a tool that optimally implements multiple-sample enriched region analysis, with a user-friendly graphical interface. It jointly analyzes the enriched regions of multiple replicates, distinguishing between biological and technical replicates, and accepting user-defined parameters: weak (T^w) and stringent (T^s) significance thresholds, combined significance threshold (γ), minimum number of replicates where the overlapping enriched regions should be present(C), and multiple-testing correction threshold (α). The output of MuSERA consists in sample-specific lists of enriched regions, which account for the presence of replicates. These lists can be used to generate several automatic plots and visualized in a genome browser integrated in the tool. The enriched regions are classified as stringent-confirmed, stringent-discarded, weak-confirmed and weak-discarded based on the combined statistical significance obtained over replicates, evaluated with the Fisher's method.

Given reference annotations, MuSERA supports the positional analysis of the enriched regions in each output list, with respect to the reference annotations, and complements it with illustrative plots. Results are exported in standard BED files and/or in an XML file with detailed information about each of the regions; furthermore, each plot can be exported as a high resolution image.

The goal of MuSERA is to facilitate the investigation of results following different parameter choices by integrating data visualization in a genome browser, functional analysis with user-chosen annotation, and nearest neighbor search. Additionally, MuSERA provides means of combining a large collection of replicates in different sessions (independently from each other) in a batch process defined using simplified XML structure. This feature facilitates the analysis of a large collection of samples, each with its own parameters, with no requirement of coding/scripting knowledge.

The tool is implemented as open-source project and is freely available for download from the download section. For source code, discussions, or issues please refer to the CodePlex page of the project.

Authors

Contacts

Vahid Jalili: vahid DOT jalili AT polimi DOT it