Download

GMQL software freely available. Downloads to GNU/Linux systems:

GMQL Version 1.0.1 over Apache Hadoop 1.x
GMQL Version 1.1.2 over Apache Hadoop 2.x with YARN
GQML Services over Apache Hadoop 2.x with YARN, tested on Apache Tomcat 7 with Java 7.

Please refer to:

GMQL Quick Start for GMQL installation
GMQL Tutorial for datasets and data sample management and for a step-by-step explanation of a typical GMQL query.
GMQL Services for GMQL services.documentation and usage.

GMQL Services installation

To install the GMQL Web Services, please orderly do the following:

Install the GMQL shell version v.1.2
If not installed, install the Apache Tomcat 7 server
Add the file setenv.sh (without renaming it) to the $TOMCAT_HOME/bin/ location (e.g. /usr/share/tomcat7/bin/).
Update the content of the setenv.sh file by setting the contained environment variables to their same value used to configure the installation of the GMQL shell version (see GMQL Quick Start)
Change the permissions on the setenv.sh file to make it executable, e.g. by executing the command:
chmod +x /usr/share/tomcat7/bin/setenv.sh
In order for the GMQL Web Services to manipulate the repository configured during installation, the user that executes Tomcat Server (e.g. tomcat7) must have full access (read, write and execution privileges) to the repositorys folder (e.g. /home/gmql_repository/). The user should be either owner of the folder, or be a member of the folders group; the latter one could be done with the scripts below:
- Add the tomcat user (e.g. tomcat7) to the same group (e.g. hadoop) of the repository folder owner, using the command:
  sudo usermod -a -G hadoop tomcat7
- Give permissions to the group (e.g. hadoop) to read and write on the repository folder, using the command:
  sudo chmod -R 775 /home/gmql_repository/
Restart the Apache Tomcat 7 server with the following command: sudo service tomcat7 restart
Upload the GMQL services war file using the Apache Tomcat 7 manager webpage.

The provided GMQL packages (both v.1.0.1 and v.1.2) include four example GMQL queries addressing some typical biological use cases, such as:

Finding ChIP-seq peaks in promoter regions
Finding distal bindings in transcription regulatory regions
Associating transcriptomics and epigenomics
Finding somatic mutations in exons

The packages include also a few small-scale datasets with data and metadata from the ENCODE and TCGA projects, which we provide just for testing the examples and demonstrate the power and flexibility of GMQL at work in a rich set of biological use cases.

Note that GMQL is designed for cloud computing processing of big data in the Hadoop framework (i.e. when used in MapReduce mode).
It shows its assets in particular when it is applied on numerous data samples with many genomic regions and of multiple data types, in order to identify their genomic regions that satisfy given distance constrains.

GMQL can be used also with small data and on non-parallel computing frameworks (i.e. when used in Local mode); in these cases other available tools may show much shorter running times, but then they fail on massive data.

Run examples in two clicks

Within the packages we include two shortcut commands to enable also first-time GMQL users to quickly execute in Local mode the provided examples and see their results.

First click: After GMQL installation (see GMQL Quick Start), before running the four provided examples, the datasets used as input in the examples must be created using the data in the folder GMQLPackage/EXAMPLES/data/. To do so, execute: ./GMQLPackage/EXAMPLES/createInputDataSets.sh
This makes the following four input datasets available in your GMQL user account:

HG19_ANN
HG19_MUT
HG19_PEAK
HG19_RNA

Second click: To run all together the four example GMQL queries, execute: ./GMQLPackage/EXAMPLES/runScriptExamples.sh
After the execution finishes (few seconds), all generated result datasets are shown in the print out:

PROM_HM_TF
TF_res
Genome_space
Exon_res

Each GMQL example materializes only one dataset; so, in total four output datasets, one for each example, are generated in the GMQL repository. From there, their data files can be extracted and placed in a user local folder for their use outside GMQL; this can be done by executing the following command (see Section 1.5 of the GMQL Tutorial):
repositoryManagerV1 CopyDSToLocal <DatasetName> <DestinationLocalFolder>

Thanks to the standard data formats used, both input data samples and generated results can be directly loaded in a Genome Browser (e.g. UCSC Genome Browser, Integrated Genome Browser (IGB), or Integrative Genomics Viewer (IGV)) for their easy visualization, browsing and evaluation.