Federated GMQL

GenoMetric Query Language (GMQL) is a novel query language for multi-sample integration and processing of heterogeneous datasets of genomic features and known annotations. Federated GMQL extends the single-site system to include communicating GMQL servers, and supports distributed processing of genomic datasets stored at several federated data sources; the resulting system amplifies the capability of data exploration, sharing and reuse beyond the boundaries of single data centers, and as such is a significant step forward in the development of genomic data management for supporting modern research and clinical practice, whose players naturally act in federations.


From the model point of view, federated datasets are added to public and private datasets; they can be transparently inspected and browsed by any GMQL system provided with suitable access authorizations, resulting from simple bilateral or multilateral agreements between administrators. From the language point of view, minimal syntactic additions turn single-system queries into federated queries, and default query allocation policies are provided that automatically produce operation allocation with no change to single-system query expression. From the software point of view, both GMQL servers and the new Name server for coordinating data repositories can be easily installed and initialized (also using Docker technology).


Federated GMQL technology includes software solutions for managing distributed naming and access protocols, query splitting, distributed logging, query distribution and data protection policies. Federated GMQL is an autonomous software freely available for non-commercial use as an open source project, under Apache License 2.0 in the GitHub platform.


Full documentation of all GMQL operators and examples of their use are available here.
GMQL System source code and documentation are available here.


Single-site version of GMQL is available here.


Try the Federated GMQL Web interface with a pre-loaded demo query, executed by first launching COMPILE and then launching EXECUTE; query progression is shown by pressing RUNNING at various times. Execution time depends on the load on our servers; it is expected between 6 and 10 minutes. Try Demo

GMQL System is supported by the Data-Driven Genomic Computing (GeCo) project, funded by the European Research Center (ERC) (Advanced ERC Grant 693174).