The GMQL system is based on the GenoMetric Query Language, a high-level declarative language allowing the expression of queries over genomic regions and their metadata, in a way similar to what can be done with the well-known Relational Algebra and Structured Query Language (SQL) over a relational database. It uses the Genomic Data Model (GDM) which is based on the notion of genomic region, mediates existing data formats, and covers also metadata of arbitrary structure; thanks to these features, GDM is capable to support data interoperability, by describing semantically heterogeneous data.
The GMQL system has a well-designed modular architecture featuring:
The system design is inspired by dominant cloud computing paradigms, which are supported by a variety of next-generation cloud-based data engines. GMQL scripts are translated into such paradigms and then executed; thus, the evolution of GMQL in terms of portability, performance and scalability will be well-supported by the key actors of cloud computing.