Welcome to the GPKB RESTful Web API Help Page!

This page will help you better understand the usage of the read-only Web API that is designed both to get a complete insight of the Genomic and Proteomic Knowledge Base (GPKB) and to allow its comprehensive querying via HTTP requests.

A Web application for the GPKB is also available at GPKB Web site.

API Structure

The API is made up of two distinct parts:

Multiple databases are available to the user on both parts of the API. The usage of the different database is defined in the Database Selection section.

Note:

The data warehouse has a complex structure and changed by time in order to support the original data source information and efficiently extract the data. The REST services divided into to two sub-sections, in order to support the data warehouse.
The first service (metadata API) provides the structural information of the selected data warehouse version. The structural information of the annotations and the feature is extracted by using these services and each client should always start using the API from using this services.
The second service (public API) provides the annotations that are imported into the data warehouse. By compiling the XML structure, that is defined in the metadata API, client import the data from the services.

Database Selection

The API is available to the user over different databases. In order to get the list and the description of the databases, the user should call the resource below:

/GPKB-REST/rest/resources/registered-databases

The result contains for each database:

In order to call the API for a specific database, the user calls the URI with the parameter "db-handle".


Example:

The XSD definition of the API:

Database XSD, database XSD documentation and root element registered_databases.

Metadata API

The metadata API exposes first of all the list of all the features contained in the GPKB: genes, proteins , dna sequences and so on. For each feature, is then possible to get the metadata details regarding the specific feature selected; namely the list and the structure of tables containing the information about the feature and the associations with the other features in which is involved.

API outline

Content negotiation: XML, Application-XML.

1) GET features list

2) GET specific feature details

3) GET specific feature association details

4) GET specific encoded field values list


Public data API

(Go back to the Top)

The public data API, by means of a single parametric service, allows for the comprehensive querying of the data warehouse.

Content negotiation: XML, Application-XML.

The resource URI for the public API is /GPKB-REST/rest/resources/selections .
This api takes media type as XML and returns XML(application/xml). The header fields of Content-Type and Accept.

All the selections are to be posted in the request payload and all the selection posted in the XML have to conform the following XML Schema grammar:
Download public XSD, public XSD documentation and root element feature_details.

Vocabulary of XML Schema for selections

The following table describes only the most important tags. The explanation is omitted for the intuitive ones.

Xml Schema tag Description
feature By adding the following element and by specifying the proper feature name (in the name attribute) to the selections the user can choose the specific feature/s to include in the query. For each feature the user can then choose the tables to query (attribute_groups element) and for each table he/she can specify the attributes (attribute element) of interest. Finally the user can specify also filter options on these attributes, using the options available in the selected_options element.
Furthermore in case of queries involving different instances of the same specific feature (the so called "alias queries") the user has to specify an alias for each feature involved so as to distinguish among the different instances of the features having the same name. The alias name has to conform the following syntax: "[feature_name]_[counter]".
feature_association By adding the following element and by specifying the proper feature association name (in the name attribute) to the selections the user can choose the specific feature association/s to include in the query. For each association the user can then choose the tables to query (attribute_groups element) and for each table he/she can specify the attributes (attribute element) of interest. Finally the user can specify also filter options on these attributes, using the options available in the selected_options element.
Furthermore in case of queries involving different instances of the same specific feature association(the so called "alias queries") the user has to specify an alias for each feature association involved so as to distinguish among the different instances of the associations having the same name. The alias name has to conform the following syntax: "[feature1_name]_[counter]TO[feature2_name]_[counter]" or "[feature1_name]_[counter]to[feature2_name]_[counter]"
queries_general_options The following element allows to specify general options regarding the query. E.g: DISTINCT, LIMIT, OFFSET, ORDER BY. Furthermore the element only_matching, when set to "TRUE" allows to return only INNER JOINS between all the selected tables. Instead, the counting element allows to choose between an exact versus an estimated total count as concerns the global number of rows returned as resultSet for the query. Anyway, for counting queries that have a cost higher than a pre-defined threshold, even though the user chose an exact count the service will return an approximate count in order not to negatively impact on the response time.
alias_general_options The following element allows to specify advanced options regarding the joining of tables, in case of alias queries. In particular, by choosing the join tables and join attributes of interest, it is possible to join tables that aren't directly joining with each other, by means of adding WHERE clauses to the generated SQL statement. < >

Returned results grammar

The XML returned by the service, containing the results of the query together with some metadata regarding the results conforms to the following XML Schema grammar

Download result XSD, schema documentation and root element result.

Vocabulary of XML Schema for showing results

The following table describes only the most important tags. The explanation is omitted for the intuitive ones.

Xml Schema tag Description
entries_total_count This element keeps track of the total number of rows that are returned by the query. N.B. Unless expressed otherwise, the count returned will always be exact. But there are cases in which even if the user selected an exact count, the service will return an approximate value for the count; namely if the exact total count query exceeds a predefined response time threshold. This value is extracted from the information returned after launching a SQL EXPLAIN query.
entries_showed_count This element shows the number of rows that are actually returned by the query, according to the limit option specified at selection time.
attributes_group_names_list This element contains the metadata regarding the results returned, namely: the columns selected, "attribute" tag, and the belonging table names/table aliases.
rows_group This element contains the collection of rows that represent the actual result of the query. In the resultSet, for each row, is present the row number and the list of tables involved each of them containing the results data of the query.

Example queries for the public data

Example 1:

In this example, the user select all the biological function feature (gene ontology) where name has "DNA replication". The "%" sign is used to define wildcards (missing letters) both before and after the pattern. The result is ordered by name of the biological function feature.
These examples shows the solution step by step.
Example 1a
Example 1b
Example 1c

Scenario

In this scenario, the extraction of the gene ontology (biological function feature) and their related genes is shown step by step.
  1. First of all, the user has to check which features are available in the selected database version. The features are extracted from the link as defined in the get features list section.
    The result contains all the features and their links to extract detailed information. In this scenario, the features "biological function feature" and "gene" are used, and the links to get the details of these features are: Note: The database selection is available over query string of URL, with parameters db-handle. It is defined in the Database Selection section.
  2. From the biological_function_feature details result, the feature tables' (in the attribute group tag) source id, source name, name and description are the starting point for our query. All features in the databases has the couple "{feature}_id" and "{feature}_source" defines an element in an external datasource e.g. biological_function_feature_id and biological_function_feature_source (in the older version these columns are source_di and source_name, respectively).
    The feature ID the is the original data source ID and feature source is the name of the source for the feature element.
    The feature tag is added into XML with name attribute "biological_function_feature" and the attribute group element is compiled with the name biological_function_feature and the type feature table as inner element to the feature tag.
    NO-JAVASCRIPT-ERROR
    where the attributes are:
    NO-JAVASCRIPT-ERROR
    The full query is in the example_1a.xml and this query XML can be run from the link /GPKB-REST/rest/resources/selections as POST request. Please check Public data API section for further information and header setting should be done as in the section.
    Note: The attributes from the result of the metadata feature details API are added as attribute inner element to the attribute groups. The XML for the public API is compatible with XML result of the feature details API. They are defined with the same XSD.
  3. In this step, to query the information about the biological function feature whose name contains "DNA replication". New filtering option is added to previous example. A new sub "selected_options" element is added to the name attribute with.
    NO-JAVASCRIPT-ERROR
    The full query is in the example_1b.xml.
  4. The queries general option is optional however counting set as "exact" to count exactly the total number of the rows from the query defined in the XML. And the result should be ordered by the column defined attribute groups. The information has been defined in the queries general option tag.
    NO-JAVASCRIPT-ERROR
    The full query is in the example_1c.xml.
    Note: It is better select the first one exact the others estimated. The count query time should be only in the first run, and in the next call, it is better to run with estimated count.
  5. From the result of the biological_function_feature feature details (first step), it has related associated enzyme, gene, pathway and etc. And the association links tag, the URL to get further information about the association.
    For the further analysis, the genes that are associated to the gene ontology are added to the query. The gene feature is added to the query. The feature details are extracted from the link. In this example, source id, source name, name and symbol of the gene are selected and added to the query. And we filtered taxonomy as "Homo sapiens" but not return in the result by defining filter_only of attribute element.
    NO-JAVASCRIPT-ERROR
    This condition is defined as the request XML in the example2.xml.
    Note: The possible taxonomy id are found in the link below. This link has been extracted from the details query of the gene feature. rest/resources/features/encoded/gene/taxonomy_id and from this link user retrieve the taxonomies starts with "homo" from the link: rest/resources/features/encoded/gene/taxonomy_id/search/startsWith/homo/limit/5/offset/0
  6. The analysis is continued with extraction of evidence and qualifier of the gene and gene ontology association. In order to find, these attributes defined in the data warehouse, the gene and biological_function_feature association details is checked from the link. These attributes available in the "pub_ref_4_gene2biological_function_feature" attribute group.
    NO-JAVASCRIPT-ERROR
    It is available in the example3.xml.
  7. The features which has ontology definition can be unfolded. The features with ontology defined in feature details with the element ontology true defined under the feature element. For the biological_function_feature defined as below in the link.
    NO-JAVASCRIPT-ERROR
    If the feature is ontology and it is possible to extend the query semantically. this can be defined in the feature as below:
    NO-JAVASCRIPT-ERROR
    Full query is available in the example4.xml.
    Note: The feature descendant column is available with the column and the suffix "_descendant". It is also available the unfolded feature information of a single query. It is avaible as below and full query is available in the example5.xml.
    NO-JAVASCRIPT-ERROR