Multiple databases are available to the user on both parts of the API. The usage of the different database is defined in the Database Selection section.

Note:

The data warehouse has a complex structure and changed by time in order to support the original data source information and efficiently extract the data. The REST services divided into to two sub-sections, in order to support the data warehouse.
The first service (metadata API) provides the structural information of the selected data warehouse version. The structural information of the annotations and the feature is extracted by using these services and each client should always start using the API from using this services.
The second service (public API) provides the annotations that are imported into the data warehouse. By compiling the XML structure, that is defined in the metadata API, client import the data from the services.

Metadata API

The metadata API exposes first of all the list of all the features contained in the GPKB: genes, proteins , dna sequences and so on. For each feature, is then possible to get the metadata details regarding the specific feature selected; namely the list and the structure of tables containing the information about the feature and the associations with the other features in which is involved.

API outline

Content negotiation: XML, Application-XML.

1) GET features list

Resource URI: /GPKB-REST/rest/resources/features
Resource Info:
By sending an HTTP GET request to the above link, the proper service will respond with an XML representation showing all the features' names together with a link (for each feature) to enable access to the details regarding the specific feature.
Required parameters: None.
Returned results vocabulary: /.
Example: /GPKB-REST/rest/resources/features

2) GET specific feature details

Resource URI: /GPKB-REST/rest/resources/features/{feature_name}
Resource Info:
By sending an HTTP GET to /GPKB-REST/rest/resources/features/{feature_name} the service will respond with an XML representation containing the details of the specific feature selected: tables structure, attributes, data types and relationships with other features.

Required parameters:

Parameter	Description
feature_name	Name of the feature of interest. E.g.: gene, dna_sequence, enzyme and so on.

Returned results vocabulary:

Returned tag	Description
attribute_groups	The list of all the tables regarding the feature
values	The following tag, inside the attribute tag for each attribute group, is not empty only for encoded fields, namely for field that can assume a predefined range of values. For encoded fields that can assume a small range of values the value is directly contained in the value tag. Otherwise, if the range of values is too broad, it is present a link to be queried to get further details about the specific encoded field values.
value	The following tag, inside the values tag can contain an hyperlink to access the full list of values if the amount of values exceeds a certain threshold, otherwise will contain the list of values for the specific encoded field. An encoded field value has the following structure: an id that represents the identifier for the encoded field value, a name that represents the actual value for the field and a count that represents the total number of elements assuming that specific value. Both id and name can be used, alternatively or together, in the public data queries to specify filter options for the encoded field.
associated_features	It contains the list of associations in which the feature is involved. For each association, in the association_basic_info tag there are some basic information regarding the association; in particular, the tag name contains the name of the actual table that stores the information about the association in the data warehouse.
source	It contains the name of the source/s that provide the information about the feature.

Example: /GPKB-REST/rest/resources/features/gene

3) GET specific feature association details

Resource URI: /GPKB-REST/rest/resources/features/{feature_name}/assoc/{feature_name}/{associated_feature_name}
Resource Info:
It returns the details regarding the selected pairwise feature association. Namely all the tables, attributes and data types regarding the first feature involved in the association, the second feature involved in the association and their association.

Required parameters:

Parameter	Description
feature_name	Name of the feature of interest. E.g.: gene, dna_sequence, enzyme and so on.
feature_name	The second path parameter represents the name of the first feature involved in the association.
associated_feature_name	Name of the second feature involved in the association.

Returned results vocabulary:

Returned tag	Description
attribute_groups	Inside the association portion, it represents the list of all the tables regarding the feature association
values	The following tag, inside the attribute tag for each attribute group, is not empty only for encoded fields, namely for field that can assume a predefined range of values. For encoded fields that can assume a small range of values the value is directly contained in the value tag. Otherwise, if the range of values is too broad, it is present a link to be queried to get further details about the specific encoded field values.
value	The following tag, inside the values tag can contain an hyperlink to access the full list of values if the amount of values exceeds a certain threshold, otherwise will contain the list of values for the specific encoded field. An encoded field value has the following structure: an id that represents the identifier for the encoded field value, a name that represents the actual value for the field and a count that represents the total number of .... Both id and name can be used in the public data queries to specify filter options for the encoded field.
associated_features	It contains the list of associations in which the feature is involved. For each association, in the association_basic_info tag there are some basic information regarding the association; in particular, the tag name contains the name of the actual table that stores the information about the association in the data warehouse.
source	It contains the name of the source/s that provide the information about the feature association.

Example: /GPKB-REST/rest/resources/features/gene/assoc/gene/protein

4) GET specific encoded field values list

Resource URI: /GPKB-REST/rest/resources/features/encoded/{table_name}/{encoded_field_name}
Resource Info:
The ad-hoc service will respond with an XML representation containing the total number of values assumed by the encoded field, plus all the links to get the full list of values.

Required parameters:

Parameter	Description
table_name	The name of the table that contains the field.
encoded_field_name	The name of the encoded field.

Returned results vocabulary:

Returned tag	Description
link_plain_search	Link to query in order to discover, in a sequential way, all the values for the specific field. To discover the values it is sufficient and necessary to specify a starting point, offset and a limit for the values returned limit. E.g.: a request to /GPKB-REST/rest/resources/features/encoded/expasy_enzyme/cofactor/limit/10/offset/0 will return the first 10 values for the encoded field "cofactor" of table "expasy_enzyme".
search_string_links	Link to query in order to discover, by means of a search string, all the values for the specific field. To discover the values it is sufficient to specify: the search string with the wildcard ({search_string}) in the desired position (startsWith, contains or endsWith) , a starting point, offset and a limit for the values returned limit. E.g.: a request to /GPKB-REST/rest/resources/features/encoded/expasy_enzyme/cofactor/search/startsWith/b/limit/10/offset/0 or its equivalent /GPKB-REST/rest/resources/features/encoded/expasy_enzyme/cofactor/search/startsWith/B/limit/10/offset/0 will return the first 10 values for "cofactor" field of table "expasy_enzyme" starting with letter b/B.

Returned tag

Description

link_plain_search

Link to query in order to discover, in a sequential way, all the values for the specific field. To discover the values it is sufficient and necessary to specify a starting point, offset and a limit for the values returned limit.
E.g.: a request to /GPKB-REST/rest/resources/features/encoded/expasy_enzyme/cofactor/limit/10/offset/0 will return the first 10 values for the encoded field "cofactor" of table "expasy_enzyme".

search_string_links

Link to query in order to discover, by means of a search string, all the values for the specific field. To discover the values it is sufficient to specify: the search string with the wildcard ({search_string}) in the desired position (startsWith, contains or endsWith) , a starting point, offset and a limit for the values returned limit.
E.g.: a request to /GPKB-REST/rest/resources/features/encoded/expasy_enzyme/cofactor/search/startsWith/b/limit/10/offset/0 or its equivalent /GPKB-REST/rest/resources/features/encoded/expasy_enzyme/cofactor/search/startsWith/B/limit/10/offset/0 will return the first 10 values for "cofactor" field of table "expasy_enzyme" starting with letter b/B.

Example: /GPKB-REST/rest/resources/features/encoded/gene/taxonomy_id

Public data API

(Go back to the Top)

The public data API, by means of a single parametric service, allows for the comprehensive querying of the data warehouse.

Content negotiation: XML, Application-XML.

The resource URI for the public API is /GPKB-REST/rest/resources/selections .
This api takes media type as XML and returns XML(application/xml). The header fields of Content-Type and Accept.

Content-Type:application/xml
Accept:application/xml

All the selections are to be posted in the request payload and all the selection posted in the XML have to conform the following XML Schema grammar:
Download public XSD, public XSD documentation and root element feature_details.

Vocabulary of XML Schema for selections

The following table describes only the most important tags. The explanation is omitted for the intuitive ones.

Xml Schema tag	Description
feature	By adding the following element and by specifying the proper feature name (in the name attribute) to the selections the user can choose the specific feature/s to include in the query. For each feature the user can then choose the tables to query (attribute_groups element) and for each table he/she can specify the attributes (attribute element) of interest. Finally the user can specify also filter options on these attributes, using the options available in the selected_options element. Furthermore in case of queries involving different instances of the same specific feature (the so called "alias queries") the user has to specify an alias for each feature involved so as to distinguish among the different instances of the features having the same name. The alias name has to conform the following syntax: "[feature_name]_[counter]".
feature_association	By adding the following element and by specifying the proper feature association name (in the name attribute) to the selections the user can choose the specific feature association/s to include in the query. For each association the user can then choose the tables to query (attribute_groups element) and for each table he/she can specify the attributes (attribute element) of interest. Finally the user can specify also filter options on these attributes, using the options available in the selected_options element. Furthermore in case of queries involving different instances of the same specific feature association(the so called "alias queries") the user has to specify an alias for each feature association involved so as to distinguish among the different instances of the associations having the same name. The alias name has to conform the following syntax: "[feature1_name]_[counter]TO[feature2_name]_[counter]" or "[feature1_name]_[counter]to[feature2_name]_[counter]"
queries_general_options	The following element allows to specify general options regarding the query. E.g: DISTINCT, LIMIT, OFFSET, ORDER BY. Furthermore the element only_matching, when set to "TRUE" allows to return only INNER JOINS between all the selected tables. Instead, the counting element allows to choose between an exact versus an estimated total count as concerns the global number of rows returned as resultSet for the query. Anyway, for counting queries that have a cost higher than a pre-defined threshold, even though the user chose an exact count the service will return an approximate count in order not to negatively impact on the response time.
alias_general_options	The following element allows to specify advanced options regarding the joining of tables, in case of alias queries. In particular, by choosing the join tables and join attributes of interest, it is possible to join tables that aren't directly joining with each other, by means of adding WHERE clauses to the generated SQL statement. < >

Returned results grammar

The XML returned by the service, containing the results of the query together with some metadata regarding the results conforms to the following XML Schema grammar

Download result XSD, schema documentation and root element result.

Vocabulary of XML Schema for showing results

The following table describes only the most important tags. The explanation is omitted for the intuitive ones.

Xml Schema tag	Description
entries_total_count	This element keeps track of the total number of rows that are returned by the query. N.B. Unless expressed otherwise, the count returned will always be exact. But there are cases in which even if the user selected an exact count, the service will return an approximate value for the count; namely if the exact total count query exceeds a predefined response time threshold. This value is extracted from the information returned after launching a SQL EXPLAIN query.
entries_showed_count	This element shows the number of rows that are actually returned by the query, according to the limit option specified at selection time.
attributes_group_names_list	This element contains the metadata regarding the results returned, namely: the columns selected, "attribute" tag, and the belonging table names/table aliases.
rows_group	This element contains the collection of rows that represent the actual result of the query. In the resultSet, for each row, is present the row number and the list of tables involved each of them containing the results data of the query.

Scenario

In this scenario, the extraction of the gene ontology (biological function feature) and their related genes is shown step by step.

First of all, the user has to check which features are available in the selected database version. The features are extracted from the link as defined in the get features list section.
The result contains all the features and their links to extract detailed information. In this scenario, the features "biological function feature" and "gene" are used, and the links to get the details of these features are:
- rest/resources/features/biological_function_feature
- rest/resources/features/gene
Note: The database selection is available over query string of URL, with parameters db-handle. It is defined in the Database Selection section.
From the biological_function_feature details result, the feature tables' (in the attribute group tag) source id, source name, name and description are the starting point for our query. All features in the databases has the couple "{feature}_id" and "{feature}_source" defines an element in an external datasource e.g. biological_function_feature_id and biological_function_feature_source (in the older version these columns are source_di and source_name, respectively).
The feature ID the is the original data source ID and feature source is the name of the source for the feature element.
The feature tag is added into XML with name attribute "biological_function_feature" and the attribute group element is compiled with the name biological_function_feature and the type feature table as inner element to the feature tag.
NO-JAVASCRIPT-ERROR
where the attributes are:
NO-JAVASCRIPT-ERROR
The full query is in the example_1a.xml and this query XML can be run from the link /GPKB-REST/rest/resources/selections as POST request. Please check Public data API section for further information and header setting should be done as in the section.
Note: The attributes from the result of the metadata feature details API are added as attribute inner element to the attribute groups. The XML for the public API is compatible with XML result of the feature details API. They are defined with the same XSD.
In this step, to query the information about the biological function feature whose name contains "DNA replication". New filtering option is added to previous example. A new sub "selected_options" element is added to the name attribute with.
NO-JAVASCRIPT-ERROR
The full query is in the example_1b.xml.
The queries general option is optional however counting set as "exact" to count exactly the total number of the rows from the query defined in the XML. And the result should be ordered by the column defined attribute groups. The information has been defined in the queries general option tag.
NO-JAVASCRIPT-ERROR
The full query is in the example_1c.xml.
Note: It is better select the first one exact the others estimated. The count query time should be only in the first run, and in the next call, it is better to run with estimated count.
From the result of the biological_function_feature feature details (first step), it has related associated enzyme, gene, pathway and etc. And the association links tag, the URL to get further information about the association.
For the further analysis, the genes that are associated to the gene ontology are added to the query. The gene feature is added to the query. The feature details are extracted from the link. In this example, source id, source name, name and symbol of the gene are selected and added to the query. And we filtered taxonomy as "Homo sapiens" but not return in the result by defining filter_only of attribute element.
NO-JAVASCRIPT-ERROR
This condition is defined as the request XML in the example2.xml.
Note: The possible taxonomy id are found in the link below. This link has been extracted from the details query of the gene feature. rest/resources/features/encoded/gene/taxonomy_id and from this link user retrieve the taxonomies starts with "homo" from the link: rest/resources/features/encoded/gene/taxonomy_id/search/startsWith/homo/limit/5/offset/0
The analysis is continued with extraction of evidence and qualifier of the gene and gene ontology association. In order to find, these attributes defined in the data warehouse, the gene and biological_function_feature association details is checked from the link. These attributes available in the "pub_ref_4_gene2biological_function_feature" attribute group.
NO-JAVASCRIPT-ERROR
It is available in the example3.xml.
The features which has ontology definition can be unfolded. The features with ontology defined in feature details with the element ontology true defined under the feature element. For the biological_function_feature defined as below in the link.
NO-JAVASCRIPT-ERROR
If the feature is ontology and it is possible to extend the query semantically. this can be defined in the feature as below:
NO-JAVASCRIPT-ERROR
Full query is available in the example4.xml.
Note: The feature descendant column is available with the column and the suffix "_descendant". It is also available the unfolded feature information of a single query. It is avaible as below and full query is available in the example5.xml.
NO-JAVASCRIPT-ERROR

GPKB

Genomic and Proteomic Knowledge Base - RESTful Web API

Welcome to the GPKB RESTful Web API Help Page!

API Structure

Note:

Database Selection

Metadata API

API outline

1) GET features list

2) GET specific feature details

3) GET specific feature association details

4) GET specific encoded field values list

Public data API

Vocabulary of XML Schema for selections

Returned results grammar

Vocabulary of XML Schema for showing results

Example queries for the public data

Example 1:

Scenario