Introduction
- Language Definition.
- Query examples.
- Querying biological networks.
Query language functionality is implemented as Query Window, from 'Tools'
menu, where users can type and run their queries. Results of user's queries
are immediately visualized in the main window. Pull-down menu of the query
window describes six different types of queries as examples.
To run these queries, start from applying annotations to your graph. To check
what kind of annotations are applied to your nodes/edges just right click
on them. If 'Node Attribute Browser' does not appear, you need to download yeast-context.jar.
Save it on your computer and upload through 'Plugins'->'Load plugins from
Jar File' submenu. Note, for your plugins to work, they should be loaded before
opening your graphs.
Apply your ontology annotations: GO, KEGG,.. . Click on the Ontology icon
of the menu and add levels you are going to search by your queries. In general
case of querying you need to apply all ontology levels.
After all annotations and ontologies are added you can start querying by clicking
on 'Query Engine' from 'Tools' menu.
Be aware of next:
The Query must EXACTLY satisfy the language syntax.
Be sure your terms are correctly spelled.
End your query with ';' and spell ontology terms dividing words by "_" like: "DNA_binding", "protein_serine/threonine_kinase", "general_transcriptional_repressor".
Note, queries searching through all GO tree (like descendants of highter ontology
levels binding, enzyme, trancription_regulator) execute longer than simple
queries (like children, parent, etc.).
Language Definition.
• //SELECTSELECTStatement
::= SELECT ("P_UNION" | "P_INTERSECTION" | "P_DIFFERENCE") SELECTSELECT
::= "SELECT" ("DISTINCT")? SELECTList FROMClause [WHEREClause]SELECTList
::= SELECTItem(","SELECTItem)*SELECTItem::=
(AggFunction"("ReturnType")")|(ReturnType)
•//FROMFROMClause
::= "FROM" ( Database(sb) "." DAG(sb) ( "AS" (f = )
()? )? )*Database::= DAG::=
•//WHEREWHEREClause
::= "WHERE" SQLExpressionSQLExpression::=SQLPrimaryExpression(sb)
(","SQLPrimaryExpression(sb))*SQLPrimaryExpression::=
(( "." ) | ) "="
(( "." ) | NodeFunctionExp |
("REG_EXP" "(\"" LinkType(sb) ) ) )
Query examples.
You can see the language syntax from the list of following
examples:
1.a
SELECT
path_bindings(a)
FROM
dag(x,y,z)
WHERE
a = idlist(1|2|3|999)
1.b
SELECT
path_bindings(a//b)
FROM
dag(x,y,z)
WHERE
a = idlist(1|3),
b = idList(16|)
2.a
SELECT
path_expression(a//b)
FROM
dag(x,y,z)
WHERE
a = idlist(1|3),
b = idList(16|)
2.b
SELECT
full_path_expression(a//b)
FROM
dag(x,y,z)
WHERE
a = idlist(1|3),
b = idList(16|)
3.a
SELECT
full_path_expression(a//b/c)
FROM
dag(x,y,z)
WHERE
a = nameEquals(nervous_system) union nameEquals(pns),
b = idlist(31|57|),
c=idlist(22|21)
3.b
SELECT
path_bindings(a//b/c)
FROM
dag(x,y,z)
WHERE
a = nameEquals(nervous_system) union nameEquals(pns),
b = idlist(31|57|),
c = idlist(22|21)
4
SELECT
full_path_expression(a//b/c//d)
FROM
dag(x,y,z)
WHERE
a = idlist(1|) difference idList(2|),
b = idlist(48|),
c = idlist(5|31|67|3),
d = idlist(12|)
5.a
SELECT
path_bindings(a//b, c//d)
FROM
dag(x,y,z)
WHERE
a = nameEquals(nervous_system),
b = idList(19|20|),
c = idlist(46|),
d = nameEquals(dendrite)
5.b
SELECT
path_expression(a//b), full_path_expression(c//d)
FROM
dag(x,y,z)
WHERE
a = nameEquals(nervous_system),
b = idList(19|20|),
c = idlist(46|),
d = nameEquals(dendrite)
5.c
SELECT
path_bindings(a{l1}//b{l2}//c//d{l3}/e//f, g//h{l4}//i), path_expression(j{l5}//k/l)
FROM
dag(database,username,passwd)
WHERE
a = idlist(12|13|14) intersect nameEquals(protein) union idList(4|),
l3= "isa",
b = idList(23|24|25)
6.
SELECT
full_path_expression(a{l1}//b)
FROM
dag(x,y,z)
WHERE
a = idlist(1|),
b = idlist(31|57|),
l1 = "isa"
7.
SELECT
path_expression(a{l1}//b{l2}/c)
FROM
dag(x,y,z)
WHERE
a = idlist(1|),
b = idlist(49|50|48|52|),
c = idlist(5|31|67|3),
l2 = "has"
8.
SELECT
full_path_expression(a{l1}//{l2}//{l3}b{l4}//d)
FROM
dag(x,y,z)
WHERE
a = idlist(23),
b = idList(49),
d = idlist(12),
l2 = "has"
,
l3 = "has"
l4 = "has"
Querying biological networks.
1) SELECT
path(a)
FROM
Yeast.go
WHERE
a = descendant(nameEquals$DNA_binding);
Select a subnetwork of genes whose GO annotations fall under descendants
of ‘DNA_binding’ term in GO hierarchy.
2) SELECT
path(a)
FROM
Yeast.go
WHERE
a = children (nameEquals$protein_kinase);
Select a subnetwork of genes whose GO annotations fall under children
of ‘protein_kinase’ term in GO hierarchy.
3) SELECT
path(a)
FROM
Yeast.go
WHERE
a = parent (nameEquals$acid_phosphatase);
Select a subnetwork of genes whose GO annotations fall under parent of
‘acid_phosphatase’ term in GO hierarchy.
4) SELECT
path_bindings(a//b)
FROM
Yeast.go
WHERE
a = children (nameEquals$protein_kinase);
b = parent (nameEquals$protein_threonine/tyrosine_kinase);
Select a binding (union) of two subnetworks one of genes whose GO annotations
fall under children of ‘protein_kinase’ term in GO hierarchy and another
whose GO annotations fall under parent of ‘protein_threonine/tyrosine_kinase’
term in GO hierarchy .
5) "Find colocalized proteins grouped by location".
SELECT p.location, set(p)
FROM yeastGraphDB G(N, E)
WHERE p:N and p.type = ’protein’
GROUP BY p.location;
Here, the from clause refers to a graph. Further, thanks to the grouping
condition, the output is a nested relation instead of a graph, where due
to the inner structuring element set, this query produces a set of tuples
(genepairs) for every binding of location.
6) In the next query, we use graph operations in the body of the query,
and the return data type is a graph. “Find networks of colocalized proteins
that are parts of some protein complex and are connected by either a 2-hybrid
(y2h) edge or a comimmunoprecipitation (coIP) edge.”
SELECT graph(N2(n.name, n.source), E2(e.label,e.source))
FROM yeastGraphDB G1(N, E)
WHERE n:N and c:N and e:E
and n.type << ’protein’
and c.type = ’protein complex’
and (e.label = ’y2h’ or e.label = ’coIP’)
and pathExpr(G1, c//[member of]n) = true
The query declares a variable c whose type is protein complex.
The query returns a graph, whose nodes n should be tuples with the attributes
name and source (i.e., data source), and whose edges e should have a
label and a source from which that edge was known.
Recall that the system will convert this to a query on a connector node.
The << operation specifies that the type of the node should be “under”
protein in the node type hierarchy
The last line should be read as “n has an edge whose label has the value
member, and this edge points to c”, where c is declared before. Note that
we did not mention the relationship between nodes n and edges e, namely,
an instance of the returned edge set e connects instances of the returned
node set n. This constraint, expressed as n.edge = e, is implied by the
construct of line 2, where n and e are constrained to be parts of the same
graph.
7) Finally, we present an example from systems biological analysis that
uses the graph-theoretic attributes. In this example, we take two subnetworks
b1 and b2 produced by two subqueries, each using an aggregate graph function.
For each network, the query computes the distribution of the betweenness centrality
of the nodes of the respective graphs, and then uses the F-test to compare
them.
WITH b1 AS (
SELECT distribution(betweenness centrality(*,0.05))
FROM yeastGraphDB G1(N, E)
WHERE n:N and e:E and
n.source IN (’Gavin-DB’, ’Ito-DB’, ’Tong-DB’)
and n.degree() > 2),
b2 AS (...)
SELECT F-test(b1, b2)
FROM b1, b2, stat-source;
Due to a heavy use of statistical operations, a number of statistical operations
have been packaged in a source called stat-source. The function between-centrality
produces a bag of values corresponding to the betweenness centrality of
all nodes satisfying the remaining constraints. The function distribution
takes a set of values and a bucket size and outputs a histogram, which
is known to the system as a basic statistical data type defined as a table
of 2-tuples {category, count} – here the category comes from the number
of distinct values of the centrality measure with a bucket size of 0.05.