Query Language

Introduction

  1. Language Definition.
  2. Query examples.
  3. Querying biological networks.

Query language functionality is implemented as Query Window, from ‘Tools’ menu, where users can type and run their queries. Results of user’s queries are immediately visualized in the main window. Pull-down menu of the query window describes six different types of queries as examples.

To run these queries, start from applying annotations to your graph. To check what kind of annotations are applied to your nodes/edges just right click on them. If ‘Node Attribute Browser’ does not appear, you need to download yeast-context.jar. Save it on your computer and upload through ‘Plugins’->’Load plugins from Jar File’ submenu. Note, for your plugins to work, they should be loaded before opening your graphs.

Apply your ontology annotations: GO, KEGG,.. . Click on the Ontology icon of the menu and add levels you are going to search by your queries. In general case of querying you need to apply all ontology levels.

After all annotations and ontologies are added you can start querying by clicking on ‘Query Engine’ from ‘Tools’ menu.

Be aware of next:
The Query must EXACTLY satisfy the language syntax.
Be sure your terms are correctly spelled.
End your query with ‘;’ and spell ontology terms dividing words by “_” like: “DNA_binding”, “protein_serine/threonine_kinase”, “general_transcriptional_repressor”.
Note, queries searching through all GO tree (like descendants of highter ontology levels binding, enzyme, trancription_regulator) execute longer than simple queries (like children, parent, etc.).

Language Definition.

• //SELECTSELECTStatement
	::= SELECT ("P_UNION" | "P_INTERSECTION" | "P_DIFFERENCE") SELECTSELECT 
	::= "SELECT" ("DISTINCT")? SELECTList FROMClause [WHEREClause]SELECTList	
	::= SELECTItem(","SELECTItem)*SELECTItem::=
	(AggFunction"("ReturnType")")|(ReturnType)

•//FROMFROMClause
	::= "FROM" ( Database(sb) "." DAG(sb) ( "AS" (f = )
	()? )? )*Database::= DAG::=

•//WHEREWHEREClause
	::= "WHERE" SQLExpressionSQLExpression::=SQLPrimaryExpression(sb) 
	(","SQLPrimaryExpression(sb))*SQLPrimaryExpression::=
	((  "."  ) |  ) "="
	((  "."  ) | NodeFunctionExp | 
	("REG_EXP" "(\"" LinkType(sb) ) ) )

Query examples.

You can see the language syntax from the list of following examples:

1.a

SELECT 
	path_bindings(a)
FROM 
	dag(x,y,z) 
WHERE 
	a = idlist(1|2|3|999)
1.b

SELECT 
	path_bindings(a//b)
FROM 
	dag(x,y,z) 
WHERE 
	a = idlist(1|3),
	b = idList(16|)
2.a

SELECT 
	path_expression(a//b)
FROM 
	dag(x,y,z) 
WHERE 
	a = idlist(1|3),
	b = idList(16|)
2.b

SELECT 
	full_path_expression(a//b)
FROM 
	dag(x,y,z) 
WHERE 
	a = idlist(1|3),
	b = idList(16|)
3.a

SELECT 
	full_path_expression(a//b/c)
FROM 
	dag(x,y,z) 
WHERE 
	a = nameEquals(nervous_system) union nameEquals(pns), 
	b = idlist(31|57|), 
	c=idlist(22|21)
3.b

SELECT 
	path_bindings(a//b/c)
FROM 
	dag(x,y,z) 
WHERE 
	a = nameEquals(nervous_system) union nameEquals(pns), 
	b = idlist(31|57|), 
	c = idlist(22|21)
4

SELECT 
	full_path_expression(a//b/c//d)
FROM 
	dag(x,y,z) 
WHERE 
	a = idlist(1|) difference idList(2|), 
	b = idlist(48|), 
	c = idlist(5|31|67|3),
	d = idlist(12|)
5.a

SELECT 
	path_bindings(a//b, c//d)
FROM 
	dag(x,y,z) 
WHERE 
	a = nameEquals(nervous_system), 
	b = idList(19|20|), 
	c = idlist(46|),
	d = nameEquals(dendrite)
5.b

SELECT 
	path_expression(a//b), full_path_expression(c//d)
FROM 
	dag(x,y,z) 
WHERE 
	a = nameEquals(nervous_system), 
	b = idList(19|20|), 
	c = idlist(46|),
	d = nameEquals(dendrite)
5.c

SELECT 
	path_bindings(a{l1}//b{l2}//c//d{l3}/e//f, g//h{l4}//i), path_expression(j{l5}//k/l) 
FROM 
	dag(database,username,passwd) 
WHERE 
	a = idlist(12|13|14) intersect nameEquals(protein) union idList(4|), 
	l3= "isa", 
	b = idList(23|24|25)
6.

SELECT
	full_path_expression(a{l1}//b)
FROM 
	dag(x,y,z) 
WHERE 
	a = idlist(1|), 
	b = idlist(31|57|), 	
	l1 = "isa"	
7.

SELECT 
	path_expression(a{l1}//b{l2}/c)
FROM 
	dag(x,y,z) 
WHERE 
	a = idlist(1|), 
	b = idlist(49|50|48|52|), 
	c = idlist(5|31|67|3),
	l2 = "has"
8.

SELECT 
	full_path_expression(a{l1}//{l2}//{l3}b{l4}//d)
FROM 
	dag(x,y,z) 
WHERE 
	a = idlist(23),
    b = idList(49),
	d = idlist(12),
    l2 = "has"
    ,
    l3 = "has"
    l4 = "has"

Querying biological networks.

 

  • 1) SELECT
    	path(a)
    FROM
    	Yeast.go
    WHERE
    a = descendant(nameEquals$DNA_binding);
    

    Select a subnetwork of genes whose GO annotations fall under descendants of ‘DNA_binding’ term in GO hierarchy.

  • 2) SELECT
    	path(a)
    FROM
    	Yeast.go
    WHERE
    a = children (nameEquals$protein_kinase);
    

    Select a subnetwork of genes whose GO annotations fall under children of ‘protein_kinase’ term in GO hierarchy.

  • 3) SELECT
    	path(a)
    FROM
    	Yeast.go
    WHERE
    a = parent (nameEquals$acid_phosphatase);
    

    Select a subnetwork of genes whose GO annotations fall under parent of ‘acid_phosphatase’ term in GO hierarchy.

  • 4) SELECT
    	path_bindings(a//b)
    FROM
    	Yeast.go
    WHERE
    a = children (nameEquals$protein_kinase);
    b = parent (nameEquals$protein_threonine/tyrosine_kinase);
    

    Select a binding (union) of two subnetworks one of genes whose GO annotations fall under children of ‘protein_kinase’ term in GO hierarchy and another whose GO annotations fall under parent of ‘protein_threonine/tyrosine_kinase’ term in GO hierarchy .

  • 5) “Find colocalized proteins grouped by location”.
    SELECT p.location, set(p)
    FROM yeastGraphDB G(N, E)
    WHERE p:N and p.type = ’protein’
    GROUP BY p.location;
    

    Here, the from clause refers to a graph. Further, thanks to the grouping condition, the output is a nested relation instead of a graph, where due to the inner structuring element set, this query produces a set of tuples (genepairs) for every binding of location.

  • 6) In the next query, we use graph operations in the body of the query, and the return data type is a graph. “Find networks of colocalized proteins that are parts of some protein complex and are connected by either a 2-hybrid (y2h) edge or a comimmunoprecipitation (coIP) edge.”
    SELECT graph(N2(n.name, n.source), E2(e.label,e.source))
    FROM yeastGraphDB G1(N, E)
    WHERE n:N and c:N and e:E
    and n.type << ’protein’
    and c.type = ’protein complex’
    and (e.label = ’y2h’ or e.label = ’coIP’)
    and pathExpr(G1, c//[member of]n) = true
    

    The query declares a variable c whose type is protein complex.
    The query returns a graph, whose nodes n should be tuples with the attributes name and source (i.e., data source), and whose edges e should have a label and a source from which that edge was known.
    Recall that the system will convert this to a query on a connector node.
    The << operation specifies that the type of the node should be “under” protein in the node type hierarchy
    The last line should be read as “n has an edge whose label has the value member, and this edge points to c”, where c is declared before. Note that we did not mention the relationship between nodes n and edges e, namely, an instance of the returned edge set e connects instances of the returned node set n. This constraint, expressed as n.edge = e, is implied by the construct of line 2, where n and e are constrained to be parts of the same graph.

  • 7) Finally, we present an example from systems biological analysis that uses the graph-theoretic attributes. In this example, we take two subnetworks b1 and b2 produced by two subqueries, each using an aggregate graph function. For each network, the query computes the distribution of the betweenness centrality of the nodes of the respective graphs, and then uses the F-test to compare them.
    WITH b1 AS (
    SELECT distribution(betweenness centrality(*,0.05))
    FROM yeastGraphDB G1(N, E)
    WHERE n:N and e:E and
    n.source IN (’Gavin-DB’, ’Ito-DB’, ’Tong-DB’)
    and n.degree() > 2),
    b2 AS (...)
    SELECT F-test(b1, b2)
    FROM b1, b2, stat-source;
    

    Due to a heavy use of statistical operations, a number of statistical operations have been packaged in a source called stat-source. The function between-centrality produces a bag of values corresponding to the betweenness centrality of all nodes satisfying the remaining constraints. The function distribution takes a set of values and a bucket size and outputs a histogram, which is known to the system as a basic statistical data type defined as a table of 2-tuples {category, count} – here the category comes from the number of distinct values of the centrality measure with a bucket size of 0.05.

 

Biological Articles: