PathSys Architecture

Introduction

  1. PathSys System Architecture
  2. Database Schema
  3. Representation of the Data
  4. Physical Storage Schema (DDL)

PathSyS System Architecture

PathSys system consists of six major parts: Client Side Application, Graph Query Engines, Integration Client/Manager, Data Importer, Schema Mapping Tools and Data Warehouse.

Client Application implements all business logic and a significant part of the user interface. Two novel Graph Query Engines store and query molecular interaction network and directed acyclic graphs such as ontologies and taxonomies, using specialized algorithms (see Chen et al. (2005) for example) customized for each kind of graph. Data Importer can accept data from the external data source, validate it against the schema, and store the data in the warehouse. Integration Client/Manager specifies a new database that needs to be integrated in the system. The user provides the system with the schema of the new source, and the schema is validated and stored in the Schema Library.

 

Database Schema

The PathSys database was designed to represent generic network data.
Current implementation defines three classes of vertices – primary nodes (primary objects), connector nodes(events of interaction or regulation between primary objects) and graph nodes (complex objects (protein complexes, cell processes) that might contain graphs). Connector nodes are identified by mechanism and effect type. Vertices are stored in the table Nodes. Nodes themselves can be of several types (Proteins, Small Molecules, Cell Processes, Expression Controls, Binding, Protein Modification etc.), which are recorded in the table NodeType.

The current implementation supports three classes of links between vertices– directed and non directed(as defined by the field Direction in the table Edges) and membership (as defined by the field Relation in the table Edges). Directed links can describe biological notion of regulation, such as “protein A activates protein B”; non-directed links are used to describe binding events: “protein A binds protein B”; membership links describe situations such as “protein complex P contains protein A”. The database structure does not limit the number of different classes of nodes and edges. Also, there can be any number of node types, attribute types.

The two component networks that we promote are a natural extension of the data models accepted for metabolic databases, and will provide a unifying ground for such diverse phenomena as signaling pathways, protein interaction maps, and gene expression networks.

Any node can have an arbitrary number of associated attributes. Node attributes usually store information about functional class, localization, chemical structure etc. These attribute, as well as names of proteins and other objects, are searchable fields. Attribute values can be translated into color or shape coding of nodes during visualization. Connector node attributes store information about tissues, cell types, experimental conditions, and other biologically meaningful details. All fields are searchable.

Attribute of an object can be assigned a single value, as well as multiple values. Different properties can form sets to describe complex attributes, such as experimental conditions that might include cell type, tissue, organism, species, p_values etc.

 

Representation of the Data

Database tables
Nodes	- Stores primary, connector and graph nodes
	ID 	- Internal node ID [auto number]
	URN	- Global node ID [string]
	Node_Type- Node type ID (Protein, Cell Process, Binding, etc.) [number]
	Node_Class- Node class ID (Node, Control) [number]
	isDeleted	   - “true” if node has been deleted
	DataSource- External database source name where the node comes from

Edges	- Stores links between nodes (primary, connector and graph nodes) 
	ID 	- Internal link ID [auto number]
	IDSrc 	- Internal ID of the source node [number]
	IDDst 	- Internal ID of the destination node [number]
	Direction 	- Link class (directed, non-directed)
	Relation	- Special relationship types(subgraph, member-of, ...)

Graph_Edges	- Stores information about graph edge membership, for performance 
		  purposes, generally for graph nodes (complexes) searches
	graph_ID	- Internal graph ID [auto number]
	Edge_ID	- Link ID (from Edges table) [number]

Graph_Nodes - Stores information about graph node membership, for performance 
		  purposes, generally for graph nodes (complexes) searches
	graph_ID	- Internal graph ID [auto number]
	Node_ID	- Node ID (from Node table) [number]

NodeType	- Stores node type information (Protein, Binding, etc.) as hierarchy
	ID		- Node type ID [auto number]
	Name		- Node type name [string]
	child_of		- child of the node type in the node type hierarchy [number]
	NodeType_SynonymsID- Node Type synonims ID [number]

NodeType_Synonyms - Stores information about Node Type synonyms
	ID		- Node type synonym ID [auto number]
	NodeType_Synonym - Node Type synonym [number]

Attributes	- Stores attributes associated with nodes
	ID		- Internal attribute value ID [auto number]
	NodeID		- ID of the corresponding Node [number]
	Attribute_Type_ID	- Attribute type ID (Name, Organism, Effect, etc.) [number]
	AttrOption	- Attribute option (multi-value, single value) [number]
	DataType 	- Data type of attribute [string]

DoubleValue - Stores values for dictionary-based properties
	ID	- Internal value ID [auto number]
	Value	- Property value 
	NodeID	- Corresponding Node ID. For performance purposes only.
		 It reduces the number of joins, although the information is duplicated.

IntegerValue - Stores values for dictionary-based properties
	ID	- Internal value ID [auto number]
	Value	- Property value
	NodeID	- Corresponding Node ID. For performance purposes only.
		 It reduces the number of joins, although the information is duplicated.

FloatValue - Stores values for dictionary-based properties
	ID	- Internal value ID [auto number]
	Value	- Property value
	NodeID	- Corresponding Node ID. For performance purposes only.
		 It reduces the number of joins, although the information is duplicated.

StringValue	- Stores values for dictionary-based properties
	ID	- Internal value ID [auto number]
	Value	- Property value
	NodeID	- Corresponding Node ID. For performance purposes only.
		 It reduces the number of joins, although the information is duplicated.

LongStingValue - Stores values for dictionary-based properties
	ID	- Internal value ID [auto number]
	Value	- Property value
	NodeID	- Corresponding Node ID. For performance purposes only.
		 It reduces the number of joins, although the information is duplicated.
	
Attribute_Type	- Stores attribute type information (Name, Organism, Effect, etc.)
	ID		- Attribute type ID [auto number]
	AttributeName	- Attribute type name [string]
	
Attribute_Type_Category	- Stores attribute category hierarchy
	ID		- Internal attribute category ID [auto number]
	Attribute_Type_Name- Attribute type name 
	Relation		- Special relationship types (subgraph, member-of, ...)
	child_of		- child of the node type in the attribute category hierarchy [number]

AT_NT	- Stores the attribute type and node type relationships, for performance purposes only
	Attribute_Type	- Attribute type ID [number]
	Node_Type	- Node Type ID [number]

Data_Source - Stores the information about different data sources 
	ID		- Internal Data Source ID [auto number]
	IDName		- Data Source Name [string]
	FullName		- Full name of data source [string]
	Last_Depos_Time	- Last time data was deposited
	Description	- Data source description [string]
	Reference	- Reference to data source [string]
	UserID		- Corresponding user's ID [number]

User	- Stores the information about different users, allowed to store, update or edit 
	 database information
	ID	- Internal User ID [auto number]
	Name	- User Name [string]
	Institute	- User's institution name [string]
	email	- User's e-mail [string]

Physical Storage Schema (DDL)

CREATE TABLE NODES
(
ID NUMBER(8) NOT NULL PRIMARY KEY,
URN VARCHAR2(50) NOT NULL,
Node_Type NUMBER(5) NOT NULL,
Node_Class CHAR NOT NULL,
isDeleted boolean,
Data_Source NUMBER(5),
CONSTRAINT Node_Type_FKEY
	FOREIGN KEY (Node_Type) REFERENCES NODETYPE (ID),
CONSTRAINT Data_Source_FKEY
	FOREIGN KEY (Data_Source) REFERENCES DATA_SOURCE (ID)
)
;
CREATE INDEX Node_ID_Ind ON NODES(ID);
CREATE INDEX Node_Type_Ind ON NODES(Node_Type);
CREATE INDEX Node_Calss_Ind ON NODES(Node_Class);



CREATE TABLE EDGES
(
ID NUMBER(10) NOT NULL PRIMARY KEY,
IDSrc NUMBER(8) NOT NUL,
IDDst NUMBER(8) NOT NULL,
Direction CHAR NOT NULL,
Relation VARCHAR2(50)
CONSTRAINT IDSrc_FKEY
	FOREIGN KEY (IDSrc) REFERENCES NODES (ID),
CONSTRAINT IDDst_FKEY
	FOREIGN KEY (IDDst) REFERENCES NODES(ID),
)
;
CREATE INDEX Edges_Ind ON EDGES(ID);
CREATE INDEX Edges_Ind ON EDGES(IDSrc);
CREATE INDEX Edges_Ind ON EDGES(IDDst);
CREATE INDEX Edges_Ind ON EDGES(Direction);



GRAPH_EDGES
(
graph_ID NUMBER(5) NOT NULL,
Edge_ID NUMBER(10) NOT NULL
CONSTRAINT EDGEID_FKEY
	FOREIGN KEY (EDGE_ID) REFERENCES EDGES (ID)
)
;
CREATE INDEX graph_nodes_Ind1 ON GRAPH_EDGES(graph_ID);
CREATE INDEX graph_edge_Ind1 ON GRAPH_EDGES(Edge_ID);



GRAPH_NODES
(
graph_ID NUMBER(5) NOT NULL,
Node_ID NUMBER(8) NOT NULL
CONSTRAINT NODEID_FKEY
	FOREIGN KEY (NODE_ID) REFERENCES NODES (ID)
)
;
CREATE INDEX graph_nodes_Ind2 ON GRAPH_NODES(graph_ID);
CREATE INDEX graph_edge_Ind2 ON GRAPH_NODES(Node_ID);



CREATE TABLE NODE_TYPE
(
ID NUMBER(5) NOT NULL PRIMARY KEY,
Name VARCHAR2(50) NOT NUL,
child_of NUMBER(5),
NodeType_SynonymsID NUMBER(10)
CONSTRAINT child_of_FKEY2
	FOREIGN KEY (child_of) REFERENCES NODE_TYPE (ID),
CONSTRAINT SynonimsID_of_FKEY2
	FOREIGN KEY (NodeType_SynonymsID) REFERENCES NodeType_SYNONIMS (ID),
)
;
CREATE INDEX Edges_Ind ON NODE_TYPE(ID);
CREATE INDEX Edges_Ind ON NODE_TYPE(child_of);



CREATE NodeType_SYNONIMS
(
ID NUMBER(10) NOT NULL,
NodeType_Synonym VARCHAR2(50)
)
;
CREATE INDEX NodeType_Syn_Ind ON NodeType_SYNONIMS(ID);



CREATE TABLE ATTRIBUTES
(
ID NUMBER(20) NOT NULL PRIMARY KEY,
NodeID NUMBER(8) NOT NULL,
Attribute_Type_ID NUMBER(5) NOT NULL,
attrOption NUMBER(3) NOT NULL,
DataType VARCHAR2(20)
CONSTRAINT NodeID_FKEY
	FOREIGN KEY (NodeID) REFERENCES NODES (ID),
CONSTRAINT Attribute_Type_ID_FKEY
	FOREIGN KEY (Attribute_Type_ID) REFERENCES ATTRIBUTE_TYPE (ID),
CONSTRAINT Node_Type_ID_FKEY
	FOREIGN KEY (Node_Type_ID) REFERENCES NODETYPE (ID),
)
;
CREATE INDEX Attributes_Ind1 ON ATTRIBUTES(ID);
CREATE INDEX Attributes_Ind2 ON ATTRIBUTES(NodeID);



CREATE TABLE DOUBLEVALUE
(
ID NUMBER(8) NOT NULL PRIMARY KEY,
VALUE NUMBER(10),
NodeID NUMBER(8)

CONSTRAINT DOUBLEVALUE_FKEY
	FOREIGN KEY (ID) REFERENCES ATTRIBUTES (ID),
CONSTRAINT NODEID_FKEY
	FOREIGN KEY (NodeID) REFERENCES NODES (ID)
)
;
CREATE INDEX SETVALUE_Ind1 ON DOUBLEVALUE(ID);
CREATE INDEX NODE_Ind1 ON DOUBLEVALUE(NodeID);



CREATE TABLE INTEGERVALUE
(
ID NUMBER(8) NOT NULL PRIMARY KEY,
VALUE NUMBER(5),
NodeID NUMBER(8)

CONSTRAINT INTEGERVALUE_FKEY
	FOREIGN KEY (ID) REFERENCES ATTRIBUTES (ID),
CONSTRAINT NODEID_FKEY
	FOREIGN KEY (NodeID) REFERENCES NODES (ID)
)
;
CREATE INDEX INTEGERVALUE_Ind ON INTEGERVALUE(ID);
CREATE INDEX NODE_Ind2 ON INTEGERVALUE(NodeID);


CREATE TABLE FLOATVALUE
(
ID NUMBER(8) NOT NULL PRIMARY KEY,
VALUE NUMBER(8,3),
NodeID NUMBER(8)

CONSTRAINT FLOATVALUE_FKEY
	FOREIGN KEY (ID) REFERENCES ATTRIBUTES (ID),
CONSTRAINT NODEID_FKEY
	FOREIGN KEY (NodeID) REFERENCES NODES (ID)
)
;
CREATE INDEX FLOATVALUE_Ind ON FLOATVALUE(ID);
CREATE INDEX NODE_Ind3 ON FLOATVALUE(NodeID);


CREATE TABLE STRINGVALUE
(
ID NUMBER(8) NOT NULL PRIMARY KEY,
VALUE VARCHAR2(50),
NodeID NUMBER(8)

CONSTRAINT STRINGVALUE_FKEY
	FOREIGN KEY (ID) REFERENCES ATTRIBUTES (ID),
CONSTRAINT NODEID_FKEY
	FOREIGN KEY (NodeID) REFERENCES NODES (ID)
)
;
CREATE INDEX STRINGVALUE_Ind ON STRINGVALUE(ID);
CREATE INDEX NODE_Ind4 ON STRINGVALUE(NodeID);


CREATE TABLE LONGSTRINGVALUE
(
ID NUMBER(8) NOT NULL PRIMARY KEY,
VALUE VARCHAR2(1000),
NodeID NUMBER(8)

CONSTRAINT LONGSTRINGVALUE_FKEY
	FOREIGN KEY (ID) REFERENCES ATTRIBUTES (ID),
CONSTRAINT NODEID_FKEY
	FOREIGN KEY (NodeID) REFERENCES NODES (ID)
)
;
CREATE INDEX LONGSTRINGVALUE_Ind ON STRINGVALUE(ID);
CREATE INDEX NODE_Ind5 ON LONGSTRINGVALUE(NodeID);


CREATE TABLE ATTRIBUTE_TYPE
(
ID NUMBER(5) NOT NULL PRIMARY KEY,
AttributeName VARCHAR2(30) NOT NULL,
CONSTRAINT ATTRIBUTE_TYPE_FKEY
	FOREIGN KEY (ID) REFERENCES ATTRIBUTE_TYPE_CATEGORY (ID)
)
;
CREATE INDEX ATTRIBUTE_TYPE_Ind ON ATTRIBUTE_TYPE(ID);



CREATE TABLE ATTRIBUTE_TYPE_CATEGORY
(
ID NUMBER(5) NOT NULL PRIMARY KEY,
Attrubute_Type_Name VARCHAR2(30) NOT NULL,
Relation VARCHAR2 (50),
child_of NUMBER(5)
CONSTRAINT attrchild_of_FKEY
	FOREIGN KEY (child_of) REFERENCES ATTRIBUTE_TYPE_CATEGORY (ID)
)
;
CREATE INDEX ATTRIBUTE_TYPE_CATEGORY_Ind ON ATTRIBUTE_TYPE_CATEGORY(ID);
CREATE INDEX ATTRIBUTE_TYPE_CATEGORY_child_Ind ON ATTRIBUTE_TYPE_CATEGORY(child_of);



AT_NT
(
ATTRIBUTE_TYPE NUMBER(5) NOT NULL,
NODE_TYPE NUMBER(5) NOT NULL

CONSTRAINT ATTRIBUTE_TYPE_of_FKEY
	FOREIGN KEY (ATTRIBUTE_TYPE) REFERENCES ATTRIBUTE_TYPE (ID),
CONSTRAINT NODE_TYPE_of_FKEY
	FOREIGN KEY (NODE_TYPE) REFERENCES NODE_TYPE (ID),
)
;
CREATE INDEX ATTRIBUTE_TYPE__Ind ON AT_NT(ATTRIBUTE_TYPE);
CREATE INDEX NODE_TYPE__Ind ON AT_NT(NODE_TYPE);



CREATE TABLE DATA_SOURCE
(
ID NUMBER(5) NOT NULL PRIMARY KEY,
IDName Varchar2(20) NOT NULL,
FullName varchar2(30) NOT NULL,
Last_Depos_Time Date,
Description varchar2(2000),
Reference varchar2(50),
UserID NUMBER(5),
CONSTRAINT User_ID_FKEY
	FOREIGN KEY (User_ID) REFERENCES USER (ID)
)
;



CREATE TABLE User
(
ID NUMBER(5) NOT NULL PRIMARY KEY,
Name Varchar2(20) NOT NULL,
Institute varchar2(20),
email varchar2(30)
)
;

Biological Articles: