BiologicalNetworks is a program developed by SDSC at UCSD which introduces a research environment for inferring cellular molecular mechanisms and elucidation of factors making interrelated impact on different levels of organism including genes, biomolecules, cells, and cell systems.
The program uses over 100 public databases of thousands of eukaryotic, prokaryotic and viral genomes integrated in the IntegromeDB which is a back-end database to BiologicalNetworks.
BiologicalNetworks provides a “one stop shop” experience for researchers by providing them with information needed to decipher gene regulatory networks, sequence andexperimental data, functional annotation, orthology relations, and transcriptional regulatory regions analysis.
The program is developed through multiple driving projects, about 10 of which are embedded into the program and even more are available on the BiologicalNetworks website. Driving Projects are research projects that are selected for their scientific merit in answering important biological questions and that represent a broad range of research endeavors, advancing their disciplines. Driving Projects stimulate the BiologicalNetworks project to improve its technologies and provide feedback on our work.How to become a Driving Project.
The sophisticated querying capabilities of BiologicalNetworks allow users to formulate queries with virtually any combination of properties (name/synonym, function, sequence, expression, etc.) and to condition any combination of entities (gene/protein, promoter, COG, pathway, etc.) and/or relations (interactions, co-expression, co-citations, etc.). This can be combined with the build-pathways infrastructure for molecular interactions, relationships and modules discovery from high-throughput experiments.
Integrated Genome Viewer allows the user to search and analyze gene regulatory regions, transcription factor binding sites and their conservation in multiple species in conjunction with molecular pathways/networks, experimental data and functional annotations.
BiologicalNetworks uses a specialized graph visualization engine to exemplify biological pathways, gene regulation networks and protein-protein interaction maps for intuitive exploration and prediction. The software can handle a multitude of different tasks, including graphic drawing and layout optimization, data filtering and pathway expansion, and classification and prioritization of proteins, etc.
The new software uses a proprietary file format (BNX or BiologicalNetworks model XML format) which stores information pertaining to the model and the corresponding simulation environment in BiologicalNetworks. It also allows the user to import and export of models from the System Biology Mark-up Language (SBML), SIF and GML file formats.
The following is a summary of the functionality of BiologicalNetworks:
BiologicalNetworks is a freely distributed Java web-based application.
BiologicalNetworks is a multi-platform software that has been developed in Java. It has been tested on Windows and Macintosh platforms.
- Biologist-friendly User Interface (UI)
The user interface has been designed to be as friendly as possible for the biologist. The UI provides adequate tools for cell model building while abstracting certain underlying mechanism from the users.
- Database Integration
The BiologicalNetworks platform is integrated with IntegromeDB integrated database system compiled from over 100 databases and ontological data sources.
- Database Querying
The BiologicalNetworks platform provides extended querying functionalities to specify and retrieve biologically meaningful networks.
- Network Analysis
BiologicalNetworks is able to provide users the network statistics for the models, which are being loaded. In addition, finding conserved pathways between pathways is another main function in analysis.
- Genomic Sequences Analysis
BiologicalNetworks provides an integrated environment to work with genomic sequences and regulatory regions in context of biological pathways and gene regulatory networks. It provides extended functionalities for searching regulatory regions and other sequences and comparative genomics analysis.
- 3D protein structure Analysis
BiologicalNetworks provides an integrated environment to work with 3D protein structure data in context of biological pathways and gene regulatory networks.
- SQL-like querying language
The system is equipped with novel query engine with built in SQL-like querying language, allowing paths, trees, graphs operations.
This tutorial will guide the users through BiologicalNetworks by introducing various features and functions in greater details.
Chapter 2 describes how to download and install BiologicalNetworks. The existing BiologicalNetworks users may skip to Chapter 3 directly.
Chapter 3 summarizes the accessibility of multiple features included in BiologicalNetworks.
Chapter 4 highlights all details with regards to the GUI and introduces users to various functionality of BiologicalNetworks.
In Chapter 5 biologically relevant examples are presented, which provides step-by-step guide to construction of models using BiologicalNetworks.
Chapter 6 explains Data Management of BiologicalNetworks.
Chapter 7 then provides more detail on working with Microarray data.
Finally, Chapter 8 demonstrates how to work with Gene interactions and relationships.
2. Getting Started
2.1 System Requirements
In order to run BiologicalNetworks successfully, your computer must meet the following minimum system requirements:
2.2 Download Additional Data Files.
- For making Microarray, 3D protein structure, Functional Data analysis you can download example data files.
Download the ExampleData.ZIP file containing Stanford (tab delimited), Affymetrix, TIGR, GenePix microarray data, GeneOntology data files and PDB 3D structures of proteins from the BiologicalNetworks website (http://BiologicalNetworks.net/ExampleData.zip). Before you download a file, notice that its byte size is provided on the download page. Once the download has completed, check that you have downloaded the full, uncorrupted data file.
- Zip file also contains GO annotation files, to make GeneOntology annotation analysis.
- Unzip the file anywhere into your hard drive. Now you can load these data files into BiologicalNetworks analysis environment.
If you have problems with running BiologicalNetworks, try to disable your firewall program and launch BiologicalNetworks again, if this functions as a solution, you will need to change your firewall settings in order to allow Sun Java to connect to the Internet.
To enable Sun Java access to Internet in Norton Internet Security you should follow these steps:
- Open NIS.
- Click on Status & Setting menu item on the left.
- Click on Personal Firewall.
- Press Configure button.
- Select Programs tab.
- Find java item in list.
- Select Permit All option in Internet Access column.
- Click OK.
If you have a different firewall program, please refer to its specific manual for more details.
If disabling the firewall does not help, restart your computer.
In addition, BiologicalNetworks has an embedded Issue Report tool which allows the user to submit any critical errors, enhancements, feature requests or just general questions that they may have about the program. The Report Issue tool creates a simpler and more user friendly environment and allows for the program to be easily accessible by first time users and experts alike.
To access this feature:
- Open the “Tools” submenu;
- Select “Report Issue”;
- Fill out the necessary sections in the new window (Figure 2.1 );
- Press OK to submit the form.
Figure 2.1 – Report Issue Window.
The three sections of the Report issue window allow the user to type a description of the problem/request, attach problematic files, and personalize the request by including an email by which he or she can be reached.
3. Quick Tour of BiologicalNetworks
BiologicalNetworks is an application developed for navigation and analysis of molecular networks and protein structure, genomic sequences and various experimental data analysis. The software is also a powerful drawing tool which allows researchers to summarize experimental results and produce publication quality pathways.
3.0 Standard Features
The software can handle a variety of tasks, including: graphics and layout optimization, data filtering and pathway expansion, as well as classification and prioritization of proteins. BiologicalNetworks supports multiple features available in public pathway browsers and graphical toolkits.
With BiologicalNetworks you can:
- Create, manage and modify objects – pathways, nodes, reactions, interactions and relations, etc.
- Design BiologicalNetworks projects to share with other users.
- Move and lay out individual nodes, selected subgroups, and colored groups.
- Have access to pathways, genomic sequences, protein 3D structure, and experimental data (i.e. microarrays).
- View and modify different Diving Projects ( Figure 1.2 below) which are embedded into the program and located on the web. (Currently there are 17 Diving Projects, but new ones are being created.)
3.1 General Layout of Interface
As shown in Figure 3.1 below, the new interface is composed of four major areas: the Project Pane on the left, the Search Results Pane on the bottom, the central Pathway Navigation Pane and the Properties Pane on the right.
Figure 3.1 – BiologicalNetworks program layout.
- The Pathway Navigation Pane can exhibit multiple pathway, one tab for each pathway. The Pathway Map is completely configurable: it can show the whole pathway or any set of nodes; for the selected group of proteins, additional detailed information is displayed in the List Pane.
- The Project Pane organizes data in a Workspace. Proteins, small molecules, and cell processes can be stored in folders. Folders are hierarchical and can include both subfolders and single objects. A separate set of folders is reserved to represent a functional protein ontology and/or classification hierarchies.
- The Search Result Pane is organized as several tables with a set of columns, where each column represents one characteristic of a node or control found for your search. Some useful characteristics include protein symbol, gene name, links to external databases (GenBank, LocusLink, Golden Path, HUGO), experimental conditions, etc. These characteristics can be used for filtering or sorting objects in the table. Selection in the table can be highlighted on the map and vice versa; behind the Project Pane there are three other panels: the Palette Panel, Curated Pathways and Database Tables Pane.
- The Pathways Pane contains all previously created and readily available pathways.
- The Viewer Pane creates an interactive environment in which a 3D model of the node demonstrates a visual representation of the chemical components. The interactive model can be rotated along any axis using the mouse.
- The Experimental Data Pane allows the user to search through preexisting data and microarray files.
- The Properties Pane includes detailed information about proteins, genes, and network connections.
- The Palette Pane shown in Figure 3.2 below displays whole list of node and control types and their graphical presentation.
3.2 Beginning BiologicalNetworks
3.2.1 Opening Files/ Projects
BiologicalNetworks allows users to create many types of already created files and projects by using one of the given menu options. Figure 3.2.1 demonstrates the different types of files which can be loaded into the program. Example files are available for download on the BiologicalNetworks website (http://BiologicalNetworks.net/ExampleData.zip).
3.2.2 Creating a Pathway/ Model
To create a pathway/model in BiologicalNetworks, there are several means to do it. User can load ready-made pathways/models from pathway databases or SBML files (will touch them in next chapters) or manually create a new pathway/model from scratch. Hereby, we’ll describe the manually creating of a pathway/model.
Creating a project – Use the New Project button to open a new project workspace. Click on the button to create a new project. A new modeling canvas will be automatically generated.Creating a species – Use the Species buttons to create new species in the Palette Pane. Click on the species button once and then click on the modeling canvas where you want to place the new species.
Creating a reaction node – Use the Reaction button to create a new reaction in the modeling canvas. Click on the button once and click in the modeling canvas where you want the new reaction created.
Creating a reaction – Use the Linking Tool to links the species to the reaction node. Click on the tool once and click on the species you want to link with the reaction node, hold the left mouse button and drag the arrow to the reaction node.
Creating a compartment – Use the Compartment button to create a new compartment in the modeling canvas. Click on the workspace and drag the mouse to the desired region which covers all the reactions taking place in the same compartment.
You can easily explore all the species in the model tree viewer by its category and edit any of them.(<fontcolor+”#993333″> Figure 3.2.4 ).
Figure 3.2.3 – The Palette tab in the Projects Pane. Figure 3.2.4 – BiologicalNetworks Project Browser.
3.3 Defining Pathways/Models
3.3.1 Defining a Node
Single clicking on the node in the modeling canvas will highlight the Node Properties Panel fields (Figure 3.3.1), which are editable. This panel allows you to see and modify the bioentity properties: Name, Aliases, different IDs, etc. In addition, you can include some biological information e.g. full GO annotations, organism and pathway information; kinetic information: Quantity, Concentration, Unit, Molecular Weight, etc. Apart from that, different links lead users to online databases to get more detailed information. Also Full Description window which is at the bottom of the panel gives to user full description of the bioentity.
3.3.2 Defining a Process/Reaction
Double click on the Process node in the modeling canvas and a new window, the Process Properties panel(Figure 3.3.2) will appear. Modify the process properties (process stoichiometry, specify if the process is reversible and other properties) in this panel and. You may also click on the “kinetics” tab to access and modify the reaction/process kinetics.
3.4 Exploring Curated Pathways
Click Pathways Data Source Window tap in the left panel of application window to bring up Curated Pathways tab pane. The Curated Pathways tab pane has database tree panel. ‘KEGG’ is the root node in the Curated Pathways tree panel.Click the KEGG node to show available organisms in KEGG database. After retrieving the data, over 1,000 organism nodes appear. Click on any one of the organism nodes to show the accompanying pathways; click on any of the pathways to view a list of reaction nodes.
Figure 3.4.1 – Curated Pathways Pane.
Figure 3.4.2 – Pathways: Organisms, Pathways, Reactions, Metabolites.
To load the pathways into the Pathway Panel simply drag a node and drop to the Pathway Panel window. The loaded pathways containing reactions, compounds and their interactions are drawn in model workspace window. Positions of these are randomly placed. Use layout tools to get nice positioning.
4. BiologicalNetworks Graphical User Interface
4.1 BiologicalNetworks Components
The main BiologicalNetworks window is divided into three panes and a Menu ToolBar. The panes are:
- Pathway Pane.
- Project Pane.
- List Pane.
Panes as a part of the software interface help you to manage your data. This section describes the panes and presents guidelines for their appearance and general use.(Figure3.1 )
4.2 BiologicalNetworks Panes
4.2.1 Project Pane
The Project Pane contains:
- Project properties pane.
- Choose organism drop-down menu.
- Project Folders.
The Project Pane contains all the currently opened projects and their respective subcategories.
Project properties pane allows you to define your personal project settings in order to be recognized by database security system allowing to add, edit, or remove data from database.
Choosing a project from the menu allows you to view all of its individual components.
4.2.2 Pathways Pane
The Pathway Navigation Pane or Graph Display is capable of displaying several pathway views, one window per pathway. The pathway map is completely configurable. The map shows individual pathways or any selection of nodes. For a selected group of proteins, more detailed information can be viewed in the List Pane. For example,Figures 3.3.1 , and 3.3.2 show how you can find a description of selected proteins.
4.2.3 Search Results Pane
The Search Results Pane functionality allows you to:
- Show query results and group contents.
- Copy and paste contents of the table into the pathway.
- Copy and paste contents of the table into MS Excel.
- Select nodes from the table on the pathway diagram.
- Retrieve the selection from the pathway diagram.
- Find pathways and groups that contain nodes of interest.
- Link Protein names stored in the List Pane tables to HUGO, LocusLink, and GenBank. Hyperlinks are shown in the blue color.
When the Search Results Pane shows contents of groups or search results, the Search Results Pane functionality allows you to examine useful characteristics for nodes and controls. Some characteristics to be displayed include: symbol, gene name, functional group, links to external databases (GenBank, LocusLink, Golden Path, HUGO2, experimental conditions, etc. These characteristics can serve as the basis for filtering or sorting objects in the table. Table selections can be highlighted on a pathway map and vice versa.
To show nodes of interest in the Search Results Pane:
- Create a query. For example, find nodes are containing string LAT in their names, using the Attributes Search ( Figure 6.3.2 )
- Run the query;
- The results appear in the Search Results Pane.
To copy a node of interest from the Search Results Pane to the pathway diagram:
- Select the node of interest;
- Add the selected node to the pathway diagram using copy and paste from the Search Results Pane, or simply drag and drop the selection into the Pathway Pane.
To select a node of interest on the graph:
- In the Search Results Pane, put mouse over the record with node of interest;
- Open the right click menu and choose Select on Graph;
- If the node of interest is included in the current pathway diagram, it will be selected on the graph.
To retrieve selection from the graph:
- Open the Search Results Pane;
- Open the pathway of interest. Select the node of interest on the graph;
- Open the context menu on the List Pane and choose Get Selection from Graph;
- If the node of interest is included in the current list of nodes, it will be selected on the Search Results Pane;
The Search Results Pane functionality allows you to find groups and pathways that the node of interest is a member of. Find groups and pathways containing the selected object(s) by using the steps described in Section 6.4.
4.2.4 Palette Pane
The Palette Pane contains a full list of nodes and controls types and their graphical representation. Property values can be displayed by shape and color of the node in the Pathway Pane. Every kind of node or control in the software has a unique name and graphic representation. By default, BiologicalNetworks has built in styles for all kinds of nodes and controls stored in the List Pane. ( Figure 3.2 )
4.3 Menu, Toolbar and Model Kit
Menu, Toolbar and model kit may be used for quick access to various components of BiologicalNetworks. The menu options are listed in the Table 4-1.
|New||Create a new model workspace|
|Open||Open an existing model saved in the .bnm format
Import a SBML model file. The imported model may automatically be laid out on the model workspace using two layout algorithms – Force directed and hierarchical.
|Save||Save the model in .bnm format. This will save the current model along with the diagrammatic layout
Save as another .bnm or other format file
|Print the model diagram.|
|Network||Allows the use to add selected coordinate files.|
|Build Pathway Wizard||Creates and pathways relevant to a selected group of nodes.|
|Build Homology Wizard||Finds nodes with similar properties to the node selected across all species and organizes the information in a table according to species name and the original data selected.|
|Arranges the network drawn using layout algorithms (Force-Directed, Hierarchical-Embedding , etc.).|
|The View menu allows the user to personalize the layout of the program to an extent by hiding certain panes and showing others. It also has options for zooming in and out.|
|Clustering||Allows the user to import selected clusters of information to the workspace.|
|Tools Menu||Tools Menu provides the available tools to maximize the ease of access of the program.|
|Report Issue||Is used to send comments about the program such as errors and recommendations which will be highly regarded by the programmers. (Most critical errors are automatically sent, however additional requests and comments can be sent using this option.|
|Allows the user to trace other documents and such which are relevant to what is being worked on.|
|Is used to personalize the setup of the workspace by managing the placement of the panels.|
|Manages the different toolbars/ windows open.|
The toolbar contains the shortcuts for some of commonly used menu option like open, save, print, help etc. which are activated by clicking on the corresponding icons.
4.3.1 Node Editor
In any model the Node may be one of the Bioentities: Gene, Protein, Cell Object, Enzyme, mRNA, Pathway, Complex, Small Molecule etc. For easy identification each category has a different icon associated with it as shown in Figure 3.2.3 . The Node Editor is a pane which is located at the right of the main window. Figure 3.3.1 shows a snapshot of the Node editor. Following information can be entered through the Node editor:
|Label||Name displayed on the model workspace|
|Full Name||Full Name|
|Label||Name on the Graph.|
|Organism||The organism in which the molecule is located.|
|inNetworkld||Number of moles of the bioentity|
|IntegromeDBld||Number of molecules of the bioentity|
|Node type||The classification of the node.|
|Other properties||Properties included in the database that scientists can use and learn from.|
4.3.2 Process Properties
An in-silico model of a biological system is abstracted as a network of bioprocesses or chemical reactions. Section 3.3 describes steps to be followed for constructing a model on the model workspace.Process Properties is an interface ( Figure 3.3.2 ) for viewing and changing the properties of a bioprocess or a reaction such as stoichiometry, rate laws etc.
The Process editor consists of two tabs:
- Process : Displays the characteristic properties of the node selected including the name, the organism it belongs to and the node type. Additional properties are also available which compose the database. These properties can be used by scientists to conduct research and reach conclusions.
- Visual Properties : The visual properties only contain information relevant to how that node is represented in the program, including color, shape and size. The flexibility of these properties allows the user to personalize his or her workspace and the layout of the graphical display.
4.3.3 Compartment Editor
- Chemical reactions may take place in different compartments.
- is an interface (
- ) for changing the properties of a compartment.
4.4 Overview of Context Menus
- Right click menus or context menus are extensive hidden menus that exist throughout BiologicalNetworks. They allow you to access commands for selected objects. In other words, right click menus contain the list of commands that can be currently used.
5. BiologicalNetworks Modeling Environment
5.1 Building A Model
Every model requires a project space. To define a new model the first step is to open a new project by using the icon in the toolbar or the menu option (File->New Project). This creates a model workspace on which the model network can be built. Following sections describe how to define and edit nodes and processes. Free text details or references for the model can be entered in the description window (4.2). Other specific information like User Name, Date, Species Name, etc. can be entered in the project properties tab mentioned in 4.2.
5.1.1 Adding Biological Components
After creating the model workspace, the next step is to define different species in the network. Steps for adding new specie in the model are as follows:
- Click on the icon of specie category, i.e. gene, enzyme, etc., in the Palette Panel. This will change the mouse pointer inside the model workspace to the icon of the selected category.
- Place the new specie on the model workspace.
- Change the properties of the specie by invoking the Figure 3.3.1 Species editor. By default a name would be assigned to the specie and its concentration would be set to “1” if it is a reactant and to “0” if it is a product. On moving the mouse pointer to a specie icon a tool tip is displayed with its name, volume and concentration.
5.1.2 Adding Processes/Reactions
The next step after creating the bioentities in the model is to define the processes/reactions and the associated kinetics. In BiologicalNetworks reactions are graphically represented by bioprocesses/reaction icons and linking arrows. The steps for defining a bioprocess/reaction are as follows:
- Select the bioprocess/reaction icon from the Palette Panel
- Place the bioprocess/reaction link icon on the workspace by clicking on the workspace
- Choose the bioprocess/reaction arrow from the Palette Panel and link the bioprocess/reaction link with the bioentity in the bioprocess/reaction by clicking and dragging the mouse. If bioentity is a source/reactant, the arrow should originate from it with the arrowhead at the bioprocess/reaction link. To indicatedestination/product the arrowhead should point to the bioentity and tail should be on the bioprocess/reaction link.
- Go to the Figure 4.7: Bioprocess/Reaction Editor window by double clicking on the bioprocess/reaction icon and see/change the properties, enter the stoichiometry, rate law and other information as explained earlier. On moving the mouse pointer to a bioprocess/reaction icon a tool tip is displayed with its name, bioprocess/reaction equation, rate law and rate law parameters etc.
5.1.3 Adding Compartments
In BiologicalNetworks a compartment is graphically represented by a compartment icon (Figure 4.8)
The steps for defining a compartment are as follows:
- Select the compartment icon from the Palette Panel
- Click on the workspace and drag the mouse to the desired region, which covers all the reactions taking place in the same compartment.
- Change the properties of the compartment by invoking the Figure 4-6: Compartment Editor. By default a name would be assigned to the compartment and its volume would be set to “1.0E-14”. On moving the mouse pointer to a compartment a tool tip is displayed with its name, and volume.
5.1.4 Editing Model
The model network definition is complete with the creation of all the species and reactions on the workspace, and entering their properties through the Species and Reaction editor. To better organize and aesthetically enhance the components on the workspace the following can be done:
- Move the components on the workspace. This can be done by first switching to the selection mode and then choosing the component on the workspace. The component can be moved around and resized. During all these process, the network topology will be preserved.
- Change the color of the components. A component may be given any color from the color palette in the Palette Panel.
- Add text to the model workspace. To enter on any section of the model free text may be added by selecting the annotation icon from the model click and placing a text box on the workspace. Also font and color of the text may be changed by using the color and font icons in the Palette Panel.
- Cut, Copy, Paste and Delete component. These standard operations can be done on any component on the model workspace – specie, reaction arrow, reaction link, text box – through the Edit menu or standard key strokes.
- Undo and Redo operations. Up to 10 operations can be undone and redone using the icons in the toolbar.
- Zoom in /out/fit and Zoom in selected region. This feature becomes useful when the model size becomes big. The zoom in, zoom out, zoom to fit and zoom in selected region icons in the toolbar implement this functionality.
- Resize component. The species and reaction icons can be resized by selecting them and dragging one of the control points.
- Layout component. BiologicalNetworks incorporates two intelligent layout algorithms for automatically creating an aesthetically pleasing network layout of the model on the drawing workspace. These are:
- Force Directed algorithm: This algorithm is more suitable for layout of networks with many loops in it.
- Hierarchical algorithm: This algorithm gives better results when used for laying out networks with less loops e.g. linear cascades.
5.2 Managing Models and SBML Support
Building models of biological processes, is an intensive and tedious exercise. As a result re-usability and ease of exchange of models is an asset for any software tool aimed at studying biological networks and processes.In BiologicalNetworks this requirement has been addressed by making it compliant with the SBML standard (www.sbml.org). SBML is an XML based modeling language for describing biochemical network models. It is an ongoing international collaboration effort and is fast becoming a standard for biochemical model specification and exchange. The current release versions of SBML (Level 1 version 1, Level 2 version 1, Level 2 version 2) are supported by BiologicalNetworks.
Following sections describe how to save, import and export models in BiologicalNetworks.
5.2.1 Saving Models
Models constructed in BiologicalNetworks can be saved by using the save option in the menu or through the save icon in the toolbar. The models are saved in the proprietary .bnm format, which is essentially a binary format of the model Java class.
The .bnm file captures all the network information entered through species editor, reaction editor, simulation setup window and the layout of the network on the model workspace.
5.2.2 Importing Models
BiologicalNetworks can import models created in the .bnm (BiologicalNetworks) format or a model specified in the SBML format.BiologicalNetworks models can be imported by simply choosing the open option from the File menu or by clicking on the open icon in the toolbar and then browsing for the .bnm file. The model would be created along with the network layout.
SBML models can be imported by using the Import SBML option from the File menu and then browsing for the .bnm file. SBML does not specify information for the layout of the network on the model workspace. BiologicalNetworks uses an intelligent layout algorithm for automatically creation of network layout from the SBML information.
5.2.3 Exporting Models
BiologicalNetworks models can be exported to SBML file. User can use the Export SBML option from the File menu and then specify the file name and the version of SBML to exported the model. However, certain information for example the simulation setup and layout information will not be saved in the SBML files.
5.3 Network Statistical Analysis
BiologicalNetworks provides tools to extract topological information from the model. These tools provide statistical information about the network.
6. Data Management
The software database provides storage and organization functions for proteins, small molecules, cell processes, events of regulation, chemical reactions, and other objects used in studies of molecular networks and pathway analysis.
6.1 Pathway Representation
The pathway concept is central in the design of the system. The program stores pathway as a diagram with groups, annotations, and user settings.
The pathway stores not only the list of all proteins, small molecules, and cellular processes, but also, the links and relationships between them. Graphically, each pathway is shown in a separate window and serves the purpose of organizing the data and saving results of searches in underlying database. Pathways can be saved to the database, and the list of existing pathways is displayed in the Pathway Pane.
Pathways can be exported as separate XML files, exchanged between researchers for collaboration purposes and reimported back into the database.
To create the new pathway:
- Choose File>New >Project from the Main Menu;
- The Pathway named New Pathway appears in the Pathways folder;
- Rename the new pathway by typing the new name in the folder box.
To remove the pathway from the Workspace:
- Select the pathway of interest;
- Choose Delete from the context menu;
- Press Yes.
The Workspace is a collection of the pathways, as well as custom annotations and other private data. Custom annotations may include folders and text fields functionally classifying a gene (for example, as a receptor) or linking it to a specific disease. Users may add both new nodes and links, which will be saved to the Workspace.
The Workspace is implemented in the form of a local file. The default Workspace is the welcome page which contains multiple projects/ databases to choose from. The Integrated Database workspace also contains approximately 100,000 functional links extracted from PubMed and full text articles.
6.3 Database Search
This section introduces you to the database search procedures. A database search can be used to locate any type of object stored in the database. In general, the software supports two types of searches:
- Context search
- Search by attributes
6.3.1 Context Search
The context index provides the retrieval of sets of objects that have attributes containing the specified query phrase (string) partially or completely. The text fields are broken into words according to a conventional manner. The quick context search is optimized for speed of retrieval.
With BiologicalNetworks, you can refine and filter your search through the use of the two filters. The primary one is located to the left of the search textbox and filters the search based on a certain organism. The one located to the right of the search textbox filters the results according to the source defined.
The search results are displayed in a separate Search Result List Pane and can be added to a pathway (or a group) by Copy and paste or a drag and drop operation.
6.3.2 Search by Attributes
The search by attributes allows you to search database objects using many types of data as search conditions. These include, for example, node type, effect (positive, negative, unknown), mechanism (transcription, phosphorylation), tissue type, description/user defined attributes text, and so forth.
The Comprehensive Search icon is located to the left of the Search Textbox and allows the user to search through the use of selected attributes.( Figure 6.1)
To activate a Comprehensive Search:
- Click Comprehensive Search Icon;
- The Find dialog box appears;
- Select the type of attribute by opening a corresponding folder in the left panel;
- Select specific attributes by highlighting them and then clicking the icon on the right side of the panel to add these attributes to search.
- Select the type to combine search parameters in the Logic field located along the bottom;
- Click OK to run the query and the search results should appear in the Search Results pane located at the bottom of the program layout. ( Figure 3.1)
The search results are displayed in a separate Search Result List Pane and can be added to a pathway or a group by Copy and paste or a drag and drop operation.
6.3.3 Quick Search
6.3.3 Quick Search You can perform a quick search for any object type stored in the Workspace. In the Quick Search window, you can set the search text and press the Start Quick Search button to start the Quick Search algorithm. All objects found will be placed in the separate list in the Search Result List Pane. You can save the search results as a individual group in the Groups folder, or as a search in the Searches folder (by default).
The Quick Search box is located above the Properties Pane.( Figure 6.5 )
Enter keywords to search for all available information about it presented in our database.
The search results will be listed in the Search Results Pane. ( Figure 6.6 )
6.3.4 Search by Keyword Input
In addition, the uses has the ability to search the database by importing a text file which includes keywords through the use of the multiple keywords command located immediately to the right of the Search Textbox ( Figure 6.7 )
To search the Database through file input:
- Press the command button located to the right of the Search Textbox;
- When a window opens, select the text file which contains keywords;
- Press ok;
- Search results will be displayed in the Search Results Pane.
6.4 Querying Database to Explore Gene Relationships
You can drag and drop rows from the Search Result List Pane to your active pathway, or Build a Pathway from selected group of bioentities by right clicking on the chosen group of bioentities. ( Figure 6.8 ).
To Activate Build Pathway Wizard:
- Choosing Build Pathway opens Pathway Wizard Panel( Figure 6.9 );
- Check starting entities for your pathway to input node selection, ( Figure 6.10 );
- Choose an algorithm for pathway building, ( Figure 6.11 ).
Figure 6.9 – Build Pathway Wizard Window.
Pathway Wizard allows you to create pathways, using molecular interaction data from the database. It starts by suggesting several ways to determine starting entities, and then opens the Build Pathway dialog box ( Figure 6.10).
The next step is to apply advanced attributes search to specify any logical combination of searched bioentities and bioprocesses properties ( Figure 6.11).
Click next to access advanced filter settings ( Figure 6.1).
Figure 6.12 – Search Mode Selection.
Figure 6.13 – Build Pathways Wizard Result displays interactions between the selected nodes as well as interactions with other curated pathways and networks.
Refer to the User Manual if you would like to learn more about the Build Pathway Wizard.
7. Working with Microarray Data
7.1.2 Loading Microarray Files
Biological Networks allows the user to search through preexisting experimental data and analyze this data using expression tables and data clustering. The search results are categorized into three to five different categories as shown in Figure 7.1.1
The primary folder contains all of the unsorted search results, while the following two categorize these results into experiments in which the search input was significantly under-expressed or overexpressed when compared to the average. The last two categories only serve a purpose if more than one item is entered into the search field and then demonstrate experiments in which two or three of the items had a correlation of more than 0.7.
7.1.2 Opening Microarray Data
Before using Microarray data environment of BiologicalNetworks make sure you have Microarray files to open.
Example data files could be downloaded here: http://biologicalnetworks.org/downloads/ExampleData.zip
Choosing one of the available file types will open one Import Expression Data Wizard( Figure 3.2.1).
The Import Expression Wizards allow you to import the expression data in BiologicalNetworks. The data files can be in TXT or MS Excel (for Stanford tab delimited and Affymetrix, .mev and .ann (for TIGR expression data), .gpr (for GenePix expression data).To import your data, call the File> Open> Microarray menu and choose format. Then specify the location of the source data file and follow the steps provided with the Wizard. The imported data will be opened in the ImportExpression Wizard.
The imported file may contain EMPTY or non-value cells that will be colored grey and will not be taken into consideration for calculations.
Importing Stanford (tab delimited) Data.
To import data in Stanford (tab delimited) TXT or MS Excel formats:
- Call the File> Open> Microarray menu and choose the Stanford Format option(Tab Delimited).
- The Import Expression Wizard appears.
- In the Expression Wizard window, specify the content of the first string of the data file and the columns of the data file that contain the Gene IDs by clicking the upper leftmost expression value.
- Then press “Load” to load the data.
7.2 Color schemes and Visual styles settings
When an expression experiment is opened as a heat map, a colored box represents the expression level of each gene (protein). There are two default color schemes in the Expression Experiment Viewer that correspond to data formats supported by the software (Signal and Ratio). You have the ability to change the color scheme of the data using the Color Scheme button located in the Pathway Navigation Pane ( Figure 3.1).
Ratio data: the color intensity is proportional to the log ratio of the current sample to the base sample and is represented as double gradient color map. There are acceptable negative values in this format. On the heat map, the green color represents the negative log ratios and the red color represents the positive log ratios. Greens of increasing intensity correspond to increasingly negative log ratios. Reds of increasing intensity correspond to increasingly positive log ratios. In the Color Settings dialog window, you can set up the color range for min and max values of ratio, the cut off values, and the color range for missing data.
Signal data: the color intensity is proportional to the signal value and is represented as single gradient color map. There are no negative values in this data representation. By default, the software uses green for low expression values, red for high expression values, and yellow for missing values.
Visual Styles for Gene Expression
Visual styles are used to make your gene expression map more intuitive and clear.
-Change the scale of the expression map by zooming in and out, and setting up element size.
-Use the Brightness option to adjust the color intensity of the heat map for better viewing.
-Change the color range for the particular gene expression map. Use Microarray toolbar icon to adjust the Expression Viewer option. In this dialog box, you can also enter the cut off values, set up the color range for min and max values of ratio, and the color range for missing data.
-In the heat map, a colored box represents the level of expression for each gene. The software supports two default color schemes (Ratio and Signal) for expression data. To change the color schemes for an opened experiment, use the Microarray toolbar and select the radio button corresponding to the color scheme you want.
7.3 Expression Experiment Viewer
Loaded Microarray data appears in a separate Expression Experiment Viewer tab in the Pathways Viewer Pane.
The Expression Experiment Viewer is designed to display a graphical representation of gene expression and proteomics experiment data, usually generated by microarray experiments. It provides the algorithms and workspace for examining the data from expression experiments or proteomics experiments and also for superimposing this data onto an opened pathways and gene regulatory networks ( Figure 7.3 ).
Functionalities available from Microarray submenu and Microarray Experiment Manager Menu bar, allows the user to:
– Create new pathways as well as new groups from an expression experiment.
– Select a number of genes and create a group or a pathway from them.
– Expression data can be visually displayed on an existing pathway diagram by showing different shades of green/red depending on the fold change of expression.
– There are numerous clustering, filtering, normalization, search methods available in BiologicalNetworks.
7.4 Expression viewer toolbar
The Expression Viewer toolbar contains wide range of functionalities:
- Color entities by expression. Select this option to color pathway of interest entities by their expression values.
- Cluster genes with common characteristics into a selected amount of groups.
- Create group/Store cluster from selection option creates a group from the selected genes.
- Visual styles and color settings for gene expression map.
7.5 Filtering, Normalization and Data Transformation
Different types of adjustments can be applied on top of one another in any sequence, and the same type of adjustment may be applied repeatedly to the matrix. Adjustments may not necessarily affect the main display or the values displayed when elements are clicked on the matrix displays, but will influence the calculation of the expression matrix, the foundation of all analyses. Adjustments will also be reflected when the entire matrix or individual clusters are saved as text files, although the original data files are not overwritten. Furthermore, with the exception of three options: “Set Lower Cutoffs”, “Set Percentage Cutoffs” and “Adjust Intensities of Zero”, all the changes made to an expression matrix are irreversible for the current session.
Because of the above features, a good way to use these options might be to apply any required adjustments to the data set, save the entire adjusted matrix as a tab delimited formatted text file (using the “Save Microarray Matrix” option under the “Microarray” Menu), and then load this new file in a new session, during which no further data adjustments will be made. This will ensure consistency throughout the session.
7.6 Sorting and searching over expression data
The Sort feature permits the user to sort the data:
- By Expression Value
- By Chromosomal Order
- By Gene ID
-The Search feature permits the user to search the data for genes or samples for a search term given search criteria.
-The Search initialization dialog allows the option of finding genes or samples. The search criteria include a search term, a selection to make the search case sensitive, and a selection to permit the search term to be an exact match or simply a contiguous portion of a larger annotation term.
-Search results are returned in a new window. Upper section is represented as a table of genes or samples identified as matching the search criteria and a lower section providing shortcut links to cluster viewers that contain the identified samples or genes.
-Navigation shortcuts provide a means to open cluster viewers that contain the elements found in the search.
-Elements in the table can be deselected using the checkboxes. Clicking on the Update Shortcuts button will produce a new search result window with just the previously selected entries and the associated viewer shortcuts. This allows one to prune unwanted elements out of the search result.
-The Store Cluster button will store the selected items as a cluster and assign a user selected color.
7.7 Clustering of Experimental Data
Each of clustering algorithms available in BiologicalNetworks can be launched from the tools menu located in the Microarray tab. All clustering algorithms can be performed to cluster genes or samples. Clustering analysis results appear in in the Analysis sub-tree of the Project Properties navigation tree. The tabs within this sub-tree contain the results of the method’s calculations. Each algorithm run will present a dialog or form to use to input parameters specific to the algorithm being performed.
7.8 Clustering analysis viewers
Viewers are the graphical displays used to present the results of the microarray analysis. The viewers will appear as a sub-tree under the method’s Analysis Tree within the Project Properties navigation tree.
This viewer is used in the main window of the Expression Viewer as well as in clustering analysis Viewers. Every colored rectangle represents a gene. Each column represents all the genes from a single experiment, and each row represents the expression of a gene across all experiments. The default color scheme used to represent expression level is red/green (red for overexpression, green for under-expression) and can be adjusted using the color scheme button located in the embedded toolbar. See Section 7.2
Double clicking on any of the rectangles in this view will open a window containing more information about this gene’s expression level.
The Expression Graphs Viewer displays graphs of the expression levels of each gene across the experimental conditions. The mean expression levels of genes in the cluster are shown as a centroid graph overlaid on top of the individual expression graphs.
Gene cluster Table Views
Table View of clustering results show annotations for gene in the cluster.
- You can drag columns horizontally across the table to change their relative ordering.
- You can sort the rows in ascending or descending order of the entries in the column by successive clicking on the header of that column.
- You can sort the “Stored Color” column, bringing together elements that have been stored with the same cluster color.
- You can sort the table in the original order of elements by CTRL-clicking on any column header.
There is a Context Menu appearing by Right-clicking on the table view. The options available from the Context Menu are:
- Store a subset of rows in the table in a cluster, to Groups/Clusters manager
- Store entire table as a cluster, to Groups/Clusters manager
- Make a search over the table.
- Save currently viewed cluster to a file.
- Delete all rows in the table or a subset of them
- Delete a cluster stored from this viewer
7.9 GeneOntology terms overrepresentation analysis
BiologicalNetworks provides an implementation of the GeneOntology Fisher’s overrepresentation test, method which gives the researcher an initial biological interpretation of gene clusters based on the indices provided in the input data set and information linking those indices to biological “themes”. These themes are generally GO terms, KEGG pathways, or any other descriptive term related to biological role or biochemical pathway information. The result of the analysis is a group of biological themes which are represented in the cluster. A statistic reports the probability that the prevalence of a particular theme within the cluster is due to chance alone given the prevalence of that theme in the population of genes under study (all “genes” loaded into BiologicalNetworks).
Fisher Exact Probability
The Fisher Exact Probability reports the probability that a biological theme is over-represented in the cluster of interest relative to the representation of that theme in the total gene population.
For example, suppose that one has a gene list of 50 genes from a population of 10,000 genes. Now suppose that 10 of the 50 genes were related to pathway “A” but only 13 genes in the total population were associated with pathway “A”. This scenario would yield a low probability that the observed number of hits (occurrences of pathway “A”) within the small sample could be due to chance alone. This statistic is based on the hyper-geometric distribution and has benefits over chi-square in that it is appropriate for finite populations.
Annotation parameters Panel
Population and Cluster Selection Option
This option specifies a gene population or a gene cluster list. The default selection is to use a population file which is simply all of the genes loaded into BiologicalNetworks.
The Annotation parameters Panel also displays gene clusters currently stored in BiologicalNetworks cluster repository. If no clusters have been saved then a blank browser page will be displayed on this panel and the Cluster Analysis mode option will be disabled. Selecting a row (or a group of rows using ‘Alt’ button) in the cluster table will display the cluster in the expression graph area of the browser. Cluster analysis will be executed on the selected clusters.
This area contains a drop down list which contains a list of available annotation types which can be used identify genes. Generally it’s best to use an index or accession ‘uniquely’ identifying the spotted material.
Annotation Conversion File
This optional file provides the mapping from your annotation key (above) to the index used to map to biological themes (GO terms, KEGG pathways, etc.). If your annotation key type is the one used in the linking file (below) then this conversion (mapping) is not needed. These files if needed are typically stored in the Convert directory.
Gene Annotation / Gene Ontology Linking Files
This section allows one to specify one or more annotation files. These files contain gene indices paired with biological themes such as GO terms. These files typically reside in the Class directory.
Results of GeneOntology Analysis
The primary result is reported in a table in which entries are ordered based on the reported statistic. The table can be sorted on any column. A right click in the table will launch a menu allowing you to:
Store Selection as Cluster: Stores the genes associated with a biological theme as a cluster that will be stored in the cluster manager.
8. Explore gene relationships with expression data
Theory that expression of interacting entities is correlated due to evolutional or physical reasons makes it possible to predict networks of interactions from expression values. This is a good opportunity to start with, especially if you have no initial hypothesis concerning your gene expression data.
8.1 Color network by expression values.
To overlay Expression Experiment results onto an existing pathway diagram:
- Open an expression experiment;
- Open a pathway of interest;
- Press the Coloring by Expression Values Toolbar button to color the active pathway by expression values;
- From drop down menu choose the sample time point you would like to visualize on the active Pathway Pane
8.2 Extract pathways from expression data.
Correlation algorithms group genes according to similarities in patterns of expression variation over all the samples. A correlation network is a group of genes whose expression profiles are highly predictive of one another. Each pair of genes related by a correlation coefficient larger than a minimum threshold and smaller than a maximum threshold (assigned in the initialization dialog box) is connected by an edge. Groups of genes connected to one another are referred to as networks.
In order to extract pathways from the expressions data, you need to store the data as a cluster first by either using K-Means/Medians Clustering or Hierarchical Clustering or by right clicking on the list of data. After the data has been stored as a cluster, you can right click on it and select the “open” option which allows you to view the pathways.
The algorithm calculates the correlation coefficient between genes by comparing the expression pattern of each gene to that of every other gene. The ability of each gene to predict the expression of each other gene is measured as a correlation coefficient. Genes are represented as nodes in a network and edges are drawn between them if their correlation coefficient falls between the minimum and maximum thresholds specified in the initialization dialog. The experiment sub-tree created in the Project Properties Panel contains information regarding the networks predicted. Under the Network tab is a graph of all of the subnets generated. A subnet is a group of genes in which each gene is connected to at least one other gene. The Correlation Subnets tab contains network diagrams for each of the individual subnets, and the Expression Images folder contains expression views for the genes in each of them.
8.3 Build Pathways for selected expression values
- To create a new pathway from an expression experiment:
- Open the expression experiment and select genes;
- Press the Create New Group Toolbar button;
- Enter the group name in the dialog box;
- In the dialog box, press Create Group button, and then press Close;
- A new group appears in the Groups/Clusters sub-tree of the Project Properties tree.
-  Shannon P. et al.: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research 2003, 13, 2498-2504.
 Mendes P. et al.: Pathdb: a second generation metabolic database. In Hofmeyr JH, Rohwer J, Snoep J. (eds.), Ani- mating the cellular map, pp. 207-212. Stellenbosch University Press.
-  Bader G. et al.: BIND-The Biomolecular Interaction Network Database. Nucleic Acid Res. 2001, 29, 242-245.
-  Bhalla US: The chemical organization of signaling interactions. Bioinformatics 2002, 18, 855-863.
-  Cary MP. et al.: Pathway information for system biology. FEBS Lett. 2005, 579(8), 1815-1820.
-  Chen L, Gupta A, Kurul ME: Efficient algorithms for pattern matching on directed acyclic graphs. In Proc. 21st Int. Conf. on Data Engineering (ICDE), Tokyo.
-  Hu Z, et al.: VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acid Res. 2005, 33,352-357.
-  Krishnamurthy L. et al.: Pathways database system: an integrated system for biological pathways. Bioinformatics 2003, 19, 930-937.
-  Yeger-Lotem E. et al.: Network motifs in integrated cellular networks. Proc. Natl. Acad. Sci. 2004, 101 (16), 534-539.
-  Ogata H. et al.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acid Res. 1999, 27, 29-34. http://www.genome.ad.jp/kegg/.
-  Nikitin A. et al.: Pathway studio – the analysis and navigation of molecular networks. Bioinformatics Applications Note 2003, 19, 1-3.