In this project we describe data integration, analysis and management environment for systems biology research for studying host-pathogen interaction. We developed methods and tools for simplifying and streamlining the process of integration of diverse experimental data types, including molecular interaction, gene expression levels, genomic sequences, protein structure information, phylogenetic classification, and virulence data for pathogen-related studies. Specifically, systems level studies on Influenza viruses identified host molecular pathways that appear to be induced or repressed during infection, and these results highlight pathways exhibiting responses specific for a given pathogen infection. These results demonstrate the usefulness of dynamic data integration techniques, and enable a hypothesis generation platform for major human disease systems. In this integrated database we provide access to heterogeneous information that are of value to researchers in epidemiology, virology and vaccine development.
We have chosen evolutionary distance, as specified by phylogenetic trees, to rationalize the information integration scheme of this database. The assumption here is that virus properties, such as virulence, infectivity, host-specificity, the ability to jump host species, geographic locations, morbidity in an epidemic, or host-specific reactions are related by evolutionary lineages. RNA and protein sequence data, and phylogenetic trees constructed over these sequences, form the core of this database.