Maria G. Samsonova1 and Alexander V. Spirov1,2
1Institute of High-Performance Computing and Data Bases
2The Sechenov Institute of Evolutionary Physiology and Biochemistry
St Petersburg, Russia
Study of embryogenesis is of crucial importance for understanding of molecular mechanisms of health and disease, as well as for development of efficient therapies for myriad disorders ranging from hyperplasia to degenerative diseases. Rapid progress in this domain of science leads to accumulation of large amount of information, which analysis is impossible now without using electronic data storage, retrieval and display tools.
It has been a progress in developmental biology which has led to understanding that for to reveal the mechanisms of cell functioning it is necessary to consider the ensembles of interacting genes (Schuler et al., 1996; Lander, 1996; Hunter, 1997). The central role in these ensembles, named as genetic networks (Kauffman, 1971; Somogyi and Sniegorski, 1996; Harris et al., 1997) play genes encoding transcription factors, which activate or repress other network genes. The products of these genes in turn act on other target genes. Structural genes are the ultimate elements in genetic network. Thus the genetic network can be considered as a complex web genes turning each on and off (Hunt and Krumlauf, 1991; Jackle et al., 1992).
During embryogenesis the processing of information in loosely interconnected genetic networks, which function in different embryo parts and at definite time defines genome activity and leads finally to organism development. It is obvious that deciphering of mechanisms of embryogenesis is impossible without elucidation of principles underlaying genetic networks functioning (Hunt and Krumlauf, 1991; Jackle et al., 1992; Somogyi et al., 1996; McAdams and Arkin, 1997).
We developed the GeNet database for description of genetic networks controlling embryogenesis. GeNet is intended as the database of a new type oriented on representation of results of analysis of genetic networks structure, function and evolution (Spirov and Samsonova, 1997).
GeNet is a hypertext database. It runs on CONVEX C120 under CONVEX OS at the Institute for High Performance Computing and Databases. The programs which permit to search a database, as well as interface for querying the database and generating a report are written in ANSI C. The applet Genes\_Graph for graphical presentation of genetic network maps is written in Java (JDK 1.02 and java.awt). The program consists of several Java bytecode classes, in particular java.applet.Applet and java.awt.panel. For presentation of genes and genes interactions develop two new classes Nodes and Edges are developed. The applet can be accessed through any WWW-browser, however optimal results are obtained if Netscape Navigator 3.0 (or higher version) is used.
The concept of genetic networks forms a basis for information structuring in GeNet database. In this database each of thus far characterised genes is considered as a ''node'' of genetic network, while the links between nodes accounts for interactions of genes or their products. The structure of GeNet database is shown schematically in Figure 1. GeNet is subdivided in sections, which hold information on genetic networks in different organisms: Caenorhabditis elegans, sea urchin, Drosophila and vertebrates.
Each section contains 5 types of data: genetic networks maps, gene entries, gene sequence entries, regulatory regions entries and bibliography.
The maps of genetic networks are shown as flow diagrams (Figure 2), as well as Java applets, (e.g., http://www.csa.ru/Inst/gorb\_dep/inbios/genet/Graph /genes.html) Both methods of presentation depict genes as rectangles and their interactions as arrows. However in the case of Java applets these arrows differ in shape and colour depending on interaction type. Red arrows connect gene with upstream genes of network, blue arrows - with downstream genes. Filled and hollow arrows reflect the mode of gene action - activation and repression correspondingly.
The applet (Figure 3) subdivides the browser window into two panels: GraphPanel and CtrlPanel. The former displays the information on genetic network, while the later contains the control elements. These elements are: choices ''Up'' and ''Down'', which enable to get the information on genes interacting with given gene; checkboxes, which permit to display or to hide any combinations of groups of genes; Buttons ''All'' (shows all genes and switches on all checkboxes), ''Path Only'' (leaves in the window solely previously marked genes) and ''Help''.
Each gene entry contains information on genes function, subcellular location, encoded protein, expression pattern, regulatory interactions (upstream and downstream genes) as well as links to other databases. In the GeNet subdivision SegNet the expression pattern field incorporates the images of expression pattern of segmentation genes (Figure 4) in fruit fly D.melanogaster (e.g., http://www.csa.ru/Inst/gorb\_dep/ inbios/genet/krueppel.html).
The work on incorporation of quantitative data on segmentation genes expression in GeNet proceeds in collaboration with Dr. J.Reinitz (Brookdale Center of Mol. Biol., Mt.Sinai School of Medicine), whose group obtains gene expression data in experiments (Reinitz J, Sharp, 1995; Kosman et al., in press). The aim of the collaboration is to design the quantitative atlas of D.melanogaster segmentation genes expression, containing both 3D and numerical data.
The regulatory element entry in GeNet contains data on organism source, bibliography, element sequence and coordinates of sites for transcription factors binding, as well as key words and definition. Genes interaction entries hold information on mechanism of genes interaction and method by which interaction is proven.
The search engine Finder treats GeNet as a array of key words allocated to database sections, definite types of entries relevant to sections definite entry fields. The search engine can process simultaneously up to 30 key words with consideration for intersection and negation operations applied to matched documents.
The search engine consists of three files describing the database structure and two programs IndexCreate and FindsLinks. IndexCreate performs the database indexing. FindsLinks runs the search of matching documents, executes operations of joining and intersection for key words apiece and generates a report as html file.
Three entry points (Figure 5) are provided allowing the user to browse the database, to search the database and to work with genetic network maps. While browsing GeNet the user sequentially moves from the page containing the list of database sections to genetic networks maps, controlling development in organism under consideration. Each gene of genetic network map is linked to gene entry, which in turn holds hypertext links to data on gene sequence, regulatory regions, gene interactions and bibliography. Thus by clicking on gene name in genetic network map the user gets detailed information about gene and mechanisms of its regulation.
In addition to browsing the database the interface is available for composing arbitrary searches (Figure 6). The user first selects the database section, entries type and field. After specifying the key words and submitting query the user retrieves (gets) a html file hyperlinked to entries in GeNet matching search criteria (Figure 7).
Another entry point into GeNet enables user to work with genetic network maps. Genetic network maps depicted as diagrams enable to find out which genes regulate particular gene as well as on which genes activity in turn this gene influences. As each gene on the diagram is hyperlinked to its gene entry, the user can retrieve all the information about gene of interest following the hypertext links in the database.
The genetic networks controlling development consist of tens of genes which are involved in complex regulatory interactions. These genes function at different time intervals and in different parts of embryo. It is evident that diagrams of large genetic networks will be too complex for comprehension. Moreover it is difficult to reflect in the diagram the time-space pattern of genes expression. The realization of these difficulties cause us to develop a tools for presentation of genetic network maps as Java applets.
By selecting a gene on such a map the user can bring out its interactions with other genes (Figure 8). Besides that selection of gene activates the appearance of lists of upstream and downstream genes in choices ''Up'' and ''Down''. Since these genes are hyperlinked to genes entries, their selection from the choice menu calls out the information about them.
It is possible drag a selected gene on a new place for better visualization of genes interactions. In addition by clicking on gene while pressing Ctrl key the user quickly gets the information about gene by opening gene entry in new browser window. Two procedures permit to visualize the interactions of genes related by function (eg. acting at definite time intervals or controlling development of definite embryo parts) or structure.
The first one consists in usage of checkboxes (Figure 9) which enable the user to display or to hide any combinations of groups of genes. The second method involves sequential clicking on genes of interest while pressing Shift key followed by pressing ''Path Only'' button on the CtrlPanel (Figure 10). As the result only selected genes remain on the screen.
Current GeNet version holds approximately 800 files in *.html and *.gif format and occupies ca. 10 Mb of disk space. It contains the information on 200 genes and 60 regulatory elements. GeNet holds 10 genetic networks maps and 30 images of genes expression patterns.
The large-scale projects on human and several model organisms DNA sequencing lead to rapid growth of biological information ( Schuler et al., 1996; Lander, 1996; Nowak, 1995). A great many of databases are designed for storage, processing and retrieval of molecular biology data. These databases in much the same way as formulae in physics generalize and formalize knowledge and therefore are of primary importance for biologists. At present neither analysis of results, nor planning of experiments are impossible without consulting these databases.
Most molecular biology databases today can be classified into two categories. To the first category belong databases containing essentially structural information (eg., Swissprot and Genbank), while the second category comprise genome databases of model organisms and human. As now the emphasis of biomedical research shifts for identifying genes to characterisation of their function, the design of databases containing functional information becomes crucial.
Efforts are underway to design the databases containing information on signalling pathways (SPAD database http://www.grt.kyushu-u.ac.jp/eny-doc/spad.html and Cell Signalling Networks Database http://geo.nihs.go.jp/csndb.htm); GXD database on expression of genes controlling embryogenesis in mouse (http://www.informatics.jax.org/gxd.html); as well as The Interactive Fly database containing the information on mechanisms of development in Drosophila (http://sdb.bio.purdue.edu/fly/aimain/1aahome.htm).
In a manner similar to the Interactive Fly and GXD databases GeNet contains functional information on mechanisms of genes action in embryogenesis. However the distinctive feature of GeNet is a model for information presentation, which is based on a concept of genetic networks and comparative evolutionary approach. Such database structure enables end users to retrieve information on functional organization and evolutionary conservation of the whole ensemble of interacting genes.
At present GeNet contains about 30 digital images of expression patterns of genes controlling segmentation in Drosophila. This images are unique material, which makes possible the evaluation of genes expression at the level of discrete nuclei. This images are obtained and courteously placed at our disposal by Dr. J.Reititz from Brookdale Center for Molecular Biology, Maunt Sinai School of Medicine (USA). At present we are working on incorporation in GeNet the numerical data on segmentation genes expression, obtained by Dr. Reinitz group (See Kosman et al., in press).
To our knowledge today GeNet is the only database which contains the genetic networks maps. Our presentation of genetic network maps as Java applets enables the visualization of large genetic networks an makes it possible to reflect on the diagram the time-space pattern of genes expression. The information in GeNet can be used for derivation of information on genetic networks functioning; quantitative data on expression of segmentation genes in Drosophila; digital images of Drosophila segmentation genes expression patterns.
In future we plan to broaden the content of GeNet by adding the information on genetic networks controlling later embryogenesis stages, as well as morphogenesis of different organs (eye, limb, neural system, etc). Besides that we will further incorporate in GeNet graphical and numerical information on expression patterns of genes controlling morphogenesis in Drosophila. The new version of GeNet will allow the Internet end users to access and process this information on-line.
This work was supported in part by Russian Foundation for Basic Researches (Grant No 96-04-49350).
Harris S.E., Sawhill B.K., Wuensche A. and Kauffmn S., Regulation rules suggest genome behavior is near edge of chaos, Santa Fe Institute, Preprint, (1997).
Hunt P. and Krumlauf R., Deciphering of the Hox code: Clues to patterning branchial regions of the head, Cell, 66:1075-1078, (1991).
Hunter T., Oncoprotein networks, Cell, 88:333, (1997).
Jackle H; Hoch M; Pankratz MJ; Gerwin N; Sauer F; Bronner G., Transcriptional control by Drosophila gap genes. J Cell Sci Suppl 16: 39-51, (1992).
Kauffman S.A., Gene regulation networks: a theory for their global structure and behavior, Current Topics in Dev.Biol., 6:145, (1971).
Kosman D., Reinitz J. And Sharp D.H., Automated assay of gene expression at cellular resolution, In: Proc.of PSB'97, In press.
Lander E.S., The new genomics: Global view of biology, Science, 274:536, (1996).
McAdams H.H. and Arkin A., Stochastic mechanisms in gene expression, Proc.Natl.Acad.Sci. USA, 94:814, (1997).
Nowak R., Entering in postgenome era, Science, 270: 368-369, (1995).
Reinitz J, Sharp DH, Mechanism of eve stripe formation. Mech Dev 49:133-158 (1995).
Schuler G.D, Boguski M.S., Stewart L.D. et al., A gene map of the human genome, Science, 274:540, (1996).
Somogyi, R. and Sniegorski, A., 1996, Modeling the complexity of genetic networks: understanding multigenic and pleiotropic regulation, Complexity 1:45.
Somogyi, R., Fuhrman S., Askenazi M. and Wuensche A., The gene expression matrix: towards the extraction of genetic network architectures, In: Proc.of the Second World Congress of Nonlinear Analysis, 1996, ElsvierScience.
Spirov A.V. and Samsonova M.G. The GeNet database as a tool for the analysis of regulatory gene networks, In: Proc. of Int. Workshop on Information Processing in Cells and Tissues, Sheffield UK, 1-4 Sept. 1997, P.56-63.