e-den genetics

Each Bug carries with it a genome, or genetic text, that consists of a long string of digits (example). The genome is consulted while the Bug is developing within its spore: the sequence of digits determines the organism's structure and internal parameters. Once the Bug has hatched, the genome has no further influence over the Bug's structure or behaviour but copies of the genome are provided to its offspring. The way the genetic text is copied and transmitted is not dependent on its semantic content.

The Genome during Development

Immediately prior to hatching, a spore consults its genetic text and uses the information to specify all of its internal parameters, including:

its skeletal structure

its neural connections

the behavioural parameters of its individual neurons

the metabolic threshold required for replication

the amount of metabolic fuel provided to its offspring

its mutation parameters.

If the genome specifies only some of the organism's parameters, the remaining parameters are, according to the user's preference, set to defaults or to random values. This does not apply to neural connections, which default to the unconnected state.

The Genome viewed as a Program

Rather than setting the values in a predefined template, the digital genome functions much like a small program. Imagine a piece of computer code that looked as follows:

CreateNeuron()

{

number = 121 ;

SetThreshold(45) ;

SetFlux(+3) ;

CreateAxon(number 1, type EXCITATORY, effect 99) ;

AttachAxon(recipient 111) ;

}

Copymode = COPYAXONSANDDENDRITES ;

SetSourceNeuron(121) ;

for(target_neuron = 150) to (target_neuron = 221)

... NeuroCopy(source_neuron,target_neuron) ;

Copymode = COPYSOMA ; //flux, threshold, drift etc...

for(target_neuron = 207) to (target_neuron = 357)

... NeuroCopy(source_neuron,target_neuron) ;

Although such a piece of code might create a viable organism, a mutation of the written text (or the compiled code) would probably produce a syntax error in most cases.

e-den allows instruction sequences just like the one illustrated and even statements like:

For the first segment in each stack of 100 neurons, take the 3rd neuron and attach it to the 15th with an axon of type 1 and an effect of type 45...

To make such instructions meaningful in the face of random mutations, e-den simply replaces all function names, keywords and punctuation marks with digits. A "7" digit that specifies a flux of 7 in one generation might be part of a function call in the next. Thus, all possible strings of digits are allowed syntactically.

As I wrote in a recent e-mail, this syntax is infinitely extendable so ultimately it will be possible to write the equivalent of "pavlov(235,324,456)" and three neurons will be created such that when the first (235) fires and then the second (324) fires soon after, the third (456) will lower the threshold or increase the sensitivity for the first neuron. If the first neuron is a motor neuron and the second neuron detects some favourable state (such as a full energy tank), this will mean that a simple reinforcing relationship has been established. I envisage a hundred or so definitions of such relationships, involving groups of 2 to 5 neurons each. To envoke a function like "pavlov", it would only be necessary to write something like .....7 (= start gene) 1 (= set gene type to neurorelationship) 01 (= pavlov) 235324456 (= the three neurons) 9 (= end gene)... I will require a lot of feedback from users to know what relationships are thought desirable. As you can see, the sky is the limit.

A general understanding of the semantic logic of Bug genomes can be obtained by experimenting in the Bug Lab window, which provides step-by-step coaching on the design of new organisms, or by using the decode.exe utility, which inserts explanatory labels into a genome text file. During the organism's embryological development, most digits or groups of digits are used either to set the value of a particular parameter or to identify which parameter is about to be set. Two variables within the embryological software, one major and one minor, determine the context (or "gene mode") within which the subsequent digit is interpreted. There are ten major gene modes and, for each of these, one to twenty minor gene modes (or "gene sub-modes"). The major gene modes are as follows:

0 Between Genes

1 Skeletal

2 Visual neurons

3 Metabolic

4 Current Neuron

5 Multiconnect

6 Alphabetical (Species identifier)

7 General neurons

8 Neurocopy

9 New Gene

Commencing with the default gene mode, 'Between Genes', and the default gene sub-mode 'On Standby', each digit is examined in turn. The first occurrence of a '9' sets the gene mode to 'New Gene' (other digits prior to the first '9' are ignored). The subsequent digit sets the mode and the one after sets the sub-mode. Thereafter, the syntax and semantic logic within each sub-mode is different. For some sub-modes, an instruction is generated merely by choosing the sub-mode and the sub-mode then immediately reverts to 'On Standby'. In most cases, however, the gene mode and sub-mode determine which parameter is about to be set and the subsequent digits are used to actually set the parameter. For these parameters, the sub-mode reverts to 'On Standby' only when the necessary number of digits have been read from the genome (some parameters require a single digit; others, such as axonal connections, require many digits). At this stage, a '9' can end the gene or another digit can specify, within the same mode, a new sub-mode (and hence begin setting a new parameter). A sequence of digits within one of the gene modes 1 to 8 is considered to be a single gene; such a sequence always starts and ends with a '9'. Between these genes, the digits have no meaning and are not interpreted. These sections are analagous to the 'junk DNA' found between genes in carbon-based Life. (See example).

The full range of sub-modes, their meaning and the details of the embryological process are specified in Appendix A [**not yet written**]. Some combinations of mode and sub-mode are not currently allocated but are likely to be given new meanings and new syntactical rules in software extensions. These should be avoided when designing organisms in the Bug Laboratory.

The set of default values supplied when the genome fails to set all the necessary parameters is described in Appendix B [**not yet written**]. If the user has elected to use the default parameters, then any parameter that is not assigned a value during the embryological process retains its default value. These values may be changed in later software versions and so should not be relied upon when designing organisms; if the current default value for some parameter is truly appropriate for an organism, then it is better to write a gene explicitly adopting that value. (The Bug Lab offers the option of adding a pre-written genetic sequence encoding the current defaults, to make this easier).

Although the defaults are useful when designing new organisms or when wanting to work with short genomes, long evolutionary sessions should usually be run with these defaults turned off because this preserves evolutionary pressure on all organisms to encode their preferred parameters.

Likely Future Extensions

The current version of e-den uses nearly all combinations of gene mode and submode but mechanisms will need to be found for specifying new parameters. Most importantly, the current syntax assumes that all neurons can be identified by a 3-digit number and a way of overriding this will have to be established. One way to open up a new range of mode and submode combinations without rendering old organisms obsolete would be to operate within what is currently the 'junk' genome between genes. For instance, instead of commencing every new gene with a '9', a whole new syntax could apply to genes starting with some other number. Ultimately, all odd numbers could be used to start genes leaving only even numbers to serve as gene-spacers. In anticipation of this, junk sections of engineered organisms should avoid odd numbers. The decode.exe utility automatically identifies junk and will eventually offer an option to strip the junk of all odd numbers.

The Genome as Passive Text

A Bug genome is entirely composed of the digits 0-9, which can be arranged in any order and in continuous strings of any length. Thus, every natural number from 0 to infinity represents a legitimate genome although, for practical reasons, most Bugs have genomes of a few thousand digits or less. Although genome files stored on computer disks may contain non-numerical characters, these are stripped (along with all leading zeroes) when the genome is assigned to an organism. A Bug's genome remains unaltered throughout its life but, during the process of replication, the copy that is passed to a Bug's offspring may develop mutations. The most common mutations are single digit mutations, which may be insertions, substitutions or deletions. (An example of an insertion is "911912913"=>"9171912913", a substitution is "911912913"=>"913912913" and a deletion is "911912913"=>"91912913"). Mutations can also involve the repetition or deletion of long sections of genome (up to 100 digits), which are known as "backwards jumps" when sections already copied are recopied and "forwards jumps" when the copying mechanism skips over a section. The probability of any of these mutations occurring is partly dependent upon the parents' mutation parameters and partly dependent upon the background mutation rate set by the user. The background rate represents the average number of mutation opportunities arising per thousand digits copied and can range from 0 to 99 (with the zero rate only recommended for genetic engineering sessions). The individual mutation parameters of each Bug are known as its insertion frequency, substitution frequency, deletion frequency and jump frequency. Ranging from 0 to 100, these parameters represent the chance, per thousand, that each mutation opportunity will actually lead to a mutation of the specified type; they are added to the default value of 1 (to prevent any organism from having a flawless replication process). Thus, the probability of an insertion mutation occurring during the copying of any one digit can be as low as zero, if the background mutation rate is set to zero, or as high as 100/1000 x 100/1000, which equals 1%. The lowest non-zero probability of an insertion mutation is 1/1000 x 1/1000, or 0.0001%.

During sexual reproduction, a new genome is produced which is partly copied from one parent and partly copied from the other. Although the copying process advances along both parental genomes at the same rate, only one genome is actively read and copied at any one moment. Initially, each of the two genomes has a 50% chance of being the actively copied genome but, once chosen, the active genome remains the favoured genome until a crossover event occurs, and copying swaps to the other genome. The probability that a crossover will occur at any one moment is determined by a replication parameter (the crossover frequency) that is derived from both parents. As with the other mutation parameters, the crossover frequency may range from 1 to 100per thousand.

In order for sexual reproduction to begin, the first three digits of each genome must match and the species identifier (name) of each parent must be identical. At each crossover, an attempt is made to realign the two parental genomes by comparing the next four digits of each and crossing in a manner that preserves in the offspring any genetic sequence common to both parents. For example, instead of crossing as follows, which would lead to deletion of the digit '2' because of malalignment -

1st parent 0123456789012345678901...

offspring 012345678901345678901...

2nd parent 123456789012345678901...

- the crossover process examines the crossover region and realigns the parental genomes as shown.

1st parent 0123456789012345678901...

offspring 0123456789012345678901...

2nd parent 12345678901-2345678901...

Without this corrective process, a single insertion or deletion in either parent could disrupt important genetic sequences in the offspring at every crossover point downstream from the initial mutation. If the parental genomes cannot be perfectly realigned locally (with respect to the four digits downstream from the crossover point), the crossover is made without any attempt at realignment. A corollary of this is that, once alignment between the parental genomes has been lost (because of a two- or three-digit deletion in one of the parents, say) then it is unlikely to be regained. If the two parents are only distantly related, it is likely that each crossover will completely disrupt the genes at the point of crossover. This is less likely to be deleterious if it occurs "between genes"; that is, at points in the genetic text that are lacking in semantic content. Thus, it is expected that sexually reproducing species are likely to accumulate a greater proportion of genetic text "between genes".

(Within the software of e-den, the process does not actually occur exactly as described above, because the first parent has often left the scene when the second parent re-fertilises an existing spore. Thus, in place of the first parent's genome, the offspring's copy is used. The offspring's copy and the second parent's genome are used to create a new, merged genome that is then reassigned to the offspring. The result is the same, however, as the process described.)

In sections of genome derived from either parent, the usual processes of insertion, substitution and deletion can be seen to have occurred. Jump mutations only occur when copying from the first parent, however. The mutation parameters applying to all of these processes are those of the relevant parent, depending on which genome is actively being copied. The crossover frequency is the average of the two parental crossover frequencies.


Contents

e-den introduction
e-den overview
e-den physics
e-den biology
e-den neurology
e-den genetics
e-den across cyberspace
e-den user interface
e-den FAQ
e-den download

1