Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Open AccessArticles

Towards a virtual fly brain

J. Douglas Armstrong

J. Douglas Armstrong

Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh10 Crichton Street, Edinburgh EH8 9AB, UK

[email protected]

Google Scholar

Find this author on PubMed

Jano I. van Hemert

Jano I. van Hemert

National e-Science Centre, School of Informatics, University of Edinburgh15 South College Street, Edinburgh EH8 9AA, UK

Google Scholar

Find this author on PubMed


    Models of the brain that simulate sensory input, behavioural output and information processing in a biologically plausible manner pose significant challenges to both computer science and biology. Here we investigated strategies that could be used to create a model of the insect brain, specifically that of Drosophila melanogaster that is very widely used in laboratory research. The scale of the problem is an order of magnitude above the most complex of the current simulation projects, and it is further constrained by the relative sparsity of available electrophysiological recordings from the fly nervous system. However, fly brain research at the anatomical and behavioural levels offers some interesting opportunities that could be exploited to create a functional simulation. We propose to exploit these strengths of Drosophila central nervous system research to focus on a functional model that maps biologically plausible network architecture onto phenotypic data from neuronal inhibition and stimulation studies, leaving aside biophysical modelling of individual neuronal activity for future models until more data are available.

    1. Introduction

    The behaviour of small fruitfly Drosophila melanogaster was first described in the academic literature just more than 100 years ago (reviewed in Sturtevant 1915), and since that period it has ‘evolved’ into one of the most powerful genetic model organisms. Molecular genetic study in Drosophila has uncovered many of the biological principles underpinning developmental processes, genetic regulation and cell signalling. These processes are conserved across animal species and ultimately in humans also are related to inherited diseases (Chien et al. 2002). Yet, to many, the conceptual jump from, say, the development of a limb in an insect compared with limb development in a developing human is much less than the jump between the nervous systems. This is understandable given the six orders of magnitude difference in brain size and complexity between the two species.

    The adult Drosophila nervous system (Power 1943) comprises in the region of 100 000 neurons. Most of the central nervous system (CNS) neurons reside within the head, but there are some in the thoracic cavity, mostly associated with locomotor and reflex functions. As with other insects, the neurons are not myelinated (the electrical insulation used to improve transmission rates). Owing to the very primitive cardiovascular system, the cell bodies of the neurons all live in an outer cortex region where they are bathed in the insect haemolymph—a nutrient bath. Oxygen diffuses from the tracheole system rather than being actively carried by blood cells.

    Despite this relative simplicity in terms of brain size and structure, many of the genes and proteins expressed in the mammalian brain are conserved and can be identified in the fly genome (Venter et al. 2001; Emes et al. 2008). Moreover, at a functional level, the Drosophila brain displays plasticity. In other words, the brain of this simple organism is not ‘hard-wired’, but rather it is adapted or remodelled in response to changes in its developmental stage, experience and environmental context. Classic experiments in this area demonstrated that sensory deprivation has measurable effects on the volume of subdomains in the adult brain (Heisenberg et al. 1995).

    Many of the most famous contributions made by the study of Drosophila are in developmental processes, particularly at the molecular genetic level (e.g. Nüsslein-Volhard & Wieschaus 1980). There is also a long and important contribution in the field of neurogenetics to our understanding of the relationship between genes–brain–behaviour. Although many research laboratories have contributed to this, the late Seymour Benzer's research group at Caltech pioneered a large proportion of these studies—Weiner (1999) provides a historical perspective. By tradition, genes in Drosophila are named by the group who demonstrate function, usually through the use of a mutated form where the gene is named after the mutant phenotype. For example, the learning genes dunce and rutabaga in the fly and their mutant forms have been used to dissect molecular mechanisms that have been highly conserved both in terms of gene/protein sequence and in behavioural function across all animal species (Davis 1996; Dubnau & Tully 1998). The human equivalent, PDE4B, of the fly dunce gene is the site of action of the antidepressant rolipram and has been linked to both depression and more recently schizophrenia (Miller et al. 2005).

    However, flies do not get schizophrenia, they do not play football or exhibit any number of complex behaviours observed in ‘higher’ animals. At a core molecular level, they share a basic principle, yet there is clear evidence for evolutionary divergence since the last common ancestor, over 600 Myr ago. This has recently been dissected in more detail using state-of-the-art proteomic techniques where the protein complexes at the synapse in mammal brains (mouse) have been directly compared with those in the fly brain (Emes et al. 2008). Synapses, which form the connections between neurons, are the fundamental units of computation in the brain. They not only transmit information between cells but also detect patterns of neural activity and process this information by activating intracellular biochemical signalling pathways, which subsequently changes the properties of the neuron. When we compare the molecular complexity found in the synapse, we see that the vertebrate lineage has recruited additional complexity since its ancestral divergence from the insect lineage. While this has also occurred along the insect line, the degree of expansion is less, resulting in a simpler complex. Thus, we see a common set of core mechanisms shared across all species with a nervous system with the vertebrates having developed both larger brains and additional molecular complexity.

    2. Simulating the fly brain

    Here we propose a virtual fly brain. If implemented, this would capture our biological and informatics domain understanding of natural computation in the brain. Building a biologically plausible model of a real brain is not a light undertaking and will take many years. At this point, it is worth asking: why would we consider such a challenge?

    The human brain with its 1012 neurons and 1015 synapses is the most complex organ known to biology. The neuroscience community requires a good exemplar model brain to understand how it works. The small size, availability of advanced genetic tools and measurable behaviours make the fly brain particularly amenable to research and ultimately modelling. With some 2000 of the 3000 genes linked to human inherited diseases conserved between a fly and human being (Chien et al. 2002), this simple animal is clearly of medical relevance. Given the ethical and economic pressure to reduce the amount of mammal-based research in the pharmaceutical industry and in academic research, we may see further growth in the development of insect-based models for human disease where these are appropriate (Bilen & Bonini 2005).

    The fly nervous system combines multiple sensory modalities and integrates them into a highly adaptive autonomous system that can clearly illuminate new strategies for computation. Existing insect-inspired algorithmic approaches such as ant colony optimization (Dorigo & DiCaro 1999) are already successful for certain tasks and yet these are only loosely inspired by biological strategies for computation. Our current understanding of neural computation does not support the range of behaviours and adaptive capability that insects (or potentially any biological brain) can achieve with their limited computational resources. Simply put, we have a lot to learn from biological computation and the fly brain is by far the most tractable model.

    3. State of the ‘art’

    The ‘connectome’ concept that has been proposed and described by various groups (including Sporns et al. 2005; Lichtman et al. 2008) aims to catalogue and collate connection information for the human brain. These data will in turn be useful to underpin models directly based on human brain micro-architecture. However, the best-known large-scale project for brain modelling is the collaboration between EPFL and IBM known as the Blue Brain project (Markram 2006). The programme objective is to develop a model of a rat cortical column comprising some 10 000 neurons. With this objective in mind, the group aims to enable scientific discovery, develop methods to handle the complexity at such a scale and to provide a framework for data integration. The models themselves are highly constrained by the underlying biology. At an anatomical level, advanced imaging techniques are employed to reconstruct an average cortical column connectivity matrix from histological slices. At a physiological level, multi-electrode recordings are taken from real column preparations. Gene expression profiles are used to generate raw data to support the properties of ion channels in neuron populations that help define the parameters for the models. Since the literature on ion channel properties is not complete, the group also expresses ion channels in Chinese hamster ovary cells to sample their parameters directly. These data are then integrated into the model, which themselves are then used to generate new hypotheses about the neuronal network and thus the loop from model to biological validation is closed.

    The scale of the Blue Brain project is truly pioneering and yet it is approximately one-tenth of the complexity of a fly brain in raw neuron numbers. Drawing on the experience of this research leads to some suggestions for planning a virtual fly brain. We need to be very explicit about which parameters need to be defined in the model and where we are going to source these (e.g. literature versus de novo matched studies). A robust theoretical effort should be constrained by biological observation wherever possible. To achieve these, the modelling and experimental teams need to be as tightly integrated as possible.

    The Blue Brain model (Markram 2006) is focused on a specific level of the biological organization and one for which there is a strong body of smaller scale modelling studies and in vivo physiological recordings to build upon. This level may not be the most appropriate one for the first fly brain simulations. In Drosophila, studies reporting physiological recordings from neurons are scarce for technical and historical reasons—both the neurons and the animal are small and thus electrophysiology has generally been performed on larger insects. However, understanding and data from other levels of brain organization, for example, functional (behavioural), anatomical and genetic levels, are more advanced than that in many other species. Therefore, it may be more appropriate to model these first and bring physiology in later.

    4. Combining existing work

    Constructing biologically plausible models is difficult. For example, systems biology has existed for many years, yet it could not have risen to its recent prominence (Trewavas 2006) without very significant breakthroughs in biotechnology that now allow the measurement and monitoring of gene and protein expression/activity in a high-throughput manner and in the context of a sequenced (even if not fully understood) genome. The very nature of nerve cells and the networks they form make obtaining the parameters required to inform brain models a particular challenge. The properties of ion channels and their responses to ligands have to be measured using physiological recordings (e.g. patch clamps) that are data rich and not very amenable to rapid, high-throughput technologies. Through exploiting data sharing and integration methods from eScience, there are now community efforts to maximize the value in such data (Fletcher et al. 2008).

    It may perhaps be more useful to consider what resources do exist first, rather than follow the established route to such a model. First and foremost, the Drosophila community has a long tradition of open sharing of tools and resources (at least after publication), and it has a well-established community database (; Wilson et al. 2008) indexing such resources and curating published (and directly submitted) data. Having roughly half the number of genes of mammals means that the fly still has approximately 11 500 genes to be indexed, characterized and mutated. Indexed in FlyBase are some 20 000 alleles (both natural and induced mutations), stocks of a further 20 000 or so, cDNA clones and a large variety of tools too numerous to go into here. In summary, there is a genetic toolkit where many genes have already been mutated or could be mutated very rapidly.

    Whole genome RNAi libraries are now available and these allow single genes to be inhibited in a tissue of choice (Flockhart et al. 2006; Dietzl et al. 2007). Through technologies such as these, the role of identified genes in specific parts of the brain may be assessed. By a slightly cruder, but no less useful, means, expression of dominant mutations can be used to disrupt cellular activity sometimes in a conditional manner (Waddell et al. 2000), such that circuits within the brain may be suppressed for a period of time. Such disruptive methods are extremely useful tools to implicate a structure as having an essential role within a functional process and/or within a temporal window, but they do not imply any kind of sufficiency for a functional process. The latest generation of technologies for Drosophila behaviour addresses this. By expressing in Drosophila light-sensitive channels, specific groups of neurons can be depolarized without the need to implant an electrode. Thus, the effect of neuronal firing can be assessed against a direct behavioural outcome in the intact animal (Lima & Meisenbock 2005). This is an extremely powerful method for testing and refining any model for neuronal function, assuming one knows which neurons have been manipulated.

    Mapping structure to function in the brain requires an anatomical map of the former and a conceptual map of the latter. As mentioned above, the fly nervous system comprises somewhere in the region of 100 000 neurons, but actually counting neurons accurately is an unsolved problem. These neurons are compacted into a dense neural tissue, resulting in a small volume relative to neuron number. The small brain can be scanned at a high resolution (down to approx. 300 nm) with a confocal microscope without dissecting it apart or sectioning—both of which make subsequent stitching it all back together very difficult. At this resolution, one can visualize every neuron and resolve single processes so long as there is adequate separation (e.g. Yang et al. 1995). However, the dense tracks of neuronal processes merge together and individual synapses between neurons cannot be reliably identified without adding stochastic markers (Lee & Luo 1999). Within the brain, a combination of anatomical features such as tracts, some glial boundaries and molecular markers are used to subdivide the brain into regions. Some of these are very highly defined (e.g. the mushroom bodies), and can be observed in many species across the phyla and have a wealth of molecular and behavioural studies to support their functional role (Heisenberg 2003). Other regions are much more diffuse in nature and therefore it is more difficult to define their absolute boundaries. Many of these have only recently been named and their function is mostly unknown. A ‘nomenclature working group’ has recently been established to review the existing names of the neuropil regions, agree on their boundaries and propose new names for some of the less well-understood regions. This group is due to report its findings in the form of a hierarchical nomenclature that can rapidly be incorporated into the existing anatomical ontology in 2009.

    Alignment, registration and segmentation methods can be and have been applied to the fly brain. Initial attempts involved manual registration to a template brain (Rein et al. 2002), but more recent methods use a combination of automatic registration techniques, often supervised or primed by minimal expert effort. Combining such template brains, defined regions and neuronal tracing techniques, it is now possible to build connectivity maps of the brain in a systematic, statistical fashion. The limit of resolution imposed by the confocal microscope and the whole brain does restrict this to a region–region connection rather than evidence linking specific neurons together.

    To link two neurons requires resolution of the synaptic connection between two marked neurons. This can only be done at present by electron microscopy and this has been achieved in small regions of the brain, and the medulla (part of the visual system) can be used to measure the scale of the problem (Takemura et al. 2008). To scale such studies to the entire brain would result in approximately 26 TB of information covering 7 million μm3 at 4000× magnification. The volume of data itself is not the issue, but it is rather the highly skilled technical effort required to prepare the material, which would take approximately 7000 person/years at the rate achieved in published studies (I. A. Meinertzhagen 2008, personal communication). The more recently developed higher throughput methods such as block face microscopy (Denk & Horstmann 2004) and the automatic tape-collecting lathe ultramicrotome (Hayworth et al. 2006) offer new ways to minimize these problems. In summary, the resolution required to provide connectivity information is currently limiting, but new technologies might well address these issues in the coming 5–10 years.

    At the other end of mapping structure to function lies behaviour. While behaviour in Drosophila has been studied longer than its genetics, the technological advances in association with molecular genetics have massively accelerated our study of the animal. Technological advances for behavioural research have been more incremental with some automated assays in the past couple of decades and the more recent emergence of bespoke computer vision tracking solutions for behaviours. Until very recently, the community has largely focused on rapidly generating simple measures of behaviour which collapse the components into a single index value. These types of studies fit with mutant screens, and were the basis for many of the well-known discoveries in this area. To build a more realistic model, we will need to start looking more carefully at the component behaviours again and measure the more subtle effects that gene or anatomical manipulation have on these. Technological advances in automated experimentation, computer visions and machine learning will probably support the capture and classification of Drosophila behaviour in ever more detail, but this needs a formal framework, or ontology, onto which the behavioural components may be mapped.

    To delve into the mechanics of any structure–function map, we will want to understand how each neuron, and ultimately each synapse, is processing the information passing through it. Large insects with their relatively simple biology, and large neurons, have been used as a major experimental workbench for electrophysiologists for decades. Drosophila with its very compact nervous system is technically more difficult to work with and as a result neurophysiology in Drosophila is largely inferred from larger insects. This is being addressed by a renewed interest in this area and groups starting to look at recording more directly in the fly brain (e.g. Turner et al. 2008). Where Drosophila already has an edge over many other models are in physiological measurement systems that do not use electrode-based recordings but rather exploit the genetic toolkit of the fly. There is now a range of genetically targeted luminescent and fluorescent reporters that can be used to measure brain activity in ensembles of neurons (Rosay et al. 2001; Miesenböck 2004).

    5. Towards a virtual fly brain

    With its biological relevance and scale—parameters constrained by direct biological observation across 10 000 neurons in a cortical column—the Markram/IBM Blue Brain project would appear to be the obvious model to follow for a virtual fly brain. However, this does not play to the strengths of the organism and would require a massive shift in research emphasis within the community towards electrophysiology.

    At the scale we propose, we also need to include input and output from the CNS. Here, Drosophila is relatively strong with research groups investigating the major sensory modalities including olfactory, gustatory, auditory, visual, gravitactic, tactile and even, recently, magnetosensory ones. Many of the neurons involved have been mapped and in some cases the downstream signalling mechanics are beginning to be understood. In terms of output, behaviours in Drosophila have been studied for in excess of 100 years. While we are far from a comprehensive understanding of all complex behaviours in Drosophila, we can measure indicators for the major behaviours and their disorders including (to mention just a few) reflexive responses, locomotion (walking, climbing and flying), circadian rhythm, associative learning and memory, sleep, aggression, sexual preferences and addiction.

    Binary expression systems, already extremely popular in Drosophila neuroscience research, could be used to underpin the data required for a high-level model that incorporates sensory systems, internal neuronal architecture and its output to behaviour. For those unfamiliar with binary systems, we create a ‘driver’ strain by putting a transcription factor from another organism (e.g. GAL4) under the control of an endogenous or a cloned Drosophila gene regulatory sequence (reviewed by Elliot & Brand 2008). The newly added transcription factor can then be used to drive any sequence of choice in the neurons, where it is active through the use of simple genetic crosses (in flies this takes a few days).

    Through the use of conditional expression systems, we can visualize, knock out and activate identified neurons (e.g. figure 1). Visualization is normally achieved through the expression of a variety of general cellular markers (e.g. green fluorescent protein (GFP), lacZ) and increasingly through the use of markers anchored to molecules that are localized within the cell (e.g. DSCAM exons can be used to preferentially target expression in dendrites; Wang et al. 2004) and that thus indicate information flow within the neuron. Knockout can be achieved using expression of toxins that kill or inhibit the cell. A range of these are available, of which the most widely used at present is a temperature-sensitive allele of the shibire gene (Kitamoto 2001). This displaces the wild-type protein and results in neurons whose synaptic transmission is now sensitive to small changes in temperature and which is reversible within an experiment. Examples include experiments that inhibited specific neurons during learning, but then allowed them to resume normal activity during recall to test the role of different circuit elements in learning and memory (Waddell et al. 2000). Finally, recent methods have been used to test the involvement of neurons within circuit models. Lima & Meisenbock (2005) described a photosensitive activator that can be targeted to specific neurons. Putting these three technologies together allows a single driver strain to be used to first identify a set of neurons and their structures, map their polarity, next disrupt their function and assess impact (requirement) on sensory perception or behaviour and also to trigger their firing to confirm their role (sufficiency).

    To achieve this requires carefully mapped markers for neurons that can drive expression. A number of collections already exist (e.g. and these are being added through the use of new techniques that clone subregions of regulatory sequence and thus achieve greater specificity of expression downstream (Pfeiffer et al. 2008). The tools to rapidly map these expression patterns to identified brain regions are rapidly maturing. These are currently limited to a general region with accuracy in the micrometre range. However, as discussed above, technologies for synaptic-level maps of the brain are under development.

    To allow a virtual fly brain to be constructed and then used afterwards requires the composition of methodologies from a number of areas in the field of informatics. One obvious area is that of image processing and analysis to deal with the electronic output data from microscopes and other image-based techniques, such as optical tomography. The primary goal is to map images to standard reference models and to extract useful features from these data. These reference models and the data that refer to them need to be properly annotated to allow modelling. The area of data and knowledge engineering is concerned with building efficient systems and representations to support annotation and modelling, which often rely on and build on methods from the area of machine learning. Ontologies form another important contribution from this area, which are used to make linking data from different aspects possible and allow experts to formalize models, for instance in the form of functional and spatial atlases.

    The systems and methodologies from these two areas are dependent on the availability of powerful data and computational resources. The large quantities of data obtained through many laboratories performing experiments and contributing their results in combination with the computational requirements of image processing and machine learning methods lead to a surge in demand for computational resources. This demand is met by the area of eScience, which aims to enable scientific applications to make use of large-scale data and computational resources. These resources may be made available using one or more of several methods, such as grid computing, utility computing and high-performance computing. Furthermore, eScience provides the methodologies that facilitate collaboration of researchers working in a distributed context and on different aspects of the same problem.

    What might this first ‘virtual fly brain’ model look like? We envisage a series of models, the first of which is simply based on integrating the most readily available data types, namely behavioural role and connectivity and perhaps using a control theory model in the first instance. Such initial models could be used to predict functional consequences of targeted brain disruption and/or genetic defects and answer questions such as, ‘if I block or trigger this neuron, what behaviour am I most likely to observe in the animal?’, or, ‘protein X is expressed in these following neurons, what phenotype is most likely to be associated with a mutant?’. Although fairly simplistic, these sorts of hypotheses can be tested against the model and the loop between model and biology closed as results may be used to refine the model. Immediate next steps that extend the model would be performing combinatorial queries on information flow in the network, particularly where there are multiple potential paths through the network and then subsequently incorporating development processes that will link genes back to structure. The integration of dynamic information, in particular, physiological information once it becomes available, will then open up many other more advanced possible modelling and direct neuronal simulation approaches.

    Figure 1

    Figure 1 Brain mapping pipeline. (a) The expression pattern of a GAL4 driver strain (c61) viewed using a GFP reporter is shown. The brain is also registered onto a standard template where the boundaries between neuropil are pre-defined (e.g. Rein et al. 2002). (b) Neuronal tracing algorithms (courtesy of Mark Longair) are then used to segment out the connections. From these tracings and the registration data, a connectively map can then be inferred to form the basis of a structural model. Subsequent studies would then add functional significance to the circuit shown. Scale bar, 20 μm.


    J.D.A. and J.I.v.H. were supported by the EPSRC and the e-Science Institute ( Many of the ideas presented here were the result of discussions at the first Virtual Fly Brain workshop held in Edinburgh during summer 2008. We would like to thank the following participants: Malcolm Atkinson; Richard Baldock; Mikko Juusola; Ian Meinertzhagen; Eugene Myers; Charles Peck; Nick Strausfeld; David Sutherland; Barbara Webb; Jan Wessnitzer; and David Willshaw. Images for figure 1 are courtesy of M. Longair (2008, personal communication).


    One contribution of 15 to a Theme Issue ‘The virtual physiological human: tools and applications II’.

    This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.