Imprint New York : Springer, c Physical description xv, p. Series Interdisciplinary applied mathematics v. Online Available online. Full view. Science Library Li and Ma. E76 Unknown. More options. Find it at other libraries via WorldCat Limited preview. Contributor Terman, David H. David Hillel. Bibliography Includes bibliographical references p. Contents The Hodgkin-Huxley Equations. These equations and the methods that arose from this combination of modeling and - periments have since formed the basis for nearly every subsequent model for active cells.

Dynamical systems and computational methods are now being used to study activity patterns in a variety of neuronal systems. It is becoming increasingly recognized, by both experimentalists and theoreticians, that issues raised in neuroscience and the ma- ematical analysis of neuronal models provide unique interdisciplinary collaborative research and educational opportunities. This book is motivated by a perceived need for an overview of how dynamical systems and computational analysis have been used in understanding the types of models that come out of neuroscience.

Our hope is that this will help to stimulate an increasing number of collaborations between mathematicians and other th- reticians, looking for interesting and relevant problems in applied mathematics and dynamical systems, and neuroscientists, looking for new ways to think about the biological mechanisms underlying experimental data. The book arose out of several courses that the authors have taught.

One of these is a graduate course in computational neuroscience that has students from the d- ciplines of psychology, mathematics, computer science, physics, and neuroscience. In previous work we systematically investigated this issue by developing a mathematical model for the memory consumption of the most important data structures. Guided by this model, we presented a design that yielded substantial improvements in memory consumption on massive supercomputers Kunkel et al.

In a subsequent work we investigated theoretically Kunkel et al. Instantiated for a particular software and computer architecture, the memory model predicts the memory requirement of a planned simulation and hence the size of the required machine. In the present work, we assess the performance of the software and the accuracy of the memory model described in Kunkel et al. In Sec. The conceptual and algorithmic work described here is a module in our long-term collaborative project to provide the technology for neural systems simulations Gewaltig and Diesmann, The theoretical performance of the system is A compute node in the K system is mainly composed of a CPU, memory modules of 16 GB, and a chip for the interconnect of nodes.

Theoretical performance per chip is GFlops. Each core has two SIMD units that concurrently execute four double precision floating point multiply and add operations. A three-level parallel programming model is available on the K computer: 1 SIMD processing in the core, 2 thread programming in a compute node using OpenMP directives, 3 distributed-memory parallel programming with MPI. Its theoretical performance was 1 PFlops. Each core has a dual floating point unit.

The theoretical performance per chip is The technology to simulate networks of these model neurons has been documented in an ongoing series of publications Rotter and Diesmann, ; Morrison et al. For a large class of frequently used neuron models, the time evolution of the dynamic equations is essentially linear and can often be integrated exactly Rotter and Diesmann, This also holds for many abstracted forms of spike-timing dependent plasticity, including models in which neuromodulators influence synaptic plasticity as a third factor Morrison et al. Non-linearities are typically concentrated in a single thresholding operation.

The NEST simulator implements a hybrid update scheme: time-driven updates in regular intervals propagate the neuronal dynamics and event-driven update of synapses is performed only when the pre-synaptic neuron fires an action potential Morrison et al. Efficient time-driven update schemes have also been developed that enable the exact evolution of the neuronal dynamics, representing the firing times in double precision Morrison et al. In time discrete simulations, for each time step by default 0.

For the common case that the immediate effect of incoming spikes on the neuron dynamics is linear e. This enables an efficient representation of short conduction delays on the order of milliseconds, quantized in units of the simulation time step Morrison et al. For the simplest models of this class, the number of floating point operations required per neuron and time step can be as low as 2, in addition to the on average 1—10 floating point additions required to accumulate the synaptic input in the ring buffer.

Although spike-timing dependent plasticity STDP requires only a few extra floating point operations at irregularly spaced time points determined by the spikes of the pre-synaptic neurons Morrison et al. The neurons of the network are evenly distributed over the compute nodes in a round-robin fashion and communication between machines is performed by collective MPI functions Eppler et al. These architectures feature a multi-level parallel programming model, each level potentially operating at different granularity. The coarsest level is provided by the process based distribution, using MPI for inter-process communication Message Passing Interface; Pacheco, Within each process, the next finer level is covered by threads, which can be forked and joined in a flexible manner with OpenMP enabled compilers Board, The finest level is provided by streaming instructions that make use of concurrently operating floating point units within each core.

The code used in this work combines distribution by MPI, starting one process per compute node and utilizes multi-threaded OpenMP-based software components within each process during the setup and simulation phase. The use of threads instead of one MPI process per core is essential, because each MPI process entails an additional memory overhead due to replicated data structures and the process management. Moreover, the communication load and memory consumption caused by the currently employed collective data exchange scheme Morrison et al.

In Kunkel et al. The model expresses the memory consumption of each MPI process as a function of the total number of neurons N , the number of incoming connections per neuron K , and the number of MPI processes M. Here, we instantiate the model terms and parameters for the simulation software NEST revision in order to obtain reliable predictions of the maximum size of a randomly connected neuronal network that fits on a specific supercomputing architecture without saturating the available memory resources.

We extend the original formulation of the memory usage model to account for threading, which results in. The second term is the memory consumption of neurons. It is dominated by the storage of the state variables, which is typically around B per neuron. In addition it contains the contribution from a sparse table Silverstein, needed to check for local existence of a neuron on a given compute node and the overhead caused by the vector storing the local neurons.

These contributions to the memory consumption are rather minor in the regime of up to 32, compute nodes considered here. In order to account for threads as they are implemented in NEST, only the third model term needs adaptation while the first and the second model term remain unaltered.

Figure 1 illustrates the connection infrastructure of NEST that is required on each compute node for the case that a simulation is run with T threads. On the highest level a vector of dimension T holds a sparse table Silverstein, for each thread. To this end, the index space 1, …, N of all neurons is equally partitioned into n gr subgroups here 48 entries. For each entry in a group, 1 bit is stored in the bit-field tiny squares in Figure 1 indicating the existence of a target neuron for the specific presynaptic neuron j.

If neuron j has at least one target, the sparse table stores a pointer to a vector, which enables different synapse types e.

- Soviet Art of Brainwashing - A Synthesis of the Russian Textbook on Psychopolitics;
- Wally’s Stories: Conversations in the Kindergarten.
- Amyloid Precursor Protein A Practical Approach.

For a detailed description of the fundamental data structures in NEST and how these can be mapped to the model terms please see Kunkel et al. Figure 1. Connection infrastructure in NEST optimized for supercomputers for the case that a simulation is run with T threads. Vertical dark orange rectangles and tiny squares indicate the per-group overhead of the sparse table.

For simplicity, the figure shows only once the additional infrastructure which is required for each neuron with local connections light orange. The filled pink square illustrates a locally stored connection object. Figure adapted from Kunkel et al. On each thread, this infrastructure causes an overhead of m c 0 per neuron. We denote such a linear dependence on the total number N of neurons serial overhead , indicating that this contribution does not benefit from distributing the N neurons over M parallel machines. Please see Kunkel et al.

The parameters used for the predictions in Figure 4 are displayed in Table 1. The value for the memory consumed by a single connection object m c is based on a representation of the synaptic parameters and weight in double precision, as this allows for most generality and represents the underlying mathematical model most accurately. However, it should be noted that if a lesser precision for example using the float data type is adequate for the scientific question at hand, a lighter weight synapse can of course be implemented without any change to the framework.

Table 1. We use a recurrent random network of current-based integrate-and-fire model neurons with spike-timing dependent plasticity in the connections from excitatory to excitatory neurons as a benchmark simulation.

### Refine Search

All parameter values for the neuronal dynamics and the details of the employed models are taken from Morrison et al. Constant numbers of inputs ensure that networks of different sizes are in comparable dynamical states. Our benchmark model contains a Poisson source to model external input to the network Morrison et al. These sources are stochastic by definition. Moreover, the connectivity of the network is generated randomly. However, the random numbers required to realize the connectivity and the Poisson spike trains are drawn such that identical sequences are produced when rerunning the same simulation.

This is crucial not only to be able to reproduce scientific results, but also to have a means of testing different implementations against each other during software development, in particular to benchmark performance improvements. The mechanisms to achieve this reproducibility even across different numbers of MPI processes are described in Morrison et al. The key ingredients of the implementation are the use of thread-local random number generators, initialized with the same seed at the beginning of each run and the parallelization of connection setup routines that ensure that the same numbers of random variates are drawn from each thread-local generator irrespective of the number of processes.

We use identical versions of the NEST software revision on both supercomputing architectures. The code incorporates the optimizations for large supercomputers as described in Kunkel et al. In order to assess whether the simulation code makes good use of the parallel architecture of the supercomputers, we show a strong scaling of the simulation time in Figure 2 A using K and in Figure 2 B using JUGENE, keeping the problem size number of neurons constant while increasing the number of used processor cores.

In particular the communication at regular intervals between different machines required to deliver the spikes point events in time to the target neurons imposes synchronization points for the threads and the MPI processes, possibly limiting the scalability of the application. For a fixed size of the network of about 3. A high slope of 0. Changing the seeds of the random number generators results in different realizations of the random networks. This has a negligible effect on the firing rates in the network and therefore on the simulation time.

The largest source of fluctuations in the runtime of these simulations are load differences on the compute nodes caused by other users. To quantify these fluctuations we performed for each number of nodes 5 identical simulations for the strong scaling on K. The standard error of the mean over the 5 runs is shown as error bars in Figure 2 A.

Note that this error increases for larger sizes of the machine, as the communication time becomes dominant. At this point of highest load per processor, the slope reaches 0. Figure 2. Strong scaling of the NEST simulator. Optimal linear scaling is shown by the dashed line. At cores the network consumes all available memory, causing the highest possible workload per core for the given network size. The scaling of the simulation time solid curve between and cores has a slope slightly below linear scaling 0. The error bars denote the standard error of the mean from 5 repetitions of identical simulations same random numbers.

The setup time allocation and creation of neuron objects and synapses is shown by the dotted curve. At cores the network consumes all available memory. Same symbol code used as in A. For such large networks, not only the simulation time needs to be taken into account, but also the setup of the network may consume a considerable amount of time.

## The NEST Initiative Association | NEST Initiative

In the current implementation, the wiring process is therefore also performed in parallel on two levels, firstly on the coarse grained level of MPI processes and secondly employing finer grained parallelism implemented with OpenMP directives. As the synaptic information is exclusively stored on the machine that harbors the target neuron in a thread-specific structure, as shown in Figure 1 , both levels of parallelization are implemented in a natural way: each MPI process and thread establishes and stores the incoming connections for the neurons that are local to that process and thread.

The second level of parallelization using one OpenMP thread per available core 4 on JUGENE and 8 on K is possible because the threads work independently on disjoint parts of the connection infrastructure see Figure 1. Figures 2 A,B show the scaling of the time required for network setup. The absolute value is below 10 min at the highest load per processor. As a production run for the network considered here is typically longer than 1 s of biological time Morrison et al. In contrast to the simulation time, the speedup for the setup time increases with decreasing load per processor and almost reaches optimal linear scaling at the point at which the machine is 4 times larger cores than the minimal required size , as seen in Figures 2 A,B.

Near the point of maximum filling, connection setup is less effective. This hints that the memory allocation for the synaptic infrastructure may be dominating the setup time. Strong scaling for simulation and network setup are important measures to assess the percentage of parallelism achieved by the application, but this measure is less informative for the typical use of the simulation tool by a neuroscientist.

Given a neuroscientific question to be investigated, the number of neurons N is determined by the chosen model of the biological system. The researcher needs to determine the size of the machine required to address this question by simulation. It is desirable to determine the minimum size of the machine in terms of number of CPUs and working memory that is sufficient in order to keep the energy consumption small and because the effort spent on computation time grant applications typically increases with the size of the machine asked for.

Moreover, the shared use of high performance computing resources by a large community of users requires thoughtful behavior of the individual. Using the smallest possible portion of the machine for a given task causes faster scheduling and thus often leads to shortest return times, as the startup plus the queuing time contribute considerably to the turn-around times.

In the case of spiking network simulations, the feasibility of a particular simulation is determined by memory constraints rather than by the required performance of the machine. The memory consumption of neurons increases with the number of cores, because of memory overhead that is serial in N. Each instance of NEST has a sparse table for the total number of neurons, which is needed to determine whether a neuron is locally represented Kunkel et al. Similarly, the sparse table in the connection infrastructure see Figure 1 , needs to store for each neuron 1, …, N in the network whether it has a target on the local machine.

As both structures grow proportional to N , they constitute a serial memory overhead. As a result, in Figures 3 A,B the number of neurons per core needs to decrease with increasing network size in order to remain within the memory constraints on each compute node. In the log-linear plot the number of neurons per core decreases in a linear fashion, demonstrating an approximately logarithmic dependence on the number of cores. The linear extrapolation has an intercept with the x -axis at a certain number of cores exposing the limits of the current implementation.

Correspondingly, the total number of neurons as a function of the machine size, shown in Figures 3 A,B, increases slightly sub-linearly. Figure 3. Maximum filling scaling. The network size number of neurons is chosen so that the simulation consumes all available memory. The affordable number of neurons per core black round symbols decreases with increasing number of cores to keep the total memory consumption close to the usable maximum. The dotted lines give linear fits to the data.

- Towards Reproducible Descriptions of Neuronal Network Models?
- Understanding Complex Systems | Henry D. I. Abarbanel | Springer?
- Dynamics in Complex Brain Networks!
- Lectures in Supercomputational Neuroscience Dynamics in Complex Brain Networks Understanding Com.
- 1. Introduction.

The memory consumption at different stages of the simulation colored round symbols: after allocating the neurons, colored crosses: after establishing the synapses, colored triangles: after running the simulation show the memory consumption at different stages of the simulation.

The largest contribution is due to the synapses. All data are represented using log-linear axes. A K computer with Note that the last point at , cores is slightly below the maximum memory usage, as the bisectioning method we used to empirically determine the largest possible number of neurons per core did not converge before the end of our access period to K.

Figure 4. Simulation time required for 1 s of biological time is shown as black round symbols for a firing rate per neuron of 6. Total size of the network i. The theoretical prediction 1 of the maximum possible network size is shown as colored crosses. Optimal linear scaling shown as dashed lines; the dotted lines give linear fits to the data.

A K computer; the estimated slope of the maximum network size is 0. The memory model developed in Kunkel et al. At small numbers of cores the memory consumption is slightly overestimated, at larger machine size the theory is below the measured value. A possible source of error is the memory management system of the operating system. The memory overhead involved in managing dynamic memory allocation within the kernel is not part of our model. Future implementations will need to face this issue, for example by employing more effective pool allocation strategies Stroustrup, The deviations of the model from the measured memory consumption for the K computer is larger than for JUGENE see Figure 4 , but the slopes of theoretical and estimated curves approximately agree.

## Lectures in Supercomputational Neuroscience : Dynamics in Complex Brain Networks

For the latter, we observed a non-monotonic dependence on, e. As we rely on these measurements in order to determine the parameters of the memory model see Sec. The decrease is due to the reduced workload neurons per core, see Figures 3 A,B. As we are using collective MPI communication, the communication time increases with the machine size. This explains the subsequent increase of the simulation time at higher numbers of cores.

On K the simulation time and the slope of the simulation time predominantly increase with machine size Figure 4 A. Different configurations of the Tofu communication topology see Sec. Rerunning the same simulation in Figure 4 with different topologies showed a notable effect on the runtime. However, this turned out to be false. As the K software environment matures, further investigations into the optimal configuration can be carried out.

On K the largest network of 6. Decreasing the firing rate in the network to 1. The memory consumption was only marginally affected data not shown. The short return times allow for a quasi interactive working style with the model for short simulations of a few seconds of biological time. Figure 5 shows the comparison of the two supercomputers.

The maximum size of the network for a given machine size is shown in Figure 5 A. In the optimal case without any serial overhead in the representation of the network, one would expect a factor of 4 corresponding to the relative size of the total available working memory. A similar observation can be made from the total memory consumption as a function of the network size, shown in Figure 5 B.

The slope of the linear fit is above the optimal linear scaling slope 1 and the memory increase on the K computer is slightly less than on JUGENE. Figure 5.

A Maximum possible network size as a function of the number of cores same data as in Figures 4 A,B. The dashed lines are determined by a linear fit to the data points. The dashed horizontal lines at 10 7 and 10 8 neurons are given for visual guidance. B Memory consumption and runtime. Total memory consumption mem tot as a function of the network size N triangles.

Normalizing the simulation time by the workload per core exhibits an increase with the number of cores in Figure 5 B, caused by the collective communication scheme. As each spike produced within the network needs to be communicated to all other processors irrespective of targets existing on that machine, the communication load increases with the number of cores. Additionally, the number of spikes arriving at a given machine grows almost in proportion to the number of cores.

So as the network size increases, so does the number of arriving spikes, each of which needing to be checked for whether it has local targets, and so too increases the proportion of spikes which have no such local targets and must be discarded. The peak performance of K can be achieved by streaming instructions single instruction multiple data, SIMD utilizing both concurrently working floating point units in each core. Moreover, the benchmark considered in this contribution uses rather lightweight computations per neuron, where each simulation time step comprises only a few floating point multiplications and additions per neuron.

However, the delivery of spikes to randomly drawn target neurons causes frequent and random memory access. Therefore, for our application the floating point performance of the CPU is less relevant than clock speed, cache efficiency, and memory bandwidth. In order to measure the performance of NEST on K, we employed the profiling tools fpcoll and fprofx that are part of the Fujitsu development kit.

We simulated 18,, neurons on a subset of 98, cores 12, nodes. Note that the total number of floating point instructions in the simulation code is very low in our benchmark, as we are using a leaky integrate-and-fire neuron with only three state variables and the exact integration method Rotter and Diesmann, The floating point performance is thus expected to be low.

For a program without floating point operations the measure is zero. The MIPS performance is good, especially considering that the application heavily relies on random memory access. Table 2. NEST is an openly available tool to routinely simulate networks of spiking neurons of functionally relevant sizes on HPC facilities; its use is also taught in the major advanced computational neuroscience summer schools. The simulation framework as used in this contribution does not sacrifice any generality for efficiency.

The improved connection infrastructure as presented in Kunkel et al. This is a crucial prerequisite to investigate synaptic plasticity in recurrent networks, the biological substrate hypothesized to underlie system level learning. Moreover, the same code that we used on supercomputers here also runs on small machines, like laptops, without any penalty in performance.

This number of neurons is a critical point at which the largest areas of the visual system in the primate brain can be represented at full cellular resolution.