AUSTRALIAN FRONTIERS OF SCIENCE, 2005
Walter and Eliza Hall Institute of Medical Research, Melbourne, 12-13 April
Computational challenges in particle physics
Dr Lyle Winton, School of Physics, University of Melbourne
What I want to do today is to give
you a small insight into some of the computing challenges facing high energy
physics, and I will also look a little bit beyond that.
![]()
(Click on image for a larger version)
I will give you a bit of an overview of what I am going to talk about. I was going to do an introduction but Elisabetta Barberio has done that very well so I won’t spend too much time on that.
I will look at the challenges facing the Belle experiment and the ATLAS experiment, mainly because the Melbourne University Experimental Particle Physics Group has been a part of those collaborations since the mid-1990s, and I will talk a little bit about Data Grids and grids, because this is a potential solution to some of the challenges that we are facing. Then I will look at some other areas of physics which are facing similar problems and having similar challenges.
![]()
(Click on image for a larger version)
Elisabetta has mentioned how we do what we do. We construct particle accelerators, we then construct instruments precision detectors to actually try and reconstruct these nanoscopic collisions. But the majority of activities that we do actually involve high performance computing. We need high performance computing to design these accelerators and these detectors, we use high performance computing to collect and filter experimental data, and we also then do further data processing and analysis on high performance computing.
![]()
(Click on image for a larger version)
This is a little bit complicated but I will try and go through it quickly. It is, I suppose, a day in the life of data within particle physics. We have data produced from the actual experiment, from the collisions that are occurring inside the detector. We get information from each of the sub-detectors. This is then processed with specialised online data acquisition equipment, as well as high performance computing.
This is then reconstructed to try and pull out tracks and vertices within the experiment itself. And the data is then analysed to actually try and figure out features within the data and test actual processes within the data. This is then further statistically analysed by scientists who prove hypotheses and figure out what we have been trying to look at.
But the real point of the diagram on this slide is that a major part of what we do is actually simulation. We need to simulate the entire detector, the collisions within the detector and the tracking of all particles within the detector, as well as the online data acquisition system. And that simulated data then actually traverses the same reconstruction and analysis chain.
![]()
(Click on image for a larger version)
As I mentioned, we simulate the collisions and events. These simulations are then used to predict what we will see, some of the features of the data. This is absolutely essential to support the designs of the systems, designs of the electronics and the sub-detectors that we build, and it is also essential for analysis, because it tells us a lot about our acceptances, what we expect to see. It helps us to finetune our analysis and understand the uncertainties within the detector.
Simulation is very computationally intensive, mainly because we need to simulate the collisions, the interaction at the collision point, and the decays of the particles produced at the collision. We then need to simulate all of the components and all of the materials within the detector – and for ATLAS it is a very large detector, as Elisabetta mentioned. However, the ATLAS detector measures things down to a micrometre accuracy, so we need to simulate it down to very fine accuracy.
We then need to simulate the tracking and energy deposition of all these particles throughout the entire detector. We need to simulate all the electronics effects within the detector, including signal shapes and all sorts of output. And in the end we need a ratio of greater than 3:1 for simulated to real data in order to be able to be able to eliminate some of the statistical uncertainties generated within the simulation.
![]()
(Click on image for a larger version)
Now I will look at the Belle experiment. The Belle experiment is situated on the KEKB accelerator in Japan. (In Australia the high energy physics group has been a member since 1997.) It collides e+ e- to generate what we call B mesons, basically to investigate a fundamental violation of symmetry in nature.
They have an interesting problem well, actually, it is a good problem in that the accelerator in Japan has actually had an increasing luminosity, an increasing number of particles, over the last few years. (You can see an increase there on the graph.) This is fantastic, because it means we get an increased number of collisions, which increases our data, increases our statistics, so we can probe physics more deeply. However, we need to actually create more simulated data in order to maintain the ratio of 3:1.
![]()
(Click on image for a larger version)
This has created a computing challenge for the Belle experiment. Within 2004 we needed to create four billion events in order to keep up, and each of these events took about 3 seconds on a modern CPU. This ended up saturating the KEK computing facilities; it just wasn’t possible to complete all those. So this led to an effort that we called the Belle Monte Carlo Production effort, where a lot of facilities around the world contributed computing power, CPU resources, from their own high performance computing facilities. Australia was a major contributor to this. We used some high performance computing facilities existing in Melbourne, Canberra and Sydney. We were able to generate and replicate data between the sites within Australia, and back to Japan, using a tool that we call the storage resource broker. These kinds of efforts are ongoing in 2005, because of course we need to keep generating data.
![]()
(Click on image for a larger version)
Again I will look at the ATLAS experiment. I had better mention that 23 particles, I think, on average collide every time that we have a bunch crossing. This can create on average 7000 tracks but it can be much larger. It can take quite a while for these tracks to actually exit the detector, so you can actually get up to four different bunch collisions, or particles from four different collisions, happening in the detector at once.
![]()
(Click on image for a larger version)
The detector is a precision detector, as Elisabetta mentioned, 22 metres high and wide, 46 metres long – Elisabetta says 44; there is some discrepancy there. It is 7000 tonnes. But we are tracking, within the inner detector, down to an accuracy of 10 micrometres. So there is an awful lot of information coming out of this detector.
It is also in a high radiation environment. I believe the inner detector receives up to 160,000 grays per year, so there is a lot of flux there from thermal neutrons and particles, potentially up to TeV. So a lot of simulation has gone into the design of the inner detector, because we can’t get access to it frequently so we need to make sure that everything survives in there for quite some time.
There is also an extreme volume of data coming from the detector. Bunch collisions like those shown in the second picture here occur 40 million times per second. You get an output of about one peta-byte per second of data coming directly from the detector. Now, using the trigger and the specialised electronics that Elisabetta mentioned before, we can get that down to about three gigabytes per second. But we still need to use something that we call the event filter, which is a CPU farm of about 2000 computers, to process the data in real time and try and filter that down to a more manageable level, I think about 300 megabytes per second. That then gets stored to tape and disk as well.
In general, ATLAS expects to be producing about 10 peta-bytes of data per year, which need to be stored long term. And of course the goal that Elisabetta mentioned is to look for the one event like that, in the third picture here, embedded in one in 10,000 billion of the events I mentioned previously.
![]()
(Click on image for a larger version)
Simulation also provides a challenge. A large body of the knowledge needs to be encapsulated within code in order to do simulation, so we need to simulate the interaction and the decay of fundamental particles and the passage of these particles through matter, through the detector. The sheer volume and complexity and precision of the data that will be recorded, from the last slide that I showed you, will also pose a problem. We need to produce greater than three billion events per year. And a real problem is that on a modern CPU the simulation, we discovered, can take up to 45 minutes per event. So that is an awful lot of computing power.
One of the problems that we discovered is the showering of particles within the detector. Any one of these particular tracks in the top picture here, if they hit the right detector, can actually shower and produce a large number of secondary particles. There is an example in the second illustration here of an electron in the electromagnetic calorimeter. Each of these secondaries, if we do a full simulation, needs to be tracked. This adds a lot of computational time. So we are working in Melbourne to try and speed up this kind of problem, by parameterising these showers. We actually stop the tracking of the shower particles and simulate the shape of the shower with energy deposited within the detector instead. This is a great speed-up, but we are still working on that.
![]()
(Click on image for a larger version)
Some other problems arise in the ATLAS experiment, mainly because it is a very large collaboration of about 1800 physicists at the moment. Many of these individuals want to perform their own sorts of studies, so all of them require access to all the data as well as lots of CPU power. There is an extensive amount of code that has been generated by the collaboration – four million lines, I believe, at last count – and this embodies much of the physics and experimental knowledge that we need, to do what we are doing.
In addition to that, we have also had to develop our own management and distribution systems for the code, in order for everyone to be able to work on the components that they specialise on.
An awful lot of work also has gone into documentation and communication. We are using technologies such as AccessGrid and Virtual Rooms, and a number of other standard technologies like Wiki, Database and MetaData systems. But in addition to that, a lot of effort has gone into policy, because it is very important that we communicate efficiently.
![]()
(Click on image for a larger version)
So one of the areas that we are looking at to try and solve the challenges is new high performance computing techniques, one of which is called the Grid. The Grid is stated to be an infrastructure that enables integrated, collaborative use of resources owned and managed by multiple people, and it is often thought of by the proponents of the Grid as one global peta-scale computing resource. In fact, it is actually an effort to provide transparent access to processing power, on tap as required, and is usually implemented using middleware solutions.
We in high energy physics are looking at data grids. This is a specific area of grid computing where access to the data is extremely important, and there are lots of efforts going into this to try and help share, manage and process large amounts of distributed data – which is, of course, the problem that we will have within ATLAS. There are some examples of data grids around at the moment. There is the Earth Systems Grid in the United States, and a Global Bioinformatics Grid.
But the one that I am going to focus on is the LHC Computing Grid. This is the high energy physics driver for the European Data Grid project, which has become the enabling grids for eScience everywhere project, I believe, which is a very large project in Europe. It has a very large manpower, employs a lot of physicists.
![]()
(Click on image for a larger version)
The LHC Computing Grid is currently about 10 per cent complete. Even though it is only 10 per cent complete, it is currently the world’s largest international scientific grid, and it is the computing infrastructure that will support the four experiments that are situated on the LHC. It is expected to support, eventually, about 5000 physicists from 500 institutes across the world, and it is expected to need to take tens of peta-bytes of data generated per year, in order to store the information from these experiments.
In the end, it will have about 10,000 CPUs – modern-standard CPUs, Pentium 4 type things – but currently there are only about 130 sites. The majority of them are in Europe, some are in Asia and some are in the US. When it is complete, it is expected to be about 140 tera-FLOPS; that will be at the end of 2004 – and currently the fastest supercomputer is about 70 tera-FLOPS.
![]()
(Click on image for a larger version)
The ATLAS collaboration is the largest user of the LHC Computing Grid, or will be. What they actually expect to need within the LHC Computing Grid for 2008 running will be 24,000 modern Pentium 4, or the equivalent, in CPU. And they expect to be generating about 20 peta-bytes of data by then per year. So in order to try and distribute this data throughout the grid, the information is summarised. You take the raw events, you generate some sort of event summary, and then towards the end you generate very small tags of information about the events.
The whole idea is that the LHC Computing Grid will be structured within tiers, where you have tier 0 situated at CERN, which will contain all the raw data; you have tier 1s which are situated at the country level and will contain some of the raw data and the majority of the summarised data; and you have tier 2s which are closer to the laboratory level. That will include only the very most summarised data. Our workstations will connect in to this.
The whole idea of the Grid – this is the dream – is that from your workstation you will have extremely low latency access to all of the data within 48 hours of it being produced. You will have shared access to all the CPU within the grid, from anywhere. Users will be able to compose virtual data sets of the data that they require, and the data could physically be located anywhere. In addition to that, some of the software will have a technique built into it called back navigation, which means that if you are actually processing on summarised data you have transparent access to the non-summarised data – the detailed, parent data – if you require that.
And the idea at the end is that the complexity of all of this will be hidden from the user completely. They will have no idea of the resources they are actually using.
So is this just a dream? What we are trying to do at the moment is to go through some data challenges that have been set up by the ATLAS experiment and all the other experiments, to test the infrastructure that we have now the 10 per cent of the LHC Computing Grid and to prepare for the start of data-taking in 2007.
![]()
(Click on image for a larger version)
The Australian group has realised that advanced high performance computing is essential for the future of research within Australia. We need to participate in large-scale international collaborations, and we feel that high performance computing will help Australians better utilise these international facilities, particularly because they will help us to gain access to the experimental data, the simulations and the results. (This is critical for doing analysis.)
So Australia came up with our own high energy physics grid program, which was started in 2002, and we have been investigating the use of data grid technologies. We have done this in collaboration with computer science to try and research some of the tools that we will be using, and we have also participated in some of the major data grids around the world, like the LHC Computing Grid and the NorduGrid community. We have also had an aim to try and drive some of the infrastructure within Australia. We are hoping to do this because high energy physics, we feel, and the challenges that we will come across, will provide a ‘killer application’ for this kind of grid.
![]()
(Click on image for a larger version)
We have had some successful outcomes. We managed to build and demonstrate grid test beds throughout Australia and demonstrate the use of high energy physics applications on these grids. But we have also been recognised in the meantime as one of the leading grid applications by the APAC Grid program within Australia, and the University of Melbourne has also started an eResearch pilot program.
So we have been used a number of times as a driver for advanced networks in Australia and data infrastructure.
![]()
(Click on image for a larger version)
Now, a look at some related areas that have their own computing challenges. Quantum chromodynamics is a fundamental quantum field theory of the Standard Model – I am not an expert on this – and is being studied at the moment by the Centre for Subatomic Structure of Matter, at the University of Adelaide. This helps us to describe the interactions between the quarks and the gluons, like those found within nuclei (protons and neutrons). I will show you an example in a nice little animation in the upper panel here. This is a ‘flux tube’, and it helps us to understand why quarks are actually bound to the nuclei.
The majority of work that people do is simulations on space–time lattices, and this is the only real way of actually studying this kind of thing, I am told. The ideal is that you have a very large physical volume, with fine lattice spacing. But typically, the majority of lattices that people do simulations on are about 203 or slightly larger – which isn’t very large.
You can see an example in the lower panel here of the QCD ‘Lava Lamp’, which is in fact looking at a vacuum action potential density. It can take months, up to years, on tera-FLOPS scale computing, supercomputers, to generate these kinds of results. So what they have come up with is the International Lattice Data Grid community. This community’s main goal is to share these generated data sets and, in addition, to share initial lattice state data, which can also take a lot of time to compute. The idea is basically to help distribute this information and save on computing time.
![]()
(Click on image for a larger version)
Simulations from experiments like the ATLAS detector have led to some technology transfers, some of which have come up within the astrophysics community, where we are actually using some of the similar codes to simulate cosmic rays and to design telescopes. In the field of radiation protection we are also using some similar codes to look at the effects of radiation on astronauts and equipment in space. But one of the major areas of transfer has occurred in the area of medical physics.
![]()
(Click on image for a larger version)
We do work on positron emission tomography – tagged radioactive nuclei that produce gamma rays within a patient. We are also looking at radiation therapy, for example brachytherapy, electron therapy and proton therapy.
![]()
(Click on image for a larger version)
We are also using some of the similar codes to do equipment design within radiation therapy. Essentially, some of these therapies, like electron therapy and proton therapy, are just low energy particle accelerators.
![]()
(Click on image for a larger version)
So some efforts are under way within Australia, and mainly I will talk about those efforts between the University of Melbourne and the University of Wollongong. We are looking at positron emission tomography and radiation therapy, as I mentioned, and nanodosimetry, where we actually look at the effects of radiation at DNA level, at cellular level.
There exists a collaboration within high energy physics called the GEANT collaboration, and this collaboration has produced a toolkit, currently called GEANT4, for simulating the passage of particles through matter. This is the same toolkit that we use at the ATLAS detector.
![]()
(Click on image for a larger version)
One of the advantages of this toolkit is that it actually incorporates a lot of the effects that are needed for both high energy physics, such as those found at the LHC, and low energy physics that we actually see within things like medical physics.
It is also able to model quite complex geometries, such as those at the ATLAS detector or in a human being, for example, and it can accurately – as we have found – predict dose calculations within patients. So some of the work that we are doing with GEANT4 is that we are using it to design medical equipment, we are looking at inexpensive high resolution PET detectors (these are also using some advanced silicon technologies that we are using within high energy physics experiments such as ATLAS) and we are looking at building nanodosimetry detectors.
In addition to that we are also looking at a project on real-time patient planning, where we reproduce real geometries in tissues from computer tomography and then predict an optimal plan for a particular patient in situation, and we do this in real time before the patient moves. And we are also looking at investigating that on cluster and grid computing.
![]()
(Click on image for a larger version)
I need to acknowledge my collaborators at the Experimental Particle Physics Group in Melbourne, who are working on the same sorts of things as I am, and a few other people – Derek Leinweber, for example, from CSSM, who provided some of those nice pictures, and our colleagues from the Centre for Medical Radiation Physics, at the University of Wollongong.


