AUSTRALIAN FRONTIERS OF SCIENCE, 2003

Canberra, 31 July to 1 August 2003

Scanning the sky learning about the universe through large datasets
by Dr Brian Schmidt

Brian Schmidt Brian Schmidt is an ARC Professorial Fellow at the Research School of Astronomy and Astrophysics, Australian National University. He received a double degree in physics and astronomy from the University of Arizona in 1989, and his PhD in astronomy from Harvard University in 1993. From 1993-94 Brian was a postdoctoral fellow at the Harvard-Smithsonian Center for Astrophysics, and immigrated to Australia in 1995, when he took up a postdoctoral fellowship at the ANU at what was formerly known as Mount Stromlo and Siding Spring Observatories.

Brian is the leader of the High-Z-SN Search team, a collaboration of 20 astronomers on five continents whose discovery of an accelerating universe was awarded Science Magazine's Breakthrough of the Year award in 1998. He is also a member other international investigative teams, including the Supernova Intensive Survey and the REACT Gamma Ray Burst Follow Up Program, both with the Hubble Space Telescope, and the Trans-Neptunian Object Search with the recently destroyed Great Melbourne Telescope. Schmidt has received India's Vainu Bappu medal (2002), the Australian Academy of Science's Pawsey Medal (2001), the Australian government's Malcolm McIntosh Prize (2000) and the Harvard University Bok Prize for Outstanding Astronomical Thesis (2000).

I am going to talk today about using large astronomical datasets. I am going to give you an overview of where I think astronomy is today and how an astronomer would look at where we are currently at, and then pose what I see as being the big questions for our field. Our field is a very small field, relative to what most people have been talking about here. There are roughly 3000 or 4000 active astronomers around the world, 150 of whom are in Australia.

Then, rather than concentrate so much on the results, I am going to look at how I see us tackling these problems, and the methodology that we are utilising to try to continue to make progress without spending too much money in the process.

I am at Mt Stromlo, and I think the first thing I need to do is to give you an overview of the universe: the universe in a nutshell. We live in a solar system which contains a fairly ordinary star. Stars go from being one-tenth of the mass of our sun to roughly 100 times the mass of our sun.

Figure 1
Figure 1
Figure 2
Figure 2
Click on images for larger versions

This solar system is embedded on a scale 40 million times larger in a galaxy which contains roughly 1010 stars. Figure 1 is an image of our galaxy, the Milky Way, taken by a satellite which has managed to look at it and peer through what you would look up at in the sky as the Milky Way. This is what it actually looks like. Just to show you why we think this is a nice, normal galaxy, figure 2 is a picture of a galaxy not the Milky Way which has the glorious name of NGC891. You can see they look very similar. NGC891 happens to be many, many millions of light-years away.

Figure 3
Click on image for a larger version of figure 3

This galaxy, of course, is embedded in an even larger structure, on a scale another hundred times larger, which we call the local group. Figure 3 is a sort of a map of the local group, a map of where everything in the nearby universe is. Our Milky Way of course is in the centre, because we are very important here. But then we have other galaxies, the most notable of which would be the Andromeda Galaxy, which is a nice big galaxy very similar to our own figure 3 is a picture of it, visible from the northern hemisphere but then lots of other things, such as a fairly unexciting member known as the Cetus dwarf. You can see that that would probably contain only 100,000 stars. So this is the local make-up of our universe.

Figure 4
Click on image for a larger version of figure 4

This is embedded in an even larger map of the universe (figure 4), which was done here in Australia over the last five years, known as the 2dF redshift survey. That was headed up by Matthew Colless, here at the ANU. Each one of the dots represents the position in space of a galaxy more or less like our own. So there are a lot of galaxies out here. There are 219,000 maybe up to 221,000 objects in this particular map, and you can see that these galaxies pull out structures, there are filaments, there are voids, as we call them voids are places where there are no galaxies and trying to understand this structure is one of the problems of cosmology of the day.

Figure 5
Click on image for a larger version of figure 5

If we keep going back to a larger and larger scale, this is the biggest scale in the universe we can see (figure 5). This is known as the cosmic microwave background, just recently completed this year in an experiment called WMAP, which was an experiment in space. What we are looking back here at is all these little bumps and wiggles, the remnant of the Big Bang. We are looking back to a point when the universe was roughly 100,000 years old. We are not seeing galaxies here, we are just seeing the universe as it looked when it was very warm, roughly 4000°C. The bumps and wiggles we see here are places where the universe is a little hotter or colder, which it turns out means it is a little denser or less dense, than other places. Connecting what you see here to what we see in the nearby universe through that map of where the galaxies are today is a great triumph of cosmology, in that we can actually go through and piece together what this looks like.

In this schematic view you have this universe where you have this web of density which collapses down and forms the galaxies we see today. These galaxies have gravity and they are moving around each other. If we go back and we look, we are actually looking here back into time, 14 billion years through space, seeing the galaxies on the way, in the case of optical light, and looking to that very distant time with these microwave as it turns out observations made from space.

We live in a very funny universe, a universe described by General Relativity something that Einstein is very famous for but, interestingly enough, never won a Nobel Prize for. The universe is expanding, and space may or may not be curved. If we want to get an idea of where we are at, it is actually more complicated than you might think. If space is curved and I have, for example, two spots and I want to look back in time, you would normally think, 'Well, all right, there is a nice line. Light would come to me along that line.' But light is not so simple, because instead space is curved and at the same time it is expanding, so that photon as it comes from a galaxy to us travels a fairly complicated path. And of course, right now we don't really know from first principles what the shape of space is. Space may actually have this geometry instead, and so you get a different curve that photons will travel.

Theorists often like the universe to be flat. That is the simplest situation. It has a nice geometry which is not curved. And this is one of the things that cosmology, over the last 30 years, has principally wanted to answer: what is the shape of space?

Over the last decade it turns out we have made incredible strides forward in measuring these basic properties of the universe. Rachel alluded to the fact that I use something called supernovae, which are exploding stars. These are stars like our sun after they die. After our sun runs out of hydrogen it becomes a small star called a white dwarf, and under certain circumstances, if you are able to put more material onto that star, it will explode and be as bright as 10 billion stars. When it does it will be one of these objects which get bright over a period of 20 days and fade away. By looking at these and seeing how bright they are, you can measure distances.

The large team that I ran over the last decade were able to measure distances from the nearby universe, where the light from these objects had only been stretched by 1 per cent. As the photons from these objects travel through space, their light gets st-r-e-tched as it goes through that universe, which is expanding. The expanding universe stretches the light as well, and makes it turn redder. Well, we have done it from light that is 1 per cent stretched, all the way out to this, where the light has been stretched by more than 100 per cent. It turns out that these objects are only a few hundred million years old, but these objects are 10 billion years old, so we have measured the expansion history of the universe with this.

We can compare that with what General Relativity says. And what this expansion history says is that the universe, as it has been expanding, has been speeding up. That was not expected in 1995, when we started this experiment. Instead, we expected the universe to be slowing down, because the universe is full of mass, mass has gravity, and mass should slow down by General Relativity the expansion of the universe. So that is one fact.

The census of gravity: by looking at all of these galaxies, Matthew Colless and his team, with the 2dF redshift survey and the Anglo-Australian Telescope, have measured how much gravity there is in the universe. It turns out that it is about 30 per cent of the amount of gravity you needed to make space flat. So that is another fact.

Finally, a snapshot of the universe, back 14 billion years ago, taken with the cosmic microwave background, tells us the geometry of the universe. It is too complicated to explain in a short period of time, but the answer here is that the universe is flat. So here we have 30 per cent of the amount to make the universe flat, the universe is flat, and then some weird result here that the universe is speeding up; there is something other than gravity at work.

The consensus from all this work is that we live in a very funny universe. We live in a universe that is dominated by something we call dark energy. It is something that is throughout space, and it repulses itself and causes the universe to expand faster over time. The rest is this gravitational matter, measured by the 2dF redshift survey, and that is about 28 per cent now of the universe, of which 4 per cent is normal stuff like stars and 24 per cent is dark matter material we can tell has gravity there but that we cannot see in the universe. We have reason to believe it is made up of something unusual that we don't understand, because we have an idea of how much material, what we call baryons, is in the universe from our models of the Big Bang. And it seems that this is not that type of material. We have also managed to measure the universe's age very accurately now it is about 13.7 billion years old, with an error bar less than 10 per cent and geometrically it is flat to at least 2 per cent.

So this is the golden age of astronomy. In the last decade we have managed to measure the age of the universe, we have discovered the first extrasolar planet, which I haven't even discussed yet, we have mapped the cosmic neighbourhood, and we have figured out how much and what makes up the universe. And we have, because of this, as it turns out, a good idea on the ultimate fate and the beginning of the universe. We were created in a Big Bang, and we expect the universe to keep expanding forever, to infinity.

So what, you might ask, is left to do? There are still some outstanding questions. One thing that is primary for the future is observing and understanding the first generations of stars and galaxies: how did we come to be?

Another popular topic is directly detecting and studying the first extrasolar planets. We have detected extrasolar planets by the gravitational wobble that their sun induces as they go around it. That turns out to be a few metres per second about how fast I can run. You can actually see that indirectly as the stars wobble, but it would be much better to actually image one of these things and take a picture of it so you can see what is going on.

Another question is: what is this dark energy that seems to be blowing the universe apart? It is 72 per cent of our universe; we have absolutely no idea what it is.

And what is the dark matter? Well, 25 per cent of the stuff we think we sort of understand, we don't understand anyway. That is this dark matter, which has been around for the last three decades.

These are four big questions that people are trying to answer. I should say that they are not the only questions in astronomy, but I would say they are four of the biggest.

So what astronomers have done is to propose a whole raft of new experiments. The James Webb Space Telescope is going to replace the Hubble Space Telescope, the OWL telescope is a 100-metre sized telescope (and there are several competing models for this), and something called the Square Kilometre Array (SKA) which is a square-kilometre sized radio telescope. The James Webb Space Telescope is proposed to be in Australia; the OWL will probably be in some place like Chile; and the SKA will be out way beyond the Earth, in an orbit known as the second Lagrangian Point.

You might ask: what do these three experiments have in common? Well, they all allow us to look very far back into time, and so you can look to see the first stars. But they have another commonality, which is that they all cost more than a billion dollars. Now, trying to go and raise a billion dollars in the US is hard, and in Australia it is impossible.

For this reason, astronomy cannot afford to live on these instruments alone. You have a small community several thousand astronomers but if you are only going to have one or two telescopes in the future, you had better figure out something else to do with your time, because you are not going to get much time on these telescopes. So it turns out that one feature of technology is that it allows you to build these huge telescopes but it also provides the ability to create large datasets. You can sift through these rare events or objects, or possibly add the information of a billion objects together to get the signal and information you want, rather than observing a few objects very well.

This is not to say that these big instruments do not have a place. They do, but you need to have other ways of looking at information to help make sure you utilise those big instruments well, because you are only going to get one of those big instruments every 20 or 30 years.

So, to succeed at using these big datasets, there are a few key factors. You have to still have well-chosen scientific goals. Just grabbing huge datasets does not by itself allow you to learn anything. You have to have an idea how you are going to use them. You have to have a carefully chosen experimental plan. You have to design the experiment that is going to get these big datasets. And you need to continue to work with and beyond the cutting-edge technology of the day, both hardware and software. If it is easy, someone else will have already done it, so you need to use the software and hardware, and develop it yourself, to compete in this area so that you are at the forefront.

I am going to provide you a worked example of observing and doing a large optical survey of the southern sky.

From the experimental side you might ask why we would want to do this. One of the things I said we wanted to do was to observe and understand the first generation of stars and galaxies. Well, one way is to look and make these huge telescopes to look back. Another way is to go and find the brightest objects in the early universe, of which there are only a few, which we call quasars, and study these. It may not seem quite as good as going and using your billion-dollar instrument, but at least it gives you the idea. The challenge is this: over the entire hemisphere there are probably 10 useful objects, and so you have to look at the 10 billion stars or objects in the southern sky, to pick out the 10 interesting ones.

Secondly, another thing you want to do is directly detect and study extrasolar planets. I think most of us can imagine that Venus occasionally will traverse in front of the sun. This is what it did in 1882. It is probably not a well-known fact that Captain Cook was actually being paid to do astronomy when he 'discovered' Australia, back in the 1700s to look at a previous transit of Venus in the late 1700s. It turns out that these eclipses happen about once every hundred years. There is another one going to be visible next year from Australia and so you should go out and get yourself a thing of mylar and look at it. But the whole idea of an eclipse is that when a planet goes in front of star it becomes a little fainter, and so you can go out and look at these things. But I have just told you Venus only does it once every hundred years, so you need to look at a lot of stars. It requires monitoring millions of stars, very accurately, on the timescale of hours, to see eclipses. If you do this, it is possible that you can directly detect by their shadow in this case, but you do get to see bits and pieces of the atmosphere these objects.

Thirdly, what is the dark energy? Well, it turns out and for lack of time I am not going to explain this if you look at literally tens of millions of galaxies, and how they affect the cosmic microwave background, you can learn things about the eark energy and actually discern between two popular models for what it is.

Fourthly, what is eark matter? There is a lot of dark matter in our own galaxy, and by getting a nice sample over the entire sky of tracer particles, which are distant stars and the only way to find those is to again sift through those 10 billion objects in the sky and find the interesting 10,000 or so you can figure out what the shape of the dark matter is, how far it extends, and hopefully get some understanding of exactly how it behaves.

So we were using the Great Melbourne Telescope. We automated this telescope and were in the process of about to start a five-year, five-colour, 24-terabyte image of the entire southern sky at 3-apex.

Figure 6
Figure 6
Figure 7
Figure 7
Click on images for larger versions

Just to give you an idea of how telescopes work: typically we go and we work in this sort of Star Trek-like console (figure 6), we like to have lots of blinking lights, we press lots of buttons, we stay up late at night, and we control our instrument with another panel like this (figure 7). There is lots of button pressing, and you have to be very alert to not make any mistakes.

So you need to automate things, and in our case it involves installing a weather monitoring system, installing computer-controllable switches on all systems, moving to not have things that like, for example, liquid nitrogen duals you have to fill up every 12 hours, having control software to monitor the weather and control the telescope, and having a scheduler that decides what you are going to observe when. And finally having quality monitoring software that goes through and makes sure that the data that the telescope is taking is reasonable. Finally, if you get greedy, you want to put on a state-of-the-art imaging system that allows you to take the best data around. These things cost money, but are not prohibitively expensive.

Figure 8
Click on image for a larger version of figure 8

Well, the best plans are laid to waste. We were going to start this. But this is the 50-inch telescope now, after 18 January (figure 8). So you have to plan, but you have plans for a new southern sky survey, possibly.

Figure 9
Click on image for a larger version of figure 9

You get to design from scratch, so you have a huge field of view in this case we are going to be going for a 7 squared degrees field of view, or 50 times the full moon which means 300 million pixels of information which we want to gather every 70 seconds. So you can do the entire sky in 10 nights, and over two years you can map the entire sky in six colours, with information at approximately 10 different times. And the idea would be to have this available to the entire community. So you can design a new telescope that does this (figure 9).

Figure 10
Figure 10
Figure 11
Figure 11
Click on images for larger versions

Just to give you an idea of the information we are talking about, we have 115 terabytes of information produced in two years. It sounds insane, but well, you can compress the information. The difference between these two images (figures 10 and 11) is this: it turns out that I can compress that data by a factor of 10 and only increase the noise by about 5 per cent, which turns out to be negligible as far as we are concerned. So you can throw away a lot of information very quickly.

But if you have 115 terabytes of information, how easy is it to analyse it? With current computers it is not hard. You can store this on 15 terabytes, and it turns out I can go out and for $40,000 buy the discs to do that. Or you can use the dedicated mass storage system.

You do have to have robust software, which we write ourselves, to go through and figure out the interesting information in these images, and you don't necessarily invent it all yourself. You beg, borrow and steal it from many, many sources. In astronomy we are lucky, because nothing is proprietary. You just go out, you grab it, you get the source code, you modify it as necessary.

The current pipeline takes approximately one hour to complete on one of these images, but fortunately in astronomy we have night-time and bad weather. So on a long winter's night you would take approximately 500 images, and it turns out that what you require is about 22 computers which is completely affordable now. And the good news is that we have Moore's Law, so in three years' time you only need eight of the newly-released 8 GHz Intel Tritium processors. So it is not a difficult problem. It is a huge amount of data, but it is not hard. Computers are on your side.

Because I am out of time I am going to wind up here. I will just say that yes, there are problems. You have to get funding. The system I am talking about building is small potatoes compared with a big telescope, but it is still big bickies here in Australia. It is roughly a $15 million experiment. The previous incarnation, before it burnt down, was a $500,000 experiment. It is good to take your existing equipment and try to update it, rather than have your old equipment burn down and try to build from scratch.

The automated systems and instruments are non-trivial to implement. It takes a lot of time and effort to make them trouble-free. And the software has to be good. Computers can handle the 113 terabytes a survey like this will deal with, but humans cannot. So you must have pipelines which do almost all the
work for you and do not need to have even any button pressed along the way, or it just completely overwhelms you.

Session 6 discussion