HIGH FLYERS THINK TANK

Sponsored by:
LWA logo

Innovative technical solutions for water management in Australia

University of Adelaide, 30 October 2006

Water management informatics
by Dr Bronwyn Harch, CSIRO Mathematical and Information Sciences, Brisbane

Alex Zelinsky has given you a broader view of what the Water Resources Observation Network (WRON) is about, and I am going to chisel down into some specific aspects of the informatics needs in relation to WRON. That is the innovation needed for decision support science, which I will call water resource informatics, where the focus is on the quantitative sciences of mathematics, statistics and information engineering. So it is the enabling technologies around water resource management.

Slide 1
(Click on image for a larger version)

The journey I will take you through in this talk is establishing the need for these kinds of integration and enabling technologies: what can we actually provide? Also, I am putting forward four different filters for you to focus on as we move from novel application to knowledge gaps. So I will be focusing on monitoring design; information and data acquisition; methodologies for information or data analysis; and methodologies for knowledge deployment. I will pose the novel application aspects and where the knowledge gaps are in those four areas, from my point of view, and then finish with a vision for water resource informatics.

Slide 2
(Click on image for a larger version)

'Establishing the need' relates to making sure that there is innovation in water management in Australia, and that water policy and management decisions need an understanding of the complex natural system that water sits within.

In some areas, water policy is ahead of the science required to enact those policy frameworks. You often hear people say, 'Great policy; shame about the data available.' Also, policy demands are also ahead of the measurement, prediction and reporting capability.

The real challenge here arises because the complexity of the system, the operating rules and reporting needs are exceeding the human capability to integrate the information. That is the main challenge, and establishes the need for informatics.

Slide 3
(Click on image for a larger version)

Water resource informatics is about providing knowledge for water resource managers. From my perspective, the real challenge is to enable risk-informed decisions. As I move into thinking about risk-informed decisions, there are three main areas I want us to think about.

We are often faced with the catch-cry, 'There's not enough information.' Either there is poor spatial coverage, or particular data sets that people want to get to are inaccessible or are seen as idiosyncratic.

On the other hand, people say, 'Well, there's just too much information. I don't know what to do with it. There's too much information in space and time, and there are a lot of different attributes that I have to deal with.' There are also the interactions, those complex dependencies, between space and time, that change with different attributes as well. How do you try and make your way through that space? Add into that the challenge of near real-time data acquisition as well: how do you deal with processing information that is coming too fast for you? Do you actually need to go to real-time data acquisition for particular aspects of water resource management?

The thing that is really close to my heart as a statistician is the confidence in the information enabling that risk-informed decision – that it is not just an answer; there is a bound on particular predictions and things that you are putting forward. People need to have confidence in relation to the quality of information that they are using and in relation to having measures of uncertainty for forecast scenarios.

Slide 4
(Click on image for a larger version)

These are the four filters that I will be talking about. I will give just you a broad view of these now, and then chisel down into them a little bit more as I work through them.

In relation to monitoring design: the main focus here is about optimal allocation of monitoring resources. As a bottom line, the reliability of inference depends on the representativeness of the samples that are actually taken. In this case it may not necessarily be a biophysical sample; it could be how you are collating socioeconomic information as well – that is, what is the frame in which you are making your decisions?

We need to think about the multiple objectives of the monitoring undertaken, the actual monitoring technologies that you have – whether they are surveys, sitting with catchment management groups, or the actual kinds of sensors that Alex Zelinsky showed us – and then thinking about where in the landscape you are actually going to do that monitoring.

Moving through to information or data acquisition and all the different types of information that can be collected there: we know it is not just about quality of water, it is about quantity of water. It is about images, it is about sensors, it is about sensing people's views as well and the time frames that interact with the way that information is collected.

Then moving through to methodologies for information analysis: here I have been bold in putting forward two kinds of general ways of thinking about analysis of information. One is around process models, deterministic models, and the other looks at stochastic variation, or statistical models. People often use the terminology of 'data mining' in relation to a lot of the statistical things that are done.

Finally, one of the most important filters is methodologies for knowledge deployment, particularly focusing on the visualisation aspects of how people can gain that knowledge out of the analyses that have been undertaken.

Importantly, we need to remember this adaptive loop in relation to all of these. They aren't necessarily in sequential order – it is often a very non-linear process – but it is just to remember that there is an adaptive process that goes on as you move through these different aspects.

Slide 5
(Click on image for a larger version)

The main challenge for water resource management is the balance of how many dollars I have got to expend as against the level or detail of knowledge, understanding and action that is actually required. The thing that helps you balance that out, often, is to consider what you are measuring, where you measure it and how often you measure it.

Slide 6
(Click on image for a larger version)

As we move through to monitoring design, I will talk about novel application issues and then the knowledge gaps, and provide some examples or aspirational examples of those as I move through.

We are very good at monitoring design for specific objectives, or a number of objectives. We have moved from the more well-known methods of convenience-based sampling, representative sampling or model-based sampling, because of the biases in those kinds of technologies for looking at monitoring design, and moved through to probability-based designs.

Here is an example of a probability-based design in a stream network, where we have used information about that catchment to be able to decide how to weight where we actually put sampling applications.

In relation to knowledge gaps: there is a need to move probability-based methods from that more catchment-based system to national and state level monitoring, where you can actually put down a frame nationally and then use that at other scales, so there is that nested grid effect. The important factor here is being able to get robust inference, and so from a probabilistic point of view you need to have spatial balance in the way that you actually design these monitoring programs.

At the moment, we are helping the Queensland government redesign their stream and estuary assessment program for all of Queensland, and the biggest issue for us is coming up with variables to define the inclusion probability surface for doing sampling in the rivers and streams in Queensland, and using those as weighting factors. There is a lot of knowledge that we have about the river systems. How can you use that, come up with a variable or a set of variables to help you decide where in the landscape it needs to be? Random sampling is actually quite biased; you can get a lot of clumping. There are more sophisticated ways of using variables and information to come up with a probability surface and then decide where the dots go.

The other thing to think about is fixed versus mobile sensors, and how you can actually use adaptation in your monitoring design on the fly as well. That is something that we need to think about.

Slide 7
(Click on image for a larger version)

Just as another example, this shows that you can't get everywhere in one particular time slice. You could design for different time slices, but when you put all the panels together, not only do you maintain spatial balance for each individual time but as they are put together you also maintain the spatial balance there, so you have valid inference across time and space.

Slide 8
(Click on image for a larger version)

Moving on to information or data acquisition: the novel application issues are that we are putting together and acquiring data from different sources, we are very good at dealing with data that is not necessarily streaming data, with the different time scales, looking at the validation of data and anomaly detection, and visualising data.

The knowledge gaps relate to the heterogeneity of platforms and information: assimilating information, for example, from remote sensing to Alex Zelinsky's images of the flows of water and the sensors that are in streams – so assimilating that data together and doing integration of it.

And then the other challenge is of streaming data. I think there are a lot of aspirations at the moment in relation to actually getting that data and pulling it together. This involves adaptive acquisition and in-sensor actuation as well, where analysis is actually done in the sensor, not sent back to home base. I am using a lot of the correlation in space and time within those sensors to then make decisions in relation to sensors in the near vicinity.

Slide 9
(Click on image for a larger version)

Methodologies for information analysis: here with process models, or deterministic models, calibration is usually done against observational data. And then with stochastic, or statistical, models we have got some really fancy spatio-temporal trend analyses where the focus is on extracting the trend from the 'noise'.

The particular plot here is of the south-east Queensland ecosystem health monitoring program, particularly the freshwater aspects of that program. We have developed some methodology for doing spatial interpolation in river networks, where what you are trading off or trying to include is a very fancy distance metric which enables you to think about 'as the crow flies' and 'as the fish swims', in relation to correlation in river networks.

Where are the knowledge gaps? They are in the integration of these process and observational data, combining them, with the additional challenge of doing it in a dynamic sense as you have streaming data. That will enable probability-based error propagation. So you have probabilistic envelopes rather than an estimated trajectory. Here it is using some graphical Bayesian methodology for coming up with projections, and so your projections actually have that probability envelope around them.

Then people can make decisions instead of seeing lines going off as what a forecast could be. Is there really any true difference in those forecasts that people are putting forward?

I think about this as the age of moving into data driven modelling, in some areas. We will have to get very good at computationally efficient algorithms, mixed sensor systems, and dealing with massively multivariate prediction and mixed data types – so, sitting in within terabyte science kinds of issues.

Slide 10
(Click on image for a larger version)

In relation to novel application, I think we are pretty good at integration and knowledge deployment at local and regional scales. There is a lot of expertise that people have in interoperable and modular tools for looking at models. Alex Zelinsky put up that 'jigsaw puzzle', showing that there are all those tools and models from different aspects that can be put together. In relation to data display, we are really good at static figures and deployment of a known set of reporting requirements. This slide shows some spatio-temporal modelling work, where you can look at different temporal aspects and then look at the interaction between space and time to come up with some inference.

In relation to knowledge gaps, it is having the nationally accessible integration of these systems, and it transports across without the interoperable and modular tools as well, being able to deal with multiple scales – not just at the local or regional scale but through to the national scale. Something that is going to be very important is access and privacy provisions around a lot of the information that is put out there, that you can't identify my Dad as a farmer in the Lockyer Valley and what he is doing with his pumping. Another challenge is data display: having interactive figures and navigation interfaces, dynamic visualisation, having that 'personal' customisation – there will be aspects where some stakeholders have specific reporting requirements, but there is an element of discovery that people will be looking for, so having that ability for personal customisation – searching and alerting services, and also reusable web services.

Slide 11
(Click on image for a larger version)

There are a number of knowledge deployment tools that could be used. For instance, using Google Earth individuals could locate their dams and then request information on their particular dam levels.

Slide 12
(Click on image for a larger version)

Or they can look at it in terms of this gauge. They can have it on their desktop as a water resource manager and go in and see how they are going in the ACT region in relation to water that is in the dams.

Slide 13
(Click on image for a larger version)

This slide epitomises a vision for us in relation to what can happen with water resource informatics. Let's use this example of seasonal forecasting of water allocation, up to about six months.

We start by bringing in information from somewhere on the Southern Oscillation Index and sea surface temperature. We're bringing in flow information into the modelling structures in relation to WRON. We can get forecast models out of that. That can then be used with other modelling frameworks. There is water authority metering that can come into it as well. And then my Dad, at his farm, can go in and see, on a time scale that is relevant for him, what water is available there for him to suck out of the river that runs past his place, doing it at the right time and with the right amount.

The interesting thing about this is that we can set this up with the novel application stuff that we have got now, but the way that you set up the modelling frameworks and reporting needs to be generic enough that, as we come up with faster algorithms and better ways to put the models together, the infrastructure doesn't change but the components within it do. And that is the real challenge for a lot of this work: having those generic frameworks that we can put the updated componentry into.

Slide 14
(Click on image for a larger version)

I will finish now with this vision statement about: Delivering risk-informed, dynamic, timely reporting and forecasting of Australia's water resources.