close
The Wayback Machine - https://web.archive.org/web/20111231041123/http://scienceofdoom.com:80/
Feeds:
Posts
Comments

In the last article we had a look at the depth of the “mixed ocean layer” (MLD) and its implications for the successful measurement of climate sensitivity (assuming such a parameter exists as a constant).

In Part One I created a Matlab model which reproduced the same problems as Spencer & Braswell (2008) had found. This model had one layer  (an “ocean slab” model) to represent the MLD with a “noise” flux into the deeper ocean (and a radiative noise flux at top of atmosphere). Murphy & Forster claimed that longer time periods require an MLD of increased depth to “model” the extra heat flow into the deeper ocean over time:

Because heat slowly penetrates deeper into the ocean, an appropriate depth for heat capacity depends on the length of the period over which Eq. (1) is being applied (Watterson 2000; Held et al. 2010). For 80-yr global climate model runs, Gregory (2000) derived an optimum mixed layer depth of 150 m. Watterson (2000) found an initial global heat capacity equivalent to a mixed layer of 200 m and larger values for longer simulations.

This seems like it might make sense – if we wanted to keep a “zero dimensional model”. But it’s questionable whether the model retains any value with this “fudge”. So because heat actually moves from the mixed layer into the deeper ocean (rather than the mixed layer increasing in depth) I instead enhanced the model to create a heat flux from the MLD through a number of ocean layers with a parameter called the vertical eddy diffusivity to determine this heat flux.

So the model is now a 1D model with a parameterized approach to ocean convection.

Eddy Diffusivity

The concept here is the analogy of conductivity but when convection is instead the primary mover of heat.

Heat flow by conduction is governed by a material property called conductivity and by the temperature difference. Changes in temperature are governed by heat flow and by the heat capacity. The result is this equation for reference and interest – so don’t worry if you don’t understand it:

∂T / ∂tα∂²T / ∂z²  - the 1-d version (see note 1)

where T = temperature, t = time, α = thermal diffusivity and z = depth

What it says in almost plain English is that the change in temperature with respect to time is equal to the thermal diffusivity times the change in gradient of temperature with depth. Don’t worry if that’s not clear (and there is a explanation of the simple steps required to calculate this in note 1).

Now the thermal diffusivity, α:

α = k/cpρ, where k = conductivity, cp = heat capacity and ρ = density

So, an important bit to understand..

  • if the conductivity is high and the heat capacity is low then temperature can change quickly
  • if the conductivity is high and the heat capacity is high then it slows down temperature change, and
  • if the conductivity is low and the heat capacity is high then temperature takes much longer to change

Many researchers have attempted to measure an average value for eddy diffusivity in the ocean (and in lakes). The concept here, an explained in Part Two, is that turbulent motions of the ocean move heat much more effectively than conduction. The value can’t be calculated from first principles because that would mean solving the problem of turbulence, which is one of the toughest problems in physics. Instead it has to be estimated from measurements.

There is an inherent problem with eddy diffusivity for vertical heat transfer that we will come back to shortly.

There is also a minor problem of notation that is “solved” here by changing the notation. Usually conductivity is written as “k”. However, most papers on eddy diffusivity write diffusivity as “k”, sometimes “K”, sometimes “κ” (Greek ‘kappa’) – creating potential confusion so I revert back to “α”. And to make it clear that it is the convective value rather than the conductive value, I use αeddy. And for the equivalent parameter to conductivity, keddy.

keddy = αeddycpρ

because cp= 4200 J/K.kg and ρ ≈ 1000 kg/m³:

keddy =4.2 x 106  αeddy – it’s useful to be able to see what the diffusivity means in terms of an equivalent “conductivity” type parameter

Measurements of Eddy Diffusivity

Oeschger et al (1975):

α is an apparent global eddy diffusion coefficient which helps to reproduce an average transport phenomenon consisting of a series of distinct and overlapping mechanisms.

Oeschger and his co-workers studied the problem via the diffusion into the ocean of 14C from nuclear weapons testing.

The range they calculated for αeddy = 1 x 10-4 – 1.8 x 10-4 m²/s.

This equates to keddy = 420 – 760 W/m.K, and by comparison, the conductivity of still water, k = 0.6 W/m.K – making convection around 1,000 times more effective at moving heat vertically through the ocean.

Broecker et al (1980) took a similar approach to estimating this value and commented:

We do not mean to imply that the process of vertical eddy mixing actually occurs within the body of the main oceanic thermocline. Indeed, the values we require are an order of magnitude greater than those permitted by conventional oceanographic wisdom (see Garrett, 1979, for summary).

The vertical eddy coefficients used here should rather be thought of as parameters that take into account all the processes that transfer tracers across density horizons. In addition to vertical mixing by eddies, these include mixing induced by sediment friction at the ocean margins and mixing along the surface in the regions where density horizons outcrop.

Their calculation, like Oeschger’s, used a simple model with the observed values plugged in to estimate the parameter:

Anyone familiar with the water mass structure and ventilation dynamics of the ocean will quickly realize that the box-diffusion model is by no means a realistic representation. No simple modification to the model will substantially improve the situation.

To do so we must take a giant step in complexity to a new generation of models that attempt to account for the actual geometry of ventilation of the sea. We are as yet not in a position to do this in a serious way. At least a decade will pass before a realistic ocean model can be developed.

The values they calculated for eddy diffusivity were broken up into different regions:

  • αeddy(equatorial) = 3.5 x 10-5 m²/s
  • αeddy(temperate) = 2.0 x 10-4 m²/s
  • αeddy(polar) = 3.0 x 10-4 m²/s

We will use these values from Broecker to see what happens to the measurement problems of climate sensitivity when used in my simple model.

These two papers were cited by Hansen et al in their 1985 paper with the values for vertical eddy diffusivity used to develop the value of the “effective mixed depth” of the ocean.

In reviewing these papers and searching for more recent work in the field, I tapped into a rich vein of research that will be the subject of another day.

First, Ledwell et al (1998) who measured eddy diffusivity via SF6 that they injected into the ocean:

The diapycnal eddy diffusivity K estimated for the first 6 months was 0.12 ± 0.02 x10-4 m²/s, while for the subsequent 24 months it was 0.17 ± 0.02 x10-4 m²/s.

[Note: units changed from cm²/s into m²/s for consistency]

It is worth reading their comment on this aspect of ocean dynamics. (Note that isopycnal = contact density surfaces and diapycnal = across isopycnal):

The circulation of the ocean is severely constrained by density stratification. A water parcel cannot move from one surface of constant potential density to another without changing its salinity or its potential temperature. There are virtually no sources of heat outside the sunlit zone and away from the bottom where heat diffuses from the lithosphere, except for the interesting hydrothermal vents in special regions. The sources of salinity changes are similarly confined to the boundaries of the ocean. If water in the interior is to change potential density at all, it must be by mixing across density surfaces (diapycnal mixing) or by stirring and mixing of water of different potential temperature and salinity along isopycnal surfaces (isopycnal mixing).

Most inferences of dispersion parameters have been made from observations of the large-scale fields or from measurements of dissipation rates at very small scales. Unambiguously direct measurements of the mixing have been rare. Because of the stratification of the ocean, isopycnal mixing involves very different processes than diapycnal mixing, extending to much greater length scales. A direct approach to the study of both isopycnal and diapycnal mixing is to release a tracer and measure its subsequent dispersal. Such an experiment, lasting 30 months and involving more than 105 km² of ocean, is the subject of this paper.

From Jayne (2009):

For example, the Community Climate Simulation Model (CCSM) ocean component model uses a form similar to Eq. (1), but with an upper-ocean value of 0.1 x 10-4 m²/s and a deep-ocean value of 1.0 x 10-4 m²/s, with the transition depth at 1000 m.

However, there is no observational evidence to suggest that the mixing in the ocean is horizontally uniform, and indeed there is significant evidence that it is heterogeneous with spatial variations of several orders of magnitude in its intensity (Polzin et al. 1997; Ganachaud 2003).

More on eddy diffusivity measurements in another article – the parameter has a significant impact on modeling of the ocean in GCMs and there is a lot of current research into this subject.

Eddy Diffusivity and Buoyancy Gradient

Sarmiento et al (1976) measured isotopes near the ocean floor:

Two naturally occurring isotopes can be applied to the determination of the rate of vertical turbulent mixing in the deep sea: 222Rn (half-life 3.824 days) and 228Ra (half-life 5.75 years). In this paper we discuss the results from fourteen 222Rn and two 228Ra profiles obtained as part of the GEOSECS program.

From these results we conclude that the most important factor influencing the vertical eddy diffusivity is the buoyancy gradient [(g/p)(∂ρpot/∂z)]. The vertical diffusivity shows an inverse proportionality to the buoyancy gradient.

Their paper is very much about the measurements and calculations of the deeper ocean, but is relevant for anywhere in the ocean, and helps explain why the different values for different regions were obtained by Broecker that we saw earlier. (Prof. Wallace S. Broecker was a co-author on this paper as well, and has authored/co-authored 100′s of papers on the ocean).

What is the buoyancy gradient and why does it matter?

Cold fluids sink and hot fluids rise. This is because cold substances contract and so are more dense. So in general, in the ocean, the colder water is below and the warmer water above. Probably everyone knows this.

The buoyancy gradient is a measure of how strong this effect is. The change in density with depth determines how resistant the ocean is to being overturned. If the ocean was totally stable no heat would ever penetrate below the mixed layer. But it does. And if the ocean was totally stable then the measurements of 14C from nuclear testing would be zero below the mixed layer.

But it is not surprising that the more stable the ocean is due to the buoyancy gradient the less heat diffuses down by turbulent motion.

And this is why the estimates by Broecker shown earlier have a much lower value of diffusivity for the tropics than for the poles. In general the poles are where deep convection takes place – lots of cold water sinks, mixing the ocean – and the tropics are where much weaker upwelling takes place – because the ocean surface is strongly heated. This is part of the large scale motion of the ocean, known as the thermohaline circulation. More on this another day.

Now water is largely incompressible which means that the density gradient is only determined by temperature and salinity. This creates the problem that eddy diffusivity is a value which is not only parameterized, but also dependent on the vertical temperature difference in the ocean.

Heat flow also depends on temperature difference, but with the opposite relationship. This is not something to untangle today. Today we will just see what happens to our simple model when we use the best estimates of vertical eddy diffusivity.

Modeling, Non-Linearity and Climate Sensitivity Measurement Problems

Murphy & Forster agreed in part with Spencer & Braswell about the variation in radiative noise from CERES measurements. I quote at length, because the Murphy & Forster paper is not freely available:

For the parameter N, SB08 use a random daily shortwave flux scaled so that the standard deviation of monthly averages of outgoing radiation (N – λT) is 1.3 W/m².

They base this on the standard deviation of CERES shortwave data between March 2000 and December 2005 for the oceans between 20 °Nand 20 °S.

We have analyzed the same dataset and find that, after the seasonal cycle and slow changes in forcing are removed, the standard deviation of monthly means of the shortwave radiation is 1.24 W/m², close to the 1.3 W/m² specified by SB08. However, longwave (infrared) radiation changes the energy budget just as effectively from the earth as shortwave radiation (reflected sunlight). Cloud systems that might induce random fluctuations in reflected sunlight also change outgoing longwave radiation. In addition, the feedback parameter λ is due to both longwave and shortwave radiation.

Modeled total outgoing radiation should therefore be compared with the observed sum of longwave and shortwave outgoing radiation, not just the shortwave component. The standard deviation of the sum of longwave and shortwave radiation in the same CERES dataset is 0.94 W/m². Even this is an upper limit, since imperfect spatial sampling and instrument noise contribute to the standard deviation.

[Note I change their α (climate feedback) to λ for consistency with previous articles].

And they continue:

We therefore use 0.94 W/m² as an upper limit to the standard deviation of outgoing radiation over the tropical oceans. For comparison, the standard deviation of the global CERES outgoing radiation is about 0.55 W/m².

All of these points seem valid (however, I am still in the process of examining CERES data, and can’t comment on their actual values of standard deviation. Apart from the minor challenge of extracting the data from the netCDF format there is a lot to examine. A lot of data and a lot of issues surrounding data quality).

However, it raised an interesting idea about non-linearity. Readers who remember on Part One will know that as radiative noise increases and ocean MLD decreases the measurement problem gets worse. And as the radiative noise decreases and ocean MLD increases the measurement problem goes away.

If we average global radiative noise and global MLD, plug these values into a zero-dimensional model and get minimal measurement problem what does this mean?

Due to non-linearity, it tells us nothing.

Averaging the inputs, applying them to a global model (i.e., a zero-dimensional model) and calculating λest (from the regression) gets very different results from applying the inputs separately to each region, averaging the results and calculating λest

I tested this with a simple model – created two regions, one 10% of the surface area, the other 90%. In the larger region the MLD was 200m and the radiative noise was zero; and in the smaller region the MLD was 20m and the (standard deviation of) radiative noise was varied from 0 – 2. The temperature and radiative flux were converted into an area weighted time series and the regression produced large deviations from the real value of λ.

A similar run on a global model with an MLD of 180m and radiative noise of 0-0.2 shows an accurate assessment of λ.

This is to be expected of course.

So with this in mind I tested the new 1D model with different values of ocean depth eddy diffusivity,  radiative noise, and an AR(1) model for the radiative noise. I used values for the tropical region as this is clearly the area most likely to upset the measurement – shallow MLD, higher radiative noise and weaker eddy diffusivity.

As best as I could determine from de Boyer Montegut’s paper, the average MLD for the 20°N – 20°S region is approximately 30m.

Here are the results using Oeschger’s value of eddy diffusivity for the tropics and the tropical value of radiative noise from MF2010 – varying ocean depth around 30m and the value of the AR(1) model for radiative noise:

BERJAYA

Figure 1

For reference, as it’s hard to read off the graph, the value at 30m and φ=0.5 is λest = 2.3.

Using the current CCSM value of eddy diffusivity for the upper ocean:

BERJAYA

Figure 2

For reference,  the value at 30m and φ=0.5 is λest = 0.2. (Compared with the real value of 3.0)

Note that these values are only for one region, not for the whole globe.

Another important point is that I have used the radiative noise value as the standard deviation of daily radiative noise. I have started to dig into CERES data to see whether such a value can be calculated, and also what typical value of autoregressive parameter should be used (and what kind of ARMA model), but this might take some time.

Yet smaller values of eddy diffusivity are possible for smaller regions, according to Jochum (2009). This would likely cause the problems of estimating climate sensitivity to become worse.

Simple Models

Murphy & Forster comment:

Although highly simplified, a single box model of the earth has some pedagogic value. One must remember that the heat capacity c and feedback parameter λ are not really constants, since heat penetrates more deeply into the ocean on long time scales and there are fast and slow climate feedbacks (Knutti et al. 2008).

It is tempting to add a few more boxes to account for land, ocean, different latitudes, and so forth. Adding more boxes to an energy balancemodel can be problematic because one must ensure that the boxes are connected in a physically consistent way. A good option is to instead consider a global climate model that has many boxes connected in a physically consistent manner.

The point being that no one believes a slab model of the ocean to be a model that gives really useful results. Spencer & Braswell likewise don’t believe that the slab model is in any way an accurate model of the climate.

They used such a model just to demonstrate a possible problem. Murphy & Forster’s criticism doesn’t seem to have solved the problem of “can we measure climate sensitivity?

Or at least, it appears easy to show that slightly different enhancements of the simple model demonstrate continued problems in measuring climate sensitivity – due to the impact of radiative noise in the climate system.

Conclusion

I have produced a simple model and apparently demonstrated continued climate sensitivity measurement problems. This is in contrast to Murphy & Forster who took a different approach and found that the problem went away. However, my model has a more realistic approach to moving heat from the mixed layer into the ocean depths than theirs.

My model does have the drawback that the massive army of Science of Doom model testers and quality control champions are all away on their Xmas break. So the model might be incorrectly coded.

It’s also likely that someone else can come along and take a slightly enhanced version of this model and make the problem vanish.

I have used values for MLD and eddy diffusivity that seem to represent real-world values but I have no idea as to the correct values for standard deviation and auto-correlation of daily radiative noise (or appropriate ARMA model). These values have a big impact on the climate sensitivity measurement problem for reasons explained in Part One.

A useful approach to determining the effect of radiative noise on climate sensitivity measurement might be to use a coupled atmosphere-ocean GCM with a known climate sensitivity and an innovative way of removing radiative noise. These kind of experiments are done all the time to isolate one effect or one parameter.

Perhaps someone has already done this specific test?

I see other potential problems in measuring climate sensitivity. Here is one obvious problem – as the temperature of the mixed layer increases with continued increases in radiative forcing the buoyancy gradient increases and the eddy diffusivity reduces. We can calculate radiative forcing due to “greenhouse” gases quite accurately and therefore remove it from the regression analysis (see Spencer & Braswell 2008 for more on this). But we can’t calculate the change in eddy diffusivity and heat loss to the deeper ocean. This adds another “correlated” term that seems impossible to disentangle from the climate sensitivity calculation.

An alternative way of looking at this is that climate sensitivity might not be a constant – as already noted in Part One.

References

Potential Biases in Feedback Diagnosis from Observational Data: A Simple Model Demonstration, Spencer & Braswell, Journal of Climate (2008) – FREE

On the accuracy of deriving climate feedback parameters from correlations between surface temperature and outgoing radiation, Murphy & Forster, Journal of Climate (2010)

A box diffusion model to study the carbon dioxide exchange in nature, Oeschger et al, Tellus (1975)

Modeling the carbon system, Broecker et al, Radiocarbon (1980) – FREE

Climate response times: dependence on climate sensitivity and ocean mixing, Hansen et al, Science (1985)

The study of mixing in the ocean: A brief history, MC Gregg, Oceanography (1991) – FREE

Spatial Variability of Turbulent Mixing in the Abyssal Ocean, Polzin et al, Science (1997) – FREE

The Impact of Abyssal Mixing Parameterizations in an Ocean General Circulation Model, Steven R. Jayne, Journal of Physical Oceanography (2009)

The relationship between vertical eddy diffusion and buoyancy gradient in the deep sea, Sarmiento et al, Earth & Planetary Science Letters (1976)

Mixing of a tracer in the pycnocline, Ledwell et al, JGR (1998)

Impact of latitudinal variations in vertical diffusivity on climate simulations, Jochum, JGR (2009) – FREE

Mixed layer depth over the global ocean: An examination of profile data and a profile-based climatology, de Boyer Montegut et al, JGR (2004)

Notes

Note 1: The 1D version is really:

∂T / ∂t = ∂/∂z (α.∂T/∂z)

due to the fact that α can be a function of z (and definitely is in the case of the ocean).

Although this looks tricky – and it is tricky to find analytical solutions – solving the 1D version numerically is very straightforward and anyone can do it.

In plain English is looks something like:

- Heat flow into cell X = temperature difference between cell X and cell X-1

- Heat flow out of cell X = temperature difference between cell X and cell X+1

- Change in temperature = (Heat flow into cell X – Heat flow out of cell X) x time / heat capacity

Note 2: I am in the process of examining CERES data. Apart from the challenge of extracting the data from the netCDF format there is a lot to examine. A lot of data and a lot of issues surrounding data quality.

In Measuring Climate Sensitivity – Part One we saw that there can be potential problems in attempting to measure the parameter called “climate sensitivity”.

Using a simple model Spencer & Braswell (2008) had demonstrated that even when the value of “climate sensitivity” is constant and known, measurement of it can be obscured for a number of reasons.

The simple model was a “slab model” of the ocean with a top of atmosphere imbalance in radiation.

Murphy & Forster (2010) criticized Spencer & Braswell for a few reasons including the value chosen for the depth of this ocean mixed layer. As the mixed layer depth increases the climate sensitivity measurement problems are greatly reduced.

First, we will consider the mixed layer in the context of that simple model. Then we will consider what it means in real life.

The Simple Model of Climate Sensitivity

The simple model used by Spencer & Braswell has a “mixed ocean layer” of depth 50m.

BERJAYA

Figure 1

In the model the mixed layer is where all of the imbalance in top of atmosphere radiation gets absorbed.

The idea in the simple model is that the energy absorbed from the top of atmosphere gets mixed into the top layer of the ocean very quickly. In reality, as we will see, there isn’t such a thing as one layer but it is a handy approximation.

Murphy & Forster commented:

For the heat capacity parameter c, SB08 use the heat capacity of a 50-m ocean mixed layer. This is too shallow to be realistic.

Because heat slowly penetrates deeper into the ocean, an appropriate depth for heat capacity depends on the length of the period over which Eq. (1) is being applied (Watterson 2000; Held et al. 2010).

For 80-yr global climate model runs, Gregory (2000) derived an optimum mixed layer depth of 150 m. Watterson (2000) found an initial global heat capacity equivalent to a mixed layer of 200 m and larger values for longer simulations.

Held et al. (2010) found an initial time constant τ = c/α of about four yr in the Geophysical Fluid Dynamics Laboratory global climate model. Schwartz (2007) used historical data to estimate a globally averaged mixed layer depth of 150 m, or 106 m if the earth were only ocean.

The idea is an attempt to keep the simplicity of one mixed layer for the model, but increase the depth of this mixed layer for longer time periods.

There is always a point where models – simplified versions of the real world – start to break down. This might be the case here.

The initial model was of a mixed layer of ocean, all at the same temperature because the layer is well-mixed – and with some random movement of heat between this mixed layer and the ocean depths. In a more realistic scenario, more heat flows into the deeper ocean as the length of time increases.

What Murphy & Forster are proposing is to keep the simple model and “account” for the ever increasing heat flow into the deeper ocean by using a depth of the mixed layer that is dependent on the time period.

If we do this perhaps the model will work, perhaps it won’t. By “work” we mean provide results that tell us something useful about the real world.

So I thought I would introduce some more realism (complexity) into the model and see what happened. This involves a bit of a journey.

Real Life Ocean Mixed Layer

Water is a very bad conductor of heat – as are plastic and other insulators. Good conductors of heat include metals.

However, in the ocean and the atmosphere conduction is not the primary heat transfer mechanism. It isn’t even significant. Instead, in the ocean it is convection - the bulk movement of fluids – that moves heat. Think of it like this – if you move a “parcel” of water, the heat in that parcel moves with it.

Let’s take a look at the temperature profile at the top of the ocean. Here the first graph shows temperature:

Soloviev & Lukas (1997)

Soloviev & Lukas (1997)

Figure 2

Note that the successive plots are not at higher and higher temperatures – they are just artificially separated to make the results easier to see. During the afternoon the sun heats the top of the ocean. As a result we get a temperature gradient where the surface is hotter than a few meters down. At night and early morning the temperature gradient disappears. (No temperature gradient means that the water is all at the same temperature)

Why is this?

Once the sun sets the ocean surface cools rapidly via radiation and convection to the atmosphere. The result is colder water, which is heavier. Heavier water sinks, so the ocean gets mixed. This same effect takes place on a larger scale for seasonal changes in temperature.

And the top of the ocean is also well mixed due to being stirred by the wind.

A comment from de Boyer Montegut and his coauthors (2004):

A striking and nearly universal feature of the open ocean is the surface mixed layer within which salinity, temperature, and density are almost vertically uniform. This oceanic mixed layer is the manifestation of the vigorous turbulent mixing processes which are active in the upper ocean.

Here is a summary graphic from the excellent Marshall & Plumb:

BERJAYA

From Marshall & Plumb (2008)

Figure 3

There’s more on this subject in Does Back-Radiation “Heat” the Ocean? – Part Three.

How Deep is the Ocean Mixed Layer?

This is not a simple question. Partly it is a measurement problem, and partly there isn’t a sharp demarcation between the ocean mixed layer and the deeper ocean. Various researchers have made an effort to map it out.

Here is a global overview, again from Marshall & Plumb:

BERJAYA

Figure 4

You can see that the deeper mixed layers occur in the higher latitudes.

Comment from de Boyer Montegut:

The main temporal variabilities of the MLD [mixed layer depth] are directly linked to the many processes occurring in the mixed layer (surface forcing, lateral advection, internal waves, etc), ranging from diurnal [Brainerd and Gregg, 1995] to interannual variability, including seasonal and intraseasonal variability [e.g., Kara et al., 2003a; McCreary et al., 2001]. The spatial variability of the MLD is also very large.

The MLD can be less than 20 m in the summer hemisphere, while reaching more than 500 m in the winter hemisphere in subpolar latitudes [Monterey and Levitus, 1997].

Here is a more complete map by month. Readers probably have many questions about methodology and I recommend reading the free paper:

BERJAYA

From de Boyer Montegut et al (2004)

Figure 5 – Click for a larger image

Seeing this map definitely had me wondering about the challenge of measuring climate sensitivity. Spencer & Braswell had used 50m MLD to identify some climate sensitivity measurement problems. Murphy & Forster had reproduced their results with a much deeper MLD to demonstrate that the problems went away.

But what happens if instead we retest the basic model using the actual MLD which varies significantly by month and by latitude?

So instead of “one slab of ocean” at MLD = choose your value, we break up the globe into regions, have different values in each region each month and see what happens to climate sensitivity problems.

By the way, I also attempted to calculate the global annual (area weighted) average of MLD from the maps above, by eye. I also emailed the author of the paper to get some measurement details but no response.

My estimate of the data in this paper was a global annual area weighted average of 62 meters.

Trying Simple Models with Varying MLD

I updated the Matlab program from Measuring Climate Sensitivity – Part One. The globe is now broken up into 30º latitude bands, with the potential for a different value of mixed layer depth for each month of the year.

I created a number of different profiles:

Depth Type 0 – constant with month and latitude, as in the original article

Type 1 – using the values from de Boyer’s paper, as best as can be estimated from looking at the monthly maps.

Type 2 – no change each month, with scaling of 60ºN-90ºN = 100x the value for 0ºN – 30ºN, and 30ºN – 60ºN = 10x the value for 0ºN – 30ºN – similarly for the southern hemisphere.

Type 3 – alternating each month between Type 2 and its inverse, i.e., scaling of 0ºN – 30ºN = 100x the value for 60ºN-90ºN and 30ºN – 60ºN = 10x the value for 60ºN-90ºN.

Type 4 – no variation by latitude, but  month 1 = 1000x month 4, month 2 = 100x month 4, month 3 = 10x month 4, repeating 3 times  per year.

In each case the global annual (area weighted) average = 62m.

Essentially types 2-4 are aimed at creating extreme situations.

Here are some results (review the original article for some of the notation), recalling that the actual climate sensitivity, λ = 3.0:

BERJAYA

Figure 6

BERJAYA

Figure 7 – as figure 6 without 30-day averaging

BERJAYA

Figure 8

BERJAYA

Figure 9

BERJAYA

Figure 10

BERJAYA

Figure 11

BERJAYA

Figure 12

What’s the message from these results?

In essence, type 0 (the original) and type 1 (using actual MLDs vs latitude and month from de Boyer’s paper) are quite similar – but not exactly the same.

However, if we start varying the MLD by latitude and month in a more extreme way the results come out very differently – even though the global average MLD is the same in each case.

This demonstrates that the temporal and area variation of MLD can have a significant effect and modeling the ocean as one slab – for the purposes of this enterprise – may be risky.

Non-Linearity

We haven’t considered the effect of non-linearity in these simple models. That is, what about interactions between different regions and months. If we created a yet more complex model where heat flowed between regions dependent on the relative depths of the mixed layers what would we find?

Losing the Plot?

Now, in case anyone has lost the plot by this stage – and it’s possible that I have – don’t get confused into thinking that we are evaluating GCM’s and gosh aren’t they simplistic.. No, GCM’s have very sophisticated modeling.

What we have been doing is tracing a path that started with a paper by Spencer & Braswell. This paper used a very simple model to show that with some random daily fluctuations in top of atmosphere radiative flux, perhaps due to clouds, the measurement of climate sensitivity doesn’t match the actual climate sensitivity.

We can do this in a model – prescribe a value and then test whether we can measure it. This is where this simple model came in. It isn’t a GCM.

However, Murphy & Forster came along and said if you use a deeper mixed ocean layer (which they claim is justified) then the measurement of climate sensitivity does more or less match the actual climate sensitivity (they also had comment on the values chosen for radiative flux anomalies, a subject for another day).

What struck me was that the test model needs some significant improvement to be able to assess whether or not climate sensitivity can be measured. And this is with the caveat – if climate sensitivity is a constant.

The Next Phase – More Realistic Ocean Model

As Murphy & Forster have pointed out, the longer the time period, the more heat is “injected” into the deeper ocean from the mixed layer.

So a better model would capture this better than just creating a deeper mixed layer for a longer time. Modeling true global ocean convection is an impossible task.

As a recap, conducted heat flow:

q” = k.ΔT/d

where q” = heat flow per unit area, k = conductivity, ΔT = temperature difference, and d = depth of layer

Take a look at Heat Transfer Basics – Part Zero for more on these basics.

For water, k = 0.6 W/m².K. So, as an example, if we have a 10ºC temperature difference across 1 km depth of water, q” = 0.006 W/m². This is tiny. Heat flow via conduction is insignificant. Convection is what moves heat in the ocean.

Many researchers have measured and estimated vertical heat flow in the ocean to come up with a value for vertical eddy diffusivity. This allows us to make some rough estimates of vertical heat flow via convection.

In the next version of the Matlab program (“in press”) the ocean is modeled with different eddy diffusivities below the mixed ocean layer to see what happens to the measurement of climate sensitivity. So far, the model comes up with wildly varying results when the eddy diffusivity is low, i.e., heat cannot easily move into the ocean depths. And it comes up with normal results when the eddy diffusivity is high, i.e., heat moves relatively quickly into the ocean depths.

Due to shortness of time, this problem has not yet been resolved. More in due course.

This article is already long enough, so the next part will cover the estimated values for eddy diffusivity because it’s an interesting subject

Conclusion

Regular readers of this blog understand that navigating to any kind of conclusion takes some time on my part. And that’s when the subject is well understood. I’m finding that the signposts on the journey to measuring climate sensitivity are confusing and hard to read.

And that said, this article hasn’t shed any more light on the measurement of climate sensitivity. Instead, we have reviewed more ways in which measurements of it might be wrong. But not conclusively.

Next up we will take a detour into eddy diffusivity, hoping in the meantime that the Matlab model problems can be resolved. Finally a more accurate model incorporating eddy diffusivity to model vertical heat flow in the ocean will show us whether or not climate sensitivity can be accurately measured.

Possibly.

References

Potential Biases in Feedback Diagnosis from Observational Data: A Simple Model Demonstration, Spencer & Braswell, Journal of Climate (2008)

On the accuracy of deriving climate feedback parameters from correlations between surface temperature and outgoing radiation, Murphy & Forster, Journal of Climate (2010)

Observation of large diurnal warming events in the near-surface layer of the western equatorial Pacific warm pool, Soloviev & Lukas, Deep Sea Research Part I: Oceanographic Research Papers (1997)

Atmosphere, Ocean and Climate Dynamics: An Introductory Text, Marshall & Plumb, Elsevier Academic Press (2008)

Mixed layer depth over the global ocean: An examination of profile data and a profile-based climatology, de Boyer Montegut et al, JGR (2004)

The Creation of Time

We all would like this machine that creates time.

In the context of Science of Doom all my time has been diverted into work-related activities and I’m not sure when this will ease up.

Unless someone hands me this machine, and for a price well below market worth, I am not sure when my next post will take place.

I have lots of ideas, but like to do research and gain understanding before writing articles.

Normal service will eventually be resumed.

I don’t think this is a simple topic.

The essence of the problem is this:

Can we measure the top of atmosphere (TOA) radiative changes and the surface temperature changes and derive the “climate sensivity” from the relationship between the two parameters?

First, what do we mean by “climate sensitivity”?

In simple terms this parameter should tell us how much more radiation (“flux”) escapes to space for each 1°C increase in surface temperature.

Climate Sensitivity Is All About Feedback

Climate sensitivity is all about trying to discover whether the climate system has positive or negative feedback.

If the average surface temperature of the earth increased by 1°C and the radiation to space consequently increased by 3.3 W/m², this would be approximately “zero feedback”.

Why is this zero feedback?

If somehow the average temperature of the surface of the planet increased by 1°C – say due to increased solar radiation – then as a result we would expect a higher flux into space. A hotter planet should radiate more. If the increase in flux = 3.3 W/m² it would indicate that there was no negative or positive feedback from this solar forcing (note 1).

Suppose the flux increased by 0. That is, the planet heated up but there was no increase in energy radiated to space. That would be positive feedback within the climate system – because there would be nothing to “rein in” the increase in temperature.

Suppose the flux increased by 5 W/m². In this case it would indicate negative feedback within the climate system.

The key value is the “benchmark” no feedback value of 3.3 W/m². If the value is above this, it’s negative feedback. If the value is below this, it’s positive feedback.

Essentially, the higher the radiation to space as a result of a temperature increase the more the planet is able to “damp out” temperature changes that are forced via solar radiation, or due to increases in inappropriately-named “greenhouse” gases.

Consider the extreme case where as the planet warms up it actually radiates less energy to space – clearly this will lead to runaway temperature increases (less energy radiated means more energy absorbed, which increased temperatures, which leads to even less energy radiated..).

As a result we measure sensitivity as W/m².K which we read as Watts per meter squared per Kelvin” – and 1K change is the same as 1°C change.

Theory and Measurement

In many subjects, researchers’ algebra converges on conventional usage, but in the realm of climate sensitivity everyone has apparently adopted their own. As a note for non-mathematicians, there is nothing inherently wrong with this, but it just makes each paper confusing especially for newcomers and probably for everyone.

I mostly adopt the Spencer & Braswell 2008 terminology in this article (see reference and free link below). I do change their α (climate sensitivity) into λ (which everyone else uses for this value) mainly because I had already produced a number of graphs with λ before starting to write the article..

The model is a very simple 1-dimensional model of temperature deviation into the ocean mixed layer, from the first law of thermodynamics:

C.∂T/∂t = F + S ….[1]

where C = heat capacity of the ocean, T = temperature anomaly, t = time, F = total top of atmosphere (TOA) radiative flux anomaly, S = heat flux anomaly into the deeper ocean

What does this equation say?

Heat capacity times change in temperature equals the net change in energy

- this is a simple statement of energy conservation, the first law of thermodynamics.

The TOA radiative flux anomaly, F, is a value we can measure using satellites. T is average surface temperature, which is measured around the planet on a frequent basis. But S is something we can’t measure.

What is F made up of?

Let’s define:

F = N + f - λT ….[1a]

where N = random fluctuations in radiative flux, f = “forcings”, and λT is the all important climate response or feedback.

The forcing f is, for the purposes of this exercise, defined as something added into the system which we believe we can understand and estimate or measure. This could be solar increases/decreases, it could be the long term increase in the “greenhouse” effect due to CO2, methane and other gases. For the purposes of this exercise it is not feedback. Feedback includes clouds and water vapor and other climate responses like changing lapse rates (atmospheric temperature profiles), all of which combine to produce a change in radiative output at TOA.

And an important point is that for the purposes of this theoretical exercise, we can remove f from the measurements because we believe we know what it is at any given time.

N is an important element. Effectively it describes the variations in TOA radiative flux due to the random climatic variations over many different timescales.

The climate sensitivity is the value λT, where λ is the value we want to find.

Noting the earlier comment about our assumed knowledge of ‘f’ (note 2), we can rewrite eqn 1:

C.∂T/∂t = - λT + N + S ….[2]

remembering that - λT + N = F is the radiative value we measure at TOA

Regression

If we plot F (measured TOA flux) vs T we can estimate λ from the slope of the least squares regression.

However, there is a problem with the estimate:

λ (est) = Cov[F,T] / Var[T] ….[3]

          = Cov[- λT + N, T] / Var[T]

where Cov[a,b] = covariance of a with b, and Var[a]= variance of a

Forster & Gregory 2006

This oft-cited paper (reference and free link below) calculates the climate sensitivity from 1985-1996 using measured ERBE data at 2.3 ± 1.3 W/m².K.

Their result indicates positive feedback, or at least, a range of values which sit mainly in the positive feedback space.

On the method of calculation they say:

This equation includes a term that allows F to vary independently of surface temperature.. If we regress (- λT+ N) against T, we should be able to obtain a value for λ. The N terms are likely to contaminate the result for short datasets, but provided the N terms are uncorrelated to T, the regression should give the correct value for λ, if the dataset is long enough..

[Terms changed to SB2008 for easier comparison, and emphasis added].

Simulations

Like Spencer & Braswell, I created a simple model to demonstrate why measured results might deviate from the actual climate sensitivity.

The model is extremely simple:

  • a “slab” model of the ocean of a certain depth
  • daily radiative noise (normally distributed with mean=0, and standard deviation σN)
  • daily ocean flux noise (normally distributed with mean=0, and standard deviation σS)
  • radiative feedback calculated from the temperature and the actual climate sensitivity
  • daily temperature change calculated from the daily energy imbalance
  • regression of the whole time series to calculate the “apparent” climate sensitivity

In this model, the climate sensitivity, λ = 3.0 W/m².K.

In some cases the regression is done with the daily values, and in other cases the regression is done with averaged values of temperature and TOA radiation across time periods of 7, 30 & 90 days. I also put a 30-day low pass filter on the daily radiative noise in one case (before “injecting” into the model).

Some results are based on 10,000 days (about 30 years), with 100,000 days (300 years) as a separate comparison.

In each case the estimated value of λ is calculated from the mean of 100 simulation results. The 2nd graph shows the standard deviation σλ, of these simulation results which is a useful guide to the likely spread of measured results of λ (if the massive oversimplifications within the model were true). The vertical axis (for the estimate of λ) is the same in each graph for easier comparison, while the vertical axis for the standard deviation changes according to the results due to the large changes in this value.

First, the variation as the number of time steps changes and as the averaging period changes from 1 (no averaging) through to 90-days. Remember that the “real” value of λ = 3.0 :

BERJAYA

Figure 1

Second, the estimate as the standard deviation of the radiative flux is increased, and the ocean depth ranges from 20-200m. The daily temperature and radiative flux is calculated as a monthly average before the regression calculation is carried out:

BERJAYA

Figure 2

As figure 2, but for 100,000 time steps (instead of 10,000):

BERJAYA

Figure 3

Third, the estimate as the standard deviation of the radiative flux is increased, and the ocean depth ranges from 20-200m. The regression calculation is carried out on the daily values:

BERJAYA

Figure 4

As figure 4, but with 100,000 time steps:

BERJAYA

Figure 5

Now against averaging period and also against low pass filtering of the “radiative flux noise”:

BERJAYA

Figure 6

As figure 6 but with 100,000 time steps:

BERJAYA

Figure 7

Now with the radiative “noise” as an AR(1) process (see Statistics and Climate – Part Three – Autocorrelation), vs the autoregressive parameter φ and vs the number of averaging periods: 1 (no averaging), 7, 30, 90 with 10,000 time steps (30 years):

BERJAYA

Figure 8

And the same comparison but with 100,000 timesteps:

BERJAYA

Figure 9

Discussion of Results

If we consider first the changes in the standard deviation of the estimated value of climate sensitivity we can see that the spread in the results is much higher in each case when we consider 30 years of data vs 300 years of data. This is to be expected. However, given that in the 30-year cases σλ is similar in magnitude to λ we can see that doing one estimate and relying on the result is problematic. This of course is what is actually done with measurements from satellites where we have 30 years of history.

Second, we can see that mostly the estimates of λ tend to be lower than the actual value of 3.0 W/m².K. The reason is quite simple and is explained mathematically in the next section which non-mathematically inclined readers can skip.

In essence, it is related to the idea in the quote from Forster & Gregory. If the radiative flux noise is uncorrelated to temperature then the estimates of λ will be unbiased. By the way, remember that by “noise” we don’t mean instrument noise, although that will certainly be present. We mean the random fluctuations due to the chaotic nature of weather and climate.

If we refer back to Figure 1 we can see that when the averaging period = 1, the estimates of climate sensitivity are equal to 3.0. In this case, the noise is uncorrelated to the temperature because of the model construction. Slightly oversimplifying, today’s temperature is calculated from yesterday’s noise. Today’s noise is a random number unrelated to yesterday’s noise. Therefore, no correlation between today’s temperature and today’s noise.

As soon as we average the daily data into monthly results which we use to calculate the regression then we have introduced the fact that monthly temperature is correlated to monthly radiative flux noise (note 3).

This is also why Figures 8 & 9 show a low bias for λ even with no averaging of daily results. These figures are calculated with autocorrelation for radiative flux noise. This means that past values of flux are correlated to current vales – and so once again, daily temperature will be correlated with daily flux noise. This is also the case where low pass filtering is used to create the radiative noise data (as in Figures 6 & 7).

Maths

x = slope of the line from the linear regression

x = Cov[- λT + N, T] / Var[T] ….[3]

It’s not easy to read equations with complex terms numerator and denominator on the same line, so breaking it up:

Cov[- λT + N, T] = E[ (λT + N)T ] – E[- λT + N]E[T] ….[4], where E[a] = expected value of a

= E[-λT²] + E[NT] + λ.E[T].E[T] – E[N].E[T]

= -λ { E[T²] – (E[T])² } + E[NT] – E[N].E[T] …. [4]

And

Var[T] = E[T²] – (E[T])² …. [5]

So

x = -λ + { E[NT] – E[N].E[T] } / { E[T²] – (E[T])² } …. [6]

And we see that the regression of the line is always biased if N is correlated with T. If the expected value of N = 0 the last term in the top part of the equation drops out, but E[NT] ≠ 0 unless N is uncorrelated with T.

Note of course that we will use the negative of the slope of the line to estimate λ, and so estimates of λ will be biased low.

As a note for the interested student, why is it that some of the results show λ > 3.0?

Murphy & Forster 2010

Murphy & Forster picked up the challenge from Spencer & Braswell 2008 (reference below but no free link unfortunately). The essence of their paper is that using more realistic values for radiative noise and mixed ocean depth the error in calculation of λ is very small:

BERJAYA

From Murphy & Forster (2010)

Figure 10

The value ba on the vertical axis is a normalized error term (rather than the estimate of λ).

Evaluating their arguments requires more work on my part, especially analyzing some CERES data, so I hope to pick that up in a later article. [Update, Spencer has a response to this paper on his blog, thanks to Ken Gregory for highlighting it]

Linear Feedback Relationship?

One of the biggest problems with the idea of climate sensitivity, λ, is the idea that it exists as a constant value.

From Stephens (2005), reference and free link below:

The relationship between global-mean radiative forcing and global-mean climate response (temperature) is of intrinsic interest in its own right. A number of recent studies, for example, discuss some of the broad limitations of (1) and describe procedures for using it to estimate Q from GCM experiments (Hansen et al. 1997; Joshi et al. 2003; Gregory et al. 2004) and even procedures for estimating  from observations (Gregory et al. 2002).

While we cannot necessarily dismiss the value of (1) and related interpretation out of hand, the global response, as will become apparent in section 9, is the accumulated result of complex regional responses that appear to be controlled by more local-scale processes that vary in space and time.

If we are to assume gross time–space averages to represent the effects of these processes, then the assumptions inherent to (1) certainly require a much more careful level of justification than has been given. At this time it is unclear as to the specific value of a global-mean sensitivity as a measure of feedback other than providing a compact and convenient measure of model-to-model differences to a fixed climate forcing (e.g., Fig. 1).

[Emphasis added and where the reference to "(1)" is to the linear relationship between global temperature and global radiation].

If, for example, λ is actually a function of location, season & phase of ENSO.. then clearly measuring overall climate response is a more difficult challenge.

Conclusion

Measuring the relationship between top of atmosphere radiation and temperature is clearly very important if we want to assess the all-important climate sensitivity.

Spencer & Braswell have produced a very useful paper which demonstrates some obvious problems with deriving the value of climate sensitivity from measurements. Although I haven’t attempted to reproduce their actual results, I have done many other model simulations to demonstrate the same problem.

Murphy & Forster have produced a paper which claims that the actual magnitude of the problem demonstrated by Spencer & Braswell is quite small in comparison to the real value being measured (as yet I can’t tell whether their claim is correct).

The value called climate sensitivity might be a variable (i.e., not a constant value) and it might turn out to be much harder to measure than it really seems (and already it doesn’t seem easy).

References

The Climate Sensitivity and Its Components Diagnosed from Earth Radiation Budget Data, Forster & Gregory, Journal of Climate (2006)

Potential Biases in Feedback Diagnosis from Observational Data: A Simple Model Demonstration, Spencer & Braswell, Journal of Climate (2008)

On the accuracy of deriving climate feedback parameters from correlations between surface temperature and outgoing radiation, Murphy & Forster, Journal of Climate (2010)

Cloud Feedbacks in the Climate System: A Critical Review, Stephens, Journal of Climate (2005)

Notes

Note 1 – The reason why the “no feedback climate response” = 3.3 W/m².K is a little involved but is mostly due to the fact that the overall climate is radiating around 240 W/m² at TOA.

Note 2 - This is effectively the same as saying f=0. If that seems alarming I note in advance that the exercise we are going through is a theoretical exercise to demonstrate that even if f=0, the regression calculation of climate sensitivity includes some error due to random fluctuations.

Note 3 - If the model had one random number for last month’s noise which was used to calculate this month’s temperature then the monthly results would also be free of correlation between the temperature and radiative noise.

In a discussion a little while ago on What’s the Palaver? – Kiehl and Trenberth 1997, one of our commenters asked about the surface forcing and how it could possibly lead to anything like the IPCC-projected temperature change for doubling of CO2.

Following a request for clarification, he added:

..We first look at the RHS. We believe that the atmosphere will also increase in temperature by roughly the same amount, so there will be no change in the conductive term. The increase in the Radiative term is roughly 5.5W/m².

The increase in the evaporative term is much more difficult, but is believed to be in the range 2-7%/DegC. So the increase in the evaporative term is 1.5 to 5.5W/m², for a total change on the RHS of 7 to 11 W/m².

Since balance is an assumption, the LHS changed by the same amount. The surface sensitivity is therefore 0.095 to 0.15 DegC/W/m².

Note that this is the sensitivity to changes in Surface Forcing, whatever the source. It is NOT the response to Radiative Forcing – there is no response of the surface to Radiative Forcing, it can only respond to Sunlight and Back-Radiation.

[See the whole comment and exchange for the complete picture].

These are good questions and no doubt many people have similar ones. The definition of radiative forcing (see CO2 – An Insignificant Trace Gas? Part Seven – The Boring Numbers) is at the tropopause, which is the top of the troposphere (around 12km above the surface).

Why is it at the tropopause and not at the surface? The great Ramanathan explains (in his 1998 review paper):

..Manabe & Wetherald’s [1967] paper, which convincingly demonstrated that the CO2-induced surface warming is not solely determined by the energy balance at the surface but by the energy balance of the coupled surface-troposphere-stratosphere system.

The underlying concept of the Manabe-Wetherald model is that the surface and the troposphere are so strongly coupled by convective heat and moisture transport that the relevant forcing governing surface warming is the net radiative perturbation at the tropopause, simply known as radiative forcing.

In essence, the reason we consider the value at the tropopause is that it is the best value to tell us what will happen at the surface. It is now an idea established for over 40 years, although for some it might sound bizarre. So we will try and make sense of it here.

Here is a schematic originating in Ramanathan’s 1981 paper, but extracted here from his 1998 review paper:

From Ramanathan (1998)

From Ramanathan (1998)

Figure 1

The first thing to pay attention to is the right hand side – 1. CO2 direct surface heating - which is shown as 1.2 W/m².

The surface forcing from a doubling of CO2 is around 1 W/m² compared with around 4 W/m² at the tropopause. The surface forcing is a lot less than at the top of atmosphere!

Before too much joy sets in, let’s consider what these concepts represent. They are essentially idealized quantities, derived from considering the instantaneous change in concentrations of CO2.

As CO2 shows a steady increase year on year, the idea of doubling overnight is clearly not in accord with reality. However, it is a useful comparison point and helps to get many ideas straight. If instead we said, “CO2 increasing by 1% per year”, we would need to define a time period for this 1% annual increase, plus how long after the end before a new balance was restored. It wouldn’t make solving the problem any easier – and it would make the results harder to understand – by contrast GCM’s do consider a steadily rising CO2 level according to whatever scenario they are considering.

So, with the idea of an instantaneous doubling, if the surface increase in radiative forcing is less than the tropopause increase in radiative forcing, this must mean that the balance of energy is absorbed within the atmosphere. And also, we have to consider what happens as a result of the surface energy imbalance.

The numbers I use here are Ramanathan’s numbers from his 1981 paper. Later, and more accurate, numbers have been calculated but don’t affect the main points of this analysis. The reason for reviewing his analysis is because some (but not all) of the inherent responses of the climate system are explicitly calculated – making it easier to understand than the output of  GCM.

Immediate Response

The immediate result of this doubling of CO2 is a reduced emission of radiation (OLR = outgoing longwave radiation) from the climate system into space. See the Atmospheric Radiation and the “Greenhouse” Effect series for detailed explanations of why.

At the tropopause the OLR reduces by 3.1 W/m², and downward emission from the stratosphere into the troposphere increases by 1.2 W/m².

This results in a net forcing at the tropopause of 4.3 W/m². Most of the radiation from the atmosphere to the surface (as a result of more CO2) is absorbed by water vapor. So at the surface the DLR (downward long radiation) increases by only 1.2 W/m² – this is the (immediate) surface forcing. Here is a simple graphical explanation of why the OLR decreases and the DLR increases:

BERJAYA

Figure 2 – Click for a larger image

Response After a Few Months

The stratosphere cools and reaches a new radiative equilibrium. This reduces the downward emission from the stratosphere by a small amount. The new value of radiative forcing at the tropopause = 4.2 W/m².

Response After Many Decades

The surface-troposphere warms until a new equilibrium is reached – the radiative forcing at the tropopause has returned to zero.

The Surface

So let’s now consider the surface. Take a look at Figure 1 again. The values/ranges we will consider are calculated by a model. This doesn’t mean they are correct. It means that applying well-understood processes in a simplistic way gives us a “first order” result. The reason for assessing this kind of approach is because our mental models are usually less accurate than a calculated result which draws on well-understood physics.

As Ramanathan says in his 1998 paper:

As a caveat, the system we considered up to this point to elucidate the principles of warming is a highly simplified linear system. Its use is primarily educational and cannot be used to predict actual changes.

Process 1 is as already described – the surface forcing increases by just over 1 W/m². But the balance of 3 W/m² goes into heating the troposphere.

Process 2 – The warming of the troposphere results in increases downward radiation to the surface (because the hotter the body, the higher the radiation emitted). The calculated value is an additional 2.3 W/m², so the surface imbalance is now 3.5 W/m² and the surface temperature must increase in response. Upwards surface radiation and/or sensible and latent heat will increase to balance.

Process 3 – The surface emission of radiation increases at around 5.5 W/m² for every 1°C of surface temperature increase. But this is almost balanced by increased downward radiation from the atmosphere (“back radiation”). The net effect is only about 10% of the change in upward radiation. So latent heat and sensible heat increase to restore the energy balance, but this also heats the troposphere.

Process 4 - The tropospheric humidity increases. This increases the emissivity of the atmosphere near the surface, which increases the back radiation.

So essentially some cycles are reinforcing each other (=positive feedback). The question is about the value of the new equilibrium point.

From Ramanathan (1981)

From Ramanathan (1981)

Figure 3

In Ramanathan’s 1981 paper he gives some basic calculations before turning to GCM results. The basic calculations are quite interesting because one of the purposes of the paper was to explain why some model results of the day produced very small equilibrium temperature changes.

Sadly for some readers, a little maths is necessary to reproduce the result. It is simple maths because it is based on simple concepts – as already presented. As much as possible I follow the equation numbers and notations from Ramanathan’s 1981 paper.

Calculations

Energy balance at an “average” surface:

Upward flux = Downward flux

→  LH + SH + F↑ = F↓ + S + ΔR  ….[2]

where LH = latent heat, SH = sensible heat, F↑ = surface emitted upward radiation, F↓ = surface downward radiation from the atmosphere, S = solar radiation absorbed, ΔR = instantaneous change in energy absorbed at the surface due to an increase in CO2

And see note 1. We have simple formulas for the left hand side.

F↑ = σT4….[3a]

Latent heat and sensible heat flux have “bulk aerodynamic formulas” (note 2):

LH = ρLCDV (q*M – qS)   ….[3b]

SH = ρcpCDV (TM – TS)   ….[3c]

Where ρ = density of air = 1.3 kg/m, L = latent heat of water vapor = 2.5 x 106, CD = empirically determined coefficient ≈ 1.3 x10-3,  V = average wind speed at some reference height above the surface ≈ 5 m/s, q*M = specific humidity at saturation at the surface temperature of the ocean,  qS = specific humidity at the reference height,  TM = temperature of the ocean at the surface,  TS = temperature of the air at the reference height (typically 10m).

To give an idea of typical values, for every 1°C difference between the surface and the air at the reference height, SH = 8.5 W/m²K, and with a relative humidity of 80% at the reference height (and 100% at the ocean surface), LH = 55 W/m²K.

Now we consider changes.

TM‘ is the change in the surface temperature of the ocean as the result of the increased CO2, and similar notation for other changes in values. Missing out a few steps that you can read in the paper:

TM‘ =                                    ΔR(0) + ΔF↓(2) + ΔF↓(3)                               ….[13]

      [ ∂LH/∂TM + ∂SH/∂TM + 4σTM³] + [  ∂LH/∂TS +  ∂SH/∂TS ].TS‘/TM

This probably seems a little daunting to a lot of readers.. so let’s explain it:

  • The parameter on the top line in black, ΔR(0) is the surface radiative forcing from the increase in CO2
  • The red terms are the changes in downward radiation as a result of process 2 and 3 described above
  • The blue terms are the changes in upward flux due to only the ocean surface temperature changing
  • The green terms are the changes in upward flux due to only the atmospheric temperature near the surface changing
  • The blue term ≈ 30 W/m²K @ 15°C; the green term ≈ -8.5 W/m²K @ 15°C (note 3)

And the smaller the total under the line, the higher the increase in temperature. And there are two competing terms:

  • As the surface temperature of the ocean increases the heat transfer from the ocean to the atmosphere increases
  • As the atmospheric temperature (just above the ocean surface) increases the heat transfer from the ocean to the atmosphere decreases

As an interesting comparison, Ramanathan reviewed the methods and results of Newell & Dopplick (1979) who found a changed surface temperature, Tm’ = 0.04 °C as a result of CO2 doubling. Effectively, very little change in surface temperature as a result of doubling of CO2.

Ramanathan states that the calculations of Newell & Dopplick had ignored the red terms and the green terms. Ignoring the red terms means that the heating of the atmosphere is ignored. Ignoring the green terms means that the effect of the ocean surface heating is inflated – if the ocean surface heats and the atmosphere just above somehow stayed the same then the heat transferred would be higher than if the atmospheric temperature also increased as a result. (Because heat transfer depends on temperature difference).

I expect that many people doing their own estimates will be working from similar assumptions.

Later Work

Here is a graphic from Andrews et al (2009), reference and free link below, which shows the simplified idea:

BERJAYA

From Andrews et al (2009)

Figure 4

The paper itself is well worth reading and perhaps will be the subject of another article at a later date.

Conclusion

I haven’t demonstrated that the surface will warm by 3°C for a doubling of CO2. But I hope I have demonstrated the complexity of the processes involved and why a simplistic calculation of how the surface responds immediately to the surface forcing is not the complete answer. It is nowhere near the complete answer.

The surface temperature change as a result of doubling of CO2 is, of course, a massively important question to answer. GCM’s are necessarily involved despite their limitations.

Re-iterating what Ramanathan said in his 1998 paper in case anyone thinks I am making a case for a 3°C surface temperature increase:

As a caveat, the system we considered up to this point to elucidate the principles of warming is a highly simplified linear system. Its use is primarily educational and cannot be used to predict actual changes.

References

Trace Gas Greenhouse Effect and Global Warming, V. Ramanathan, Ambio (1998)

The role of ocean-atmosphere interactions in the CO2 climate problem, V Ramanathan, Journal of Atmospheric Sciences (1981)

Thermal equilibrium of the atmosphere with a given distribution of atmospheric humidity, Manabe & Wetherald, Journal of Atmospheric Sciences (1967)

A Surface Energy Perspective on Climate Change, Andrews, Forster & Gregory, Journal of Climate (2009)

Notes

Note 1: The equation ignores the transfer of heat into the ocean depths

Note 2: The “bulk aerodynamic formulas” – as they have become known – are more usable versions of the fundamental equations of heat and water vapor flux. Upward sensible heat flux, SH = ρcp<wT>, where w = vertical velocity, T = temperature, so <wT> is the time average of the product of vertical velocity and temperature. However, turbulent motions are so rapid, changing on such short time intervals that measurement of these values is usually impossible (or requires intensive measurement with specialist equipment in one location). We can write,

w = <w> + w’, where <w> = mean vertical velocity and w’ = deviation of vertical velocity from the mean, likewise T = <T> + T’.

So:

<wT> = <w><T> + <w’ T’> or, Total = Mean + Eddy

Near the surface the mean vertical motion is very small compared with the turbulent vertical velocity and so the turbulent component, <w’ T’>, dominates. Therefore,

SH = cρ <w’ T’>

LH = L ρ <w’ T’>

where  cp = specific heat capacity of air, ρ = density of air, L = latent heat of water vapor

By various thermodynamic arguments, and especially by lots of empirical measurements, an estimate of heat transfer can be made via the bulk aerodynamic formulas shown above, which use the average horizontal wind speed at the surface in conjunction with the coefficients of heat transfer, which are related to the friction term for the wind at the ocean surface.

Note 3: The calculation of each of the partial derivative terms is not shown in the paper, these are my calculations. I believe that ∂LH/∂TS = 0, most of the time – this is because if the atmosphere at the reference height is not saturated then an increase in the atmospheric temperature, TS, does not change the moisture flux, and therefore, does not change the latent heat. I might be wrong about this, and clearly some of the time this assumption I have made is not valid.

In the last article we saw some testing of the simplest autoregressive model AR(1). I still have an outstanding issue raised by one commenter relating to the hypothesis testing that was introduced, and I hope to come back to it at a later stage.

Different Noise Types

Before we move onto more general AR models, I did do some testing of the effectiveness of the hypothesis test for AR(1) models with different noise types.

The testing shown in Part Four has Gaussian noise (a “normal distribution”), and the theory applied is only apparently valid for Gaussian noise, so I tried uniform distribution of noise and also a Gamma noise distribution:

BERJAYA

Figure 1

The Gaussian and uniform distribution produce the same results. The Gamma noise result isn’t shown because it was also the same.

A Gamma distribution can be quite skewed, which was why I tried it – here is the Gamma distribution that was used (with the same variance as the Gaussian, and shifted to produce the same mean = 0):

BERJAYA

Figure 2

So in essence I have found that the tests work just as well when the noise component is uniformly distributed or Gamma distributed as when it has a Gaussian distribution (normal distribution).

Hypothesis Testing of AR(1) Model When the Model is Actually AR(2)

The next idea I was interested to try was to apply the hypothesis testing from Part Three on an AR(2) model, when we assume incorrectly that it is an AR(1) model.

Remember that the hypothesis test is quite simple – we produce a series with a known mean, extract a sample, and then using the sample find out how many times the test rejects the hypothesis that the mean is different from its actual value:

BERJAYA

Figure 3

As we can see, the test, which should be only rejecting 5% of the tests, rejects a much higher proportion as φ2 increases. This simple test is just by way of introduction.

Higher Order AR Series

The AR(1) model is very simple. As we saw in Part Three, it can be written as:

xt - μ = φ(xt-1 – μ) + εt

where xt = the next value in the sequence, xt-1 = the last value in the sequence, μ = the mean, εt = random quantity and φ = auto-regression parameter

[Minor note, the notation is changed slightly from the earlier article]

In non-technical terms, the next value in the series is made up of a random element plus a dependence on the last value – with the strength of this dependence being the parameter φ.

The more general autoregressive model of order p, AR(p), can be written as:

xt - μ = φ1(xt-1 – μ) + φ2(xt-2 – μ) + .. + φp(xt-p – μ) + εt

φ1..φp = the series of auto-regression parameters

In non-technical terms, the next value in the series is made up of a random element plus a dependence on the last few values. So of course, the challenge is to determine the order p, and then the parameters φ1..φp

There is a bewildering array of tests that can be applied, so I started simply. With some basic algebraic manipulation (not shown – but if anyone is interested I will provide more details in the comments), we can produce a series of linear equations known as the Yule-Walker equations, which allow us to calculate φ1..φp from the estimates of the autoregression.

If you look back to Figure 2 in Part Three you see that by regressing the time series with itself moved by k time steps we can calculate the lag-k correlation, rk, for k=1, 2, 3, etc. So we estimate r1, r2, r3, etc., from the sample of data that we have, and then solve the Yule-Walker equations to get φ1..φp

First of all I played around with simple AR(2) models. The results below are for two different sample sizes.

A population of 90,000 is created (actually 100,000 then the first 10% is deleted), and then a sample is randomly selected 10,000 times from this population. For each sample, the Yule-Walker equations are solved (each of 10,000 times) and then the results are averaged.

In these results I normalized the mean and standard deviation of the parameters by the original values (later I decided that made it harder to see what was going on and reverted to just displaying the actual sample mean and sample standard deviation):

BERJAYA

Figure 4

Notice that the sample size of 1,000 produces very accurate results in the estimation of φ1 & φ2, with a small spread. The sample size of 50 appears to produce a low bias in the calculated results, especially for φ2, which is no doubt due to not reading the small print somewhere..

Here is a histogram of the results, showing the spread across φ1 & φ2 - note the values on the axes, the sample size of 1000 produces a much tighter set of results, the sample size of 50 has a much wider spread:

BERJAYA

Figure 5

Then I played around with a more general model. With this model I send in AR parameters to create the population, but can define a higher order of AR to test against, to see how well the algorithm estimates the AR parameters from the samples.

In the example below the population is created as AR(3), but tested as if it might be an AR(4) model. The AR(3) parameters (shown on the histogram in the figure below) are φ1= 0.4, φ2= 0.2, φ3= -0.3.

The estimation seems to cope quite well as φ4 is estimated at about zero:

BERJAYA

Figure 6

The histogram of results for the first two parameters, note again the difference in values on the axes for the different sample sizes:

BERJAYA

Figure 7

[The reason for the finer detail on this histogram compared with figure 5 is just discovery of the Matlab parameters for 3d histograms].

Rotating the histograms around in 3d appears to confirm a bell-curve. Something to test formally at a later stage.

Here’s an example of a process which is AR(5) with φ1= 0.3, φ2= 0, φ3= 0, φ4= 0, φ5= 0.4; tested against AR(6):

BERJAYA

Figure 8

And the histogram of estimates of φ1& φ2:

BERJAYA

Figure 8

ARMA

We haven’t yet seen ARMA models – auto-regressive moving average models. And we haven’t seen MA models – moving average models with no auto-regressive behavior.

What is an MA or “moving average” model?

The term in the moving average is a “linear filter” on the random elements of the process. So instead of εt as the “uncorrelated noise” in the AR model we have εt plus a weighted sum of earlier random elements. The MA process, of order q, can be written as:

xt - μ = εt + θ1εt-1+ θ2εt-2 + .. + θpεt-p

θ1..θp = the series of moving average parameters

The term “moving average” is a little misleading, as Box and Jenkins also comment.

Why is it misleading?

Because for AR (auto-regressive) and MA (moving average) and ARMA (auto-regressive moving average = combination of AR & MA) models the process is stationary.

This means, in non-technical terms, that the mean of the process is constant through time. That doesn’t sound like “moving average”.

So think of “moving average” as a moving average (filter) of the random elements, or noise, in the process. By their nature these will average out over time (because if the average of the random elements = 0, the average of the moving average of the random elements = 0).

An example of this in the real world might be a chemical introduced randomly into a physical process  - this is the εt term – but because the chemical gets caught up in pipework and valves, the actual value of the chemical released into the process at time t is the sum of a proportion of the current value released plus a proportion of earlier values released. Examples of the terminology used for the various processes:

  • AR(3) is an autoregressive process of order 3
  • MA(2) is a moving average process of order 2
  • ARMA(1,1) is a combination of AR(1) and MA(1)

References

Time Series Analysis: Forecasting & Control, 3rd Edition, Box, Jenkins & Reinsel, Prentice Hall (1994)

In Part Three we started looking at time-series that are autocorrelated, which means each value has a relationship to one or more previous values in the time-series. This is unlike the simple statistical models of independent events.

And in Part Two we have seen how to test whether a sample comes from a population of a stated mean value. The ability to run this test is important and in Part Two the test took place for a population of independent events.

The theory that allows us to accept or reject hypotheses to a certain statistical significance does not work properly with serially correlated data (not without modification).

Here is a nice example from Wilks:

BERJAYA

From Wilks (2011)

Figure 1

Remember that (usually) with statistical test we don’t actually know the whole population – that’s what we want to find out about. Instead, we take a sample and attempt to find out information about the population.

Take a look at figure 1 – the lighter short horizontal lines are the means (the “averages”) of a number of samples. If you compare the top and bottom graphs you see that the distribution of the means of samples is larger in the bottom graph. This bottom graph is the timeseries with autocorrelation.

What this means is that if we take a sample from a time-series and apply the standard Student-t test to find out whether it came from a population of mean = μ, we will calculate that it didn’t come from a mean that it actually did come from too many times. So a 95% test will incorrectly reject the hypothesis a lot more than 5%.

To demonstrate this, here is the % of false rejections (“Type I errors”) as the autocorrelation parameter increases, when a standard Student-t test is applied:

BERJAYA

Figure 2

The test was done with Matlab, with a time-series population of 100,000, Gaussian (“normal distribution”) errors, and samples of 100 taken 10,000 times (in each case a random start point was chosen then the next 100 points were taken as a sample – this was repeated 10,000 times). When the time-series is generated with no serial correlation, the hypothesis test works just fine. As the autocorrelation increases (as we move to the right of the graph), the hypothesis test starts creating more false fails.

With AR(1) autocorrelation – the simplest model of autocorrelation – there is a simple correction that we can apply. This goes under different names like effective sample size and variance inflation factor.

For those who like details, instead of the standard deviation of the sample means:

s = σ/√n

we derive:

s = σ.√[(1+ρ)/n.(1-ρ)], where ρ = autocorrelation parameter.

Repeating the same test with the adjusted value:

BERJAYA

Figure 3

We see that Type I errors start to get above our expected values at higher values of autocorrelation. (I’m not sure whether that actually happens with an infinite number of tests and true random samples).

Note as well that the tests above were done using the known value of the autocorrelation parameter (this is like having secret information which we don’t normally have).

So I re-ran the tests using the derived autocorrelation parameter from the sample data (regressing the time-series against the same time-series with a one time step lag) – and got similar, but not identical results and apparently more false fails.

Curiosity made me continue (tempered by the knowledge of the large time-wasting exercise I had previously engaged in because of a misplaced bracket in one equation), so I rewrote the Matlab program to allow me to test some ideas a little further. It was good to rewrite because I was also wondering whether having one (long) time-series generated with lots of tests against it was as good as repeatedly generating a time-series and carrying out lots of tests each time.

So this following comparison had a time-series population of 100,000 events, samples of 100 items for each test, repeated for 100 tests, then the time-series regenerated – and this done 100 times. So 10,000 tests across 100 different populations – first with the known autoregression parameter, then with the estimated value of this parameter from the sample in question:

BERJAYA

Figure 4 – Each sample size = 100

The correct value of rejected tests should be 5% no matter what the autoregression parameter.

The rewritten program allows us to test for the effect of sample size. The following graph uses the known value of autogression parameter in the test, a time-series population of 100,000, drawing samples out 1000 times from each population, and repeating through 10 populations in total:

BERJAYA

Figure 5 – Using known value of autoregression parameter in Student T-test

Remembering that all of the lines should be horizontal on 5%, we can see that the largest sample population of 1000 is the most resistant to higher autoregression parameters.

This reminded me that the equation for the variance inflation factor (shown earlier) is in fact an approximation. The correct formula (for those who like to see such things):

BERJAYA

from Zwiers & von Storch (1995)

Figure 6

So I adjusted the variance inflation factor in the program and reran.

I’m really starting to slow things down now – because in each single hypothesis test we are estimating the autoregression parameter, ρ, by a lag-1 correlation, then with this estimate we have to calculate the above circled formula, which requires summing the equation from 1 through to the number of samples. So in the case of n=1000 that’s 1000 calculations, all summed, then used in a Student-t test. And this is done in each case for 1000 tests per population x 10 populations.. thank goodness for Matlab which did it in 18 minutes. (And apologies to readers trying to follow the detail – in the graphics I show the autoregression parameter as φ, when I meant to use ρ, no idea why..)

Fortunately, the result turns out almost identical to using the approximation (the graph using the approximation is not shown):

BERJAYA

Figure 7 – Using estimated autoregression parameter

So unless I have made some kind of mistake (quite possible), I take this to mean that the sampling uncertainty in the autoregression parameter adds uncertainty to the Student T-test, which can’t be corrected for easily.

With large samples, like 1000, it appears to work just fine. With time-series data from the climate system we have to take what we can get and mostly it’s not 1000 points.

We are still considering a very basic model – AR(1) with normally-distributed noise.

In the next article I hope to cover some more complex models, as well as the results from this kind of significance test if we assume AR(1) with normally-distributed noise yet actually have a different model in operation..

References

Statistical Methods in the Atmospheric Sciences, 3rd edition, Daniel Wilks, Academic Press (2011)

Taking Serial Correlation into Account in Tests of the Mean, Zwiers & von Storch, Journal of Climate (1995)

In the first two parts we looked at some basic statistical concepts, especially the idea of sampling from a distribution, and investigated how this question is answered:  Does this sample come from a population of mean = μ?

If we can answer this abstract-looking question then we can consider questions such as:

  • “how likely is it that the average temperature has changed over the last 30 years?”
  • “is the temperature in Boston different from the temperature in New York?”

It is important to understand the assumptions under which we are able to put % probabilities on the answers to these kind of questions.

The statistical tests so far described rely upon each event being independent from every other event. Typical examples of independent events in statistics books are:

  • the toss of a coin
  • the throw of a dice
  • the measurement of the height of a resident of Burkina Faso

In each case, the result of one measurement does not affect any other measurement.

If we measure the max and min temperatures in Ithaca, NY today, and then measure it tomorrow, and then the day after, are these independent (unrelated) events?

No.

Here is the daily maximum temperature for January 1987 for Ithaca, NY:

BERJAYA

Data from Wilks (2011)

Figure 1

Now we want to investigate how values on one day are correlated with values on another day. So we look at the correlation of the temperature on each day with progressively larger lags in days. The correlation goes by the inspiring and memorable name of the Pearson product-moment correlation coefficient.

This correlation is the value commonly known as “r”.

So for k=0 we are comparing each day with itself, which obviously has a perfect correlation. And for k=1 we are comparing each day with the one afterwards – and finding the (average) correlation. For k=2 we are comparing 2 days afterwards. And so on. Here are the results:

BERJAYA

Figure 2

As you can see, the autocorrelation decreases as the number of days increases, which is intuitively obvious. And by the time we get to more than 5 days, the correlation has decreased to zero.

By way of comparison, here is one random (normal) distribution with the same mean and standard deviation as the Ithaca temperature values:

BERJAYA

Figure 3

And the autocorrelation values:

BERJAYA

Figure 4

As you would expect, the correlation of each value with the next value is around zero. The reason it is not exactly zero is just the randomness associated with only 31 values.

Digression: Time-Series and Frequency Transformations

Many people will be new to the concept of how time-series values convert into frequency plots – the Fourier transform. For those who do understand this subject, skip forward to the next sub-heading..

Suppose we have a 50Hz sine wave. If we plot amplitude against time we get the first graph below.

BERJAYA

Figure 5

If we want to investigate the frequency components we do a fourier transform and we get the 2nd graph below. That simply tells us the obvious fact that a 50Hz signal is a 50Hz signal. So what is the point of the exercise?

What about if we have the time-based signal shown in the next graph – what can we tell about its real source?

BERJAYA

Figure 6

When we see the frequency transform in the 2nd graph we can immediately tell that the signal is made up of two sine waves – one at 50Hz and one at 120Hz – along with some noise. It’s not really possible to deduce that from looking at the time-domain signal (not for ordinary people anyway).

Frequency transforms give us valuable insights into data.

Just as a last point on this digression, in figure 5, why isn’t the frequency plot a perfect line at 50Hz? If the time-domain data went from zero to infinity, the frequency plot would be that perfect line. In figure 5, the time-domain data actually went from zero to 10 seconds (not all of which was plotted).

Here we see the frequency transform for a 50Hz sine wave over just 1 second:

BERJAYA

Figure 7

For people new to frequency transforms it probably doesn’t seem clear why this happens but by having a truncated time series we have effectively added other frequency components – from the 1 second envelope surrounding the 50 Hz sine wave. If this last point isn’t clear, don’t worry about it.

Autocorrelation Equations and Frequency

The simplest autocorrelation model is the first-order autoregression, or AR(1) model.

The AR(1) model can be written as:

xt+1 - μ = φ(xt – μ) + εt+1

where xt+1 = the next value in the sequence, xt = the last value in the sequence, μ = the mean, εt+1 = random quantity and φ = auto-regression parameter

In non-technical terms, the next value in the series is made up of a random element plus a dependence on the last value – with the strength of this dependence being the parameter φ.

It appears that there is some confusion about this simple modelRecently, referencing an article via Bishop Hill, Doug Keenan wrote:

To draw that conclusion, the IPCC had to make an assumption about the global temperature series. The assumption that it made is known as the “AR1” assumption (this is from the statistical concept of “first-order autoregression”). The assumption implies, among other things, that only the current value in a time series has a direct effect on the next value. For the global temperature series, it means that this year’s temperature affects next year’s, but temperatures in previous years do not. For example, if the last several years were extremely cold, that on its own would not affect the chance that next year will be colder than average. Hence, the assumption made by the IPCC seems intuitively implausible.

[Update - apologies to Doug Keenan for misunderstanding his point - see his comment below ]

The confusion in the statement above is that mathematically the AR1 model does only rely on the last value to calculate the next value – you can see that in the formula above. But that doesn’t mean that there is no correlation between earlier values in the series. If day 2 has a relationship to day 1, and day 3 has a relationship to day 2, clearly there is a relationship between day 3 and day 1 – just not as strong as the relationship between day 3 and day 2 or between day 2 and day 1.

(And it is easy to demonstrate with a lag-2 correlation of a synthetic AR1 series – the 2-day correlation is not zero).

Well, more for another article when we look at the various autoregression models.

For now we will consider the simplest model, AR1, to learn a few things about time-series data with serial correlation.

Here are some synthetic time-series with different autoregression parameters (the value φ in the equation) and gaussian (=normal, or the “bell-shaped curve”) noise. The gaussian noise is the same in each series – with a standard deviation=5.

I’ve used long time-series to make the frequency characteristics clearer (later we will see the same models over a shorter time period):

BERJAYA

Figure 8

The value <x> is the mean. Note that the standard deviation (sd) of the data gets larger as the autoregressive parameter increases. DW is the Durbin-Watson statistic which we will probably come back to at a later date.

When φ = 0, this is the same as each data value being completely independent of every other data value.

Now the frequency transformation (using a new dataset to save a little programming time on my part):

BERJAYA

Figure 9

The first graph in the panel, with φ=0, is known as “white noise“. This means that the energy per unit frequency doesn’t change with frequency. As the autoregressive parameter increases you can see that the energy shifts to lower frequencies. This is known as “red noise“.

Here are the same models over 100 events (instead of 10,000) to make the time-based characteristics easier to see:

BERJAYA

Figure 10

As the autoregression parameter increases you can see that the latest value is more likely to be influenced by the previous value.

The equation is also sometimes known as a red noise process because a positive value of the parameter φ averages or smoothes out short-term fluctuations in the serially independent series of innovations, ε, while affecting the slower variations much less strongly. The resulting time series is called red noise by analogy to visible light depleted in the shorter wavelengths, which appears reddish..

It is evident that the most erratic point to point variations in the uncorrelated series have been smoothed out, but the slower random variations are essentially preserved. In the time domain this smoothing is expressed as positive serial correlation. From a frequency perspective, the resulting series is ”reddened”.

From Wilks (2011).

There is more to cover on this simple model but the most important point to grasp is that data which is serially correlated has different statistical properties than data which is a series of independent events.

Luckily, we can still use many standard hypothesis tests but we need to make allowance for the increase in the standard deviation of serially correlated data over independent data.

References

Statistical Methods in the Atmospheric Sciences, 3rd edition, Daniel Wilks, Academic Press (2011)

In Part One we raced through some basics, including the central limit theorem which is very handy.

This theorem tells us that even if we don’t know the type of distribution of a population we can say something very specific about the mean of a sample from that population (subject to some caveats).

Even though this theorem is very specific and useful it is not the easiest idea to grasp conceptually. So it is worth taking the time to think about it – before considering the caveats..

What do we know about Samples taken from Populations?

Usually we can’t measure the entire “population”. So we take a sample from the population. If we do it once and measure the mean (= “the average”) of that sample, then repeat again and again, and then plot the “distribution” of those means of the samples we get the graph on the right:

BERJAYA

Figure 1

- and the graph on the right follows a normal distribution.

We know the probabilities associated with normal distributions, so this means that even if we have just ONE sampling distribution – the usual case – we can assess how likely it is that it comes from a specific population.

Here is a demonstration..

Using Matlab I created a population – the uniform distribution on the left of figure 1. Then I took a random sample from the population. Note that in real life you don’t know the details of the actual population, this is what you are trying to ascertain via statistical methods.

BERJAYA

Figure 2

Each sample was 100 items. The test was made using the known probabilities of the normal distribution – “is this sample from a population of mean = 10?” And for a statistical test we can’t get a definite yes or no. We can only get a % likelihood. So a % threshold was set – you can see in figure 3, it was set at 95%.

Basically we are asking, “is there a 95% likelihood that this sample was drawn from a population with a mean of 10?

The exercise of

a) extracting a random sample of 100 items, and

b) carrying out the test

- was repeated 100,000 times

Even though the sample was drawn from the actual population every single time, 5% of the time (4.95% to be precise) the test rejected the sample as coming from this population. This is to be expected. Statistical tests can only give answers in terms of a probability.

All we have done is confirmed that the test to 95% threshold gives us 95% correct answers and 5% incorrect answers. We do get incorrect answers. So why not increase the level of confidence in the test by increasing the threshold?

Ok, let’s try it. Let’s increase the threshold to 99%:

BERJAYA

Figure 3

Nice. Now we only get just under 1% false rejections. We have improved our ability to tell whether or not a sample is drawn from a specific population!

Or have we?

Unfortunately there is no free lunch, especially in statistics.

Reducing the Risk of Rejecting one Error Increases the Risk of Accepting a Different Error..

In each and every case here we happen to know that we have drawn the sample from the population. Suppose we don’t know this? - The usual situation. The wider we cast the net, the more likely we are to assume that a sample is drawn from a population when in fact it is not.

I’ll show some examples shortly, but here is a good summary of the problem – along with the terminology of Type I and Type II errors – note that H0 is the hypothesis that the sample was drawn from the population in question:

BERJAYA

From Brase & Brase (2009)

Figure 4

What we have been doing by moving from 95% to 99% certainty is reducing the possibility of making a Type I error = thinking that the sample does not come from the population in question when it actually does. But in doing so we have been increasing the possibility of making a Type II error = thinking that the sample does come from the population when it does not.

So now let’s widen the Matlab example – we have added an alternative population and are drawing samples out of that as well.

So first – as before – we take samples from the main population and use the statistical test to find out how good it is at determining whether the samples do come from this population. Then second, we take samples from the alternative population and use the same test to see whether it makes the mistake of thinking the samples come from the original population.

BERJAYA

Figure 5

As before, the % of false rejections is about what we would expect (note the number of tests was reduced to 10,000, for no particular reason) for a 95% significance test.

But now we see the % of “false acceptance” – where a sample from an alternative population is assessed to see whether it came from the original population. This error is – in this case – around 4%.

Now we increase the significance level to 99%:

BERJAYA

Figure 6

Of course, the number of false rejections (type I error) has dropped to 1%. Excellent.

But the number of false accepts (type II error) has increased from 4% to 13%. Bad news.

Now let’s demonstrate why it is that we can’t know in advance how likely Type II errors are. In the following example, the mean of the alternative population has moved to 10.5 (from 10.3):

BERJAYA

Figure 7

So no Type II errors. And we widen the test to 99%:

BERJAYA

Figure 8

Still no Type II errors. So we widen the test further to 99.9%:

BERJAYA

Figure 9

Finally we get some Type II errors. But because the population we are drawing the samples from is different enough from the population we are testing for (our hypothesis) the statistical test is very effective. The “power of the test” – in this case – is very high.

So, in summary, when you see a test “at the 5% significance level” =95%, or at the “1% significance level” = 99%, you have to understand that the more impressive the significance level, the more likely that a false result has been accepted.

Increasing the Sample Size

As the sample size increases the distribution of “the mean of the sample” gets smaller. I know, stats sounds like gobbledygook..

Let’s see a simple example to demonstrate what is a simple idea turned into incomprehensible English:

BERJAYA

Figure 10

As you increase the size of the sample, you reduce the spread of the “sampling means” and this means that separating truth from fiction becomes easier.

It isn’t always possible to increase the sample size (for example, the monthly temperatures since satellites were introduced), but if it is possible, it makes it easier to find whether a sample is drawn from a given distribution or not.

Student T-test vs Normal Distribution test

What is a student t-test? It sounds like something “entry level” that serious people don’t bother with..

Actually it is a test developed by William Gossett just over 100 years ago and he had to write under a pen name because of his employer. Statistics was one of his employer’s trade secrets..

In the tests shown earlier we had to know the standard deviation of the population from which the sample was drawn. Often we don’t know this, and so we have a sample of unknown standard deviation – and we want to test the probability that it is drawn from a population of a certain mean.

The principle is the same, but the process is slightly different.

More in the next article, and hopefully we get to the concept of autocorrelation.

In all the basic elements we have covered so far we have assumed that each element in a sample and in a population is unrelated to any other element – independent events. Unfortunately, in the atmosphere and in climate, this assumption is not true (perhaps there are some circumstances where it is true, but generally it is not true).

I am very much a novice with statistics. Until recently I have avoided stats in climate, but of course, I keep running into climate science papers which introduce some statistical analysis.

So I decided to get up to speed and this series is aimed at getting me up to speed as well as, hopefully, providing some enlightenment to the few people around who know less than me about the subject. In this series of articles I will ask questions that I hope people will answer, and also I will make confident statements that will turn out to be completely or partially wrong – I expect knowledgeable readers to put us all straight when this happens.

One of the reasons I have avoided stats is that I have found it difficult to understand the correct application of the ideas from statistics to climate science. And I have had a suspicion (that I cannot yet prove and may be totally wrong about) that some statistical analysis of climate is relying on unproven and unstated assumptions. All for the road ahead.

First, a few basics. They will be sketchy basics – to avoid it being part 30 by the time we get to interesting stuff – and so if there are questions about the very basics, please ask.

In this article:

  • independence, or independent events
  • the normal distribution
  • sampling
  • central limit theorem
  • introduction to hypothesis testing

Independent Events

A lot of elementary statistics ideas are based around the idea of independent events. This is an important concept to understand.

One example would be flipping a coin. The value I get this time is totally independent of the value I got last time. Even if I have just flipped 5 heads in a row, assuming I have a normal unbiased coin, I have a 50/50 chance of getting another head.

Many people, especially people with “systems for beating casinos”, don’t understand this point. Although there is only a 1/25 = 1/32 = 3% chance of getting 5 heads in a row, once it has happened the chance of getting one more head is 50%. Many people will calculate the chance – in advance – of getting 6 heads in a row (=1.6%) and say that because 5 heads have already been flipped, therefore the probability of getting the 6th head is 1.6%. Completely wrong.

Another way to think about this interesting subject is that the chance of getting H T H T H T H T is just as unlikely as getting H H H H H H H H. Both have a 1/28 = 1/256 = 0.4% chance.

On the other hand, the chance of getting 4 heads and 4 tails out of 8 throws is much more likely, so long as you don’t specify the order like we did above.

If you send 100 people to the casino for a night, most will lose “some of their funds”, a few will lose “a lot”, and a few will win “a lot”. That doesn’t mean the winners have any inherent skill, it is just the result of the rules of chance.

A bit like fund managers who set up 20 different funds, then after a few years most have done “about the same as the market”, some have done very badly and some have done well. The results from the best performers are published, the worst performers are “rolled up” into the best funds and those who understand statistics despair of the standards of statistical analysis of the general public. But I digress..

Normal Distributions and “The Bell Curve”

The well-known normal distribution describes a lot of stuff unrelated to climate. The normal distribution is also known as a gaussian distribution.

For example, if we measure the weights of male adults in a random country we might get a normal distribution that looks like this:

BERJAYA

Figure 1

Essentially there is a grouping around the “mean” (= arithmetic average) and outliers are less likely the further away they are from the mean.

Many distributions match the normal distribution closely. And many don’t. For example, rainfall statistics are not Gaussian.

The two parameters that describe the normal distribution are:

  • the mean
  • the standard deviation

The mean is the well-known concept of the average (note that “the average” is a less-technical definition than “the mean”), and is therefore very familiar to non-statistics people.

The standard deviation is the measure of the spread of the population. In the example of figure 1 the standard deviation = 30. A normal distribution has 68% of its values within 1 standard deviation from the mean – so in figure 1 this means that 68% of the population are between 140-200 lbs. And 95% of its values are within 2 standard deviation from the mean – so 95% of the population are between 110-230 lbs.

Sampling

If there are 300M people in a country and we want to find out their weights it is a lot of work. A lot of people, a lot of scales, and a lot of questions about privacy. Even under a dictatorship it is a ton of work.

So the idea of “a sample” is born.. We measure the weights of 100 people, or 1,000 people and as a result we can make some statements about the whole population.

The population is the total collection of “things” we want to know about.

The sample is our attempt to measure some aspects of “the population” without as much work as measuring the complete population

There are many useful statistical relationships between samples and populations. One of them is the central limit theorem.

Central Limit Theorem

Let me give an example, along with some “synthetic data”, to help get this idea clear.

I have a population of 100,000 with a uniform distribution between 9 and 11. I have created this population using Matlab.

Now I take a random sample of 100 out of my population of 100,000. I measure the mean of this sample. Now I take another random sample of 100 (out of the same population) and measure the mean. I do this many many many times (100,000 times in this example below). What does the sampling distributions of the mean look like?

BERJAYA

Figure 2

Alert readers will see that the sampling distribution of the means – right side graph – looks just like the “bell curve” of the normal distribution. Yet the original population is not a normal distribution.

It turns out that regardless of the population distribution, if you have enough items in your sample, you get a normal distribution (when you plot the mean of each sample distribution).

The mean of this normal distribution (the sampling distribution of the mean) is the same as the mean of the population, and the standard deviation, s = σ/√n, where σ= standard deviation of the population, and n = number of items in one sampling distribution.

This is the central limit theorem – in non-technical language – and is the reason why the normal distribution takes on such importance in statistical analysis. We will see more in considering hypothesis testing..

Hypothesis Testing

We have zoomed through many important statistical ideas and for people new to the concepts, probably too fast. Let’s ask this one question:

If we have a sampling distribution can we asses how likely it is that is was drawn from a particular population?

Let’s pose the problem another way:

The original population is unknown to us. How do we determine the characteristics of the original population from the sample we have?

Because the probabilities around the normal distribution are very well understood, and because the sampling distribution of the mean has a normal distribution, this means that if we have just one sampling distribution we can calculate the probability that it has come from a population of specified mean and specified standard deviation.

More in the next article in the series.

Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 109 other followers