Jump to content
  • Member Statistics

    Total Members
    Most Online
    Michael Butler
    Newest Member
    Michael Butler

Global Average Temperature and the Propagation of Uncertainty


Recommended Posts

Because the Frank 2010 paper is still being promoted by the author and making its rounds in the blogosphere I thought I would dedicate an entire post to it. The publication claims that the lower bound of the annual global average temperature uncertainty is ±0.46 C (1σ) or 0.92 C (2σ). This result is then used by the blogosphere to conclude that we do not know whether or not the planet has warmed. 

Before I explain what I think is wrong with the Frank 2010 publication I'll first refer readers to the rigorous uncertainty analysis provided by Berkeley Earth via Rhode et al. 2013, GISTEMP via Lenssen et al. 2019, and HadCRUT via Morice et al. 2020. Each of these dataset unequivocally show that the planet is warming at rate of about +0.19 C/decade since 1979. And each uncertainty analysis confirms that the true uncertainty on the monthly values is about ± 0.03 (1σ) despite using wildly different techniques. Berkeley Earth uses jackknife resampling, GISTEMP uses a bottom-up type B evaluation, and HadCRUT uses ensembling not unlike a monte carlo simulation.

The entirety of the conclusion from the Frank 2010 publication boils down to this series of calculations.

(1a) σ = 0.2 "standard error" from Folland 2001

(1b) sqrt(N * 0.2^2 / (N-1)) = 0.200 where N is the number of observations (2 for daily, 60 for monthly, etc.)

(1c) sqrt(0.200^2 + 0.200^2) = 0.283

(2a) σ = 0.254 gaussian fit based on Hubbard 2002

(2b) sqrt(N * 0.254^2 / (N-1)) = 0.254 where N is the number of observations (2 for daily, 60 for monthly, etc.)

(2c) sqrt(0.254^2 + 0.254^2) = 0.359

(3) sqrt(0 283^2 + 0 359^2) = 0.46

Explanation of above: (1b) and (2b) are an attempt to propagate Tmax and Tmin uncertainties into a 30yr average used as the anomaly baseline and for an annual average. (1c) and (2c) is the propagation of uncertainty for an annual anomaly value. (3) is the combined uncertainty of the Folland and Hubbard components after propagating into anomaly values.

Here are the 3 mistakes I believe the author made in order of increasing egregiousness. These are based on my direct conversations with the author. Even the first mistake is egregious enough to that it would get rejected by a reputable peer reviewed journal. The other 2 are so egregious it defies credulity that it even made it into the Energy & Environment journal which is actually more of a social and policy journal than a science journal and has a history of publishing research known for being wrong.

Mistake 1. The uncertainties provided by Folland 2001 and Hubbard 2002 are for daily Tmax and Tmin observations. The author combines these in calculation (3) via the well known root sum square or summation in quadrature rule under the assumption that Folland and Hubbard are describing two different kinds of uncertainty that must be combined. The problem is that Folland is terse on details. It is impossible to say exactly what that 0.2 figure actually means. But based on context clues I personally inferred that Folland is describing the same thing as Hubbard. They just came up with slightly different estimates of the uncertainty.

Mistake 2. The formula used in steps (1b) and (2b) is σ_avg = sqrt[N * σ^2 / (N-1)] where N is the number of the observations included in a time average and σ is the daily Tmax or Tmin uncertainty. For example for a monthly average N would be ~60 or for a 30yr average N would be ~21916. As you can see for large N the formula reduces to σ_avg = σ implying that the uncertainty on monthly, annual, and 30yr averages are no different than the uncertainty on daily observations. The problem is that formula is nonsense. All texts on the propagation of uncertainty including Bevington 2003, which the author cites for this formula, clearly say that the formula is σ_avg = σ / sqrt(N). This can be confirmed via the Guide to the Expression of Uncertainty in Measurement 2008 or by using the NIST uncertainty machine which will do the general partial derivative propagation described in Bevington and the GUM for you with an accompanying monte carlo simulation.

Mistake 3. The ±0.46 C figure is advertised as the annual global average temperature uncertainty. The problem is that it is a calculation only of the uncertainty for station anomaly value (annual absolute mean minus 30yr average). No where in the publication does the author propagate the station anomaly uncertainty into the gridding, infilling, and spatial averaging steps that all datasets require to compute a global average. Because the uncertainty of an average is lower than the uncertainty of the individual measurements that go into it the global average temperature uncertainty will be considerably lower than the individual Tmax and Tmin uncertainties. There are actually 3 steps in which an average is taken: a) averaging station data into a monthly or annual domain, b) averaging multiple stations into a grid cell, and c) averaging all of the cells in a grid mesh to get the global average. Not only does the author not calculate a) correctly (see mistake #2 above) but he does not even perform the propagation through steps b) and c). The point is this. That ±0.46 C figure is not the uncertainty for the global average as the author and blogosphere claim.

Dr. Frank, if you stumble upon this post I would be interested in your responses to my concerns and the other concerns of those who came before me.

In a future post under this thread I'll present my own type A evaluation of the monthly global average temperature uncertainty. Will it be consistent with the more rigorous analysis I mentioned above? I'll also try to periodically update the AmericanWx audience with various statistics and publications relevant to this topic. Comments (especially criticisms) are definitely welcome. I am by no means an expert in uncertainty propagation or the methods used to compute the measure the global average temperature. We can all learn together.

  • Thanks 1
  • Weenie 1
Link to comment
Share on other sites

I have downloaded the following global average temperature products from several datasets. This datasets include 4 surface, 2 satellite, 1 radiosonde, and 1 reanalysis.

UAHv6 - Satellite - Spencer et al. 2016 Data: https://www.nsstc.uah.edu/data/msu/v6.0/tlt/tltglhmam_6.0.txt

RSSv4 - Satellite - Mears & Wentz 2017 Data: https://data.remss.com/msu/monthly_time_series/RSS_Monthly_MSU_AMSU_Channel_TLT_Anomalies_Land_and_Ocean_v04_0.txt

RATPAC 850-300mb - Radiosonde - Free et al. 2005 Data: https://www.ncei.noaa.gov/pub/data/ratpac/ratpac-a/

NOAAGlobalTemp v5 - Surface - Haung et al. 2020 Data: https://www.ncei.noaa.gov/data/noaa-global-surface-temperature/v5/access/timeseries/

GISTEMP v4 - Surface - Lenssen et al. 2019 Data: https://data.giss.nasa.gov/gistemp/graphs_v4/graph_data/Monthly_Mean_Global_Surface_Temperature/graph.txt

BEST - Surface - Rhode et al. 2013 Data: http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt

HadCRUTv5 - Surface - Morice et al. 2020 Data: https://www.metoffice.gov.uk/hadobs/hadcrut5/data/current/download.html

ERA5 (Copernicus) - Reanalysis - Hersbach et al. 2020 Data: https://climate.copernicus.eu/surface-air-temperature-maps

All datasets have been processed a common baseline: the full range 1979-2021 average. This is done so that they can be compared with each other.

It should be noted that it is my understanding that of the surface datasets NOAAGlobalTemp v5 is still only a partial sphere estimate. It appears research is underway to make it full sphere. See Vose et al. 2021 for details. I do not believe this is publicly available yet.

One criticism I often see is that the global average temperature warming trend is being overestimated. To test this validity of this claim I compared each of the above datasets. I also formed an equal weighted composite of the datasets for comparison. It is important to note that the dataset do not all measure the same thing. UAH and RSS measure the lower troposphere and do so at different average heights. RATPAC is the average from 850mb to 300mb which I selected to be representative of the UAH and RSS depths although neither UAH nor RSS are equally weighted in the 850-300mb layer. And as mentioned above NOAAGlobalTemp is a partial sphere estimate while GISTEMP, BEST, HadCRUT, and ERA are full sphere. In that context it could be argued that forming a composite is inappropriate. However, I do so here because all of these datasets are or have been used as proxy for comparisons of the "global average temperature" whether it was appropriate to do so or not.

In the graph below I have plotted each dataset including the 13 month centered average of the composite. The formula used Σ[Σ[Tdm, 1, m], 1, d] / (dm) where d is the number of datasets (8) and m is the number of months (13). The composite 2σ timeseries represents an implied uncertainty based off the standard deviation from the 13 month centered average of the 8 datasets that go into it. The formula used is sqrt(Σ[(Tdm - Tavg)^2, 1, dm] / (dm - 1)) * 2 which is the formula for standard deviation multiplied by 2. It is primarily confined to a range of 0.05 to 0.10 C. However, it is important to note that UAH and RSS add a considerable amount of variance to the composite. A 13 month average would be expected to have a lower uncertainty than a 1 month average. That is true in this case as well though it may be hard for the astute read to see since typical monthly uncertainties for HadCRUT, GISTEMP, and BEST are generally on the order of 0.05 C.

One thing that is immediately obvious is that UAH is, by far, the low outlier with a warming trend of only +0.135 C/decade. This compares with the composite of +0.187 C/decade. Also note the large difference between UAH and RATPAC and the small difference between RSS and RATPAC. It is often claimed that UAH is a better match to the radiosonde data. This could not be further from the truth at least when comparing with RATPAC which contains the homogenization adjustments making it valid climatic trend analysis.

What I find most remarkable about this graph is the broad agreement both in terms of long term warming and the short term variation even though the methods of measuring the global average temperature use wildly different methodologies and subsets of available data. 


  • Like 1
  • Thanks 1
Link to comment
Share on other sites

For the first statistical test of the uncertainty I focused only on HadCRUT, GISTEMP, BEST, and ERA since these are all surface datasets that have full sphere converge. In other words, they are all measuring almost exactly the same thing.

This test will determine the difference between any two measurements. The quantity we are calculating is D = Ta - Tb where D is the difference and both Ta and Tb are temperature anomalies for the same month from randomly selected datasets. The period of evaluation is 1979/01 to 2021/12 which covers 516 months. With 4 datasets there are 6 combinations. This gives us a total of 3096 comparisons or values for Ta, Tb, and D. What we want to evaluate first is the uncertainty in the difference u(D). This is pretty simple since for type A evaluations it is the standard deviation of all values of D. In the graphic below you can see the histogram of the difference.  The distribution is pretty close to normal and has a standard deviation of 0.053 C. So we set u(D) = 0.053 C.

We're not done yet though. We know u(D) = 0.053 C, but what we really want to know is u(T). We can easily do this via the well known root sum square or summation in quadrature rule which says u(z) = sqrt[u(x)^2 + u(y)^2] for a function f in the form f(x, y) = x + y. The more fundamental concept that applies to any arbitrarily complex function f is the partial differential method, but there is no need to apply the more complex general form since our function f(x, y) = x + y is simple and is already known to propagate uncertainty via u(z) = sqrt[u(x)^2 + u(y)^2]. Assuming u(Ta) and u(Tb) are the same and can be represented with just u(T) the formula becomes u(D) = sqrt[2 * u(T)^2]. Solving this equation for u(T) we get u(T) = sqrt[u(D)^2 / 2]. So if u(D) = 0.053 then u(T) = 0.038 C. And our 2σ expanded uncertainty is 2*u(T) = 2 * 0.038 = 0.075 C.

That's the result. The expanded 2σ uncertainty for monthly global average temperature anomalies using a type A evaluation where we compare each dataset to the others yields is 0.075 C. Note that is only one among multiple different ways a type A evaluation can be performed.



  • Like 1
  • Thanks 1
Link to comment
Share on other sites

  • 1 month later...

For the second statistical I will again focus on HadCRUT, GISTEMP, BEST, and ERA since these are all surface datasets that have full sphere converge.

This test will compare the difference between monthly measurements of each dataset and the mean of all. The mean is considered to be the best expectation of the true value. The quantity we are calculating is Dx = Tx - Tavg where Dx is the difference between the temperature Tx of dataset x and the average temperature Tavg.  The period of evaluation is 1979/01 to 2021/12 which covers 516 months. That means each dataset x has 516 measurements than be compared to the average. We will determine the uncertainty of Dx as 2*u(Dx) by calculating the standard deviation of Dx.  Since it is common practice to report uncertainty at 95% confidence we will multiple by 2 for the 2-sigma range. Again, this a type A evaluation of uncertainty. For HadCRUT this value is 0.051 C, for Berkeley Earth it is 0.060 C, for GISTEMP it is 0.057 C, and for ERA it is 0.086 C. The average implied uncertainty is 0.065 C.


I am a little surprised by ERA. It has the highest uncertainty wrt to the mean of the datasets analyzed. I'm surprised because ERA is considered to be among the best reanalysis datasets and incorporates not only orders of magnitude more observations than the other datasets but many different kinds of observations including surface, satellite, radiosonde, etc. It has a much longer tails on the distribution.

So we've calculated average implied uncertainty 2u(D) as 0.065 C. But that is only the uncertainty of the difference wrt to the average. The average itself will have an uncertainty given by u(avg) = u(x) / sqrt(N). So u(Tavg) = u(D) / sqrt(N) = 0.0325 / sqrt(4) = 0.0163 C. So we have u(D) = 0.033 and u(Tavg) = 0.0163. We will apply the root sum square rule to find the final uncertainty u(T). It is u(T) = sqrt(u(D)^2 + u(Tavg)^2) = sqrt(0.033^2 + 0.0163^2) = 0.0368 C. And using the 2-sigma convention we have 2u(T) = 0.0368 * 2 = 0.074 C.

That's our final answer. Method 1 above yields u = 0.075 C while method 2 here yields u = 0.074 C. This isn't much of a surprise that both methods give essentially the same result since they are both calculated from the same data.


  • Thanks 1
Link to comment
Share on other sites

  • 2 weeks later...

I often hear that UAH is the most trustworthy and honest global average temperature dataset because they do not adjust the data. I thought it might be good to dedicate a post to the topic and debunk that myth right now. Fortunately I was able to track down a lot of the information from the US Climate Change Science Program's Temperature Trends in the Lower Atmosphere - Chapter 2 by Karl et al. 2006 which Dr. Christy was the lead author at least for that chapter.

Year / Version / Effect / Description / Citation

Adjustment 1: 1992 : A : unknown effect : simple bias correction : Spencer & Christy 1992

Adjustment 2: 1994 : B : -0.03 C/decade : linear diurnal drift : Christy et al. 1995

Adjustment 3: 1997 : C : +0.03 C/decade : removal of residual annual cycle related to hot target variations : Christy et al. 1998

Adjustment 4: 1998 : D : +0.10 C/decade : orbital decay : Christy et al. 2000

Adjustment 5: 1998 : D : -0.07 C/decade : removal of dependence on time variations of hot target temperature : Christy et al. 2000

Adjustment 6: 2003 : 5.0 : +0.008 C/decade : non-linear diurnal drift : Christy et al. 2003

Adjustment 7: 2004 : 5.1 : -0.004 C/decade : data criteria acceptance : Karl et al. 2006 

Adjustment 8: 2005 : 5.2 : +0.035 C/decade : diurnal drift : Spencer et al. 2006

Adjustment 9: 2017 : 6.0 : -0.03 C/decade : new method : Spencer et al. 2017 [open]

That is 0.307 C/decade worth of adjustments with a net of +0.039 C/decade.


  • Thanks 1
Link to comment
Share on other sites

  • 2 weeks later...

Just a quick post here. Using the same procedure as above I calculated the type A uncertainty on UAH and RSS satellite monthly anomalies at ±0.16 C. This is consistent with the type B evaluation from Christy et al. 2003 of ±0.20 C and the monte carlo evaluation by Mears et al. 2011 of 0.2 C.

This compares to the surface station uncertainty of ±0.07.

It might also be interesting to point out that Spencer & Christy 1992 first assessed the monthly uncertainty as ±0.01 C then later reevaluated it as ±0.20 C. Anyway the point is that the uncertainty on global average temperatures from satellites is significantly worse/higher than those from the surface station datasets.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Create New...