FromBen SanterDateThu, 24 Apr 2008 19:34:37 -0700
ToPeter Thorne, Leopold Haimberger, Karl E.Taylor, Tom Wigley, John Lanzante, Susan Solomon, Melissa Free, peter gleckler, Phil Jones, Thomas R Karl, Steve Klein, carl mears, Doug Nychka, Gavin Schmidt, Steven Sherwood, Frank Wentz
Subject[Fwd: JOC-08-0098 - International Journal of Climatology]

Dear folks,

I'm forwarding an email from Prof. Glenn McGregor, the IJoC editor who
is handling our paper. The email contains the comments of Reviewer #1,
and notes that comments from two additional Reviewers will be available
shortly.

Reviewer #1 read the paper very thoroughly, and makes a number of useful
comments. The Reviewer also makes some comments that I disagree with.

The good news is that Reviewer #1 begins his review (I use this personal
pronoun because I'm pretty sure I know the Reviewer's identity!) by
affirming the existence of serious statistical errors in DCPS07:

"I've read the paper under review, and also DCPS07, and I think the
present authors are entirely correct in their main point. DCPS07 failed
to account for the sampling variability in the individual model trends
and, especially, in the observational trend. This was, as I see it, a
clear-cut statistical error, and the authors deserve the opportunity to
present their counter-argument in print."

Reviewer #1 has two major concerns about our statistical analysis. Here
is my initial reaction to these concerns.

CONCERN #1: Assumption of an AR-1 model for regression residuals.

In calculating our "adjusted" standard errors, we assume that the
persistence of the regression residuals is well-described by an AR-1
model. This assumption is not unique to our analysis, and has been made
in a number of other investigations. The Reviewer would "like to see at
least some sensitivity check of the standard error formula against
alternative model assumptions." Effectively, the Reviewer is asking
whether a more complex time series model is required to describe the
persistence.

Estimating the order of a more complex AR model is a tricky business.
Typically, something like the BIC (Bayesian Information Criterion) or
AIC (Akaike Information Criterion) is used to do this. We could, of
course, use the BIC or AIC to estimate the order of the AR model that
best fits the regression residuals. This would be a non-trivial
undertaking. I think we would find that, for different time series, we
would obtain different estimates of the "best-fit" AR model. For
example, 20c3m runs without volcanic forcing might yield a different AR
model order than 20c3m runs with volcanic forcing. It's also entirely
likely (based on Rick Katz's experience with such AR model-fitting
exercises) that the AIC- and BIC-based estimates of the AR model order
could differ in some cases.

As the Reviewer himself points out, DCPS07 "didn't make any attempt to
calculate the standard error of individual trend estimates and this
remains the major difference between the two paper." In other words, our
paired trends test incorporates statistical uncertainties for both
simulated and observed trends. In estimating these uncertainties, we
account for non-independence of the regression residuals. In contrast,
the DCPS07 trend "consistency test" does not incorporate ANY statistical
uncertainties in either observed or simulated trends. This difference in
treatment of trend uncertainties is the primary issue. The issue of
whether an AR-1 model is the most appropriate model to use for the
purpose of calculating adjusted standard errors is really a subsidiary
issue. My concern is that we could waste a lot of time looking at this
issue, without really enlightening the reader about key differences
between our significance testing testing procedure and the DCPS07 approach.

One solution is to calculate (for each model and observational time
series used in our paper) the parameters of an AR(K) model, where K is
the total number of time lags, and then apply equation 8.39 in Wilks
(1995) to estimate the effective sample size. We could do this for
several different K values (e.g., K=2, K=3, and K=4; we've already done
the K=1 case). We could then very briefly mention the sensitivity of our
"paired trend" test results to choice of order K of the AR model. This
would involve some work, but would be easier to explain than use of the
AIC and BIC to determine, for each time series, the best-estimate of the
order of the AR model.

CONCERN #2: No "attempt to combine data across model runs."

The Reviewer is claiming that none of our model-vs-observed trend tests
made use of data that had been combined (averaged) across model runs.
This is incorrect. In fact, our two modified versions of the DCPS07 test
(page 29, equation 12, and page 30, equation 13) both make use of the
multi-model ensemble-mean trend.

The Reviewer argues that our paired trends test should involve the
ensemble-mean trends for each model (something which we have not done)
rather than the trends for each of 49 individual 20c3m realizations. I'm
not sure whether the rationale for doing this is as "clear-cut" as the
Reviewer contends.

Furthermore, there are at least two different ways of performing the
paired trends tests with the ensemble-mean model trends. One way (which
seems to be what the Reviewer is advocating) involves replacing in our
equation (3) the standard error of the trend for an individual
realization performed with model A with model A's intra-ensemble
standard deviation of trends. I'm a little concerned about mixing an
estimate of the statistical uncertainty of the observed trend with an
estimate of the sampling uncertainty of model A's trend.

Alternately, one could use the average (over different realizations) of
model A's adjusted standard errors, or the adjusted standard error
calculated from the ensemble-mean model A time series. I'm willing to
try some of these things, but I'm not sure how much they will enlighten
the reader. And they will not help to make an already-lengthy manuscript
any shorter.

The Reviewer seems to be arguing that the main advantage of his approach
#2 (use of ensemble-mean model trends in significance testing) relative
to our paired trends test (his approach #1) is that non-independence of
tests is less of an issue with approach #2. I'm not sure whether I
agree. Are results from tests involving GFDL CM2.0 and GFDL CM2.0
temperature data truly "independent" given that both models were forced
with the same historical changes in anthropogenic and natural external
forcings? The same concerns apply to the high- and low-resolution
versions of the MIROC model, the GISS models, etc.

I am puzzled by some of the comments the Reviewer has made at the top of
page 3 of his review. I guess the Reviewer is making these comments in
the context of the pair-wise tests described on page 2. Crucially, the
comment that we should use "...the standard error if testing the average
model trend" (and by "standard error" he means DCPS07's sigma{SE}) IS
INCONSISTENT with the Reviewer's approach #3, which involves use of the
inter-model standard deviation in testing the average model trend.

And I disagree with the Reviewer's comments regarding the superfluous
nature of Section 6. The Reviewer states that, "when simulating from a
know (statistical) model... the test statistics should by definition
give the correct answer. The whole point of Section 6 is that the DCPS07
consistency test does NOT give the correct answer when applied to
randomly-generated data!

In order to satisfy the Reviewer's curiosity, I'm perfectly willing to
repeat the simulations described in Section 6 with a higher-order AR
model. However, I don't like the idea of simulation of synthetic
volcanoes, etc. This would be a huge time sink, and would not help to
illustrate or clarify the statistical mistakes in DCPS07.

It's obvious that Reviewer #1 has put a substantial amount of effort
into reading and commenting on our paper (and even performing some
simple simulations). I'm grateful for the effort and the constructive
comments, but feel that a number of comments are off-base. Am I
misinterpreting the Reviewer's comments?

With best regards,

Ben
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (925) 422-2486
FAX: (925) 422-7675
email: santer1@llnl.gov
----------------------------------------------------------------------------




Attachment Converted: "c:\eudora\attach\- santerreport.pdf"
X-Account-Key: account1
Return-Path:
Received: from mail-1.llnl.gov ([unix socket])
by mail-1.llnl.gov (Cyrus v2.2.12) with LMTPA;
Thu, 24 Apr 2008 12:47:37 -0700
Received: from smtp.llnl.gov (nspiron-3.llnl.gov [128.115.41.83])
by mail-1.llnl.gov (8.13.1/8.12.3/LLNL evision: 1.6 $) with ESMTP id m3OJlZk7028016
for ; Thu, 24 Apr 2008 12:47:37 -0700
X-Attachments: - santerreport.pdf
X-IronPort-AV: E=McAfee;i="5200,2160,5281"; a="32776528"
X-IronPort-AV: E=Sophos;i="4.25,705,1199692800";
d="pdf'?scan'208";a="32776528"
Received: from nsziron-3.llnl.gov ([128.115.249.83])
by smtp.llnl.gov with ESMTP; 24 Apr 2008 12:47:36 -0700
X-Attachments: - santerreport.pdf
X-IronPort-AV: E=McAfee;i="5200,2160,5281"; a="36298571"
X-IronPort-AV: E=Sophos;i="4.25,705,1199692800";
d="pdf'?scan'208";a="36298571"
Received: from uranus.scholarone.com ([170.107.181.135])
by nsziron-3.llnl.gov with ESMTP; 24 Apr 2008 12:47:34 -0700
Received: from tss1be0004 (tss1be0004 [10.237.148.27])
by uranus.scholarone.com (Postfix) with SMTP id 8F0554F44D5
for ; Thu, 24 Apr 2008 15:47:33 -0400 (EDT)
Message-ID: <379866627.1209066453582.JavaMail.wladmin@tss1be0004>
Date: Thu, 24 Apr 2008 15:47:33 -0400 (EDT)
From: g.mcgregor@auckland.ac.nz
To: santer1@llnl.gov
Subject: JOC-08-0098 - International Journal of Climatology
Errors-To: masmith@wiley.co.uk
Mime-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_678_379761858.1209066453554"
X-Errors-To: masmith@wiley.co.uk
Sender: onbehalfof@scholarone.com

24-Apr-2008

JOC-08-0098 - Consistency of Modelled and Observed Temperature Trends in the Tropical Troposphere

Dear Dr Santer

I have received one set of comments on your paper to date. Altjhough I would normally wait for all comments to come in before providing them to you, I thought in this case I would give you a head start in your preparation for revisions. Accordingly please find attached one set of comments. Hopefully I should have two more to follow in the near future.

Best,

Prof. Glenn McGregor

Attachment Converted: "c:\eudora\attach\- santerreport1.pdf"