Friday, August 30, 2013

Revisiting the question of "Has global warming stopped since 1998?"—again.

Let me be blunt: There is little evidence that global warming stopped in 1998 or any year thereafter.  Most of the evidence we have, from the energy imbalance to total heat content to ocean heat content, show that global warming continues, as I previously explained here, here, and here.  The only piece of evidence that appears to show that global warming has stopped is that the trend in surface temperature data is not statistically significant in recent years.  However, that is at best ambiguous.  No significant trend could mean that warming continues but short-term variation in the data masks the trend, that there's no warming or that there's a cooling trend but not enough data for that to be significant.  There's no real way to tell unless you either a) add enough data for short-term variation to cancel out or b) use statistical techniques to factor out the known natural variation.

In this article, I expand on my previous analyses of surface temperature, this time using GISS, NCDC, HadCRUT4, and UAH.  Previously, I had focused on only GISS or UAH.  I first adjusted all temperature anomalies to the 1981-2010 baseline and used the technique of Foster and Rahmstorf (2011) to factor out short-term variation due to ENSO, solar cycles, and aerosols between January 1979 and March 2013.  For each start year between 1990 and 2000, I estimated the amount of autocorrelation in the temperature data before and after adjusting for short-term variation using auto.arima in the nlme package for R and then the autocorrelation-corrected trend using generalized least squares from the forecast package.  I plotted the trends ± 95% confidence intervals for raw and adjusted data to compare how the trends change depending on start year.

First, here's the raw data since 1990:

Global temperature data before adjustments for ENSO, aerosols, and the solar cycle

Not much to say that hasn't already been written.  All the data show an increase since the 1990s, with an apparent flattening of the trend since 1998 and, in the three surface datasets, a slight cooling since 2005.  After factoring out for ENSO, the solar cycle, and changes in aerosols, the four datasets look like this:

Global temperature data after adjusting for ENSO, aerosols, and the solar cycle.
While variation remains, the overall trend in each data set is much easier to see after short-term variation is factored out. Now the apparent flattening doesn't begin until after 2010.  Of course, saying that temperatures have been flat for the past 3 years doesn't have quite the power of saying that they've been flat since 1998.  Plotting regression trends starting between 1990 and 2000 reveals that the trend in the adjusted data is higher than the trend in the unadjusted data and that the trend in the adjusted data is more stable than the trend in the unadjusted data.

The trend in adjusted GISS data decreases only slightly, although the change is not statistically significant, as shown by the overlap between the 95% confidence intervals.  The decline in the trend in unadjusted data is much larger.  However, that change is also not statistically significant as the 95% confidence intervals also overlap.  While the trend since 1997 is not significantly different from zero in the unadjusted data, the trend in the adjusted data remains statistically significant throughout.  The same pattern appears in the other three sets of data.




While the the unadjusted data generally shows a decline in the rate of warming depending on the start point, the decline is not statistically significant due to large standard errors and confidence intervals.  The trend in the adjusted data generally declines as well in the surface data but interestingly not in UAH satellite data.  With less variation in the adjusted data, the standard errors of the adjusted trends are far smaller and the trends are significant regardless of start point.  The differences between the trends in adjusted versus unadjusted data clearly show the influence of short-term variation, as represented by ENSO, aerosols, and the solar cycle, in determining the trend in unadjusted temperature data, especially as the time period gets shorter.

There has been numerous claims that IPCC climate models are wrong as they don't match surface temperature data since 1998 (i.e. Fyfe et al. 2013).  While true, the reason the models don't match is not because CO2 doesn't affect global temperature or that anthropogenic climate change theory is wrong as many deniers are claiming.  For either of those to be wrong, much of our understanding of atmospheric physics would have to be wrong as well, to say nothing of 152 years worth of experimental results.  The reason most climate models don't match is that generally they account for the long-term influence of CO2 but do not accurately predict short-term variability (i.e. Rahmstorf et al. 2012).  Few climate models can accurately predict the timing of past ENSO events much less predict the timing of future events; none predict changes in aerosols or solar output.  Short-term variation also cancels out in longer model runs or when averaging multiple models together.  In addition, the models overestimate the amount of change in greenhouse gas emissions and the resultant change in climate forcing (see Spencer Weart's interview with James Hansen in 2000).  It's therefore not surprising that the IPCC models do not match surface temperatures since 1998.  Once climate models are corrected for actual short-term variability and climate forcings, they replicate the slow-down in warming since 1998 quite well and show that ENSO is responsible for most of the slow-down (i.e. Kosaka and Xie 2013).

Two comments on Fyfe et al. (2013).  First, starting a temperature trend at a known outlier is bogus, as it artificially lowers the trend.  You can clearly see in the trend plots above that trend starting at 1998 are far lower than trends starting at any other year.  Starting at 1998 guarantees that the difference between observed and predicted trend will appear larger than starting at any other year.  Second, the data set they used, HadCRUT4, is known to show less warming than other data sets since AD 2000 because HadCRUT4 has poor geographic coverage of the polar regions, which just happen to be the fastest warming regions of the planet.  In short, Fyfe et al.'s choice of temperature data set and start point effectively biased their analysis toward finding that model trends were higher than observed trends, whether true or not.  For that reason, I'm surprised that Fyfe et al. (2013) got published.  Their analysis would have been much stronger if they had included other data sets (i.e. UAH, GISS, NCDC) and other start points.

What can be concluded about the slow-down in surface warming since 1998?  First, that much of the slow-down is due to short-term variation in the climate system.  A predominant cooling phase of ENSO, an increase in aerosols, and a deep solar minimum have combined to mask the long-term warming trend.  Second, that the warming trend continues unabated as shown once ENSO and other known variation are factored out.  This means that once the short-term variation swings back toward warming phases, we'll see faster warming that we otherwise would expect.  This isn't surprising, as physical data such as the total heat content of the Earth continues to rise and the energy imbalance continues as I've written before.  Third, just because the IPCC models are wrong does not mean the entire science is wrong.  The models don't incorporate all that we know about climate and therefore are imperfect reflections of the science.  Lastly, beware those who proclaim that global warming has stopped since 1998 based on the linear trend.  If those people were telling the full story, they'd have to admit that the rate of warming since 1998 is statistically indistinguishable from the rates of earlier years, as shown by the overlapping 95% confidence intervals.

Friday, August 23, 2013

Compensating for autocorrelation in global temperature data

Autocorrelation in global temperature data simply means that the average temperature for any one month is correlated with the average temperature of the previous month.  It is an unfortunately common problem when dealing with time series and spatial statistics.  The gist of the issue is that most of the standard statistical analysis techniques such as ANOVA, regression, and the like assume that variation in the data is random or white noise when calculating standard errors and p-values.  Autocorrelation means that the noise in the data is not random but correlated or red noise.  The degree of correlation reduces the effective size of the data set and means that the standard errors and p-values calculated from normal statistical tests will be lower than they should be and biased toward showing statistical significance when in reality the tests should not show significance.

One of the best ways to compensate for autocorrelation is to use an Autoregressive Integrated Moving Average (ARIMA) model to account for the red noise.  Here's an example of how to construct an ARIMA model using GISS global temperature data from January 1880 to June 2013.  The first task was to isolate GISS data from the larger data object and save it as a time series:

Climate.1880 = subset(Climate, Time>=1880)
GISS.ts = ts(Climate.1880$GISS, start = c(1880, 1), frequency = 12)
plot(GISS.ts, ylab="Temperature anomaly (ºC)", main="GISS global temperature anomalies January 1880-June 2013")
GISS is not a stationary time series as required for ACF and PACF analysis.  The trend is obvious.  In order to accurately measure the level of autocorrelation, we have to remove the trend, usually accomplished by differencing the data.  Differencing is when you calculate the difference between the data at time t and t-1.

GISS.diff = diff(GISS.ts)
 plot(GISS.diff, main="Differenced GISS data")

While differencing once worked this time, sometimes I'll need to repeat the differencing to completely remove the trend.  The next steps are to determine the degree of autocorrelation in the data using the autocorrelation function and partial autocorrelation function on the differenced data.

acf(GISS.diff, lag.max=40)
acf(GISS.diff, lag.max=40, plot=FALSE) 
pacf(GISS.diff, lag.max=40)
pacf(GISS.diff, lag.max=40, plot=FALSE)

The ACF results show a negative correlation between adjacent months (ACF = -0.371), with none of the rest showing much correlation except for the one at lag 2.  That's likely due to random chance, as none of the intervening lags are significant.  The PACF function shows a seasonal pattern in the differenced data, with the greatest negative correlation again between adjacent months and decreasing from there until a lag of 1.

We can use this information to manually determine which ARIMA model will be most appropriate.  The ACF suggest AR = 1 and we had to difference the data once so I = 1.  The MA would have to be at least 1 but possibly 2 if the trend is accelerating.  We could figure out which is better by trial and error or we can just allow R to figure it out for us.

GISS.arima = auto.arima(GISS.ts, ic = c("bic"))  #"bic" term penalizes models that have too many parameters

Series: GISS.ts

           ar1          ma1        ma2
       0.8129     -1.3465    0.3661
s.e.  0.0304      0.0438    0.0404

sigma^2 estimated as 0.01115:  log likelihood=1326.71
AIC=-2645.42   AICc=-2645.4   BIC=-2623.91

The best statistical model: ARIMA(1,1,2), as suspected.  The end result matches the observed data very well.

Graph code:
plot(GISS.ts, col="blue", type="l", ylab="Temperature anomaly (ºC)", main="GISS and ARIMA model results")
points(fitted(GISS.arima), type="l", col="red")
legend(1880, 0.9, c("GISS", "ARIMA"), col=c("blue", "red"), lwd=2)
We can use the ARIMA model of the temperature data in various ways, including forecasting the short-term behavior of the data and adjusting standard errors and p-values in regression models.  How do we incorporate ARIMA models into regression models?  It's pretty easy with R, actually.  First, I calculated a regression of the GISS data as normal, assuming white noise:

Time = time(GISS.ts)-1880
Time2 = Time^2
GISS.lm = lm(GISS.ts~Time + Time2)

lm(formula = GISS.ts ~ Time + Time2)

     Min       1Q   Median       3Q      Max
-0.57989 -0.09593  0.00235  0.09732  0.56760

                   Estimate    Std. Error     t value    Pr(>|t|)  
(Intercept)  -2.474e-01  1.113e-02   -22.238   < 2e-16 ***
Time          -2.644e-03   3.852e-04  -6.864     9.53e-12 ***
Time2         6.915e-05   2.795e-06   24.737   < 2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1486 on 1599 degrees of freedom
Multiple R-squared: 0.7675,    Adjusted R-squared: 0.7672
F-statistic:  2639 on 2 and 1599 DF,  p-value: < 2.2e-16 
Next, I used ACF and PACF on the residuals of the regression model to check for autocorrelation:

acf(resid(GISS.lm), 40)
pacf(resid(GISS.lm), 40)
As expected, the residuals do show autocorrelation, with the ACF and PACF showing an ACF = -0.371 between adjacent months, as well as a seasonal signal.  I then used auto.arima to compute the best model:

auto.arima(resid(GISS.lm), ic=c("bic")

Series: resid(GISS.lm)
ARIMA(1,0,1) with zero mean    

         ar1      ma1
      0.8894  -0.4378
s.e.  0.0162   0.0337

sigma^2 estimated as 0.01114:  log likelihood=1328.93
AIC=-2651.87   AICc=-2651.85   BIC=-2635.73
The resulting model is an ARMA(1,1).  I wouldn't have expected the residuals to need differencing, as the trend had already been removed.  If the residuals did need differencing, then it indicates that my regression model needed another term as it didn't include the full trend.  And since I already included an acceleration term (Time2) in the regression model, the MA term is only first-order to remove the seasonal cycle.  I incorporated the ARMA model into a regression analysis using generalized least squares from the nlme package:

GISS.gls = gls(GISS.ts~Time + Time2, correlation=corARMA(p=1, q=1))

Generalized least squares fit by REML
  Model: GISS.ts ~ Time + Time2
  Data: NULL
       AIC         BIC           logLik
  -2603.85   -2571.587   1307.925

Correlation Structure: ARMA(1,1)
 Formula: ~1
 Parameter estimate(s):
      Phi1           Theta1
 0.8943245  -0.4420998

                       Value          Std.Error        t-value      p-value
(Intercept) -0.25014128  0.04077149   -6.135201   0.0000
Time         -0.00255438   0.00141217   -1.808825   0.0707
Time2        0.00006854   0.00001024     6.692065   0.0000

            (Intr)     Time 
Time  -0.861      
Time2  0.738   -0.968

Standardized residuals:
       Min              Q1            Med             Q3           Max
-3.8499081 -0.6388528  0.0195990  0.6492011  3.7759758

Residual standard error: 0.1501945
Degrees of freedom: 1602 total; 1599 residual

While the coefficients are similar to the white noise model, the standard errors and p-values are very different.  For example, the linear Time term in the white noise model is -0.002644 versus -0.002554 in the red noise model.  However, in the white noise model, the standard error is 0.0003852, resulting in a p-value of 0.000000000953 whereas the red noise model shows that the true standard error is 0.00141217, nearly 1.5 million times larger and the true p-value, while close, is actually not significant at 0.0707.

GISS temperature data with ARIMA-adjusted regression trend added
 While this seems like quite a bit of work to go through if the overall regression doesn't change, it's necessary to get things right.  And that is the ultimate goal of science.

Wednesday, August 21, 2013

Anthropogenic climate change, evolution, and scientific theories

One of the common misconceptions about science is the nature of scientific theories.  Much of the confusion stems from the word "theory" having two very different definitions.  Among non-scientists, a theory is a guess, possibly an educated guess based on some facts, but a guess nonetheless.  It's little more than an opinion and less than a known fact (itself is usually taken to mean the unchanging truth) on the hierarchy of truth.  Skeptics often dismiss climate change and evolution with statements like "Human-caused global warming is just a theory, not a fact" or "Evolution is just a theory."  Those statements reveal that the person stating or writing them are using the non-scientific definition.

In science, "fact" and "theory" have very different meanings from their nonscientific definitions.  "Facts" are data, discovered and verified via repeated observations and experiments to the point where it's ridiculous to at least not acknowledge their existence.  Note that "fact" in science doesn't mean unchanging truth.  As technology advances and new experiments and observations are done, what is known as fact today can change, usually being shown to be a subset of a larger set of facts.  Some data is obvious to anyone without any equipment needed: Objects fall toward the ground at an accelerating rate of speed; certain trees grow in floodplains whereas others grow on hilltops; hot air rises; the Earth is a sphere; the order of fossils in the geologic column (first worked out in the 1790s).  Other facts are not so obvious: The speed of neutrinos; the change in DNA over time due to mutations; the infrared absorption properties of CO2; the change from visible light to infrared at the Earth's surface.

"Theories" are ideas that combine known facts to explain how part of the world works.   Newton's laws of motion and gravity explained the then-known facts and still do quite well despite now being recognized as special cases of Einstein's theory of general relativity.  The sliding filament theory of muscle contraction explains the observed bands within a muscle cell and changes in the positions of those bands as a muscle contracts.  Optimal foraging theory explains choices animals make when selecting food.  Scientists test theories by using them to predict the results of experiments yet to be conducted and observations yet to be taken, then do the experiments and take the observations and compare the actual results with the predicted results.  If the predicted and actual results match reasonably well, great.  If the predicted do not match the actual, it's time to modify the theory until they do match or time to come up with an entirely new theory that matches the known facts and predicts the existence of new facts better than the old theory.

What does this mean for the theories of anthropogenic climate change and evolution?  Simple.  They are both scientific theories based on long-known facts (the order of the fossil record was elucidated in the late 1700s and the greenhouse effect was first proposed in the 1820s) and which have been successfully predicting the results of experiments and observations since the 19th century.  If skeptics really want to debunk those theories, they wouldn't be relying on the ignorance of the general population as to the definition of theory and fact in science.  If they really wanted to debunk those theories, they would come up with theories that fit the known facts just as well as those theories and predict the results of new experiments and observations better than those theories.  Then they would actually do experiments and take observations to show that their new theory fits the known facts and predicts experimental/observational results better.  Of course, that would actually mean learning the known facts about climate change and evolution in the first place and being willing to actually test their ideas against reality.  And that's something most aren't willing to do.  It's much easier to just take ignorant potshots from an armchair than to do the actual work required to create and validate their own theory.

Saturday, August 10, 2013

Time series decomposition in R

Time series analysis is one of those scary sounding terms that in reality is very simple.  All it means is that you have data where the independent variable is time (seconds, minutes, hours, days, weeks, months, or years) and a dependent that changes over time.  Time series analysis is just methods for detecting trends in the dependent variable over time.

The most basic time series analysis is linear regression, which I previously covered here.  In this post, I'll discuss time series decomposition.  Time series decomposition means that you break a time series into its constituent parts: Trend, seasonal, and random.  Seasonal means changes that occur in a regular cycle over the course of a year.  Random is random fluctuations within a time series that are neither part of the seasonal pattern nor the trend.  I'll demonstrate time series decomposition using Antarctic sea ice data.

First, a graph of monthly average Antarctic sea ice since satellite records began in October 1978.

And here's the code I used to create it:
Ice = subset(Climate, Time>=1978.75) #create a new data object
Antarctic = ts(Ice$Antarctic.Ice, start = c(1978,10), frequency = 12)
plot(Antarctica, ylab="Sea Ice Extent (millions of square kilometers)", main="Monthly average Antarctic sea ice extent since Oct. 1978")
A quick note on the second line of code.  The "ts" command means simply "create a time series".  "Ice$Antarctic.Ice" is the standard way to isolate one single column from a larger data object.  The "start = c(1978, 10)" code just tells R that the new time series starts in October 1978 and "frequency" is how many times data is recorded over the course of a year.

Antarctic sea ice is clearly dominated by a seasonal cycle, making it difficult to see any trend in the data.  That cycle will also mess up any trend analysis, as it will make the standard errors unnecessarily large, thereby decreasing the chances of detecting significant trends.  A linear regression of the raw data gives us this:
Antarctica.lm = lm(Antarctica~time(Antarctica)) #time(Antarctica) means extract the time component
lm(formula = Antarctica ~ time(Antarctica))
    Min      1Q  Median      3Q     Max
-9.2403 -5.6873  0.2623  5.1676  7.4690
                            Estimate     Std. Error     t value    Pr(>|t|)
(Intercept)           -15.29993   54.25622     -0.282     0.778
time(Antarctica)     0.01346     0.02718      0.495     0.621
Residual standard error: 5.568 on 415 degrees of freedom
Multiple R-squared: 0.0005902, Adjusted R-squared: -0.001818
F-statistic: 0.2451 on 1 and 415 DF,  p-value: 0.6208
In this analysis, the trend of +13,460 km2 per year is nowhere near statistically significant p = 0.6208.  The first step to removing the seasonal cycle is to confirm its existence using Fourier spectral analysis:
Antarctica.diff = diff(Antarctica) #remove any potential trends by differencing the data
spectrum(Antarctica.diff, method=c("ar"))

The main peak is at 1 year and you can see decreasing harmonics of that cycle at 2 years, 3, years....  Removing the cycle can be done in two ways.  The first method is to decompose the data and then subtracting the seasonal component to get seasonally adjusted data.
Antarctica.decomp = decompose(Antarctica)

If you want individual plots of the trend, seasonal, or random components, just type plot(Antarctica.decomp$trend), plot(Antarctica.decomp$seasonal), or plot(Antarctica.decomp$random).  Seasonally adjusted data is made by subtracting the seasonal component out of the original time series:
Antarctica.seasonal = Antarctica - Antarctic.decomp$seasonal
plot(Antarctica.season, ylab="Sea ice extent (millions of square kilometers)", main="Seasonally adjusted Antarctic sea ice extent")
 The trend is clearer although the data is still messy, especially with that large spike in the late 1980s.  Calculating the trend by linear regression shows us this:
lm(formula = Antarctica.season ~ time(Antarctica.season))
     Min       1Q   Median       3Q      Max
-1.15876 -0.26007 -0.01012  0.26456  3.02754
                                       Estimate       Std. Error    t value     Pr(>|t|)  
(Intercept)                       -21.090252   4.393670   -4.80        2.22e-06 ***
time(Antarctica.season)   0.016377      0.002201    7.44        5.83e-13 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4509 on 415 degrees of freedom
Multiple R-squared: 0.1177, Adjusted R-squared: 0.1156
F-statistic: 55.36 on 1 and 415 DF,  p-value: 5.826e-13
Removing the yearly cycle has made detecting the trend in the data much easier.  The trend of +16,377 km2 per year is highly significant (p = 0.0000000000005826).  However, the seasonally-adjusted method has one major weakness.  It assumes that the magnitude of the yearly cycle does not change over time.  If the magnitude of the cycle is changing over time (like it is in the Arctic), then seasonally adjusting the data will result in part of the yearly cycle remaining in the data.  If that is the case, go straight to a moving average.

You can get a moving average from the decomposition data—the trend component is a 12-month moving average, averaging the 6 months prior and the 6 months after the target month.  Plotting the trend and fitting a regression line to it is simple:
Antarctica.trend = lm(Antarctica.decomp$trend~time(Antarctica.decomp$trend))
plot(Antarctica.decomp$trend, ylab="Sea ice extent (millions of square kilometers)", main="12-month moving average of Antarctic sea ice extent")
abline(Antarctica.trend, lwd=2, col="red")
This method also removes the random component from the time series in addition to the seasonal cycle, making the overall trend extremely easy to analyze.  The overall trend of +16,509 km2 per year is extremely significant (p < 0.00000000000000022).

If you just want to go straight to the moving average without first decomposing the data, you can do so using the SMA command from the TTR package.  If you don't already have TTR, download it either via your Packages Installer or via command line:
and follow the prompts.  Load TTR into R with
then run the simple moving average command using a 12-month running average and analyze it:
Antarctica.sma = SMA(Antarctica, n = 12)
Antarctica.sma.lm = lm(Antarctica.sma~time(Antarctica.sma))
plot(Antarctica.sma, ylab="Sea ice extent (millions of square kilometers)", main="12-month moving average of Antarctic sea ice extent")
abline(Antarctica.sma.lm, lwd=2, col="red")
The end result is the nearly the same as using the trend component from decomposition analysis, showing a trend of 16,463 km2 per year (p < 0.00000000000000022).  The difference between the decomposition trend and the SMA trend is a slight difference in how SMA calculates the moving average.  Whereas decomposition calculates the moving average of the previous and following 6 months, SMA calculates the moving average of the previous 11 months plus the target month.


Just because I know someone will ask, here's the decomposed trend for the Arctic:

With a trend is -54,533 km2 per year (p < 0.00000000000000022), the Arctic is losing far more sea ice per year than the Antarctic is gaining.

Monday, August 5, 2013

The logic is still the same...

even after six years.  This video from 2006 shows the logical absurdity of denying climate change.

Sunday, August 4, 2013

A quick tutorial in R

I've had a request for a quick tutorial on how to get started using R.  R is a statistical language that analyzes data and saves results in files called "objects."  You can then use the objects to create new analyses and objects.  Note: Wherever you see a "#", what follows is a comment.  You can delete that comment before running the code or leave it in—R ignores anything that follows a # sign.  Here's how I get my students started.

Thursday, August 1, 2013

Yet another nail in the coffin for the "No warming since 1998" claim

As I've already covered in the past (here, here, and here), the claim that the Earth hasn't warmed since 1998 is pure bunk.  Today, I re-ran a linear regression analysis and discovered that UAH satellite temperature data now shows statistically significant warming since 1998:

Invasive species: The impact of Emerald Ash Borer

The impact of invasive species on native species is hitting close to home where I live right now.  Ash trees (Fraxinus sp.) were popular shade trees in my area.  However, that is changing rapidly.  Wherever I drive, I see ash trees with dead tops and thick bunches of green leaves near the crotches of their main branches due to young sprouts.  Classic signs of Emerald Ash Borer (Agrilus planipennis) infestation.