juan_gandhi: (Default)
[personal profile] juan_gandhi
Но я наслаждаюсь аргументами противников теории ГП. Красиво гонят, такое ощущение, что это какая-то умственная паника.

Типа почему никакого ГП нету вообще:

- да последние пять лет самые холодные за наблюдаемую историю вообще;

- у нас на Магадане морозы надоели уже, пусть уже потеплее будет;

- в 1500-м 97% ученых считали, что Земля неподвижна, а кто был не согласен, того на костре сжигали;

- на Марсе тоже ледники тают;

- всех интересует только температура на поверхности, а что творится на высоте 10км, никого не интересует;

- при динозаврах вообще стояла жарища;

- инерция поведения океана: вода, что сейчас выходит к поверхности на Бермудах (и дальше идет в качестве Гольфстрима), шла от Антарктиды, вдоль Южной Америки, примерно тысячу лет;

- Грета Тунберг в школу давно не ходила;

- Индии и Китаю вообще пофиг какая температура стоит, им первым делом надо народ накормить;

- так нам в Калифорнии чего конкретно ожидать-то, засухи или наводнений? А то каждый год новости;

- кто-нибудь вообще изучал вопрос изменения поведения Солнца за последние 50-100 лет?

Date: 2020-01-08 03:35 pm (UTC)
From: [personal profile] sassa_nf
I understand everything about hypothesis testing. I am not clear about the preconditions being met.


"the mean value is far enough from zero so that it can't be just a natural fluctuation"

A natural fluctuation of what? Of the trend? Do we conclude "b" is nonzero? I think we will only find out whether the hypothesis that the function is linear of some b must be rejected.

Date: 2020-01-08 03:49 pm (UTC)
chaource: (Default)
From: [personal profile] chaource
The trend function is a + b*t, so the trend coefficient is an unknown value "b". We need to decide whether b = 0 (null hypothesis) or not. We can only compute a quantity "B", which is an unbiased estimator of "b" (least squares linear estimator). The quantity "B" is a random quantity whose probability distribution has mean E[B] which is equal to "b". We can also estimate the standard deviation of "B", which we call "Sigma". If | E[B] | > 2 * Sigma, we can conclude that it is unlikely that b = 0 - because we got a value of the estimator "B" that is so far from zero.

If we imagine performing this observation many times (for different time epochs, or for different imaginary Earths), we will get different values of "B" - even though "b" remains the same. This is what I called the "natural variability". We can then plot the histogram of different values of "B" and see how likely it is that actually b = 0. A simple way of doing this is assuming Gaussianity and checking 2*Sigma. If the distribution is very far from Gaussian, we would have to, say, find "B" for many 10-year intervals throughout the data set and plot a histogram of its values. But I don't think it's far from Gaussian.

We can also check whether the distribution of "noise" U(t) looks like a Gaussian. Take the entire data set T(t) from 1901 to 2010, estimate a + b*t as A + B*t using least squares, and subtract from the temperature. The process T(t) - A - B*t has zero mean by construction, and is close to U(t). We can then plot the histogram of its values and see if the distribution is sufficiently close to Gaussian. I would expect that it will be.

Your calculation so far, with 10-year intervals, shows that "B" is distributed roughly within the interval [-0.2, 0.2]. If you plot a histogram of "B" for all 10-year intervals, you will probably see something like a Gaussian curve with mean close to zero. So, one will have to conclude that 10-year intervals have too much natural variability to show that the true trend "b" is nonzero; the null hypothesis cannot be refuted.
Edited Date: 2020-01-08 04:10 pm (UTC)

Date: 2020-01-08 04:52 pm (UTC)
From: [personal profile] sassa_nf
I get this reasoning. But I have a question that precedes this.

Before estimating "b" in a + b*t, we test whether it is just a. This is not the same as estimating "b", because the result is not a choice between a non-zero "b" and a zero "b". (A "zero" b is already an estimate of b, right?) This is a test whether temperature is just a random quantity. (Perhaps, with some autocorrelation)

Then we estimate "b" to see whether a + b*t is a good description of a "trend". But it can also be seen as a test to see whether a + b*t is a good estimator of temperature. This is where I am stuck. Consequently, when we reject "b" as not a good estimate, we are not rejecting a concrete value of it, we are rejecting the hypothesis that the temperature estimator is a linear function.

Date: 2020-01-08 04:57 pm (UTC)
chaource: (Default)
From: [personal profile] chaource
I don't understand this question. I don't see how we could possibly test the hypothesis that T(t) = a + U(t), without estimating b. What exactly do we compute and how do we decide that T(t) is "just a" or not "just a"?

The only way to deal with the question of trend is, in my view, is to assume that T(t) = a + b* t + U(t) and to estimate "b". We can also assume other models, e.g. T(t) = a + b*t + c*t*t + U(t), etc.

Date: 2020-01-08 05:05 pm (UTC)
From: [personal profile] sassa_nf
Well, can we reject the null-hypothesis that mean(T(t)) == mean(U(t))? That tells us whether it is "just a". Well, not "just a", but "something added".

I probably haven't formulated what the sticking point is for me with estimating "b".

We start with stating that our model here is a good estimate of T(t), which we show is non-linear, and want to compute "a trend". Then we build another function that is linear, "a trend", and essentially we want to prove that that is also a good estimate of T(t)! But ok, maybe the difference is that T(t) is for the whole century, and "a trend" is for a smaller period of time.
Edited Date: 2020-01-08 05:47 pm (UTC)

Date: 2020-01-08 06:01 pm (UTC)
chaource: (Default)
From: [personal profile] chaource
Well, can we reject the null-hypothesis that mean(T(t)) == mean(U(t))?

That's my question, again: what exactly do we compute in order to reject this as a null hypothesis? I don't understand how we could do that. We can certainly ask whether the mean of T(t) is zero or not, but that would be the same as asking whether a = 0 or not. It's not asking whether T(t) is "just a" or not "just a". Can you describe what calculation needs to be performed with the T(t) data in your excel table in order to decide this first null-hypothesis?

We start with stating that our model here is a good estimate of T(t), which we show is non-linear, and want to compute "a trend".

I don't understand what you are saying. What exactly does it mean that T(t) is "non-linear"?

Date: 2020-01-08 08:17 pm (UTC)
From: [personal profile] sassa_nf
Compare whether means of two distributions are the same - something like this: http://www.stat.yale.edu/Courses/1997-98/101/meancomp.htm There's a bunch of approaches, but this will do for the sake of the discussion.


"What exactly does it mean that T(t) is "non-linear"?"

Maybe I am confused. Let me try and untangle what I was thinking.

So, we have a weather model, M(t). This model is a function that is not linear - that is, M(x+y)=M(x)+M(y) ∧ M(k*x)=k*M(x) does not hold. In other words, there is no f(x)=a+b*x such that M(x)=f(x)+random noise. Put yet another way, we've spent so much effort modelling weather systems because they are far more complex than just a straight line.

Now we want to show that temperature has "a trend". The proposal is to estimate a and b such that the trend T(t)=a+b*t. Then we have some test procedure (which I understand).

My question is, aren't the two propositions at odds with each other? If we prove M(x) is the best fit, why do we bother with "a trend", T(t)? If we prove T(t) is the best fit, why do we bother with the model, M(x)?


Now: of course we can find a and b using least mean squares; this will estimate the slope that describes the temperature growth with the least error. But isn't this supposed to not be a good fit, ever, so it is meant to have a non-random noise added to it ("the least error" is not guaranteed to be distributed normally with mean 0)? We've got to reject "a trend" every time we accept the model to be non-linear!

Date: 2020-01-08 08:23 pm (UTC)
chaource: (Default)
From: [personal profile] chaource
Compare whether means of two distributions are the same
Which two distributions are you going to compare? I only see one distribution, namely the observed T(t).

Now, as for non-linear models, of course M(t) should be a weather model that is based on some equations of atmospheric physics. But we are not considering the question of how to compute M(t). We are simply asking the question: does the observed temperature T(t) exhibit a growth trend or not, i.e. can we say that, on average, the temperature grows by X degrees per century. This implies a simple linear model, T(t) = a + b*t + U(t) where U(t) is some zero-based random noise.

Date: 2020-01-08 08:32 pm (UTC)
From: [personal profile] sassa_nf
Oh, I see. I thought U(t) was autocorrelation describing some solar cycles, or the like. Then I was going to test whether observed T(t) and the autocorrelation U(t) match. If they don't, we know that the solar cycles are not enough, and the hunt for a better model is on.

The essence of the question about linear trend remains the same. If we manage to accept T(t)=a + b*t + random noise, we should be using T(t) as the weather predictor, not a more complex function. So I am a bit at a loss why there is some hope of having a suitable "trend" defined after some 100 years, or ever.


A completely different way of looking at it is: the question of temperature having a trend is like a question of knowing the slope of a derivative. It has no predictive power. (Like, "what's the trend of a sine at 2019?")
Edited Date: 2020-01-08 08:39 pm (UTC)

Date: 2020-01-08 09:14 pm (UTC)
chaource: (Default)
From: [personal profile] chaource
Yes, this is a very good analogy. What is the trend of T = sin(t) at t=2019? We can compute that by averaging over some period of time (e.g. between 2019 and 2019.001) and find the "trend". But it will have little predictive power if we do that over such a small interval. If we compute the trend of sin(x) over a very long interval, we will get approximately zero trend.

Now, of course, sin(t) is a highly repetitive function, so if M(t) = sin(t) is the correct model of temperature, we would have a lot of predictive power. But this is not the case with temperature: we can't even predict ordinary weather for more than a week in advance, let alone for 100 years. We don't have M(t).

I am asking a very limited question: given the observed T(t), can we see a growth trend, and can we say with certainty that the temperature is growing, and that it is growing faster after 1950 than it was growing before 1950?

If somebody says that today we have "global warming" and that it has accelerated in recent years, can we verify this with observational data? What exactly do these words mean, "the temperature is growing" and "it is growing faster than before", in terms of observational data? That's all I'm asking. And we can answer that question simply by doing statistical analysis on the observed data for T(t), with no need for complicated physics.

Now, of course, that analysis will tell us nothing about predicting T(t) for the next 1000 years. It will have no predictive power for the next 1000 years. But predicting for the next 1000 years is a very hard question - and not the question I'm asking.
Edited Date: 2020-01-08 09:23 pm (UTC)

Date: 2020-01-09 08:58 am (UTC)
From: [personal profile] sassa_nf
Ok, I understand your intent. Now can we go back to the meaning of fitting "b"? Further up you mentioned that if we find "b" for different intervals, at some point it starts looking like "2 sigma different". What are you comparing to what? "b"s before 1950 and "b"s after 1950? But are "b"s before 1950 independent and identically distributed? I am not sure in what sense we can treat "b"s as a quantity to which we can apply the methods that we want to apply.

Date: 2020-01-09 01:40 pm (UTC)
chaource: (Default)
From: [personal profile] chaource
So, we are back to the original question - estimate "b" from a given dataset T(t) using the assumption that T(t) = a + b*t + U(t) where U is unknown noise with zero mean.

You performed a calculation where you estimated "b" linearly from different time intervals. Let us first assume that the true value "b" is the same for all time from 1900 to 2020. Then you can perform linear fit for "b" with different time intervals. For example, take the 20-year intervals 1900-1920, 1901-1921, 1902-1922 and so on until 2000-2020. The result will be 100 different estimates of "b". They are not independent, of course, but highly correlated. Nevertheless, you can look at the resulting distribution of estimates and see if there is evidence that the mean of "b" is not zero.

You can compute the mean and the standard deviation of the set of 100 estimates of "b". Roughly, if the mean is > 2 stdev then the mean is nonzero with high confidence. You can also use other statistical tests for nonzero mean, of course.

Date: 2020-01-09 03:37 pm (UTC)
From: [personal profile] sassa_nf
I see. I think I understand now.

I think the criticism remains in force. If the "b"s are not iid, then "the mean is > 2 stdev" may not apply. The problem is not only the correlation between "b"s (one flavour of "dependent"), but also how they are going to be distributed (another flavour of "dependent"). Put differently, if you were to draw samples of 20 normally distributed values, and computed "b"s, would such "b"s be distributed normally? If not, then why would the two-sigma rule be meaningful?

Date: 2020-01-09 03:43 pm (UTC)
chaource: (Default)
From: [personal profile] chaource
The two-sigma rule is very rough. Of course, the 100 values of "b" are not independent (although they are identically distributed). To be correct, you need to calculate the correlation between all of them. But as a very rough guide, you can take two sigma. In a more precise calculation, such as in my Fourier-based analysis, you can exactly account for correlations.

Profile

juan_gandhi: (Default)
Juan-Carlos Gandhi

May 2025

S M T W T F S
    1 2 3
456 7 8 9 10
11 121314151617
181920 21 222324
25262728293031

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated May. 23rd, 2025 09:33 am
Powered by Dreamwidth Studios