juan_gandhi: (Default)
Juan-Carlos Gandhi ([personal profile] juan_gandhi) wrote2020-01-04 04:51 pm
Entry tags:

не в моей компетенции, конечно

Но я наслаждаюсь аргументами противников теории ГП. Красиво гонят, такое ощущение, что это какая-то умственная паника.

Типа почему никакого ГП нету вообще:

- да последние пять лет самые холодные за наблюдаемую историю вообще;

- у нас на Магадане морозы надоели уже, пусть уже потеплее будет;

- в 1500-м 97% ученых считали, что Земля неподвижна, а кто был не согласен, того на костре сжигали;

- на Марсе тоже ледники тают;

- всех интересует только температура на поверхности, а что творится на высоте 10км, никого не интересует;

- при динозаврах вообще стояла жарища;

- инерция поведения океана: вода, что сейчас выходит к поверхности на Бермудах (и дальше идет в качестве Гольфстрима), шла от Антарктиды, вдоль Южной Америки, примерно тысячу лет;

- Грета Тунберг в школу давно не ходила;

- Индии и Китаю вообще пофиг какая температура стоит, им первым делом надо народ накормить;

- так нам в Калифорнии чего конкретно ожидать-то, засухи или наводнений? А то каждый год новости;

- кто-нибудь вообще изучал вопрос изменения поведения Солнца за последние 50-100 лет?

[personal profile] sassa_nf 2020-01-08 11:44 am (UTC)(link)
Number crunching time.

I don't know what is glossed over in all the IPCC reports, so let's take a look what to look out for:

https://data.giss.nasa.gov/tmp/gistemp/STATIONS/tmp_425003913920_14_0/station.txt - I don't recall how I got this reference, but it's some NASA data set. You will observe lots of 999.9 - that's missing data. Not sure why it would be missing in the recent years, but ok. Also, it's not global temperature, but let's take a look anyway, to see how the temperature *proxy* can behave.

What's the meaning of "trend"? Is this just a linear regression fitted with smallest error? If so, we can do linest in excel.

2007..2019 - yes, 0.04 C / year, it's warming up a lot!

but 1901..2019 - no, 0.005C / year, it's not warming up that much. Well, the claim is that in the last decade it started warming up much faster? ok, let's compute 10-year linest for all decades.

I get everything between -0.217 C / year for the decade 1986..1996, to 0.187 C / year for the decade 1978..1988.

Surely, I am doing something lame here, but I want to use this as the baseline dumb thing. This little exploration is the reason why when people say "trend" I want to know more about what they mean by that. To make sure they aren't doing the same lame thing I am doing.
chaource: (Default)

[personal profile] chaource 2020-01-08 12:35 pm (UTC)(link)
I can't open the link to station.txt.

Trend can be estimated in various ways, e.g. by first detecting "seasonality" and removing it, or by simple linear regression. The result will be a quantity (a trend estimator) that has a statistical error. The main question is to compute that error, e.g. the standard deviation of the trend estimator. This is what I called "sigma".

One can compute "sigma" in various ways, but one way is to take the linear estimates of trend over different 10-year intervals and see how those trends typically differ. Then do the same for 20-year intervals. Then for 30-year intervals. For example, taking a linear estimate over various 10-year intervals, you get trend numbers between -0.2 to 0.2. This is just due to natural variability and long-term correlations of temperature. You cannot reduce this uncertainty except by taking longer time intervals. Taking linear estimates for 30-year intervals, I expect you to get numbers between -0.1 and 0.1 or so. Eventually, when you take a large enough time interval, you will start seeing a trend value that is definitely two sigma away from zero. The main question is, how large must be the time interval for that. I estimated 100 years.

After that, there is the second question - did the trend change after 1950. Again, you need to show this beyond two sigma. Right now, I think the data is insufficient for that. Using my estimates of sigma, I expect that we need to wait until at least 2050 to see if the trend is the same or different after 1950.

Because of the natural variability, it is impossible to show that the trend is equal to X in 2000-2010 and to Y in 2010-2020. The values X and Y are statistically indistinguishable because of too much natural variability and because of large time correlations in the temperature. The data shows very clearly that a decade is far too short for us to be able to estimate the trend reliably - even if we had no missing data, and even if we could measure the temperature with precision of 0.001 C every microsecond at every point on the Earth and in the atmosphere over a 1 mm grid in three dimensions. Natural variability together with time correlations makes this impossible.
Edited 2020-01-08 13:03 (UTC)

[personal profile] sassa_nf 2020-01-08 01:58 pm (UTC)(link)
Re link: hmmm... it is odd, but it is from here - download monthly data as text.
chaource: (Default)

[personal profile] chaource 2020-01-08 02:02 pm (UTC)(link)
Most likely, the site checks whether the "referer" in the browser is set correctly. Thank you!

[personal profile] sassa_nf 2020-01-08 02:03 pm (UTC)(link)
Yes, what you are saying makes a lot of sense to me. I am wondering what hides behind the "consensus" of "a trend of 0.2 C / decade". There must be lots of caveats, but they don't get talked about.

[personal profile] sassa_nf 2020-01-08 02:25 pm (UTC)(link)
"Eventually, when you take a large enough time interval, you will start seeing a trend value that is definitely two sigma away from zero."

I'd like to understand this bit more. On what grounds do we use "two sigma" here? Is the slope ("the trend") meant to be normally distributed?

I mean, I can't wrap my head around double assumption: If T were a normally distributed value, then "the trend" would be some f(T) that is not necessarily normally distributed, but would be for some f, and won't be for some other f. But for "the trend" to be non-zero, T must be not a normally distributed value; why do we assume f(T) is normally distributed?.. How do we know that's the right hypothesis to test?

I mean, even the absence of normal distribution for f(T) doesn't mean there is "a trend" - T may still be a normally distributed random value, but f is such that f(T) is not normally distributed - because f(T) are not i.i.d. (not "independent").
chaource: (Default)

[personal profile] chaource 2020-01-08 02:41 pm (UTC)(link)
The main assumption is that T(t) as a function of time (t) is a sum of a zero-mean unknown "noise" U(t) and a linear trend a+b*t. We are estimating the trend coefficient "b". For simplicity, we may also assume that the "noise" is a Gaussian random process with zero mean and some fixed, stationary auto-correlation function.

The least-squares estimator for the trend coefficient "b" is a linear function of T(t). So, the trend estimator is then also a Gaussian distributed value with some (possibly nonzero) mean and some standard deviation. Our goal is to compute the mean and the standard deviation of the estimator for "b". This was the main goal in my Fourier-based analysis that I pointed out before.

The assumption of Gaussian or normal distributions is not at all important. We will conclude that the trend is nonzero not because the distribution is not normal or not Gaussian, but because the mean value is far enough from zero so that it can't be just a natural fluctuation. The mean value and the standard deviation are defined just as well for non-Gaussian distributions.

Even if the distribution of noise is not Gaussian, we still assume that it has zero mean (by definition, it is "noise" and has no trend). So, we can still assume that U(t) has some fixed auto-correlation function. The estimator for "b" is still a linear function of T(t), so we can compute the standard deviation of b.

The usual procedure is to require that some value is beyond 2 sigma away from zero. The null hypothesis is that there is zero trend. We can refute the null hypothesis at high confidence if we show that the mean of "b" is larger than 2 sigma.
Edited 2020-01-08 14:48 (UTC)
chaource: (Default)

[personal profile] chaource 2020-01-08 03:07 pm (UTC)(link)
What procedure would instead be appropriate for a drilling process? What are the estimated quantities, and how should they be estimated?

[personal profile] sassa_nf 2020-01-08 03:35 pm (UTC)(link)
I understand everything about hypothesis testing. I am not clear about the preconditions being met.


"the mean value is far enough from zero so that it can't be just a natural fluctuation"

A natural fluctuation of what? Of the trend? Do we conclude "b" is nonzero? I think we will only find out whether the hypothesis that the function is linear of some b must be rejected.
chaource: (Default)

[personal profile] chaource 2020-01-08 03:49 pm (UTC)(link)
The trend function is a + b*t, so the trend coefficient is an unknown value "b". We need to decide whether b = 0 (null hypothesis) or not. We can only compute a quantity "B", which is an unbiased estimator of "b" (least squares linear estimator). The quantity "B" is a random quantity whose probability distribution has mean E[B] which is equal to "b". We can also estimate the standard deviation of "B", which we call "Sigma". If | E[B] | > 2 * Sigma, we can conclude that it is unlikely that b = 0 - because we got a value of the estimator "B" that is so far from zero.

If we imagine performing this observation many times (for different time epochs, or for different imaginary Earths), we will get different values of "B" - even though "b" remains the same. This is what I called the "natural variability". We can then plot the histogram of different values of "B" and see how likely it is that actually b = 0. A simple way of doing this is assuming Gaussianity and checking 2*Sigma. If the distribution is very far from Gaussian, we would have to, say, find "B" for many 10-year intervals throughout the data set and plot a histogram of its values. But I don't think it's far from Gaussian.

We can also check whether the distribution of "noise" U(t) looks like a Gaussian. Take the entire data set T(t) from 1901 to 2010, estimate a + b*t as A + B*t using least squares, and subtract from the temperature. The process T(t) - A - B*t has zero mean by construction, and is close to U(t). We can then plot the histogram of its values and see if the distribution is sufficiently close to Gaussian. I would expect that it will be.

Your calculation so far, with 10-year intervals, shows that "B" is distributed roughly within the interval [-0.2, 0.2]. If you plot a histogram of "B" for all 10-year intervals, you will probably see something like a Gaussian curve with mean close to zero. So, one will have to conclude that 10-year intervals have too much natural variability to show that the true trend "b" is nonzero; the null hypothesis cannot be refuted.
Edited 2020-01-08 16:10 (UTC)

[personal profile] sassa_nf 2020-01-08 04:52 pm (UTC)(link)
I get this reasoning. But I have a question that precedes this.

Before estimating "b" in a + b*t, we test whether it is just a. This is not the same as estimating "b", because the result is not a choice between a non-zero "b" and a zero "b". (A "zero" b is already an estimate of b, right?) This is a test whether temperature is just a random quantity. (Perhaps, with some autocorrelation)

Then we estimate "b" to see whether a + b*t is a good description of a "trend". But it can also be seen as a test to see whether a + b*t is a good estimator of temperature. This is where I am stuck. Consequently, when we reject "b" as not a good estimate, we are not rejecting a concrete value of it, we are rejecting the hypothesis that the temperature estimator is a linear function.
chaource: (Default)

[personal profile] chaource 2020-01-08 04:57 pm (UTC)(link)
I don't understand this question. I don't see how we could possibly test the hypothesis that T(t) = a + U(t), without estimating b. What exactly do we compute and how do we decide that T(t) is "just a" or not "just a"?

The only way to deal with the question of trend is, in my view, is to assume that T(t) = a + b* t + U(t) and to estimate "b". We can also assume other models, e.g. T(t) = a + b*t + c*t*t + U(t), etc.

[personal profile] sassa_nf 2020-01-08 05:05 pm (UTC)(link)
Well, can we reject the null-hypothesis that mean(T(t)) == mean(U(t))? That tells us whether it is "just a". Well, not "just a", but "something added".

I probably haven't formulated what the sticking point is for me with estimating "b".

We start with stating that our model here is a good estimate of T(t), which we show is non-linear, and want to compute "a trend". Then we build another function that is linear, "a trend", and essentially we want to prove that that is also a good estimate of T(t)! But ok, maybe the difference is that T(t) is for the whole century, and "a trend" is for a smaller period of time.
Edited 2020-01-08 17:47 (UTC)
chaource: (Default)

[personal profile] chaource 2020-01-08 06:01 pm (UTC)(link)
Well, can we reject the null-hypothesis that mean(T(t)) == mean(U(t))?

That's my question, again: what exactly do we compute in order to reject this as a null hypothesis? I don't understand how we could do that. We can certainly ask whether the mean of T(t) is zero or not, but that would be the same as asking whether a = 0 or not. It's not asking whether T(t) is "just a" or not "just a". Can you describe what calculation needs to be performed with the T(t) data in your excel table in order to decide this first null-hypothesis?

We start with stating that our model here is a good estimate of T(t), which we show is non-linear, and want to compute "a trend".

I don't understand what you are saying. What exactly does it mean that T(t) is "non-linear"?

[personal profile] sassa_nf 2020-01-08 08:17 pm (UTC)(link)
Compare whether means of two distributions are the same - something like this: http://www.stat.yale.edu/Courses/1997-98/101/meancomp.htm There's a bunch of approaches, but this will do for the sake of the discussion.


"What exactly does it mean that T(t) is "non-linear"?"

Maybe I am confused. Let me try and untangle what I was thinking.

So, we have a weather model, M(t). This model is a function that is not linear - that is, M(x+y)=M(x)+M(y) ∧ M(k*x)=k*M(x) does not hold. In other words, there is no f(x)=a+b*x such that M(x)=f(x)+random noise. Put yet another way, we've spent so much effort modelling weather systems because they are far more complex than just a straight line.

Now we want to show that temperature has "a trend". The proposal is to estimate a and b such that the trend T(t)=a+b*t. Then we have some test procedure (which I understand).

My question is, aren't the two propositions at odds with each other? If we prove M(x) is the best fit, why do we bother with "a trend", T(t)? If we prove T(t) is the best fit, why do we bother with the model, M(x)?


Now: of course we can find a and b using least mean squares; this will estimate the slope that describes the temperature growth with the least error. But isn't this supposed to not be a good fit, ever, so it is meant to have a non-random noise added to it ("the least error" is not guaranteed to be distributed normally with mean 0)? We've got to reject "a trend" every time we accept the model to be non-linear!

(no subject)

[personal profile] chaource - 2020-01-08 20:23 (UTC) - Expand

(no subject)

[personal profile] sassa_nf - 2020-01-08 20:32 (UTC) - Expand

(no subject)

[personal profile] chaource - 2020-01-08 21:14 (UTC) - Expand

(no subject)

[personal profile] sassa_nf - 2020-01-09 08:58 (UTC) - Expand

(no subject)

[personal profile] chaource - 2020-01-09 13:40 (UTC) - Expand

(no subject)

[personal profile] sassa_nf - 2020-01-09 15:37 (UTC) - Expand

(no subject)

[personal profile] chaource - 2020-01-09 15:43 (UTC) - Expand
pappadeux: (Default)

[personal profile] pappadeux 2020-01-10 12:26 am (UTC)(link)
я, кажется, понимаю как Вы переоцениваете погрешность

> T(t) as a function of time (t) is a sum of a zero-mean unknown "noise" U(t) and a linear trend a+b*t.

нет, если брать за основу осредненную за год температуру, то, что обычно печатается наблюдателями и моделями, там нужно учесть многолетние известные изменения.

Речь, прежде всего, идет об осцилляциях (мальчик-девочка). Грубо говоря, годовое осреднение убирает сезонные колебания, но оставляет многолетние регулярные эффекты.

Т.е. функция дб a + O(t) + GW(t) + noise

Реально в первом приближении, емнип, достаточно учитывать осцилляции и 11-тий солнечный цикл.

Нельзя известные нам явления запихивать в шум

[personal profile] sassa_nf 2020-01-10 07:54 am (UTC)(link)
The discussion is about how the growth rate is extracted from observable data.

Then once we extract that, how is the confidence interval estimated? Surely this should work when we don't have any model?

(Of course, knowing some cycles and oscillations is going to help, but shouldn't be a prerequisite, right? Otherwise we can't tell the growth rate until we have the exact model!)

Then the growth rates that are alarming, did they publish confidence intervals? I can't find that easily.
chaource: (Default)

[personal profile] chaource 2020-01-10 09:36 am (UTC)(link)
достаточно учитывать осцилляции и 11-тий солнечный цикл

Солнечный циклъ понятно, а что конкретно вы здѣсь назвали словомъ "осцилляцiи"? Каковъ перiодъ этой функцiи и каковъ физическiй механизмъ, порождающiй эти колебанiя температуры?

Сезонныя колебанiя температуры уже были убраны передъ моими вычисленiями, т.к. я смотрѣлъ на графикъ среднегодовой температуры. На этомъ графикѣ видны большiя и нерегулярныя осцилляцiи.

Если была бы хорошо предсказываемая, регулярная осцилляцiя перiода 11 лѣтъ, то я полностью согласенъ, ее надо было бы вычесть, а не учитывать какъ шумъ, и я согласенъ, что sigma получилась бы завышена, если этого не сдѣлать. Однако, я не встрѣчалъ упоминанiя солнечнаго цикла или какихъ-либо другихъ извѣстныхъ циклическихъ несезонныхъ осцилляцiй въ статьяхъ объ оцѣнкѣ sigma, прочитанныхъ мной. Какiя статьи вы читали? Дайте ссылки.

Главный вкладъ въ мою оцѣнку sigma, насколько я помню, вносило большое и нерегулярное колебанiе температуры на масштабѣ около 30 лѣтъ. Глобальная температура падала въ началѣ 20 вѣка, потомъ выросла въ 1930-1940 (тогда уже предсказывали полное отсутствiе арктическаго льда къ 1950-му году), потомъ опять падала до 1970-хъ (тогда климатологи писали о томъ, что насъ всѣхъ убъетъ глобальное похолоданiе), потомъ опять росла до 2000-го (климатологи, соотвѣтственно, предсказывали исчезновенiе арктическаго льда къ 2020-му). Ничего изъ предсказаннаго климатологами въ 20-мъ вѣкѣ не случилось, т.е. они сильно занижали sigma въ своихъ расчетахъ.
Edited 2020-01-10 10:09 (UTC)
pappadeux: (Default)

[personal profile] pappadeux 2020-01-10 01:59 pm (UTC)(link)
> Солнечный циклъ понятно, а что конкретно вы здѣсь назвали словомъ "осцилляцiи"?

ENSO, El Nino - La Nina, мальчик-девочка

https://www.weather.gov/mhx/ensowhat
chaource: (Default)

[personal profile] chaource 2020-01-10 02:06 pm (UTC)(link)
Понятно. Но тогда это не является детерминистскими флуктуацiями, вѣдь мы не можемъ предсказать ни ихъ появленiя, ни величины. Читаемъ въ объясненiи weather.gov:

On periods ranging from about three to seven years, the surface waters across a large swath of the tropical Pacific Ocean warm or cool by anywhere from 1°C to 3°C, compared to normal.

Это какъ разъ и есть одна изъ компонентъ хаотическаго шума, т.е. нѣчто такое, которое мы не въ состоянiи предсказать, но при этомъ оно "can have a strong influence on weather across the United States and other parts of the world". Поэтому оно какъ разъ даетъ какой-то вкладъ въ "sigma".
pappadeux: (Default)

[personal profile] pappadeux 2020-01-10 02:54 pm (UTC)(link)
> Но тогда это не является детерминистскими флуктуацiями, вѣдь мы не можемъ предсказать ни ихъ появленiя, ни величины.

<удивившись>

не так чтобы совсем и не можем

есть люди, занимающиеся осцилляциями, есть модели, есть реконструкции прошлых осцилляций

> Это какъ разъ и есть одна изъ компонентъ хаотическаго шума, т.е. нѣчто такое, которое мы не въ состоянiи предсказать

<удивившись еще больше>

почему явление, которое мы, скажем так, не очень хорошо понимаем, сразу становится хаосом?

вполне регулярное явление, реконструируется в прошле века, на графиках температуры зачастую помечают "год мальчика", "год девочки"

> Поэтому оно какъ разъ даетъ какой-то вкладъ въ "sigma".

нет, конечно

осцилляции - это то, что мы знаем, регулярное явление, а не иррегулярный шум. Потому при определении форсинга мы должны осцилляции учитывать отдельным фактором, отдельным членом в уравнении, а не записывать их скопом в шум


касательно солнечного цикла, вот одна из основополагающих, кяп, статей

https://pubs.giss.nasa.gov/docs/2008/2008_Rind_ri07700f.pdf

[personal profile] sassa_nf 2020-01-10 03:34 pm (UTC)(link)
"сразу становится хаосом"

It doesn't. But when you don't have a function for it, you need to test whether there is anything apart from noise. That's the purpose of testing whether we should reject null hypothesis (that all there is, is just white noise).

You start with:

1. ok, this is normal distribution, no function, nothing, just random temperature. Oops, this is so for p-value of 0.5.
2. ok, here is a function that removes solar cycles, now the rest is white noise. Oops, the remainder is white noise for p-value of 0.2
3. ok, here is a function that removes (some other cycle), now the rest is white noise. Oops, the remainder is white noise for p-value of 0.15.
4. ...
5. ok, here is a linear trend of 0.8 C per century, and 0.2 C in the last decade, which removes ever speeding up global warming, now the rest is white noise. Oops, ...what p-value do we get?


(ermmm.... well, I don't do this stuff for living, so I am allowed to make imprecise statements like above. In reality you'd need to turn them "inside out" - because p-value means "if we reject the hypothesis that so and so is white noise, what's the probability of making a mistake", so as we get more and more accurate theories, p-value should be increasing to indicate that what we get is closer and closer to white noise)
Edited 2020-01-11 11:22 (UTC)