In principle, everything was ready today, and production deployment was approved.
And another small perf fix was approved too.
So I merged both, and deployed my oauth to prod. Btw, it's a new quarter, and a lot of management is somewhere away.
Deployed to prod, yes, and then decided, let me see if my fixed api populates the db with new signing keys.
Nope. It was not responding.
Ok, then I went to swagger and tried, first, the apis that I wrote - oops, it hangs, then the other apis on that server - it all hangs. Fuck. Without thinking much, notified my colleagues, and asked them how t.f. can I roll back? Oh, they say, write a pr, run it on Jenkins - with changing the build version back to what you want.
So I did, rolled it back from 1.0.113 to 1.0.112 - the version that was on prod since last week.
Deployed it. Does not work. Fuck.
So I rolled it two versions back, to 1.0.110 - and this one works.
Grabbed, in Kibana, the logs of those failures of my code. It's all Futures timing out after 5 seconds waiting for postgres, in the beginning of the server lifecycle, trying to populate the db, just in case.
WTF is going on I don't know, but I believe there are two things (or three) I should put in before deploying.
First, crash if the db is not healthy on start.
Second, make not more than N attempts before crashing.
Third, log properly.
I know that the db is ok; I can actually populate it with psql, first generating a proper sql statement... and maybe running it from a script. Good idea, no? Maybe I won't need to check the sanity of the db.
I don't know. But it's 22 heures et demie, and I'll think about it tomorrow morning, and deploy a proper solution.
And another small perf fix was approved too.
So I merged both, and deployed my oauth to prod. Btw, it's a new quarter, and a lot of management is somewhere away.
Deployed to prod, yes, and then decided, let me see if my fixed api populates the db with new signing keys.
Nope. It was not responding.
Ok, then I went to swagger and tried, first, the apis that I wrote - oops, it hangs, then the other apis on that server - it all hangs. Fuck. Without thinking much, notified my colleagues, and asked them how t.f. can I roll back? Oh, they say, write a pr, run it on Jenkins - with changing the build version back to what you want.
So I did, rolled it back from 1.0.113 to 1.0.112 - the version that was on prod since last week.
Deployed it. Does not work. Fuck.
So I rolled it two versions back, to 1.0.110 - and this one works.
Grabbed, in Kibana, the logs of those failures of my code. It's all Futures timing out after 5 seconds waiting for postgres, in the beginning of the server lifecycle, trying to populate the db, just in case.
WTF is going on I don't know, but I believe there are two things (or three) I should put in before deploying.
First, crash if the db is not healthy on start.
Second, make not more than N attempts before crashing.
Third, log properly.
I know that the db is ok; I can actually populate it with psql, first generating a proper sql statement... and maybe running it from a script. Good idea, no? Maybe I won't need to check the sanity of the db.
I don't know. But it's 22 heures et demie, and I'll think about it tomorrow morning, and deploy a proper solution.