![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I think I got it where all this bs about passing around numerical ids of entities instead of entity references (maybe lazy) come from. It's like 'error code'. It comes from the ancient c programming, where we just could not allocate a string for a readable piece of text, or for the data that may need some efforts to instantiate or allocate.
In short. It's stupid to pass around "ids" in a program.
In short. It's stupid to pass around "ids" in a program.
no subject
Date: 2015-10-02 01:31 am (UTC)Author is right. It is convenient to have a class wrapping the Id that knows how to handle itself. It is not so convenient as to switch existing large codebase, but was convenient enough that I released projects over the years into production and it stuck after I left.
Also MaybeId and mix-ins for JSON serialization work well. To see the actual value just have toString() implementation and move the mouse over.
As for int as id type - in most of my projects it's not enough. Burns to MAX_INTEGER within a year, easily.
If you create 70 persistent entities per second, 1 year is enough to get to 2.1B. In a lot of cases these days it is more like 1k per second. And even MySQL can handle it, with replication and such. So, definitely long.
no subject
Date: 2015-10-02 01:49 am (UTC)So you reuse int Ids then.
Or do you mean that you keep all 2B records in your database?
If you have 2B/year volume then you probably have performance issues to solve.
long->int conversion doubles table index performance and halves index size.
no subject
Date: 2015-10-02 02:11 am (UTC)They are sharded and/or vertically partitioned later. Some are removed, but, say, a billion or so sticks for a year, in one of the cases. Reuse is much much more expensive in business logic layer (unless you have something trivial like rolling events log) and sometimes causes terrible, hard to catch issues. Indexes are slower, but not twice. I can't say there are never any performance problems, but I never had one related to the size of id.
In couple of systems where I knew I will have to master-master synchronize parts of the database across many offices (rather, data centers) in multiple world locations I even did 128 bit UUIDs as ids right away. Never had an issue.
no subject
Date: 2015-10-02 02:54 am (UTC)If you have 1B records, what do they represent if not something trivial like rolling events log?
What kind of physical server handles 1B records database?
1B long is 8GB just to store values of the index.
no subject
Date: 2015-10-02 06:41 am (UTC)Anything happening frequently enough. E.g. phone calls. Or instant messages.
> What kind of physical server handles 1B records database?
> 1B long is 8GB just to store values of the index.
Dude, you have no idea. Do you seriously think database servers of this magnitude run on something like a MacBook Pro? 8GB is peanuts. Largest databases store petabytes of data.
I sometimes admire your zest when you start arguing about things you have very little experience with. Reminds me of my younger self :) Sorry for the ad hominem.
no subject
Date: 2015-10-02 09:56 am (UTC)Are you suggesting that database size is as critical as index size?
no subject
Date: 2015-10-02 07:08 am (UTC)1B business transactions. Sometimes tiny transactions in terms of money, but still, they require full traceability. I did stuff for telecom, bidding in ad tech, online screen sharing cloud for X11 3D engineering apps, medical teleradiology etc... Don't get me wrong, I'm not alone. There are a lot of smart people needed to plan things right for this scale.
Typical setup is master-master 64G 32 cpu cores each on SSD RAID. But then again, sometimes there are n of those pairs each per vertical partition set, some stuff goes to Mongo or Aerospike etc. It depends, case by case. You do get like couple dozen rdbms servers in usual production one way or another, no matter how much ppl love No SQL.
no subject
Date: 2015-10-02 10:02 am (UTC)At that scale smaller tables typically have faster performance and easier maintenance.
no subject
Date: 2015-10-02 05:18 pm (UTC)In reality, there is both, vertical partitioning (as you proposed) and sharding (managed by db engine, mostly) employed - but these should be used sparingly and avoided whenever reasonably possible.
This here is a wrong place to discuss this stuff anyways. In the world out there things always turn out way more complex than the books and blogs suggest. Partitioning breaks things, A LOT. But sometimes cannot be avoided after certain size.
I don't mean to insult you in any way, I am sure you are an awesome professional with inquiring mind, I was naive too, but with my first real 4 TB database and over 1k transactions per second came a lot of revelations.
E.g. suddenly awesomely stable, vetted for years db engines turn into pumpkins at 12am -just put some real load, not the kind you can generate from a single test server :(
Cheers!
no subject
Date: 2015-10-02 08:33 pm (UTC)It could be long in application layer (C#, Java), but then would be converted by algorithm to {TableName + int RowId} when it is a time to retrieve that data.
Or even keep in application layer {TableId, RowId} like you suggested.
I mean the key idea is to use int in database indexes instead of long.
I know that RDBMS starts to have serious issues at scale.
That's why I prefer to use int.
no subject
Date: 2015-10-02 02:04 am (UTC)Wrapping does not mean that Id is not available to programmers anymore, right?
no subject
Date: 2015-10-02 02:28 am (UTC)Yeah it's a matter of interpretation of author's post. You rarely need the actual value, until you need to debug or trouble-shoot manually. So, you make it such that people don't need to know what is inside and how it looks, until they really do. Also, for main entities like user or account or something that might need to be uniquely identifying some global container in human readable monthly reports or documents, int id is very useful, no questions there. No need to do long where int can suffice.