juan_gandhi | long id

You're viewing

juan_gandhi's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

I think I got it where all this bs about passing around numerical ids of entities instead of entity references (maybe lazy) come from. It's like 'error code'. It comes from the ancient c programming, where we just could not allocate a string for a readable piece of text, or for the data that may need some efforts to instantiate or allocate.

In short. It's stupid to pass around "ids" in a program.

Flat | Top-Level Comments Only

From:

exceeder.livejournal.com

Author is right. It is convenient to have a class wrapping the Id that knows how to handle itself. It is not so convenient as to switch existing large codebase, but was convenient enough that I released projects over the years into production and it stuck after I left.
Also MaybeId and mix-ins for JSON serialization work well. To see the actual value just have toString() implementation and move the mouse over.

As for int as id type - in most of my projects it's not enough. Burns to MAX_INTEGER within a year, easily.

If you create 70 persistent entities per second, 1 year is enough to get to 2.1B. In a lot of cases these days it is more like 1k per second. And even MySQL can handle it, with replication and such. So, definitely long.

Edited Date: 2015-10-02 01:38 am (UTC)

From:

dennisgorelik

> Burns to MAX_INTEGER within a year, easily.

So you reuse int Ids then.

Or do you mean that you keep all 2B records in your database?

If you have 2B/year volume then you probably have performance issues to solve.
long->int conversion doubles table index performance and halves index size.

From:

exceeder.livejournal.com

They are sharded and/or vertically partitioned later. Some are removed, but, say, a billion or so sticks for a year, in one of the cases. Reuse is much much more expensive in business logic layer (unless you have something trivial like rolling events log) and sometimes causes terrible, hard to catch issues. Indexes are slower, but not twice. I can't say there are never any performance problems, but I never had one related to the size of id.
In couple of systems where I knew I will have to master-master synchronize parts of the database across many offices (rather, data centers) in multiple world locations I even did 128 bit UUIDs as ids right away. Never had an issue.

Edited Date: 2015-10-02 02:12 am (UTC)

From:

dennisgorelik

> unless you have something trivial like rolling events log

If you have 1B records, what do they represent if not something trivial like rolling events log?

What kind of physical server handles 1B records database?
1B long is 8GB just to store values of the index.

From:

yatur.livejournal.com

> If you have 1B records, what do they represent

Anything happening frequently enough. E.g. phone calls. Or instant messages.

> What kind of physical server handles 1B records database?
> 1B long is 8GB just to store values of the index.

Dude, you have no idea. Do you seriously think database servers of this magnitude run on something like a MacBook Pro? 8GB is peanuts. Largest databases store petabytes of data.

I sometimes admire your zest when you start arguing about things you have very little experience with. Reminds me of my younger self :) Sorry for the ad hominem.

From:

dennisgorelik

> Largest databases store petabytes of data.

Are you suggesting that database size is as critical as index size?

From:

exceeder.livejournal.com

1B business transactions. Sometimes tiny transactions in terms of money, but still, they require full traceability. I did stuff for telecom, bidding in ad tech, online screen sharing cloud for X11 3D engineering apps, medical teleradiology etc... Don't get me wrong, I'm not alone. There are a lot of smart people needed to plan things right for this scale.

Typical setup is master-master 64G 32 cpu cores each on SSD RAID. But then again, sometimes there are n of those pairs each per vertical partition set, some stuff goes to Mongo or Aerospike etc. It depends, case by case. You do get like couple dozen rdbms servers in usual production one way or another, no matter how much ppl love No SQL.

From:

dennisgorelik

Why did you create single 1B records table when you could create 20 * 50M records tables?

At that scale smaller tables typically have faster performance and easier maintenance.

From:

exceeder.livejournal.com

Ok, say, you partition it vertically (in real world, good luck finding a good way to partition stuff in the first place). But say, it will be just by id from a sequence. And then what? tableId+rowId everywhere? Wouldn't it be much easier to, e.g. have tableId = rowId % 20? But then you need long again, no?

In reality, there is both, vertical partitioning (as you proposed) and sharding (managed by db engine, mostly) employed - but these should be used sparingly and avoided whenever reasonably possible.

This here is a wrong place to discuss this stuff anyways. In the world out there things always turn out way more complex than the books and blogs suggest. Partitioning breaks things, A LOT. But sometimes cannot be avoided after certain size.

I don't mean to insult you in any way, I am sure you are an awesome professional with inquiring mind, I was naive too, but with my first real 4 TB database and over 1k transactions per second came a lot of revelations.

E.g. suddenly awesomely stable, vetted for years db engines turn into pumpkins at 12am -just put some real load, not the kind you can generate from a single test server :(

Cheers!

From:

dennisgorelik

> tableId+rowId everywhere?

It could be long in application layer (C#, Java), but then would be converted by algorithm to {TableName + int RowId} when it is a time to retrieve that data.

Or even keep in application layer {TableId, RowId} like you suggested.

I mean the key idea is to use int in database indexes instead of long.

I know that RDBMS starts to have serious issues at scale.
That's why I prefer to use int.

Edited Date: 2015-10-02 08:41 pm (UTC)

From:

dennisgorelik

> It is convenient to have a class wrapping the Id

Wrapping does not mean that Id is not available to programmers anymore, right?

From:

exceeder.livejournal.com

Yeah it's a matter of interpretation of author's post. You rarely need the actual value, until you need to debug or trouble-shoot manually. So, you make it such that people don't need to know what is inside and how it looks, until they really do. Also, for main entities like user or account or something that might need to be uniquely identifying some global container in human readable monthly reports or documents, int id is very useful, no questions there. No need to do long where int can suffice.

Flat | Top-Level Comments Only

Profile

Juan-Carlos Gandhi

patryshev.com

June 2025

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Page Summary

exceeder.livejournal.com - (no subject)

Active Entries

1: утро в нашей деревне
2: сяу

Style Credit

Style: Neutral Good for Practicality by timeasmymeasure

Expand Cut Tags

No cut tags

Page generated Jun. 30th, 2025 08:49 am

Observations

Views from Souths

long id

long id

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

June 2025

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags