long id

Oct. 1st, 2015 02:02 pm
juan_gandhi: (VP)
[personal profile] juan_gandhi
I think I got it where all this bs about passing around numerical ids of entities instead of entity references (maybe lazy) come from. It's like 'error code'. It comes from the ancient c programming, where we just could not allocate a string for a readable piece of text, or for the data that may need some efforts to instantiate or allocate.

In short. It's stupid to pass around "ids" in a program.

Date: 2015-10-02 02:54 am (UTC)
dennisgorelik: 2020-06-13 in my home office (Default)
From: [personal profile] dennisgorelik
> unless you have something trivial like rolling events log

If you have 1B records, what do they represent if not something trivial like rolling events log?

What kind of physical server handles 1B records database?
1B long is 8GB just to store values of the index.

Date: 2015-10-02 06:41 am (UTC)
From: [identity profile] yatur.livejournal.com
> If you have 1B records, what do they represent

Anything happening frequently enough. E.g. phone calls. Or instant messages.

> What kind of physical server handles 1B records database?
> 1B long is 8GB just to store values of the index.

Dude, you have no idea. Do you seriously think database servers of this magnitude run on something like a MacBook Pro? 8GB is peanuts. Largest databases store petabytes of data.

I sometimes admire your zest when you start arguing about things you have very little experience with. Reminds me of my younger self :) Sorry for the ad hominem.

Date: 2015-10-02 09:56 am (UTC)
dennisgorelik: 2020-06-13 in my home office (Default)
From: [personal profile] dennisgorelik
> Largest databases store petabytes of data.

Are you suggesting that database size is as critical as index size?

Date: 2015-10-02 07:08 am (UTC)
From: [identity profile] exceeder.livejournal.com

1B business transactions. Sometimes tiny transactions in terms of money, but still, they require full traceability. I did stuff for telecom, bidding in ad tech, online screen sharing cloud for X11 3D engineering apps, medical teleradiology etc... Don't get me wrong, I'm not alone. There are a lot of smart people needed to plan things right for this scale.


Typical setup is master-master 64G 32 cpu cores each on SSD RAID. But then again, sometimes there are n of those pairs each per vertical partition set, some stuff goes to Mongo or Aerospike etc. It depends, case by case. You do get like couple dozen rdbms servers in usual production one way or another, no matter how much ppl love No SQL.

Date: 2015-10-02 10:02 am (UTC)
dennisgorelik: 2020-06-13 in my home office (Default)
From: [personal profile] dennisgorelik
Why did you create single 1B records table when you could create 20 * 50M records tables?

At that scale smaller tables typically have faster performance and easier maintenance.

Date: 2015-10-02 05:18 pm (UTC)
From: [identity profile] exceeder.livejournal.com
Ok, say, you partition it vertically (in real world, good luck finding a good way to partition stuff in the first place). But say, it will be just by id from a sequence. And then what? tableId+rowId everywhere? Wouldn't it be much easier to, e.g. have tableId = rowId % 20? But then you need long again, no?

In reality, there is both, vertical partitioning (as you proposed) and sharding (managed by db engine, mostly) employed - but these should be used sparingly and avoided whenever reasonably possible.

This here is a wrong place to discuss this stuff anyways. In the world out there things always turn out way more complex than the books and blogs suggest. Partitioning breaks things, A LOT. But sometimes cannot be avoided after certain size.

I don't mean to insult you in any way, I am sure you are an awesome professional with inquiring mind, I was naive too, but with my first real 4 TB database and over 1k transactions per second came a lot of revelations.

E.g. suddenly awesomely stable, vetted for years db engines turn into pumpkins at 12am -just put some real load, not the kind you can generate from a single test server :(

Cheers!

Date: 2015-10-02 08:33 pm (UTC)
dennisgorelik: 2020-06-13 in my home office (Default)
From: [personal profile] dennisgorelik
> tableId+rowId everywhere?

It could be long in application layer (C#, Java), but then would be converted by algorithm to {TableName + int RowId} when it is a time to retrieve that data.

Or even keep in application layer {TableId, RowId} like you suggested.

I mean the key idea is to use int in database indexes instead of long.


I know that RDBMS starts to have serious issues at scale.
That's why I prefer to use int.
Edited Date: 2015-10-02 08:41 pm (UTC)

Profile

juan_gandhi: (Default)
Juan-Carlos Gandhi

June 2025

S M T W T F S
1 2345 6 7
8 9 10 11 121314
15161718 1920 21
22232425262728
2930     

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 28th, 2025 03:16 pm
Powered by Dreamwidth Studios