juan_gandhi: (Default)
[personal profile] juan_gandhi
So we played with storing a bunch of more or less "static" data in HBase; 1.m records; reading them takes 20 minutes (since it's wiring, converting to strings, converting to maps, parsing into objects, caching the index in memory. Then for each key we'd index it (finding a range) and go to hbase to retrieve the data; if there are 10k keys in a batch, kaboom, it takes forever, for an innocent user request (which involves a third-party service to return 10k records).

So we decided to store all our data in memory, reading it from a plain csv file.

It flies, 20' for reading, merging records that can be merged, and storing in a TreeSet. I love treesets, prefix search works naturally, and the whole paraphernalia is just several hundreds of verbose java code.

Like this:

   @Override
    public Map<String, Entity> locate(Iterable<String> keys, Features features) {
        Map<String, Entity> result = Maps.newHashMap();
        
        for (String key : keys) {
            SortedSet<Entity> before = entitiesBefore(key);
            
            if (!before.isEmpty()) {
                Entity candidate = before.last();
                
                if (candidate.matches(key, features)) {
                    result.put(key, candidate);
                }
            }
        }
        return result;
    }


    private SortedSet<Entity> entitiesBefore(String key) {
        return content.headSet(new Entity(key));
    }



I love treesets! So nifty.

Profile

juan_gandhi: (Default)
Juan-Carlos Gandhi

August 2025

S M T W T F S
      12
3456789
10 11 12 13141516
171819 20212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 22nd, 2025 04:06 am
Powered by Dreamwidth Studios