Aug. 2nd, 2011

juan_gandhi: (Default)

    public final static Pattern CRLF = Pattern.compile("\n\r|\r\n|\n");
    
    /**
     * Iterates over the lines of input.
     * 
     * Usage example:
     *    for (String s : iterate(new FileReader("system.properties")) {
     *       ...
     *    }
     *    
     * @param in
     * @return an iterable
     */
    public static Iterable iterate(final Readable in) {
        return iterate(in, CRLF);
    }
    
    /**
     * Iterates over the lines of input.
     * 
     * Usage example:
     *    for (String s : iterate(new FileReader("system.properties", "\n")) {
     *       ...
     *    }
     * 
     * @param in
     * @param delimiter
     * @return
     */
    public static Iterable iterate(final Readable in, String delimiter) {
        return iterate(in, Pattern.compile(delimiter));
    }
    
    /**
     * Iterates over the lines of input.
     * 
     * Usage example:
     *    for (String s : iterate(new FileReader("system.properties", Readables.CRLF)) {
     *       ...
     *    }
     * 
     * @param in
     * @param delimiter
     * @return
     */
    public static Iterable iterate(final Readable in, final Pattern delimiter) {
        return new Iterable() {

            @Override
            public Iterator iterator() {
                return new Scanner(in).useDelimiter(delimiter);
            }};
    }


10x lj user="sassa_nf"
juan_gandhi: (Default)

    public final static Pattern CRLF = Pattern.compile("\n\r|\r\n|\n");
    
    /**
     * Iterates over the lines of input.
     * 
     * Usage example:
     *    for (String s : iterate(new FileReader("system.properties")) {
     *       ...
     *    }
     *    
     * @param in
     * @return an iterable
     */
    public static Iterable<String> iterate(final Readable in) {
        return iterate(in, CRLF);
    }
    
    /**
     * Iterates over the lines of input.
     * 
     * Usage example:
     *    for (String s : iterate(new FileReader("system.properties", "\n")) {
     *       ...
     *    }
     * 
     * @param in
     * @param delimiter
     * @return
     */
    public static Iterable iterate(final Readable in, String delimiter) {
        return iterate(in, Pattern.compile(delimiter));
    }
    
    /**
     * Iterates over the lines of input.
     * 
     * Usage example:
     *    for (String s : iterate(new FileReader("system.properties", Readables.CRLF)) {
     *       ...
     *    }
     * 
     * @param in
     * @param delimiter
     * @return
     */
    public static Iterable<String> iterate(final Readable in, final Pattern delimiter) {
        return new Iterable() {

            @Override
            public Iterator<String> iterator() {
                return new Scanner(in).useDelimiter(delimiter);
            }};
    }


10x lj user="sassa_nf"
juan_gandhi: (Default)


(10x lj user= mgsupgs)
juan_gandhi: (Default)
So we played with storing a bunch of more or less "static" data in HBase; 1.m records; reading them takes 20 minutes (since it's wiring, converting to strings, converting to maps, parsing into objects, caching the index in memory. Then for each key we'd index it (finding a range) and go to hbase to retrieve the data; if there are 10k keys in a batch, kaboom, it takes forever, for an innocent user request (which involves a third-party service to return 10k records).

So we decided to store all our data in memory, reading it from a plain csv file.

It flies, 20' for reading, merging records that can be merged, and storing in a TreeSet. I love treesets, prefix search works naturally, and the whole paraphernalia is just several hundreds of verbose java code.

Like this:

   @Override
    public Map<String, Entity> locate(Iterable<String> keys, Features features) {
        Map<String, Entity> result = Maps.newHashMap();
        
        for (String key : keys) {
            SortedSet<Entity> before = entitiesBefore(key);
            
            if (!before.isEmpty()) {
                Entity candidate = before.last();
                
                if (candidate.matches(key, features)) {
                    result.put(key, candidate);
                }
            }
        }
        return result;
    }


    private SortedSet<Entity> entitiesBefore(String key) {
        return content.headSet(new Entity(key));
    }



I love treesets! So nifty.
juan_gandhi: (Default)
So we played with storing a bunch of more or less "static" data in HBase; 1.m records; reading them takes 20 minutes (since it's wiring, converting to strings, converting to maps, parsing into objects, caching the index in memory. Then for each key we'd index it (finding a range) and go to hbase to retrieve the data; if there are 10k keys in a batch, kaboom, it takes forever, for an innocent user request (which involves a third-party service to return 10k records).

So we decided to store all our data in memory, reading it from a plain csv file.

It flies, 20' for reading, merging records that can be merged, and storing in a TreeSet. I love treesets, prefix search works naturally, and the whole paraphernalia is just several hundreds of verbose java code.

Like this:

   @Override
    public Map<String, Entity> locate(Iterable<String> keys, Features features) {
        Map<String, Entity> result = Maps.newHashMap();
        
        for (String key : keys) {
            SortedSet<Entity> before = entitiesBefore(key);
            
            if (!before.isEmpty()) {
                Entity candidate = before.last();
                
                if (candidate.matches(key, features)) {
                    result.put(key, candidate);
                }
            }
        }
        return result;
    }


    private SortedSet<Entity> entitiesBefore(String key) {
        return content.headSet(new Entity(key));
    }



I love treesets! So nifty.

Profile

juan_gandhi: (Default)
Juan-Carlos Gandhi

August 2025

S M T W T F S
      12
3456789
10 11 12 13141516
17181920212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 17th, 2025 09:40 am
Powered by Dreamwidth Studios