Apr. 20th, 2012

juan_gandhi: (Default)
Belmont to San Jose via Santa Cruz, anybody?

05/19; staying overnight in Santa Cruz (I'm not that good to do it in one day).

Пионерские галстуки брать с собой.
juan_gandhi: (Default)
github

So, I think I have to drop my attempts to civilize the java version. This one is way too elegant, reasonable and functional (in both senses). Go ahead, use it.

class Tutorial3(args : Args) extends Job(args) {

  /**
  We can ask args for the --input argument from the command line.
  If it's missing, we'll get an error.
  **/
  val input = TextLine(args("input"))
  val output = TextLine("tutorial/data/output3.txt")

  input
    .read

    /**
    flatMap is like map, but instead of returning a single item from the
    function, we return a collection of items. Each of these items will create
    a new entry in the data stream; here, we'll end up with a new entry for each word.
    **/

    .flatMap('line -> 'word){ line : String => line.split("\\s")}

    /**
    We still want to project just the 'word field for our final output.
    For interest, though, let's stash a copy of the data before we do that.
    write() returns the pipe, so we can keep chaining our pipeline.
    **/

    .write(Tsv("tutorial/data/tmp3.tsv"))
    .project('word)
    .write(output)
}

Profile

juan_gandhi: (Default)
Juan-Carlos Gandhi

August 2025

S M T W T F S
      12
3456789
10 11 12 13141516
17181920212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 20th, 2025 03:03 pm
Powered by Dreamwidth Studios