juan_gandhi: (Default)
2017-12-30 10:13 am
Entry tags:

dawn project

http://dawn.cs.stanford.edu/ 

"DAWN is a five-year research project to democratize AI by making it dramatically easier to build AI-powered applications.

Our past research–from Spark to MesosDeepDive, and HogWild!–already powers major functionality all over Silicon Valley–and the world. Between fighting against human traffickingassisting in cancer diagnosis and performing high-throughput genome sequencing, we’ve invested heavily in tools for AI and data product development.

The next step is to make these tools more efficient and more accessible, from training set creation and model design to monitoring, efficient execution, and hardware-efficient implementation. This technology holds the power to change science and society—and we’re creating this change with partners throughout campus and beyond."

juan_gandhi: (Default)
2017-11-27 10:09 am
Entry tags:

quoting Spark

override def eval(input: InternalRow): Any = {
  System.currentTimeMillis() * 1000L
}
juan_gandhi: (Default)
2017-09-21 03:01 pm
Entry tags:

just found a bug...

In spark.core, RNG, specifically, normal distribution RNG.

It caches values (randomly). Now try to reseed. 
juan_gandhi: (Default)
2017-06-15 03:34 pm
Entry tags:

TIL

There's a cogroup in data science. It consists of groupBy followed by outer join (in this case we actually will have a pullback).

"You can COGROUP up to but no more than 127 relations at a time" says Google.

juan_gandhi: (Default)
2017-06-14 11:12 am
Entry tags:

just learned that...

"Spark is faster than map/reduce". A guy working at Databricks is giving a talk regarding how Spark works.

Omfg; how can one possibly deal with all this?

UPD. From the same talk. cartesion()
juan_gandhi: (Default)
2017-01-31 03:04 pm
Entry tags:

in case you are interested in Spark...

    // Set name from main class if not given
    name = Option(name).orElse(Option(mainClass)).orNull
    if (name == null && primaryResource != null) {
      name = Utils.stripDirectory(primaryResource)
    }


Just a chunk of their funny shitty code.

FYI, var name: String = null

I'd rewrite all that crap, but it's not only code, I'm afraid. It's the whole Spark world that needs a doctor.
juan_gandhi: (VP)
2016-09-26 03:42 pm
Entry tags:

spark masterpieces

  /**
   * Returns a Seq of the children of this node.
   * Children should not change. Immutability required for containsChild optimization
   */
  def children: Seq[BaseType]

  lazy val containsChild: Set[TreeNode[_]] = children.toSet


Imagine, we override the class, but our children are not to be mutable. Why t.f. then not declare it val?!

Another masterpiece
object CurrentOrigin {
  private val value = new ThreadLocal[Origin]() {
    override def initialValue: Origin = Origin()
  }

  def get: Origin = value.get()
  def set(o: Origin): Unit = value.set(o)

  def reset(): Unit = value.set(Origin())

  def setPosition(line: Int, start: Int): Unit = {
    value.set(
      value.get.copy(line = Some(line), startPosition = Some(start)))
  }

  def withOrigin[A](o: Origin)(f: => A): A = {
    set(o)
    val ret = try f finally { reset() }
    reset()
    ret
  }
}


So, we have a static object that wants to do something with a context. (See withOrigin.) That's "dependency injection", Fortran style. And we can do without vars, because, well, it's ThreadLocal.

Who wrote all this... could they hire Scala programmers, I wonder...
juan_gandhi: (VP)
2016-07-30 07:42 pm
Entry tags:

an amazingly funny piece of code

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala

ClosureCleaner

So, spark gets a function to apply to data (e.g. in foreach, or in filter). Before running it, it (like a raccoon) cleans the function.

Enjoy the nice reading.