juan_gandhi: (Default)
http://dawn.cs.stanford.edu/ 

"DAWN is a five-year research project to democratize AI by making it dramatically easier to build AI-powered applications.

Our past research–from Spark to MesosDeepDive, and HogWild!–already powers major functionality all over Silicon Valley–and the world. Between fighting against human traffickingassisting in cancer diagnosis and performing high-throughput genome sequencing, we’ve invested heavily in tools for AI and data product development.

The next step is to make these tools more efficient and more accessible, from training set creation and model design to monitoring, efficient execution, and hardware-efficient implementation. This technology holds the power to change science and society—and we’re creating this change with partners throughout campus and beyond."

juan_gandhi: (Default)
override def eval(input: InternalRow): Any = {
  System.currentTimeMillis() * 1000L
}
juan_gandhi: (Default)
In spark.core, RNG, specifically, normal distribution RNG.

It caches values (randomly). Now try to reseed. 

TIL

Jun. 15th, 2017 03:34 pm
juan_gandhi: (Default)
There's a cogroup in data science. It consists of groupBy followed by outer join (in this case we actually will have a pullback).

"You can COGROUP up to but no more than 127 relations at a time" says Google.

juan_gandhi: (Default)
"Spark is faster than map/reduce". A guy working at Databricks is giving a talk regarding how Spark works.

Omfg; how can one possibly deal with all this?

UPD. From the same talk. cartesion()
juan_gandhi: (Default)
    // Set name from main class if not given
    name = Option(name).orElse(Option(mainClass)).orNull
    if (name == null && primaryResource != null) {
      name = Utils.stripDirectory(primaryResource)
    }


Just a chunk of their funny shitty code.

FYI, var name: String = null

I'd rewrite all that crap, but it's not only code, I'm afraid. It's the whole Spark world that needs a doctor.
juan_gandhi: (VP)
  /**
   * Returns a Seq of the children of this node.
   * Children should not change. Immutability required for containsChild optimization
   */
  def children: Seq[BaseType]

  lazy val containsChild: Set[TreeNode[_]] = children.toSet


Imagine, we override the class, but our children are not to be mutable. Why t.f. then not declare it val?!

Another masterpiece
object CurrentOrigin {
  private val value = new ThreadLocal[Origin]() {
    override def initialValue: Origin = Origin()
  }

  def get: Origin = value.get()
  def set(o: Origin): Unit = value.set(o)

  def reset(): Unit = value.set(Origin())

  def setPosition(line: Int, start: Int): Unit = {
    value.set(
      value.get.copy(line = Some(line), startPosition = Some(start)))
  }

  def withOrigin[A](o: Origin)(f: => A): A = {
    set(o)
    val ret = try f finally { reset() }
    reset()
    ret
  }
}


So, we have a static object that wants to do something with a context. (See withOrigin.) That's "dependency injection", Fortran style. And we can do without vars, because, well, it's ThreadLocal.

Who wrote all this... could they hire Scala programmers, I wonder...
juan_gandhi: (VP)
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala

ClosureCleaner

So, spark gets a function to apply to data (e.g. in foreach, or in filter). Before running it, it (like a raccoon) cleans the function.

Enjoy the nice reading.

Profile

juan_gandhi: (Default)
Juan-Carlos Gandhi

June 2025

S M T W T F S
1 2345 6 7
8 9 10 11 121314
15161718 1920 21
222324252627 28
29 30     

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 7th, 2025 08:53 am
Powered by Dreamwidth Studios