Entries tagged with spark

Entry tags:

ai,
ml,
rust,
spark

dawn project

http://dawn.cs.stanford.edu/

"DAWN is a five-year research project to democratize AI by making it dramatically easier to build AI-powered applications.

Our past research–from Spark to Mesos, DeepDive, and HogWild!–already powers major functionality all over Silicon Valley–and the world. Between fighting against human trafficking, assisting in cancer diagnosis and performing high-throughput genome sequencing, we’ve invested heavily in tools for AI and data product development.

The next step is to make these tools more efficient and more accessible, from training set creation and model design to monitoring, efficient execution, and hardware-efficient implementation. This technology holds the power to change science and society—and we’re creating this change with partners throughout campus and beyond."

Entry tags:

nuf said,
spark

quoting Spark

override def eval(input: InternalRow): Any = {
  System.currentTimeMillis() * 1000L
}

Entry tags:

just found a bug...

In spark.core, RNG, specifically, normal distribution RNG.

It caches values (randomly). Now try to reseed.

Entry tags:

TIL

There's a cogroup in data science. It consists of groupBy followed by outer join (in this case we actually will have a pullback).

"You can COGROUP up to but no more than 127 relations at a time" says Google.

Entry tags:

idiots,
spark

just learned that...

"Spark is faster than map/reduce". A guy working at Databricks is giving a talk regarding how Spark works.

Omfg; how can one possibly deal with all this?

UPD. From the same talk. cartesion()

Entry tags:

scala,
spark

in case you are interested in Spark...

    // Set name from main class if not given
    name = Option(name).orElse(Option(mainClass)).orNull
    if (name == null && primaryResource != null) {
      name = Utils.stripDirectory(primaryResource)
    }

Just a chunk of their funny shitty code.

FYI, var name: String = null

I'd rewrite all that crap, but it's not only code, I'm afraid. It's the whole Spark world that needs a doctor.

Entry tags:

scala,
spark

spark masterpieces

  /**
   * Returns a Seq of the children of this node.
   * Children should not change. Immutability required for containsChild optimization
   */
  def children: Seq[BaseType]

  lazy val containsChild: Set[TreeNode[_]] = children.toSet

Imagine, we override the class, but our children are not to be mutable. Why t.f. then not declare it val?!

Another masterpiece

object CurrentOrigin {
  private val value = new ThreadLocal[Origin]() {
    override def initialValue: Origin = Origin()
  }

  def get: Origin = value.get()
  def set(o: Origin): Unit = value.set(o)

  def reset(): Unit = value.set(Origin())

  def setPosition(line: Int, start: Int): Unit = {
    value.set(
      value.get.copy(line = Some(line), startPosition = Some(start)))
  }

  def withOrigin[A](o: Origin)(f: => A): A = {
    set(o)
    val ret = try f finally { reset() }
    reset()
    ret
  }
}

So, we have a static object that wants to do something with a context. (See withOrigin.) That's "dependency injection", Fortran style. And we can do without vars, because, well, it's ThreadLocal.

Who wrote all this... could they hire Scala programmers, I wonder...

Entry tags:

scala,
spark

an amazingly funny piece of code

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala

ClosureCleaner

So, spark gets a function to apply to data (e.g. in foreach, or in filter). Before running it, it (like a raccoon) cleans the function.

Enjoy the nice reading.