juan_gandhi: (Default)
In spark.core, RNG, specifically, normal distribution RNG.

It caches values (randomly). Now try to reseed. 


Jun. 15th, 2017 03:34 pm
juan_gandhi: (Default)
There's a cogroup in data science. It consists of groupBy followed by outer join (in this case we actually will have a pullback).

"You can COGROUP up to but no more than 127 relations at a time" says Google.

juan_gandhi: (Default)
"Spark is faster than map/reduce". A guy working at Databricks is giving a talk regarding how Spark works.

Omfg; how can one possibly deal with all this?

UPD. From the same talk. cartesion()
juan_gandhi: (Default)
    // Set name from main class if not given
    name = Option(name).orElse(Option(mainClass)).orNull
    if (name == null && primaryResource != null) {
      name = Utils.stripDirectory(primaryResource)

Just a chunk of their funny shitty code.

FYI, var name: String = null

I'd rewrite all that crap, but it's not only code, I'm afraid. It's the whole Spark world that needs a doctor.
juan_gandhi: (VP)
   * Returns a Seq of the children of this node.
   * Children should not change. Immutability required for containsChild optimization
  def children: Seq[BaseType]

  lazy val containsChild: Set[TreeNode[_]] = children.toSet

Imagine, we override the class, but our children are not to be mutable. Why t.f. then not declare it val?!

Another masterpiece
object CurrentOrigin {
  private val value = new ThreadLocal[Origin]() {
    override def initialValue: Origin = Origin()

  def get: Origin = value.get()
  def set(o: Origin): Unit = value.set(o)

  def reset(): Unit = value.set(Origin())

  def setPosition(line: Int, start: Int): Unit = {
      value.get.copy(line = Some(line), startPosition = Some(start)))

  def withOrigin[A](o: Origin)(f: => A): A = {
    val ret = try f finally { reset() }

So, we have a static object that wants to do something with a context. (See withOrigin.) That's "dependency injection", Fortran style. And we can do without vars, because, well, it's ThreadLocal.

Who wrote all this... could they hire Scala programmers, I wonder...
juan_gandhi: (VP)


So, spark gets a function to apply to data (e.g. in foreach, or in filter). Before running it, it (like a raccoon) cleans the function.

Enjoy the nice reading.


juan_gandhi: (Default)

September 2017

      1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
1718 1920 21 2223


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 23rd, 2017 12:12 am
Powered by Dreamwidth Studios