juan_gandhi: (Default)
Turned out all the delays were caused by two things:
- reading the files in parallel, as opposed to sequential (don't ask, will investigate)
- parallel output to stdout, instead of linearising the output. That was obviously my mistake, have to group the data, not mix them.

Now it's still 3 times faster than FP version. Got a suspicion that in JVM method dispatch is much heavier than plain if/else. It should be. But it's the staple of FP, never use booleans or ifs, but dispatch by the type. So, well... have to investigate.
juan_gandhi: (Default)
    // Set name from main class if not given
    name = Option(name).orElse(Option(mainClass)).orNull
    if (name == null && primaryResource != null) {
      name = Utils.stripDirectory(primaryResource)
    }


Just a chunk of their funny shitty code.

FYI, var name: String = null

I'd rewrite all that crap, but it's not only code, I'm afraid. It's the whole Spark world that needs a doctor.
juan_gandhi: (VP)
case class a(i:Int)
{
  override def canEqual(a: Any) = a.isInstanceOf[a]

  override def equals(o:Any) = AnyRef.equals(o)

  override def hashCode = AnyRef.hashCode
}

val aa = new a(1)
aa == aa //false
juan_gandhi: (VP)
You know the difference in Scala between val f: A=>B and def f(a:A): B?

I do.

The first one is a point in BA; the second one is an arrow from A to B. Yes, there's an adjunction, so there's a 1-1 correspondence, thanks to Yoneda lemma; it's basically a special case of currying.
juan_gandhi: (VP)
Categories for Scala Programmers

Pretty primitive, but well, the target audience... just to dispel some myths and clear the people's conscience.

ct4scala

Oct. 15th, 2016 04:23 pm
juan_gandhi: (VP)
Just slapped together a deck of slides, category theory for scala programmers

http://tinyurl.com/ct4scala

Comments wholeheartedly welcome
juan_gandhi: (VP)
В документации по спарку увидел
transformer: DataFrame =[transform]=> DataFrame

Ни хрена не мог понять, а потом понял и шибко оценил. Классно же! Жаль, нельзя так по жизни. Очень жаль.
juan_gandhi: (VP)
  /**
   * Returns a Seq of the children of this node.
   * Children should not change. Immutability required for containsChild optimization
   */
  def children: Seq[BaseType]

  lazy val containsChild: Set[TreeNode[_]] = children.toSet


Imagine, we override the class, but our children are not to be mutable. Why t.f. then not declare it val?!

Another masterpiece
object CurrentOrigin {
  private val value = new ThreadLocal[Origin]() {
    override def initialValue: Origin = Origin()
  }

  def get: Origin = value.get()
  def set(o: Origin): Unit = value.set(o)

  def reset(): Unit = value.set(Origin())

  def setPosition(line: Int, start: Int): Unit = {
    value.set(
      value.get.copy(line = Some(line), startPosition = Some(start)))
  }

  def withOrigin[A](o: Origin)(f: => A): A = {
    set(o)
    val ret = try f finally { reset() }
    reset()
    ret
  }
}


So, we have a static object that wants to do something with a context. (See withOrigin.) That's "dependency injection", Fortran style. And we can do without vars, because, well, it's ThreadLocal.

Who wrote all this... could they hire Scala programmers, I wonder...
juan_gandhi: (VP)
scala> class A[T] { var x:T = _ }
defined class A

scala> val aInt = new A[Int]
aInt: A[Int] = A@36f6e879

scala> aInt.x
res2: Int = 0

  Until you take a closer look:

scala> class A[T] { var x:T = _; println(x) }
defined class A

scala> val aInt = new A[Int]
null
aInt: A[Int] = A@3fee9989

scala> aInt.x
res3: Int = 0


null magically turns into 0.
juan_gandhi: (VP)
  def ×[A,B,C,D](f:A=>C)(g:B=>D): (A,B) => (C,D) = (a,b) => (f(a), g(b))


Big deal, right?
juan_gandhi: (VP)

  implicit class StreamOfResults[T](source: Stream[Result[T]]) {
    def |>[U](op: T ⇒ Result[U]) = source map (t ⇒ t flatMap op)
    def filter(p: T ⇒ Outcome) = source |> (x => p(x) andThen Good(x))
    def map[U](f: T ⇒ U) = source map (_ map f)
  }


  implicit class StreamOfResults[T](source: Stream[Result[T]]) {
    def |>[U](op: T ⇒ Result[U]) = source map (t ⇒ t flatMap op)
    def filter(p: T ⇒ Result[_]) = source |> (x ⇒ p(x) returning x)
    def map[U](f: T ⇒ U) = source map (_ map f)
  }


E.g. use case:
  // this method could be written in a couple of lines, but it's better to keep the details
  def streamOfNewEobSelectors(): StreamOfResults[Element] = {
    // this function pairs an html element with its content
    def withContent(e: Element): Result[(Element, String)] = e.outerHtml() map ((e, _))

    // here we have a stream of elements paired with their content
    val pairs: StreamOfResults[(Element, String)] = streamOfEobElements |> withContent

    // here we filter the stream, leaving only the elements containing new stuff
    // note that the stuff we don't need is not kicked out, it's labeled as bad with an explanation
    val newOnes: StreamOfResults[(Element, String)]] = pairs filter (p => isNewClaim(p._2))

    // here we forget the html
    newOnes map {case p:(Element, String) => p._1}
  }


Note that filter does not take a boolean, it takes an Outcome, which is a logical value, from the logic that I'm trying to work on. It's not Boolean.
juan_gandhi: (VP)
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala

ClosureCleaner

So, spark gets a function to apply to data (e.g. in foreach, or in filter). Before running it, it (like a raccoon) cleans the function.

Enjoy the nice reading.

Profile

juan_gandhi: (Default)
juan_gandhi

February 2017

S M T W T F S
    1 2 3 4
5 6 7 8 9 10 11
1213 14 15 16 17 18
19 20 21 22232425
262728    

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 22nd, 2017 10:04 pm
Powered by Dreamwidth Studios