juan_gandhi: (Default)
Turned out all the delays were caused by two things:
- reading the files in parallel, as opposed to sequential (don't ask, will investigate)
- parallel output to stdout, instead of linearising the output. That was obviously my mistake, have to group the data, not mix them.

Now it's still 3 times faster than FP version. Got a suspicion that in JVM method dispatch is much heavier than plain if/else. It should be. But it's the staple of FP, never use booleans or ifs, but dispatch by the type. So, well... have to investigate.
juan_gandhi: (Default)
    // Set name from main class if not given
    name = Option(name).orElse(Option(mainClass)).orNull
    if (name == null && primaryResource != null) {
      name = Utils.stripDirectory(primaryResource)

Just a chunk of their funny shitty code.

FYI, var name: String = null

I'd rewrite all that crap, but it's not only code, I'm afraid. It's the whole Spark world that needs a doctor.
juan_gandhi: (VP)
case class a(i:Int)
  override def canEqual(a: Any) = a.isInstanceOf[a]

  override def equals(o:Any) = AnyRef.equals(o)

  override def hashCode = AnyRef.hashCode

val aa = new a(1)
aa == aa //false
juan_gandhi: (VP)
You know the difference in Scala between val f: A=>B and def f(a:A): B?

I do.

The first one is a point in BA; the second one is an arrow from A to B. Yes, there's an adjunction, so there's a 1-1 correspondence, thanks to Yoneda lemma; it's basically a special case of currying.
juan_gandhi: (VP)
Categories for Scala Programmers

Pretty primitive, but well, the target audience... just to dispel some myths and clear the people's conscience.


Oct. 15th, 2016 04:23 pm
juan_gandhi: (VP)
Just slapped together a deck of slides, category theory for scala programmers


Comments wholeheartedly welcome
juan_gandhi: (VP)
В документации по спарку увидел
transformer: DataFrame =[transform]=> DataFrame

Ни хрена не мог понять, а потом понял и шибко оценил. Классно же! Жаль, нельзя так по жизни. Очень жаль.
juan_gandhi: (VP)
   * Returns a Seq of the children of this node.
   * Children should not change. Immutability required for containsChild optimization
  def children: Seq[BaseType]

  lazy val containsChild: Set[TreeNode[_]] = children.toSet

Imagine, we override the class, but our children are not to be mutable. Why t.f. then not declare it val?!

Another masterpiece
object CurrentOrigin {
  private val value = new ThreadLocal[Origin]() {
    override def initialValue: Origin = Origin()

  def get: Origin = value.get()
  def set(o: Origin): Unit = value.set(o)

  def reset(): Unit = value.set(Origin())

  def setPosition(line: Int, start: Int): Unit = {
      value.get.copy(line = Some(line), startPosition = Some(start)))

  def withOrigin[A](o: Origin)(f: => A): A = {
    val ret = try f finally { reset() }

So, we have a static object that wants to do something with a context. (See withOrigin.) That's "dependency injection", Fortran style. And we can do without vars, because, well, it's ThreadLocal.

Who wrote all this... could they hire Scala programmers, I wonder...
juan_gandhi: (VP)
scala> class A[T] { var x:T = _ }
defined class A

scala> val aInt = new A[Int]
aInt: A[Int] = A@36f6e879

scala> aInt.x
res2: Int = 0

  Until you take a closer look:

scala> class A[T] { var x:T = _; println(x) }
defined class A

scala> val aInt = new A[Int]
aInt: A[Int] = A@3fee9989

scala> aInt.x
res3: Int = 0

null magically turns into 0.
juan_gandhi: (VP)
  def ×[A,B,C,D](f:A=>C)(g:B=>D): (A,B) => (C,D) = (a,b) => (f(a), g(b))

Big deal, right?
juan_gandhi: (VP)

  implicit class StreamOfResults[T](source: Stream[Result[T]]) {
    def |>[U](op: T ⇒ Result[U]) = source map (t ⇒ t flatMap op)
    def filter(p: T ⇒ Outcome) = source |> (x => p(x) andThen Good(x))
    def map[U](f: T ⇒ U) = source map (_ map f)

  implicit class StreamOfResults[T](source: Stream[Result[T]]) {
    def |>[U](op: T ⇒ Result[U]) = source map (t ⇒ t flatMap op)
    def filter(p: T ⇒ Result[_]) = source |> (x ⇒ p(x) returning x)
    def map[U](f: T ⇒ U) = source map (_ map f)

E.g. use case:
  // this method could be written in a couple of lines, but it's better to keep the details
  def streamOfNewEobSelectors(): StreamOfResults[Element] = {
    // this function pairs an html element with its content
    def withContent(e: Element): Result[(Element, String)] = e.outerHtml() map ((e, _))

    // here we have a stream of elements paired with their content
    val pairs: StreamOfResults[(Element, String)] = streamOfEobElements |> withContent

    // here we filter the stream, leaving only the elements containing new stuff
    // note that the stuff we don't need is not kicked out, it's labeled as bad with an explanation
    val newOnes: StreamOfResults[(Element, String)]] = pairs filter (p => isNewClaim(p._2))

    // here we forget the html
    newOnes map {case p:(Element, String) => p._1}

Note that filter does not take a boolean, it takes an Outcome, which is a logical value, from the logic that I'm trying to work on. It's not Boolean.
juan_gandhi: (VP)


So, spark gets a function to apply to data (e.g. in foreach, or in filter). Before running it, it (like a raccoon) cleans the function.

Enjoy the nice reading.


juan_gandhi: (Default)

February 2017

    1 2 3 4
5 6 7 8 9 10 11
1213 14 15 16 17 18
19 20 21 22232425


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 22nd, 2017 10:04 pm
Powered by Dreamwidth Studios