juan_gandhi: (VP)
I have an applicative Result class.

Result[A] <*> Result[B] gives a Result[(A,B)].

Now Csaba, who does not like this ascii soup, and does not like the phrase "tensor product" either (do you see tensors around here?) suggested to call this construct 'andAlso'.

I like this idea a lot.

  for ((name, dob) <- findValue(html, “user”) andAlso findValue(html, “date of birth”))
juan_gandhi: (VP)
  def downloadPDF(url: String): Result[(File, String)] = {
    loadPage(url) andThen
    waitForSelector("div.textLayer") andThen 
    runJS("return extractPdfContent()") andThen {
      Thread.sleep(1000) // give browser a chance
      val extracted = runJS("return intBuf2hex(extractedPdf)") map (_.toString)
      val pdf = extracted flatMap 
        (_.decodeHex #> File.createTempFile("download", ".pdf"))
      
      val html = runJS("return _$('div.textLayer').innerHTML") map (_.toString)
      pdf <*> html
    }
  }


What happens here.
I load a page in Mozilla, via Selenium. Actually a pdf, but Mozilla pretends it's html.
andThen... (meaning, if it failed, no need to proceed, right?)
Then I extract innerHTML of the content div, I need to parse it.
Oh, the _.js means we convert the value of this string into a Javascript representation of the string, with apos escaped, wrapped in apos.
But what the server sent is actually a pdf (rendered in Mozilla by pdf.js);
So I need the pdf binary. It was https, and there's no api for interception.
So I go into the guts of pdf.js, find the holder of the binary, and tell it to give me the bytes (in a continuation). But the whole communication with the browser is imperative; so I sleep for a second. No biggie.

When I wake up, the bytes I need are already pulled from pdf.js future and converted to a hex string (like 160k of text, one line).
I extract it from the browser.
Then I decode the hexes, producing bytes, and send them to a temp file; the #> op returns the file... actually, monadically, a hope for a file.
There's flatMap here; we flatten all hopes within hopes into one big hope - or an explanation of why everything fell apart.

Now we have, hopefully, a text, and, hopefully, a pdf. We apply tensor product to produce either a long list of explanations why we failed, or a tuple, a pair (text, file).
QED.

Questions? Obvious?

Profile

juan_gandhi: (Default)
Juan-Carlos Gandhi

November 2025

S M T W T F S
       1
23456 7 8
9 1011 12 1314 15
16171819 20 2122
23 24 252627 28 29
30      

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Nov. 30th, 2025 11:26 am
Powered by Dreamwidth Studios