juan_gandhi: (VP)
I have an applicative Result class.

Result[A] <*> Result[B] gives a Result[(A,B)].

Now Csaba, who does not like this ascii soup, and does not like the phrase "tensor product" either (do you see tensors around here?) suggested to call this construct 'andAlso'.

I like this idea a lot.

  for ((name, dob) <- findValue(html, “user”) andAlso findValue(html, “date of birth”))
juan_gandhi: (VP)
  def downloadPDF(url: String): Result[(File, String)] = {
    loadPage(url) andThen
    waitForSelector("div.textLayer") andThen 
    runJS("return extractPdfContent()") andThen {
      Thread.sleep(1000) // give browser a chance
      val extracted = runJS("return intBuf2hex(extractedPdf)") map (_.toString)
      val pdf = extracted flatMap 
        (_.decodeHex #> File.createTempFile("download", ".pdf"))
      
      val html = runJS("return _$('div.textLayer').innerHTML") map (_.toString)
      pdf <*> html
    }
  }


What happens here.
I load a page in Mozilla, via Selenium. Actually a pdf, but Mozilla pretends it's html.
andThen... (meaning, if it failed, no need to proceed, right?)
Then I extract innerHTML of the content div, I need to parse it.
Oh, the _.js means we convert the value of this string into a Javascript representation of the string, with apos escaped, wrapped in apos.
But what the server sent is actually a pdf (rendered in Mozilla by pdf.js);
So I need the pdf binary. It was https, and there's no api for interception.
So I go into the guts of pdf.js, find the holder of the binary, and tell it to give me the bytes (in a continuation). But the whole communication with the browser is imperative; so I sleep for a second. No biggie.

When I wake up, the bytes I need are already pulled from pdf.js future and converted to a hex string (like 160k of text, one line).
I extract it from the browser.
Then I decode the hexes, producing bytes, and send them to a temp file; the #> op returns the file... actually, monadically, a hope for a file.
There's flatMap here; we flatten all hopes within hopes into one big hope - or an explanation of why everything fell apart.

Now we have, hopefully, a text, and, hopefully, a pdf. We apply tensor product to produce either a long list of explanations why we failed, or a tuple, a pair (text, file).
QED.

Questions? Obvious?

Profile

juan_gandhi: (Default)
Juan-Carlos Gandhi

August 2025

S M T W T F S
      12
3456789
10 11 12 13141516
17181920212223
24252627282930
31      

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 20th, 2025 05:33 pm
Powered by Dreamwidth Studios