juan_gandhi: (VP)
So, I've read the book on Tika; it is kind of better than UIMA, but what does it do? It successfully extracts meta info and the contents of HDF and RSS. Excuse me, all this information is already structured for extraction, what's the point? What I am looking for is a way of extracting content from web pages... my case is kind of specific; working on it.

Example:
====================

Deductible


MaximumSpentLeft
In NetworkMember:$3,000.00$1,652.40$1,347.60
Family:$7,000.00$1,652.40$5,347.60
Out of NetworkMember:$6,000.00$0.00$6,000.00
Family:$12,000.00$0.00$12,000.00


As a result, I get something like this:
Good((
fp(Map(Limit.In Network.Family -> $3,000.00, 
       Remaining Balance.Out of Network.Member -> $3,000.00, 
       Remaining Balance.In Network.Member -> $350.84, 
       Remaining Balance.Out of Network.Family -> $6,000.00, 
       Remaining Balance.In Network.Family -> $1,688.87, 
       Limit.Out of Network.Member -> $3,000.00, 
       Accumulated.Out of Network.Member -> $0.00, 
       Accumulated.In Network.Member -> $1,149.16, 
       Limit.Out of Network.Family -> $6,000.00, 
       Limit.In Network.Member -> $1,500.00, 
       Accumulated.Out of Network.Family -> $0.00, 
       Accumulated.In Network.Family -> $1,311.13)) with prefix <<Deductible>>) ++ 
(fp(Map(Limit.In Network.Family -> $10,000.00, 
        Remaining Balance.Out of Network.Member -> $9,000.00, 
        Remaining Balance.In Network.Member -> $3,550.84, 
        Remaining Balance.Out of Network.Family -> $18,000.00, 
        Remaining Balance.In Network.Family -> $8,304.22, 
        Limit.Out of Network.Member -> $9,000.00, 
        Accumulated.Out of Network.Member -> $0.00, 
        Accumulated.In Network.Member -> $1,449.16, 
        Limit.Out of Network.Family -> $18,000.00, 
        Limit.In Network.Member -> $5,000.00, 
        Accumulated.Out of Network.Family -> $0.00, 
        Accumulated.In Network.Family -> $1,695.78)) with prefix <<Out of Pocket>>))
juan_gandhi: (VP)
here the author hints we can have define fixed point of Succ. Well, it's Nat.

here Dan Piponi discusses in terms too obscure to me (yet) the aspects that kind of evade my comprehension.

here data and codata are defined simply as initial algebras and terminal coalgebras (over a comonad?)

It all looks logical but weird.

My practical gut feeling is that there's a BIG difference between potentially infinite data structures and strictly finite ones.

E.g. If I have a fully (c-like) structure, with no lists etc inside, I can successfully match it "in real time"; but if I have something that has a lazy list or whatever inside, it's impossible, and we actually have to work with it in a totally different way. Scan it, not match. With an exception of a map or a function, which we can ask for a value for a given key, without bothering with whatever else it contains.

There must be some pretty simple philosophy there, but somehow I cannot grasp it yet.

Update: Observational Type Theory may be the answer.

"Potentially more important than the formalisation of mathematical theories is the development of correct software for communicatin systems, which typically exhibit infinite behaviour and hence demand observational reasoning."

Practically, it may also tell us why we do not need DTD or type-safe JSON/REST.
juan_gandhi: (Default)
Некоторые люди полагают, что лазать по дереву нельзя без того, чтобы узлы держали указатели на родителей.

Потому и дерево, собственно. Если у нас меню, и один пункт повторяется в нескольких местах... короче, не получится. А ведь в принципе что, посет и посет.

Люди более продвинутые хранят сокровенное знание - ссылки на верх не имеют права на существование. Я таких встречал на интервью дважды. Они, возможно, не делятся этим знанием, потому что знают, что никто их не поддержит.

Так вот, читая Beautiful Code, я понял кое-что.

Конечно, ссылок на верх не надо. Родитель содержит списки детей, и всё.

Но когда мы браузим, то мы должны помнить, откуда пришли. Это и есть указатель на родителя. Тут вообще можно обобщать на графы (ну там решить вопрос с циклами)... или на категорию, ё.

Но главное, что стек (при dfs) и хранит всю необходимую информацию.

Стек - это же что-то вроде коданных. Codata. Происходит свёртка с данными.

На эту же тему - кванты неплохо бы в начальной школе преподавать. Да некому. Как 10000 лет назад некому было учить детей грамоте (нет ли тут русофобии).

Profile

juan_gandhi: (Default)
Juan-Carlos Gandhi

May 2025

S M T W T F S
    1 2 3
456 7 8 9 10
11 121314151617
18192021222324
25262728293031

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated May. 15th, 2025 02:25 pm
Powered by Dreamwidth Studios