juan_gandhi | Entries tagged with data

https://www.bls.gov/cps/cpsaat11.htm

So, I've read the book on Tika; it is kind of better than UIMA, but what does it do? It successfully extracts meta info and the contents of HDF and RSS. Excuse me, all this information is already structured for extraction, what's the point? What I am looking for is a way of extracting content from web pages... my case is kind of specific; working on it.

Example:
====================

Deductible

		Maximum	Spent	Left
In Network	Member:	$3,000.00	$1,652.40	$1,347.60
In Network	Family:	$7,000.00	$1,652.40	$5,347.60
Out of Network	Member:	$6,000.00	$0.00	$6,000.00
Out of Network	Family:	$12,000.00	$0.00	$12,000.00

As a result, I get something like this:

Good((
fp(Map(Limit.In Network.Family -> $3,000.00, 
       Remaining Balance.Out of Network.Member -> $3,000.00, 
       Remaining Balance.In Network.Member -> $350.84, 
       Remaining Balance.Out of Network.Family -> $6,000.00, 
       Remaining Balance.In Network.Family -> $1,688.87, 
       Limit.Out of Network.Member -> $3,000.00, 
       Accumulated.Out of Network.Member -> $0.00, 
       Accumulated.In Network.Member -> $1,149.16, 
       Limit.Out of Network.Family -> $6,000.00, 
       Limit.In Network.Member -> $1,500.00, 
       Accumulated.Out of Network.Family -> $0.00, 
       Accumulated.In Network.Family -> $1,311.13)) with prefix <<Deductible>>) ++ 
(fp(Map(Limit.In Network.Family -> $10,000.00, 
        Remaining Balance.Out of Network.Member -> $9,000.00, 
        Remaining Balance.In Network.Member -> $3,550.84, 
        Remaining Balance.Out of Network.Family -> $18,000.00, 
        Remaining Balance.In Network.Family -> $8,304.22, 
        Limit.Out of Network.Member -> $9,000.00, 
        Accumulated.Out of Network.Member -> $0.00, 
        Accumulated.In Network.Member -> $1,449.16, 
        Limit.Out of Network.Family -> $18,000.00, 
        Limit.In Network.Member -> $5,000.00, 
        Accumulated.Out of Network.Family -> $0.00, 
        Accumulated.In Network.Family -> $1,695.78)) with prefix <<Out of Pocket>>))

here the author hints we can have define fixed point of Succ. Well, it's Nat.

here Dan Piponi discusses in terms too obscure to me (yet) the aspects that kind of evade my comprehension.

here data and codata are defined simply as initial algebras and terminal coalgebras (over a comonad?)

It all looks logical but weird.

My practical gut feeling is that there's a BIG difference between potentially infinite data structures and strictly finite ones.

E.g. If I have a fully (c-like) structure, with no lists etc inside, I can successfully match it "in real time"; but if I have something that has a lazy list or whatever inside, it's impossible, and we actually have to work with it in a totally different way. Scan it, not match. With an exception of a map or a function, which we can ask for a value for a given key, without bothering with whatever else it contains.

There must be some pretty simple philosophy there, but somehow I cannot grasp it yet.

Update: Observational Type Theory may be the answer.

"Potentially more important than the formalisation of mathematical theories is the development of correct software for communicatin systems, which typically exhibit inﬁnite behaviour and hence demand observational reasoning."

Practically, it may also tell us why we do not need DTD or type-safe JSON/REST.

generic data model

Некоторые люди полагают, что лазать по дереву нельзя без того, чтобы узлы держали указатели на родителей.

Потому и дерево, собственно. Если у нас меню, и один пункт повторяется в нескольких местах... короче, не получится. А ведь в принципе что, посет и посет.

Люди более продвинутые хранят сокровенное знание - ссылки на верх не имеют права на существование. Я таких встречал на интервью дважды. Они, возможно, не делятся этим знанием, потому что знают, что никто их не поддержит.

Так вот, читая Beautiful Code, я понял кое-что.

Конечно, ссылок на верх не надо. Родитель содержит списки детей, и всё.

Но когда мы браузим, то мы должны помнить, откуда пришли. Это и есть указатель на родителя. Тут вообще можно обобщать на графы (ну там решить вопрос с циклами)... или на категорию, ё.

Но главное, что стек (при dfs) и хранит всю необходимую информацию.

Стек - это же что-то вроде коданных. Codata. Происходит свёртка с данными.

На эту же тему - кванты неплохо бы в начальной школе преподавать. Да некому. Как 10000 лет назад некому было учить детей грамоте (нет ли тут русофобии).

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Observations

Views from Souths

Entries tagged with data

profession per gender/race

good source of all kinds of data

knowledge extraction?

Deductible

codata, anybody can explain?

found an old note of mine, just published

о, всё фигня, срочно в жж записать

Profile

May 2025

Syndicate

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags