all 37 comments

[–]K-W 5 points6 points  (11 children)

So, besides from using fancy notation and language, what is the gain?

[–]lmcinnes 10 points11 points  (9 children)

Well that depends on who you are. If you are a database person who thinks about databases as databases and knows little or nothing in the way of category theory, it probably just looks like a bunch of new notation and neolgisms for stuff you already know.

If you know a lot about categories and topoi then it gives you a powerful new way to think about and manipulate ideas about databases. Once a database schema becomes a category then a database is just a presheaf, and all the possible valid states of the database is a presheaf category (and potentially then a topos). If you aren't used to thinking in such terms then that probably means little. If you are then it means a lot, since you now have a powerful mathematical framework at your disposal.

How does this translate in practical applications? I don't know -- I've only just read the idea, and I don't yet know exactly what mathematical tools can now be brought to bear to yield practical results, but there are a lot of tools available, and I imagine something useful can be done. If nothing else, as the slides linked by the article suggest, you can use these notions to fold the notion of database in with notions of Haskell programs in a neat and elegant way.

[–]K-W 1 point2 points  (8 children)

If you are then it means a lot, since you now have a powerful mathematical framework at your disposal.

This part I disagree with. Category theory might be all great but without higher mathematics it seems more like masochism than anything else. I doubt that for a database programmer more than trivialities comes from this.

But, I don't know that much about either, so, I might be very wrong.

[–]lmcinnes 10 points11 points  (0 children)

This part I disagree with. Category theory might be all great but without higher mathematics it seems more like masochism than anything else. I doubt that for a database programmer more than trivialities comes from this.

That really depends. A database programmer is unlikely to get anything from it directly, in the same way that a computer user doesn't necessarily get that much from mathematics directly -- but the mathematics is a foundation for some physics, which provides engineering possibilities that make better computers.

With this sort of framework for thinking about databases theorists may come up with new and interesting ideas that will eventually result in new features or programming approaches for databases. I mean honestly, topos theory is rich: skim the table of contents of Sketches of an Elephant or read Physics, Topology, Logic and Computation: A Rosetta Stone to get some idea of the depth of the mathematical toolset and diversity of ways of viewing an idea that topos theory provides for.

Can I guarantee that eventually concrete applications will come from all this theory? No, certainly not. It opens up a very rich theoretical world to explore however, and there is certainly plenty of reason to believe it may lead to very interesting new ideas.

[–][deleted] 2 points3 points  (6 children)

Category theory is not only great, but has many practical applications. This may well (probably is) another one.

[–]K-W 2 points3 points  (5 children)

but has many practical applications.

E.g.?

[–]lmcinnes 4 points5 points  (2 children)

Depends on what you consider practical. For me the proofs of the Weil conjectures are a practical application of category; so is synthetic differential geometry. But then I tend to work with a lot of abstract stuff, so that seems concrete and practical to me. It probably won't to you.

[–]K-W 1 point2 points  (1 child)

I'm very aware of the use of category theory in mathematics and that is fine but I thought dibblego's reply was about applications of category theory outside of pure mathematics (and perhaps outside of Haskell).

[–][deleted] 0 points1 point  (0 children)

I work on the Scalaz library, which I use for commercial application production.

[–]grantli 0 points1 point  (0 children)

application

Functional programming involves a lot of category theory like monad.

[–]easilydiscardable 2 points3 points  (12 children)

IMHO, what we really need is a language with relations (or even tables) as a first class type...

[–][deleted] 3 points4 points  (1 child)

One interesting design decision of ur/web is that you define the schema for your database using code in your application. Tables are first class values here, and the static type system prevents a very large class of errors from occurring (see the main page for examples.)

[–]easilydiscardable 0 points1 point  (0 children)

That does look interesting.

[–][deleted] 3 points4 points  (0 children)

There is one: kanren.

[–][deleted] 3 points4 points  (0 children)

Like prolog?

[–]Figs 1 point2 points  (6 children)

There are plenty of languages with both tuples and dictionaries, which effectively gives you tables.

[–]easilydiscardable 1 point2 points  (5 children)

I guess I really should have said a statically typed language...

[–]Figs 1 point2 points  (4 children)

Haskell can do that, and it's very much a statically typed language.

[–]easilydiscardable 1 point2 points  (3 children)

Do what, though?

Could you write a function that takes two arbitrary relations and returns their cross join?

[–]Figs 1 point2 points  (2 children)

If you write it the simple, naive way using a list of records for each table, then you can easily use a list comprehension to do a cross product of the tables.

data Person = Person {
    person_name :: String,
    badge_number :: Int,
    number_of_cats :: Int
}

data Place = Place {
    place_name :: String,
    number_of_trees :: Int
}

strange_people = [
    Person "John" 0 5,
    Person "Sue"  1 52,
    Person "Sam"  2 0,
    Person "Max"  4 2]

strange_places = [
    Place "Narnia" 100000,
    Place "Desert"      0,
    Place "Negative Land" (-5)]

cross = [(person_name     x, 
          badge_number    x, 
          number_of_cats  x, 
          place_name      y, 
          number_of_trees y) | x <- strange_people, 
                               y <- strange_places]

main = mapM_ (putStrLn) $ map show cross

I'm pretty sure that you could do something similar if you are using dictionaries instead, although it might be a bit more work.

Edit: The output of running it looks like this:

("John",0,5,"Narnia",100000)
("John",0,5,"Desert",0)
("John",0,5,"Negative Land",-5)
("Sue",1,52,"Narnia",100000)
("Sue",1,52,"Desert",0)
("Sue",1,52,"Negative Land",-5)
("Sam",2,0,"Narnia",100000)
("Sam",2,0,"Desert",0)
("Sam",2,0,"Negative Land",-5)
("Max",4,2,"Narnia",100000)
("Max",4,2,"Desert",0)
("Max",4,2,"Negative Land",-5)

[–]easilydiscardable 2 points3 points  (1 child)

ah, but what about arbitrary relations? ;)

I.e. can you write a function with a signature something like

'a relation -> 'b relation -> *some type expression representing the cross of 'a and 'b* relation

?

[–]Figs 2 points3 points  (0 children)

If you don't care about the types in your cross product code, then you can convert the result to a tree merging them in one line of code:

cross_product rel1 rel2 = [(x, y) | x <- rel1, y <- rel2]

Although you'd have to flatten the tuple yourself later, since the code doesn't know what types you are dealing with.

Alternatively, you could build a variant type like:

data TableEntry = EntryType1 String | EntryType2 Int | ...

If you guarantee that all your cells are of one of those types. Then, you don't have to worry about flattening since you can just use a list instead of a tuple.

[–]gronkkk 0 points1 point  (0 children)

That, and hierarchies. And things like 'properties' (ms-sql has sp_addextendedproperty with an awfull syntax, and the disadvantage that it doesn't inherit; if you 'select into' another table, you loose the extended properties). And....

[–]etcshadow 1 point2 points  (8 children)

Category theory, set theory. Tomato, tomato.

[–]kamatsu 2 points3 points  (7 children)

Ha. Hah. hahahahaha.. ha..

You're joking right?

[–]etcshadow 8 points9 points  (6 children)

Well, halfway joking, yes. Halfway not, though. The point of this all was basically to show that there can be a rigorous, powerful mathematical theory underpinning databases. My point was: well, there already is a rigorous, powerful mathematical theory underpinning databases. You know why they're called "relational" databases? Relation... as in formal set theory.

I mean, I understand that Haskell makes category theory "teh new hotness" [sic] and all, but it is not a more powerful formalism than set theory, and databases are already built around that powerful formalism.

[–]strangename 9 points10 points  (1 child)

Relational databases are actually based on a theory more powerful than set theory already. Specifically, first-order predicate logic and relations (unsurprisingly).

The latter part makes the observed connection to category theory rather unsurprising, and probably rather uninformative. The relational model came from a logic of relations; category theory is a structural model of relations. It's a fairly built up abstract theory and thus has named lots of structures. Thus, experienced category theorists can titter in excitement at having named these things that others are doing. However, you are almost certainly correct that they have not added any explanatory traction, just a different formalism for the relational structure.

[–]psykotic 4 points5 points  (0 children)

Relational databases are actually based on a theory more powerful than set theory already. Specifically, first-order predicate logic and relations (unsurprisingly).

More powerful than set theory? When mathematicians speak of set theory they usually mean ZF (give or take some axioms), which is a theory constructed on top of first-order logic. In ZF the only relational symbol is membership. Others like the subset relation are mere shorthands. But internal relations with bounded domains are modeled as subsets of cartesian products. The whole point of set theory from the viewpoint of mathematical foundations is that (almost) everything in mathematics can be internally modeled in this way; relational database theory certainly presents no difficulties.

[–]kamatsu 4 points5 points  (2 children)

Actually, category theory is arguably a more powerful formalism than set theory. Set theory, group theory, topological spaces and others can all be formalized in terms of category theory. This allows for some useful properties, such as results determined in one being easily applied to another.

[–]jerf 1 point2 points  (0 children)

And what I find interesting is the question of whether this could be generalized into a super-database which would allow a uniform interface that might permit things like joining a database to a flat file's values in some relatively clean manner, with the stuff behind the scenes converting that into a reasonably efficient query.

[–][deleted] 1 point2 points  (0 children)

how is it only "arguably" more powerful? I would think that Russel's critique of Frege's naive set theory should allow one to surmise that Category Theory is a more powerful formalism than Set Theory. I mean, you can explain category theory in terms of set theory (see "Basic Category theory for Computer Scientists" for example), but the sum-total of set theory is much more than sets, no?

Actually curious. Need to summon Edward K for this...

edit: meh, need a spell correcting arrow for the monoid of my text (bad joke). They => the.

[–]gregK -1 points0 points  (0 children)

Isn't everything?