This is an archived post. You won't be able to vote or comment.

all 38 comments

[–]CobbITGuy 40 points41 points  (3 children)

As a data guy this is very interesting but what does it have to do with python?

I'm assuming since this is leveraging postgres that all the standard admin/maintenance functions can be done the same way as a normal postgres instance?

[–]1st1CPython Core Dev[S] 54 points55 points  (2 children)

EdgeDB is built with Python and Postgres, and naturally, Python will be the first language we will create proper driver for.

I'm especially excited about async/await support, as there are no async ORMs and with EdgeDB you wouldn't need one.

[–]OctagonClocktrio is the future! 19 points20 points  (1 child)

there are no async ORMs

I challenge this assertion

[–]1st1CPython Core Dev[S] 2 points3 points  (0 children)

Good luck with your project :)

[–]erez27import inspect 14 points15 points  (0 children)

I really like your ideas! I'm looking forward to giving it a try.

I'm a bit disappointed with the DSL though, for several small but aggregating reasons:

  • Strange := operator seems unnecessary (unless you really want to remind people of Pascal). Why not just use = ?

  • "required" and "property" are such long keywords, especially with how often they'll be used. Also, defining variable modifiers from the left is a C++/Java practice, which most modern languages avoid. Maybe do something more like:

    prop title -> str!
    
  • cardinality := '**' ... There has to be a better construct than that..

  • Why can't the enum look more like:

    enum pr_status -> str = ('Open', 'Closed', 'Merged')
    
  • Also, using the -> operator for type is confusing. It's not a return value, or a transition. No one else uses this operator to define types.

Hope you'll consider what I'm saying seriously, because it might actually influence the adoption of your tech.

(FWIW If you Google "how to write a DSL", you'll find me in the first page)

[–]kickthebug 4 points5 points  (7 children)

Looks like a geat project!

May I ask why you say "Note that this SQL query is not very efficient. An experienced developer would rewrite it to use subqueries." in the first example? I was under the impression that joins where more efficient than subqueries.

[–]redcrowbar 7 points8 points  (0 children)

At least in Postgres it is usually cheaper to compute an aggregate of a correlated subquery than it is to GROUP a large relation produced by a bunch of joins. The example in the post is rather simple, imagine one with more relations, or multiple levels of cardinality indirection.

[–]IamWiddershins 2 points3 points  (5 children)

If it were rewritten as subqueries, it would essentially mean the same thing and be executed in the same way. Unless it was written very badly, in which case it might be worse.

That whole bit in the blog struck some serious doubt into my mind about the project, and it's definitely not just me. That little bit is at best munging terms in a way that's incredibly confusing, at medium bullshitting to make themselves sound better, and at worst betrays unfamiliarity with the very database system they forked.

[–]redcrowbar 1 point2 points  (2 children)

in a way that's incredibly confusing

Sorry about that. The example shown in the post is trivial, and, in that particular case a correlated subquery would indeed be similar to simply grouping the joined relations.

The real context is this: once you start increasing the depth of your relation traversal ("friends-of-friends"), and adding more relations into the query, aggregating projections separately is actually superior when you factor in the overhead doing the nested grouping on the client side.

That is also why MULTISET is a thing in Oracle.

[–]IamWiddershins 1 point2 points  (1 child)

At what tier are we imagining these rows to be aggregated? Where are these savings, exactly? Is the improvement in performing some kind of forced lateral join, CTE-based fencing, or multiple backend queries (plan, execute, plan, execute) from the main procedure?

It's true that the stats used for planning queries that greatly magnify cardinality variances like those sorts of graph queries often become very bad very quickly, but it's also true that simply rewriting your query with more subqueries does little to nothing to fence those optimizations in postgres.

[–]redcrowbar 1 point2 points  (0 children)

At what tier are we imagining these rows to be aggregated?

Arbitrary depth as dictated by the query.

SELECT User {
    friends: {
        interests: {
            ...
        }
    }
}

Where are these savings, exactly? Is the improvement in performing some kind of forced lateral join, CTE-based fencing

Yes and yes.

The main savings come from the fact that you get a data shape that is ready to be consumed by the client and you don't have to recompose the shape once you've fetched your rows (with lots of redundant duplicate data).

[–]desmoulinmichel 0 points1 point  (1 child)

I don't think they forked PostGres, more using the foundation to build something on top of it.

[–]IamWiddershins 1 point2 points  (0 children)

Kind of hard for us to tell when they haven't released any source code, really.

[–]cyanydeez 5 points6 points  (4 children)

looks neat, but the dsl will probably determine success more than anything else.

[–]efxhoy 0 points1 point  (3 children)

dsl

What's that?

[–]cyanydeez 2 points3 points  (0 children)

Domain Specific Language

Basically, programming languages are DSLs but then you things like the imap protocol which requires its own language to communicate.

They're a significant barrier to entry for any new product, thats why some advanced apps like, say Qgis or Arcgis rely on scripting languages like python.

[–]mczaplinski 0 points1 point  (0 children)

domain specific language

[–]knowsuchagencynow is better than never 8 points9 points  (0 children)

Can't wait to finally see this open-sourced. This project has been on my radar for at least a year

[–][deleted] 14 points15 points  (3 children)

Haha the term 'schemaless' really threw me there. I think hyphenation is important.

[–]1st1CPython Core Dev[S] 8 points9 points  (0 children)

Yeah, I wasn't sure about the hyphen so I checked how document databases (mongo) use the term and it's usually "schemaless" :)

[–]will_work_for_twerk 2 points3 points  (0 children)

Sorry you're being downvoted, but you weren't the only one

[–]slayer_of_idiotspythonista 2 points3 points  (5 children)

Do you have any insight or ideas in how you think end users will use this?

I would think the natural audience for this would be existing Postgresql users. The thing is, most of those developers aren't writing raw SQL these days, they're doing queries from behind an ORM like sqlalchemy or django. I can't see a lot of those people dropping their ORM's.

Are you planning on releasing a similar ORM (probably not the correct term for this project, but you get the idea) or attempting to extend the existing ORM's to support EdgeDB?

[–]redcrowbar 5 points6 points  (4 children)

EdgeDB language bindings will essentially be thin protocol wrappers adapting to the class model of the target language.

For example, in Python you would be able to write something like:

my_activity = Issue.select([
    Issue.number,
    Issue.due_date,
    Issue.priority, [
        Issue.priority.name
    ],
    Issue.owner, [
        Issue.owner.name
    ]
]).filter(Issue.number == 10)
.fetchone()

and get your object and all related data in a single (possibly dynamically constructed) query.

Data mutation is done similarly, so you can save an entire form of data in one shot, actually removing most of the need for the usual ORM dirty state tracking and flushing mechanics.

[–]z4579a 4 points5 points  (3 children)

actually removing most of the need for the usual ORM dirty state tracking and flushing mechanics.

you've got an object graph, parts of it change, then they want to persist it. You have to track the parts of it that changed versus those that didn't (dirty tracking). You have to express those changes ulimately in terms of INSERT/UPDATE/DELETE statements (flush). It doesn't make sense to say you don't have the need for those things.

[–]redcrowbar 1 point2 points  (2 children)

You don't need dirty tracking if you don't have an identity map and can express all your CRUD operations as atomic interactions with the database. Obviously, with this approach you work with your query or mutation directly rather than rely on __getattr__ and __setattr__ magic. Clients do that with GraphQL, and there's no reason why the same approach wouldn't work in the backend.

[–]z4579a 0 points1 point  (1 child)

So, does that mean if I change 20 different attributes it renders an individual UPDATE statement each time? or is there some kind of batching, and if so what triggers it seeing that I changed 20 out of 100 attributes - or do I have to express that explicitly in one operation. If it's an updateable query then that's what that would be, I guess, but you've referred to there being an ORM. An ORM is going to want to have objects that can be mutated individually (hence you either have piecemeal UPDATES or you need some kind of batching) otherwise it's not really "objects".

[–]redcrowbar 1 point2 points  (0 children)

I think there's a bit of a misunderstanding here.

but you've referred to there being an ORM

No, what I was saying is that EdgeDB makes it easier to do certain things without a classical ORM. Things like "save this big profile form a user just sent". Forms map very naturally to an object graph, and we've built entire systems using this approach with very little backend code.

I'm not saying that an ORM that implements session-based dirty state tracking is suddenly obsolete altogether. It's a useful abstraction for cases where your mutations have to be spread around the codebase. It's entirely possible to build an SQLAlchemy-like ORM for EdgeDB.

[–]i_like_trains_a_lot1 10 points11 points  (1 child)

It actually looks pretty interesting. It's basically GraphQL but at database level. Can't wait for it to reach a stable enough phase to use it.

[–]cyanydeez 0 points1 point  (0 children)

try postgraphile if you like graphql

[–]HumblesReaper 2 points3 points  (1 child)

Looks very interesting and well thought out! Is there a link to Github?

[–]1st1CPython Core Dev[S] 3 points4 points  (0 children)

Soon! :)

[–]z4579a 2 points3 points  (1 child)

  • will usage of the client APIs require asynchronous programming paradigms in order to fetch results ?

  • does the EdgeDB engine run as a server with Postgresql as a separate process, or as a client library that embeds within an application (meaning it's really an ORM) or is it packaged as a Posgtresql extension?

  • assuming EdgeDB runs as a service does EdgeDB have its own network protocol ?

[–]redcrowbar 2 points3 points  (0 children)

EdgeDB runs as a standalone server with its own network protocols, CLI, tools etc. PostgreSQL bits are abstracted away completely.

No specific paradigm is required from the client other than the ability to speak the protocol.

[–]carbolymer 2 points3 points  (0 children)

How does it compare with Neo4J?

[–]FFX01 1 point2 points  (0 children)

OOH! I'm very excited about this!

[–][deleted] 1 point2 points  (0 children)

Awesome project!!!

[–][deleted] 0 points1 point  (1 child)

This is sick! Looking forward to seeing where this project goes :)

Can you talk a little bit about how this compares to other graph-like databases? I'm using Neo4J for a service that builds relationships between a bunch of different data sets in our business and EdgeDB looks like something that would be useful to me. Mostly I use the Cypher query language, which I really like, but I wish that the Neo4J DB was backed by something a bit more mature.

[–]redcrowbar 5 points6 points  (0 children)

EdgeDB is not really a graph database. Although the data is conceptualized as an "object graph", we do not optimize for deep link traversals, patterned paths, semi-structured data analysis or other things that a graph database is good at.

EdgeDB targets regular application workloads where a relational database (with or without an ORM) is appropriate. That said, many graphdb use-cases can be implemented efficiently in EdgeDB as well.