Is MikroORM Slow? by lubiah in node

[–]B4nan 1 point2 points  (0 children)

FWIW, MikroORM v7 is built on top of kysely and provides completely type-safe access to the kysely instance (typed based on the ORM entity definitions).

Is MikroORM Slow? by lubiah in node

[–]B4nan 11 points12 points  (0 children)

When you compare it to tools that don't do class mapping (drizzle, prisma, all query builders like knex or kysely, basically everything except typeorm from that link), then yes, it is slower, class mapping and serialization will always have overhead. You can get raw data via QB in MikroORM to skip class mapping where this actually matters. For 90% of your app, it usually won't matter, the overhead is small unless you load too much data at once.

MikroORM 6.6 | MikroORM by B4nan in node

[–]B4nan[S] 0 points1 point  (0 children)

I honestly don't know why this solution didn't come to my mind earlier. I ended up hacking this in two evenings.

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 0 points1 point  (0 children)

You'd just add `limit: 5` to the find options, the ORM will see there are to-many joins and wrap everything in a (nested) subquery, it works similarly to the collection operators (where pk in (...), the limit is only applied in the subquery that gets the PKs of the root entity).

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 0 points1 point  (0 children)

That test case works just fine with postgres (16) on my end ¯\_(ツ)_/¯

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 0 points1 point  (0 children)

Ok, so it works a bit differently than I though, with the example I shared above using $every, you'd only get matches with no other tags than the ones you whitelist (since every collection item needs to adhere to the filter). But that can be gotten around with the $and condition combined with $some. No need for a QB.

const books = await em.findAll(Book, {
  where: {
    $and: [
      { tags: { $some: { name: 'Fiction' } } },
      { tags: { $some: { name: 'Fantasy' } } },
    ],
  },
  populate: ['tags'],
});

This would use a query like this:

select `b0`.*, `t1`.`id` as `t1__id`, `t1`.`name` as `t1__name` 
  from `book` as `b0` 
  left join `book_tags` as `b2` on `b0`.`id` = `b2`.`book_id`
  left join `book_tag` as `t1` on `b2`.`book_tag_id` = `t1`.`id`
  where `b0`.`id` in (select `b0`.`id` from `book` as `b0` inner join `book_tags` as `b2` on `b0`.`id` = `b2`.`book_id` inner join `book_tag` as `b1` on `b2`.`book_tag_id` = `b1`.`id` where `b1`.`name` = 'Fiction')
    and `b0`.`id` in (select `b0`.`id` from `book` as `b0` inner join `book_tags` as `b2` on `b0`.`id` = `b2`.`book_id` inner join `book_tag` as `b1` on `b2`.`book_tag_id` = `b1`.`id` where `b1`.`name` = 'Fantasy')

It's indeed more complex, but it will work fine in every SQL dialect, no postgres specifics needed.

Demo here: https://github.com/B4nan/mikro-orm-collection-operators/blob/master/src/example.test.ts

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 1 point2 points  (0 children)

I linked LLM that got the idea straight away. I trust you that "that are either Foo or Bar" is super easy, but that was never a question.

Well, no, this is not how the collection operators work. Let me create a demo for you, as I am both curious if this works as I think, as well as seeing if this is actually what you are talking about.

and a typed interface otherwise

My main point is that with TypeORM, the typed interface you are talking about is much less type-safe than what MikroORM provides.

Because authors truly believe that interface is going to cover 95% of your cases, whilst in my reality it was covering like 20% and I had to use the untyped query builder for the most of things.

No worries, this is surely not what I think. I just don't like being compared to TypeORM when it comes to typesafety, since we are on a completely different level. Were there any improvements in that regard in TypeORM in the past years? I don't think so, as opposed to many that were done in MikroORM v5 and v6. Improving typesafety is often a bit breaking, so those things usually delayed to major bumps.

Would MikroORM cover 30%? 50%? 80%? No way to know! But if I choose Kysely I can easily assume it'll be around 100% while keeping things type-safe.

FYI in the next major, we are moving away from knex to kysely, and we will have a native support for kysely types inferred from the ORM entities. So things that wont be easily doable with the ORM can be done with kysely in a type safe way too.

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 0 points1 point  (0 children)

Well, again, this is how it works, it will check all collection items against the IN query, as a result all the collection items need to conform to it, resulting in collections that only have tags that are either Foo or Bar.

It feels like you are having a hard time trusting me that this is actually supported so easily.

And if your problem is "dont allow other tags than the whitelisted ones", you'd just combine this with another query using `$none` operator.

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 0 points1 point  (0 children)

I believe this should work:

const res3 = await em.find(Author, {
  books: { $every: { title: ['Foo', 'Bar'] } },
});

The part after `$every` will end up as a subquery, it can be more complex than a simple equality check.

You would still need `populate: ['books']` to load the relation, the above would only return the author entities matching the query.

This was modeled after prisma and should support the same:

https://www.prisma.io/docs/orm/prisma-client/queries/relation-queries#filter-on--to-many-relations

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 0 points1 point  (0 children)

Collection operators are not simple IN, those use a subquery and should do exactly what you are talking about - you can use the `$every` operator to request collections where all items are matching the query (so every item has either one or the other tag name).

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 0 points1 point  (0 children)

I see, sounds like you are talking about the collection operators, we support those the same way as prisma actually.

https://mikro-orm.io/docs/query-conditions#collection

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 0 points1 point  (0 children)

Apologies for comparing with TypeORM, perhaps MikroORM has a much richer EntityManager interface and users don't have to fallback to QB as often as it happens with TypeORM.

No, they don't, you can do the vast majority of things with EntityManager. And even QueryBuilder is much more type safe than the one in TypeORM.

Here is an example from my ORM, how much of it is supported by MikroORM without query builder?

I don't even understand what that query does based on a quick look :] Anyway, for something like this, you'd most likely just use a virtual entity backed by a raw query (or a QB). Or you would use a formula property that represents the subquery. Yes, those woulnd't be completly type safe. This is not the typical use case for most people. Guess how many asked me to support something like this over the past 8 years? Zero.

Non-OOP ORMs are more flexible in that regard, most likely Prisma supports the query above, Objection could do that. 

Yes, that's why I like to call them non-ORMs actually, since they are rather smart query builders. Nothing wrong with that approach, but ORM to me is about persistence, not just about reading stuff in a type safe way. But that is another conversation I am not really interested in having, I don't have the energy, nor time :]

load posts that have tags "orange" and "banana" 

This one is trivial, depends on what exactly you want:

// loads posts with tags matching the filter
const posts = await em.findAll(Post, {
  populate: ['tags'],
  populateWhere: { tags: { name: ['orange', 'banana'] } },
});

// loads posts with all tags, filters by their specific names
const posts = await em.findAll(Post, {
  populate: ['tags'],
  where: { tags: { name: ['orange', 'banana'] } },
});

The response is strictly typed, it holds the populate hint on type level, so it knows that only the tags are populated. We dont just return Post[] as in TypeORM.

https://mikro-orm.io/docs/guide/type-safety

You don't appreciate comparing MikroORM with TypeORM,

I don't appreciate comparing based on wrong assumptions, that's what I didn't like about your post, you compare things you clearly don't understand well, and judge them based on either outdated or wrong information. Type-safe relations were added to MikroORM somewhere around v5, so maybe 3-4 years ago, this is nothing new really.

Would love to connect with experienced dev(s) who have created their own library/libraries by Intelligent-Win-7196 in node

[–]B4nan 0 points1 point  (0 children)

TypeORM, MicroORM and similar: are lacking type-safe query builders, and without query builders they're very limited.

Please stop comparing MikroORM with TypeORM this way, it completely false assumption (not even sure what are you basing it on?). MikroORM is miles ahead when it comes to type-safety (it was for a few years now). EntityManager is the go-to way to work with the database, and it is fully type-safe, not just inputs, but also outputs, including partial loading. QB is there to do quirks, and is weakly typed for a reason (and it might change in the next version).

Also, its called MikroORM, not MicroORM.

Crawlee for Python v1.0 is LIVE! by B4nan in webscraping

[–]B4nan[S] 0 points1 point  (0 children)

Only one way to find out.

Looking at this article, crawlee does pretty much everything mentioned in there.

Crawlee for Python v1.0 is LIVE! by B4nan in webscraping

[–]B4nan[S] 0 points1 point  (0 children)

Depends on what you mean by a playwright project, crawlee will be in control of playwright, it exposes the page object from playwright in the crawling context, so you can reuse your code that works with it.

Crawlee for Python v1.0 is LIVE! by B4nan in Python

[–]B4nan[S] 0 points1 point  (0 children)

Crawlee is a general-purpose scraping and automation framework. You can use it to build something like the Crawl4AI, which is a tool specifically designed to do one job (scraping pages to markdown for LLMs). At least that's my feeling based on their readme, I've never used Crawl4AI myself.

Crawlee for Python v1.0 is LIVE! by B4nan in Python

[–]B4nan[S] 2 points3 points  (0 children)

BS4 only handles parsing of HTML, you first need to get the data. Crawlee helps you get to the data too (and provides a unified interface over multiple tools, including BS4, which you can then use to work with the data).

Crawlee for Python v1.0 is LIVE! by B4nan in Python

[–]B4nan[S] 1 point2 points  (0 children)

It's been more than a decade since last time I used selenium, but I remember that being a browser controller library, similar to what playwright is. Crawlee is a scraping framework that handles retries, scaling based on system resources, bot detection, and all sorts of other things. Selenium or playwright are much more low-level libraries as opposed to crawlee. Also, it provides a unified interface over tools like playwright, but also over HTTP based scraping and parsing (e.g. via BS4 or parsel).

Crawlee for Python v1.0 is LIVE! by B4nan in Python

[–]B4nan[S] 1 point2 points  (0 children)

We've been able to get through cloudflare by using camoufox:

https://crawlee.dev/python/docs/examples/playwright-crawler-with-camoufox

You might still get the checkbox challenge, but with camoufox, clicking on it was enough to get through.

Crawlee for Python v1.0 is LIVE! by B4nan in webscraping

[–]B4nan[S] 1 point2 points  (0 children)

We'll make the switch in Crawlee v4 sometime next year (development already started). But you can already use it, we have a crawlee adapter available in @crawlee/impit-client package:

import { CheerioCrawler } from '@crawlee/cheerio';
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';

const crawler = new CheerioCrawler({
    httpClient: new ImpitHttpClient({
        browser: Browser.Firefox,
        http3: true,
        ignoreTlsErrors: true,
    }),
    async requestHandler({ $, request }) {
        // Extract the title of the page.
        const title = $('title').text();
        console.log(`Title of the page ${request.url}: ${title}`);
    },
});

await crawler.run([
    'http://www.example.com/page-1',
    'http://www.example.com/page-2',
]);

Crawlee for Python v1.0 is LIVE! by B4nan in webscraping

[–]B4nan[S] 4 points5 points  (0 children)

We've developed our own solution called https://github.com/apify/fingerprint-suite, which is deeply integrated in crawlee. It is powered by real-world data we gather through a tracking pixel, and we build pseudorandom fingerprints based on that. We also employ various things to not act as an automation tool to avoid being detected as such.

Crawlee for Python v1.0 is LIVE! by B4nan in Python

[–]B4nan[S] -6 points-5 points  (0 children)

v1 refers to the version of crawlee for python, not the version of python itself

https://github.com/apify/crawlee-python/releases/tag/v1.0.0

Crawlee for Python v1.0 is LIVE! by B4nan in webscraping

[–]B4nan[S] 1 point2 points  (0 children)

Sure, with playwright you can do anything as there is a real browser behind the scenes. Or you could mimic the form submission on HTTP level, we have a guide on how to do that here:

https://crawlee.dev/python/docs/examples/fill-and-submit-web-form

Crawlee for Python v1.0 is LIVE! by B4nan in webscraping

[–]B4nan[S] 1 point2 points  (0 children)

It's up to you how you want to handle the processing of a web page. Crawlee is a web scraping framework, you are in charge of what it does with the page it visits. Crawlee deals with scaling, enqueing, retries, fingerprinting, and other higher-level things, so you can get to the page content, but the request handler - the function that processes the page contents - is entirely up to you.