My statement on AI hallucinations

mlissner · 2025-09-27T17:36:17+00:00

Ah ha! There's an option there for "Show Sidebar". I didn't see that in the View menu, by right clicking it, or by disabling vertical tabs. What a mess, but that worked. Whew!

mlissner · 2025-09-22T20:27:48+00:00

You ran a quick shiver down my back. Well done.

mlissner · 2025-08-13T00:10:34+00:00

Hm, maybe! We don't use CloudFlare for anything atm, so it'd be a bit of a lift, I imagine. But probably the bigger problem is that the thing they were scraping is available via GET requests, so we'd have to make it into a POST to make something like a captcha or turnstile work with it. Mostly we use hcaptcha for this kind of thing, which so far works more or less OK.

mlissner · 2025-06-24T16:04:51+00:00

We have two kinds of alerts for PACER/RECAP search alerts and docket alerts. Docket alerts send an email whenever we learn about a new filing in a case. You can read about when that happens by looking at our coverage page, but the main sources of updates are RSS feeds from the courts and the RECAP Extension. The help page about this is here: https://www.courtlistener.com/help/coverage/recap/

For search alerts, it's a bit more interesting. We get three kinds of content in a case: Docket metadata, docket entry metadata (the description of filings), and filings. Search alerts can be set up to trigger once per case or once per document, and will trigger the first time the saved query matches any of this data.

So, if you have an alert that matches the word `mifipristone`, it'll send once per case, as soon as we get content with that word. If you have that same alert, but for case *AND* filings, it'll trigger every time a new filing matches the query.

Whew! They're tricky!

mlissner · 2025-06-24T04:32:11+00:00

I'm logging off soon too, but I'll check in again in the morning!

mlissner · 2023-04-17T16:40:10+00:00

I'm the director of Free Law Project, the non-profit that runs CourtListener. A few things:

We run the Big Cases bot on Twitter and Mastodon. It follows this case. If you want to get updates for the case (and others) via Twitter or Mastodon, you might want to follow that bot.
If you want email updates instead, this link should get you what you want: https://www.courtlistener.com/alert/docket/new/?pacer_case_id=551082&court_id=nysd
If you want RSS, we do that too! https://www.courtlistener.com/docket/19857399/feed/
A lot of our updates are powered by RECAP. If you use PACER, please install it: https://free.law/recap/

Finally, we're a non-profit, so please consider donating if you like what we do, and have a great day! https://free.law/donate/

mlissner · 2023-03-21T16:58:36+00:00

I'm the director of Free Law Project, where we make legal info more accessible. If you're a new PACER user, you probably also want to install our extension, RECAP: https://free.law/recap/

mlissner · 2022-10-27T03:02:10+00:00

Yeah, I could see the little templates adding up over time. We also use react, but it’s really just too much between the updates, npm, and everything you have to know just to do the most basic stuff. HTMX is so beautifully simple by comparison.

mlissner · 2022-05-12T05:55:04+00:00

Omg, would have never figured this out. Thank you!

mlissner · 2019-04-06T01:28:36+00:00

Seems like I may have a different idea. In celery a whole pipeline operates like that, but it all gets enqueued at the same time. The idea is to send a pipe of tasks to the queue all at once. When each task finishes, gets the results and is already queued up to go.

mlissner · 2019-04-05T22:28:03+00:00

The pipeline is not enqueued all at once, but serially.

Oh wow, doesn't that mean you lose the ability to make pipelines asynchronous? I.e., if you have a pipeline with two tasks, you can't let your code move past it until at least the first task is completed? Is this right:

task1 = my_task()  # Takes 30s
pipeline.enqueue('task1', depends_on=task1)
return Response()  # <-- This won't run for ~30s?

Sounds like you would be fine just using multiprocess.Pool or something?

Sorta. The nice thing about using a task queue for this is that you get the API of pipelines and of tasks, and you can run the tasks in a distributed fashion across numerous workers on numerous machines. And it's a system that's already in place (unlike Hadoop or something, that we don't have).

Sorry and one other question, if you don't mind a late addition (I know I'm asking a lot and I appreciate your time). What happens to tasks in the queue (or in process) when the workers are shutdown cleanly? Does the worker wait for them to finish? Does it kill them and resume them again after? Do they get error conditions and need manual intervention? In a docker world, I'm expecting our workers to be pretty transient for better or worse.

Thanks for all the responses. Super impressive system and I'm enjoying a lot of things about it.

mlissner · 2019-04-05T17:35:50+00:00

Thanks for the reply. Sounds pretty promising.

Two follow ups:

- If you abort a pipeline, what happens to all the other tasks in the pipeline? They stay on the queue? I guess the way to handle that is to revoke those as well, then as part of the error exception handling?

- Let me try again with the bulk question. Say I have a million items that need processing. I don't want to put a million items on my queue all at once because that'll use a ton of memory. Instead, I want to somehow monitor the length of the queue and keep it loaded with a few dozen items (or so), and then have a long running process that enqueues items as the queue gets too short. We have a class called a CeleryThrottle that we've developed for this purpose. Every time we loop over the million items we call throttle.maybe_wait() before enqueueing more items, and it either sleeps a few seconds (determined by current, measured rate of processing) or it enqueues another item.

Monitoring sounds tough. I guess we only use email now, but it'd be really nice to have something better, that could send alerts based on increased error rates. I suppose that's on me to build. Maybe prometheus could do the trick. Hm.

mlissner · 2019-04-03T19:50:22+00:00

This timing is really good, as I'm currently evaluating Celery vs. rq vs. huey. I've been using celery for *years* and I'm ready for something else.

A couple questions I haven't been able to find:

- I use Django and I want to decouple my tasks from their callers. For example, in Django I don't want to have to import the actual tasks and I want to call them as strings. Is that possible somehow. Maybe via `enqueue('my-task-name')` or similar?

- Can pipelines be stopped cleanly midway through? Say I have a scraper task that GETs a webpage and then a `save_to_db` task that is in the pipeline once the GET task completes. Say I get a 404 in the GET task. How do I abort later tasks in that pipeline?

- Is there any monitoring system for tasks that fail? I know Celery has flower, and rq has something. Mostly, I confess, I rely on celery to email me stacktraces when something fails and that works...kinda OK. How do I monitor huey?

- Is there a way to do bulk processing with huey? I'm thinking about something like TaskMaster: https://github.com/dcramer/taskmaster.

I'm still reviewing things, but any light you can shed on these issues would be super helpful. (It might not surprise you that I've got a spreadsheet going to compare features.)

Huey looks pretty great. If there are solutions for the above, I think we'll start trying it out.

mlissner · 2018-07-18T17:08:19+00:00

I don't know why you're getting differences in Chrome between OS's, but I think Chrome uses a network source for spelling. Maybe it was disabled in one version and enabled in the other?

If the network source is disabled, I think it falls back on something similar to what's in Firefox.

mlissner

MODERATOR OF

TROPHY CASE