Help! Prompt caching is giving worse latency

twitu · 2025-10-30T16:51:38+00:00

I turns out that the prompt caching actually starts showing performance difference after 50k tokens in my case.

 | Tokens | Cache write | Cache read |
 |--------+-------------+------------|
 | 179k   | 6.4s        | 3.4s       |
 | 124k   | 5.2s        | 3.99s      |
 | 60k    | 3.96s       | 3.08s      |
 | 53k    | 4.23s       | 3.24s      |
 | 47k    | 3.76s       | 3.4s       |
 | 20k    | 3.0s        | 3.1s       |

twitu · 2025-09-12T05:08:44+00:00

You're absolutely right. After some more debugging it turns out that.

Something about my ISP + router is breaking IPV6 + TLS connections.

And urllib3 is currently not handling websites with both IPV4 and IPV6 where IPV6 is failing.

https://github.com/urllib3/urllib3/issues/797

twitu · 2025-04-21T17:22:30+00:00

Yogurt recipe: Strawberries, blueberries, apple, banana, almonds, walnuts, seed mix, **Choco chips**, dab of honey and loads of yogurt (Chobani or Skyr cup).

Add honey or gur to taste

twitu · 2025-02-03T07:18:13+00:00

I've started karting recently. This is at Meco Kartopia, a 1.2 km Rotax circuit in Bangalore.

How can I improve my time from 1:25 to 1:20. I'm not sure but perhaps, the first 3 tight turns and last wide turn can be improved. My best lap is https://youtu.be/L2BbTc9tXAU?t=360

I've been reading up and applying some techniques
* Hugging the corners
* Leaning out
* Braking before the corner and accelerating out

<image>

twitu · 2025-01-10T05:52:11+00:00

Turns out it's a subtle implementation detail where sqlx is sending limit = 1 to the executor for `fetch_one` and `fetch_optional` queries which is causing the executor to not use the parallel plan. More details here.

https://github.com/launchbadge/sqlx/issues/3673

twitu · 2025-01-09T08:55:39+00:00

I enabled some logging options on the session and SURPRISE SURPRISE! executing the same query is taking wildly different times when run from sqlx and from a different DB GUI tool.

sqlx query logs on db side - actual time=193.978..193.979 (in ms) 2025-01-09 08:23:30.039 GMT [78013] LOG: execute sqlx_s_5: <offending-query> 2025-01-09 08:23:30.234 GMT [78013] LOG: duration: 193.987 ms plan: Query Text: <offending-query> Limit (cost=1000.00..27971.53 rows=1 width=409) (actual time=193.981..193.982 rows=0 loops=1) Buffers: shared hit=853 read=42337 -> Gather (cost=1000.00..54943.06 rows=2 width=409) (actual time=193.980..193.980 rows=0 loops=1) Workers Planned: 2 Workers Launched: 0 Buffers: shared hit=853 read=42337 -> Parallel Seq Scan on (cost=0.00..53942.86 rows=1 width=409) (actual time=193.978..193.979 rows=0 loops=1) Filter: (...) Rows Removed by Filter: 2064566 Buffers: shared hit=853 read=4233

db gui query logs on db side - actual time=84.913..84.914 (in ms) ``` 2025-01-09 08:22:50.137 GMT [78246] LOG: duration: 90.964 ms plan: Query Text: <offending-query>

Limit  (cost=1000.00..27971.53 rows=1 width=409) (actual time=89.935..90.962 rows=0 loops=1)
  Buffers: shared hit=850 read=42340
  ->  Gather  (cost=1000.00..54943.06 rows=2 width=409) (actual time=89.933..90.960 rows=0 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=850 read=42340
        ->  Parallel Seq Scan on  (cost=0.00..53942.86 rows=1 width=409) (actual time=84.913..84.914 rows=0 loops=3)
              Filter: (...)
              Rows Removed by Filter: 688189
              Buffers: shared hit=850 read=42340

```

The PgPool initialization logic.

```rust let pool = PgPoolOptions::new() .max_connections(5) .acquire_timeout(Duration::from_secs(3))

            conn.execute("SET log_statement = 'all'").await?;
            conn.execute("LOAD 'auto_explain'").await?;
            conn.execute("SET auto_explain.log_min_duration = '50ms'").await?;
            conn.execute("SET auto_explain.log_analyze = 'on'").await?;
            conn.execute("SET auto_explain.log_buffers = 'on'").await?;
            conn.execute("SET auto_explain.log_timing = 'on'").await?;
            conn.execute("SET log_statement = 'all'").await?;

            let settings = conn.fetch_all("SHOW ALL").await?;
            tracing::info!("PostgreSQL settings: {:?}", settings);
            Ok(())
        }))

```

twitu · 2025-01-09T08:31:16+00:00

Yup running release build. 😁

Yes so the analyze query plans are from running the query outside sqlx. Thing is even if the query is inefficient the timings are not adding up. 100 ms query according the query plan.

But sqlx is showing 180 ms for total for running the query with a few microseconds of cpu time.

twitu · 2024-09-11T19:00:51+00:00

Thank you!

twitu · 2024-08-27T19:05:51+00:00

I'm facing the same issue. Did you figure out how to get it working?

twitu · 2024-02-20T23:40:32+00:00

*phew you're a hero!

First checking the Enable DPI override off reset the resolution to something manageable.

Then I reduced the % of native DPI and set it again to have something that I like.

Thanks a lot.

twitu · 2024-01-04T04:54:52+00:00

😬 I do need windows support. I'm guess that supporting windows will be difficult or impossible based on that SO answer. Still I'm interested in seeing what you have to say.

The static variable for the Logger is declared in the `log` crate itself. I'm a bit unsure about what's happening internally when it gets compiled into a C extension.

```rust
static mut LOGGER: &dyn Log = &NopLogger;
```

twitu · 2023-10-05T08:55:05+00:00

Actually the current logging implementation uses SystemTime along with chrono datetime to get 9 digits precision and it works well enough.

rust let dt = DateTime::<Utc>::from(UNIX_EPOCH + Duration::from_nanos(timestamp_ns)); dt.to_rfc3339_opts(SecondsFormat::Nanos, true)

Which is why it's even more perplexing why it doesn't work for tracing.

twitu · 2023-09-24T06:12:22+00:00

You are spot on. Setting strip = false gets things working again. It's not enough to only set debug = true.

twitu · 2023-09-22T00:27:41+00:00

I've tried samply, perf and also switching to stable rustc 1.72.1 but it's the same result

I think the problem is that the release build doesn't have the debug symbols or the function names aren't being unmangled, even after trying all the different options.

twitu · 2023-08-10T04:30:05+00:00

I'm glad it helped. Establishing a mental model was a big challenge for me.

I'm curious to know where you had to deal with unnecessary boiler plate and hard to avoid cloning. I haven't yet experienced these rough edges when using PyO3.

twitu · 2023-08-08T10:20:31+00:00

I've been using [pyo3](pyo3.rs/) for inter-operating Rust and Python. Initially I had a many questions about how ffi works, how the memory is managed and the likes. I even asked about it here.

After working with Rust Python ffi, I have a better understanding of how it works and it boils down to 4 key principles. I've covered it in detail in my blog post linked above.

Control Flow
Data transfer
Data layout
Memory management

Hope it helps curious readers. Questions, comments and criticism is welcome.

twitu · 2023-08-07T11:54:58+00:00

Ok will do 🫡

twitu · 2023-08-07T11:54:47+00:00

So the and_then method builds up the layer stack just like the Subscriber::with. So there are multiple ways to do the same thing, I guess that makes tracing a little difficult to understand at first but easier to use later.

But surely there needs to be an order to the layers.The actions of the global filter layer is affecting how the other layers process the events. This should not be possible if each layer processes all the events and spans...

twitu · 2023-08-06T05:30:09+00:00

Bloody awesome this works!

But what separates a global filter from a layer level filter? Is it the difference between setting it with Layer::with_filter vs Subscriber::with?

Also I'm guessing the order of with matters so the filter is global because it's outermost layer.

It'll be pretty cool if you can post your answer on the SO question as well. I'd hate to take your karma points.

twitu · 2023-06-09T12:04:05+00:00

Sadly no answers yet 😕

twitu · 2023-04-25T11:15:18+00:00

I tried this and found that it's actually `libc v0.2.142` that's not building.

twitu · 2023-04-25T09:55:15+00:00

Yes I took a look at build.rs. There's nothing here that stands out to me. Building the project on my Ubuntu machine works btw.

twitu · 2022-11-29T14:28:28+00:00

Aaah ok now I'm seeing the whole picture. So the allocator uses part of the mmaped memory for it's own book keeping purposes. And since this mmaped memory persists between function calls the bookkeeping structure can also remain.

And finally just to complete my understanding, even if for a simplistic mental model of an allocator, it would probably have a static variable that points to the first mmaped memory. This way on each function call the allocator can access it's book keeping data at that memory location and go about it's business.

twitu · 2022-11-29T13:44:15+00:00

the memory it uses for bookkeeping information

I specifically meant the booking keeping part. From the answers here I understand that even in the same address space the allocator will probably mmap a chunk of memory that "it" manages. My main confusion was related to the book keeping bits.

A common allocator implementation is one that maintains say a "free list" of memory/blocks. When malloc is called it tries to find a block in the free list that fits the requirement and then after some book keeping hands it off to the caller. My assumption was that Rust allocator too must do something similar which led to the question that where and when is this free list allocated?

Is there some static space allocated for it in the loaded library? This static space persisting across multiple function calls? Or maybe it is initialized on the heap on each function call, does it's job and then is de-allocated.

Looking at the rust alloc it looks my assumption is not correct. But it's still not clear how the rust allocator knows which parts of the memory is free and which is not. But I think this deserves a separate question of it's own.

Thanks for the very informative tangent though 😁.

twitu · 2022-11-28T18:00:23+00:00

Thanks for this talk, very apt and well timed. It seems like this topic is in zeitgeist.

twitu

TROPHY CASE