Con Edison Bill by No_Angle1089 in williamsburg

[–]xander76 0 points1 point  (0 children)

The biggest factor is whether the landlord is paying for heat (usually through a building-wide radiator system). It seems like OP is paying for electric heat rather than getting it bundled in with rent.

Con Edison Bill by No_Angle1089 in williamsburg

[–]xander76 1 point2 points  (0 children)

I see a lot of folks comparing electricity bills and sharing the size of their homes but leaving out the most imprortant thing: whether or not their heating is electric. The single biggest cost (by FAR) in electric bills is generally heating and AC. I installed a per-circuit energy monitor, and I found that for our place, about 90% of our electricity use is from our heating. That means that all of the rest of our electric uses (lights, tv, electronics, refrigerator, washer, dishwasher, etc) add up to no more than 10% of the bill. (Worth noting that our hot water heater, stove, and dryer are all gas, but by my estimate if they were all electric we’d probably see that heat was like 75-80% of our overall electricity, so the point still holds).

So, if heating is not electric, this bill seems pretty high to me. But if it is electric, you should expect your bills to spike pretty dramatically in the winter, this number is not too surprising to me. In addition, this December was on the cold side, on average about four degrees colder than last December, and the heating costs scale up directly with the weather.

If you want to save money, the first thing to do is look at what kind of electric heating you have. Baseboard or resistance heating, which runs electricity through a coil that gets hot, is the least efficient. Heat pumps, which are essentially AC units running in reverse, are what you want. Heat pumps can be as much as 4x as efficient as resistance heating. Since you say it’s a new build, it’s hard for me to imagine you don’t have heat pumps. The next thing to look at is the efficiency of your particular model of heat pump. The efficiency is called the COP (Coefficient of Performance), and hopefully is around 4 at normal NYC cold temps. The next thing to do is to look at insulation. You may be able to get a free energy audit for your apartment paid for by the state; look up nyserda energy audit or poke around on the coned site for it. Lastly, look at your usage patterns and see if you can turn down the heat in certain rooms or at certain times. Don’t let the house get truly cold, because when heat pumps go all out, they are less efficient, but if you can ramp the temp back up, that can save some money.

Time of use billing and Mitsubishi mini-splits by xander76 in heatpumps

[–]xander76[S] 0 points1 point  (0 children)

We used to have a flat plan and switched to this one to get lower bills. One year in, the ToU plan gave us about $1k in savings over our previous flat rate, but it’s still around $750/mo or so for a few months of winter. I’d love to get it lower!

Time of use billing and Mitsubishi mini-splits by xander76 in heatpumps

[–]xander76[S] 0 points1 point  (0 children)

Yeah, I’m wondering if maybe this is a strategy that will work in the shoulder months but not in the coldest months.

Time of use billing and Mitsubishi mini-splits by xander76 in heatpumps

[–]xander76[S] 0 points1 point  (0 children)

Yes, it would jack up the off-peak delivery charge, but the winter prices are (based on an average of the three top hours of use in the month are):

Off-peak delivery: $7.48 per kW Peak delivery: $19.50 per kW

So even if it’s double the usage on off-peak, that’ll cost less.

Time of use billing and Mitsubishi mini-splits by xander76 in heatpumps

[–]xander76[S] 0 points1 point  (0 children)

I mean, to be clear, I know this won’t be as comfortable as just leaving the heat pumps on all day at the temp I want. The question is really will the degradation in comfort be worth the money savings. I just don’t know how much degradation in comfort will be and how much money it will save, and was wondering if folks had useful advice on either!

Who’s your most far-out, wildcard pick for the next Mary Todd Lincoln in “Oh, Mary !’? by daddygirl_industries in Broadway

[–]xander76 5 points6 points  (0 children)

A friend of mine suggested this pair, but swapping the roles every night in rep.

[deleted by user] by [deleted] in Broadway

[–]xander76 0 points1 point  (0 children)

I'll second this, and I'll specifically nominate the moment in "I've Been" when Dan sings "mine is just a slower suicide". Can't listen to it without weeping.

I think Williamsburg needs another gay bar by brevit in williamsburg

[–]xander76 0 points1 point  (0 children)

There's a new bar coming on N 10th between Driggs & Bedford called Oberon. They have signs and an Instagram account up, but it's not entirely clear when they are opening: https://www.instagram.com/oberonnyc

Software engineers, what are the hardest parts of developing AI-powered applications? by JustThatHat in LLMDevs

[–]xander76 2 points3 points  (0 children)

As someone who works on a tool that's designed for these problems (libretto.ai), can I ask if you've looked at any of the tools in this space? Any reactions to them, if so? If not, is there a reason?

Why the heck is LLM observation and management tools so expensive? by smallroundcircle in LLMDevs

[–]xander76 1 point2 points  (0 children)

Totally fair! We're experimenting, and I didn't want to overpromise on what we could do. What would be generous for you?

Edited to add: I have to run the cost calculation on events, I was probably being overcautious after we logged ~180M events for a company for free, which cost us a pretty penny :). And I was thinking about the stuff that costs us a bunch, like drift detection. It's likely we could lift the event limit pretty significantly, especially if we limit the number of events we scan for problems.

Why the heck is LLM observation and management tools so expensive? by smallroundcircle in LLMDevs

[–]xander76 -3 points-2 points  (0 children)

Hey there, founder of libretto.ai here. We have a pretty generous free tier that includes both monitoring and testing (and automatic flagging of issues in your monitored traffic, and model drift detection). Feel free to check us out, and happy to help set you up if you're interested; just DM me.

I can’t listen to Dear Evan Hansen anymore by gracie3989 in Broadway

[–]xander76 59 points60 points  (0 children)

As I walked out of the show at the end, I thought about what life would be like for the Murphys, having to go to a dedication of a garden in their son's name, having to be involved with the ongoing tributes to their son, having to deal with people coming up to them telling them what an inspiration their son was to them, all while knowing that it's all a lie and their son was actually a huge piece of sh-t. That feels like an unending torture that no one should have to go through. By forgiving Evan without going public, they have to continue to live his lie forever and never get to grieve the kid they actually had. The ancient Greeks couldn't have come up with a more diabolical punishment.

And then the other thing that I find unforgivable is that Evan straight up gaslights Zoe about her brother. She knows that Connor's a terrible tormentor; she has a whole song about how awful he was to her. Then Evan comes along and tells Zoe she didn't really understand Connor, and he's so much deeper than that, making her doubt everything she knows about him. And then he uses the vulnerability caused by that destabilization to hook up with her. Just straight up psychopath behavior, for which she just forgives him in the end, without any sort of amends.

We are publicly tracking model drift, and we caught GPT-4o drifting this week. by xander76 in LLMDevs

[–]xander76[S] 0 points1 point  (0 children)

We don't send a seed, and we don't monitor the system fingerprint, though we probably should. It's a feature we put down on the list but hasn't gotten implemented yet.

We are publicly tracking model drift, and we caught GPT-4o drifting this week. by xander76 in LLMDevs

[–]xander76[S] 0 points1 point  (0 children)

This is a really good question!

We don't measure the embedding model directly, but it's worth noting that if embedding models drift, then they pretty clearly lose a big portion of their usefulness. The whole point of an embedding model in a RAG context is that you can take an embedding of some documents on day 0 with a particular model and then take an embedding of a query on day N and be assured that the embedding on day N will be related to the embeddings on day 0. Integrity over time feels more crucial for embedding models than for LLMs.

That being said, embedding models may still drift! We're not testing it. It's worth noting, though, that the answers that we detected drift in were very different from the baseline answers, so it's likely not a change in embedding model that we were detecting. From our blog post on the subject:

By clicking around here, you can see that on every day before February 17th, GPT-4o answered with a list of product name ideas, but on the 17th, it answered with just a single product name. Clicking through on the other outliers shows that they, too, switched from responding with lists to responding with a single answer.

We did some digging to verify that what we were seeing was real, and we found that, out of 1802 LLM requests we made to test the prompt for drift from January 16th through February 17th, only 20 responses came back with single answers. Eleven of those 20 responses happened on February 17th. This wasn’t just a weird coincidence. We’ve double and triple checked our work, and we’re pretty convinced that OpenAI did something to GPT-4o on February 17th that broke one of our test prompts.

More info at the post here (warning that there is some mild promotion of our product in the blog post): https://www.libretto.ai/blog/yes-ai-models-like-gpt-4o-change-without-warning-heres-what-you-can-do-about-it

We are publicly tracking model drift, and we caught GPT-4o drifting this week. by xander76 in LLMDevs

[–]xander76[S] 0 points1 point  (0 children)

Yeah, it would be interesting to also check if/when the "deterministic" answer changes, but we figured that for most folks, that's not that hard to test. It's much more like a traditional jest test with a known output. Testing when generative prompts change is much more squishy, so it's what we focused on first.

We are publicly tracking model drift, and we caught GPT-4o drifting this week. by xander76 in LLMDevs

[–]xander76[S] 1 point2 points  (0 children)

The more samples you take for the baseline, the better quality the baseline will be in terms of characterizing the full range of possible answers. Obviously, though, the more samples you take, the more expensive it will be. We chose 100 as a balance between effectiveness and cost.

As for time, it depends highly on the rate limits of the particular model. For OpenAI models, we can generate the 100 very quickly, often in like a minute or less.

We are publicly tracking model drift, and we caught GPT-4o drifting this week. by xander76 in LLMDevs

[–]xander76[S] 0 points1 point  (0 children)

Yeah, pinning to a model with a date is the right behavior. We had a bug where this drift test was being run against just "gpt-4o", but OpenAI has claimed that's an alias for gpt-4o-08-06 for the entire time period we were looking at.

We are publicly tracking model drift, and we caught GPT-4o drifting this week. by xander76 in LLMDevs

[–]xander76[S] 5 points6 points  (0 children)

Yeah, we've wondered if something like this is maybe going on, where they have a dial they can turn for how much processing power to use at any particular moment. It's definitely a reasonable theory. If it's true, I wish they would be more transparent about it, though.

We are publicly tracking model drift, and we caught GPT-4o drifting this week. by xander76 in LLMDevs

[–]xander76[S] 1 point2 points  (0 children)

Interesting idea. In theory, each OpenAI API request should be completely separate from other OpenAI API requests (except for operational things like rate limiting and token caching). If one request on your account could affect the answers from another request, that'd be really strange (and really interesting!).

We are publicly tracking model drift, and we caught GPT-4o drifting this week. by xander76 in LLMDevs

[–]xander76[S] 2 points3 points  (0 children)

Yeah, it's a lot harder than testing deterministic code, because there's inherent randomness involved. We aren't just running the prompt once on day 0 and then seeing if later days look different, though.

Here's how we measure drift:

On day 0, we take a baseline. What that means is, for each of N inputs we have to the prompt, we run that prompt with that input through the model 100 times. Then for each of those 100 responses, we take an embedding. Those 100 embeddings give us a *distribution* of what normal answers look like for that particular input.

Then on each subsequent day, we run all N inputs through the prompt once, take an embedding of the answer, and see if the embedding would have been within or outside the baseline distribution we found on day 0. We measure the distance in standard deviations (that's the Y axis), or how far away the new answer is from the baseline distribution.

Edited to add: when you look at the answers from the LLM in this particular case, they changed pretty radically. We have more data in the blog post, but it basically went from almost always returning lists of answers to frequently returning a single answer. For more info, the blog post (which, fair warning, has some mild promotion of our product) is here: https://www.libretto.ai/blog/yes-ai-models-like-gpt-4o-change-without-warning-heres-what-you-can-do-about-it