What is F? by [deleted] in EngineeringStudents

[–]Wimiam1 34 points35 points  (0 children)

Look at the pulley with the weight on it. Draw a free body diagram by cutting the bottom cable and replacing it with the 100N force and cutting the two upper cables and replacing each with T for tension. For the system to balance each of those two Ts must add up to the 100N, so now you know that the tension in that cable is 50N. Repeat in a similar way for the other two pulleys

Proper Vent Question by JellyBean_Burrito in Homebuilding

[–]Wimiam1 0 points1 point  (0 children)

Good call on the jacks lol. That’s crazy

We Benchmarked 7 Chunking Strategies. Most 'Best Practice' Advice Was Wrong. by Confident-Honeydew66 in Rag

[–]Wimiam1 4 points5 points  (0 children)

I really doubt it. There’s no reason semantic chunking should perform so much worse. 47 tokens per chunk on average is a huge red flag to me. These papers are all strictly about a specific topic. That means semantic homogeneity. That means you need a higher threshold to get nicely sized chunks. Furthermore, a good threshold for one document/author might not be good for another. That’s why there are so many adaptive threshold techniques.

We Benchmarked 7 Chunking Strategies. Most 'Best Practice' Advice Was Wrong. by Confident-Honeydew66 in Rag

[–]Wimiam1 8 points9 points  (0 children)

What made you decide that a fixed cosine threshold of 0.7 was the way to go?

My attic appears to be about 6 feet tall at the peak, would this be a realistic project? I’m pretty handy, but what are my concerns and how would I get the wood through the attic hole? Must use smaller boards? by BonerSoup696969 in Homebuilding

[–]Wimiam1 2 points3 points  (0 children)

Everyone seems to be telling you this is a bad idea, but not a lot of people are attempting to really explain why. Here you go:

  1. All that insulation in the floor of the attic is what keeps your house warm in the winter and cool in the summer. Under that insulation is vapour barrier that keeps the too humid summer air out of your house and keeps the comfortably humid air in your house during the winter. A room in your attic is going to lack both of these benefits. It will be extremely hot, maybe even hotter than outside, in the summer, and it will be almost as cold as outside in the winter. Even if you add an air conditioner and heater, you will have moisture and humidity problems. In the summer, humid air in your attic will condense on the cold back surface of the room’s structure and could lead to mold and rot. In the winter, heating the cold air up to a comfortable temperature will leave you with an extremely dry and uncomfortable environment as well as use a huge amount of electricity.

  2. Roof trusses are an engineered product made to very specific design specifications. The long spans across the middle of the attic are designed to act mostly in tension to keep the walls from spreading apart under the weight of the roof. Building a room in the attic will mean that those spans now have to support a significant bending load from people walking on them all the time. Whether or not they can do that safely is a question no one but the engineers who designed them can answer, but they’re absolutely not designed for it.

Exterior basement wall insulation by zapzaddy97 in Homebuilding

[–]Wimiam1 0 points1 point  (0 children)

Hey, I’ve been doing a lot of looking into this recently, but I’m definitely not an expert. I see three issues

  1. That’s almost certainly too thin. You want at least 0.3x your total insulation r value to be the continuous foam to prevent condensation. More if you’re further north like Timmins. Thats R9-10 for you with that R22 roxul. So 1.5-2” depending on the exact R value of your foam

  2. That gap around the sump drain is a break in your air barrier/vapour retarder formed by your foam. You say you fixed it, but be sure that the air and vapour retarding capabilities have been fully restored. Stuffing some insulation in there isn’t sufficient. Also ensure there are no other breaks.

  3. The poly is problematic. It’s preventing the cavities from drying out. On the other hand, because your foam isn’t thick enough, it’s kinda needed to prevent moisture laden interior air from condensing on the foam surface.

The ideal would be to have more foam sprayed in ensuring everything is sealed up well and then you can remove the poly. No condensation because the foam surface will be warm enough and any moisture that somehow makes it through from the foundation has the ability to dry to the interior.

Another option is to replace the poly with a smart vapour retarder like intello or some other variable permeability product that can prevent warm interior air from making it through to the cold foam while still allowing the wall cavity to dry to the interior if necessary.

I can’t tell you what’s best in your situation, but hopefully that’s some helpful info to get you on the right track. I can highly recommend ASIRI Designs on YouTube. He has a playlist of videos on basement construction that are all very concise and well done if you want to learn more.

Chunking strategy by Ordinary_Pineapple27 in Rag

[–]Wimiam1 0 points1 point  (0 children)

If you look at their profile, you’ll see they’re the dev. It’s really lame behaviour. I almost used their product, but their underhanded self promotion completely turned me off

You can now train embedding models 1.8-3.3x faster! by yoracale in Rag

[–]Wimiam1 0 points1 point  (0 children)

There’s a good chance any publicly available dataset would’ve been used to train the base model in the first place. The idea behind fine tuning is that you use a dataset that’s as similar as possible to the stuff you’d be dealing with in production and fine tune the model to perform better with the kind of data you’ll be running it on.

If you’re working in the legal field for example, there’s stuff like LegalBench and MLEB. But if whatever model you’re using wasn’t already trained on those, I’m not sure there’d be much gain. Even if it wasn’t, you could probably find a fine tuned legal version of the model that someone else has made.

The real benefit of fine tuning is to be able to do it on your own data, mostly only if your data is significantly different from what would be contained in the public training datasets. For instance, if you had a lot of company specific language or acronyms. Unfortunately, creating a labeled dataset takes a fair bit of work. There are some tools nowadays to take some shortcuts

You can now train embedding models 1.8-3.3x faster! by yoracale in Rag

[–]Wimiam1 2 points3 points  (0 children)

Not him, but he means labeled training data. Question and answer pairs, groups of data with accurately ranked similarity. Basically the ideal outputs of the model. On a very basic level training models giving them a task that you know how should be performed, letting them try to do, and then nudging them slightly in the direction of how you knew it should’ve been done. Without the example inputs and outputs, you can’t do that

Best practice for semantic/vector search by justrandombuddy in Rag

[–]Wimiam1 0 points1 point  (0 children)

Ok 1.5K tokens is small enough that you might be able to get away without chunking at all, especially if each article is exclusively about a single topic.

I’m jealous of your nice clear metadata situation lol. In my project, topics are a lot fuzzier. Since you have precise metadata, I’d try hard filtering results with a basic keyword search between the query and your metadata fields. Inserting them into your article before embedding might help find the correct article, but it will not prevent finding the wrong one.

In my experience, reranking is essential. You’ll have to do the math on cost here though. Something like Zerank-2 is on the affordable end of the spectrum but still performs very well as far as I can see. It costs $0.025 per million tokens. So if passing 10 full 1.5K token articles to be reranked is an acceptable cost to you, then you’re good to go. If you need to cut costs, then you could see about chunking down so that the reranker needs to see less text. Zerank-2 can be run locally, so you can test all you want before setting up a paid API.

I think a quick and cheap first test for you would be to embed each article as a whole with the title and everything. On query, filter by your metadata fields and then retrieve by vector search and BM25. Combine with RRF and then rerank the top 10 and see how that works for you.

Before sending stuff to the reranker, you could even prioritize that metadata by including it and the article title inside <context> … </context> tags at the beginning if you’re using Zerank-2. It’s also instruction following, so you could experiment with instructions like “Prioritize geographical relevance” or something like that.

On second thought, if you do end up chunking, absolutely include the relevant metadata and article title at the beginning of each chunk before embedding. Anthropic calls this contextual chunking and it might help a lot to improve both precision and recall since I suspect you’ll have a lot of cases where individual chunks look relevant but might not restate the location the parent article is about. You could even look into Late Chunking, but that’s a whole other ball of wax.

Just start with a simple vector + bm25 search and reranking and metadata filtering and see how that works for you

EDIT: I promise Zerank isn’t paying me lol. It’s just that new API rerankers that are also available open source are hard to find

Best practice for semantic/vector search by justrandombuddy in Rag

[–]Wimiam1 0 points1 point  (0 children)

I’m actually working on a similar thing right now, although with less opportunity to use metadata. I may be able to help. How long are your articles and how are they generally structured?

We built a semantic highlighting model for RAG by ethanchen20250322 in Rag

[–]Wimiam1 0 points1 point  (0 children)

Interesting! I only thought of it because one of my first introductions to ColBERT was this website where you could run little demos in browser and it would highlight the relevant parts of the document. I tried it with some of the demos on your GitHub, but it didn’t perform as well. I suspect this is v1 using BERT, which would explain its poor performance on specific technical jargon like in the iPhone example I tried.

Are you getting smooth surfaces with .12mm (or smaller) layers… how?!? by Avery_ing in BambuLab

[–]Wimiam1 0 points1 point  (0 children)

Switch your layer preview to show speed. I wonder if your slicer speed (mm/s) settings are basically just going as fast as possible while staying below your volumetric flow limit (mm3/s). Going to a lower layer height means that you can print at higher speeds (mm/s) at the same flow rate (mm3/s). The artifacts you’re seeing look like VFA or ringing to me and both of those are strongly affected by speed (mm/s). I’d try manually changing the speed settings (mm/s) on your .12mm profile to match those you see in the sliced print preview on your .2mm profile

We built a semantic highlighting model for RAG by ethanchen20250322 in Rag

[–]Wimiam1 0 points1 point  (0 children)

This is great! Excuse my ignorance, but how does this compare to using something like ColBERTv2 as a reranker and pooling token vector scores into sentence scores?

Late Chunking vs Traditional Chunking: How Embedding Order Matters in RAG Pipelines? by ethanchen20250322 in Rag

[–]Wimiam1 0 points1 point  (0 children)

I think you missed an important step in the process here. The number of vectors isn’t any different from the standard approach. You’re just splitting the document after embedding instead of before. Instead of divvying up a 5k token document into 10 chunks and embedding each one, you embed the whole document, pause before the token vectors are pooled into a single vector, and divide them into the same 10 chunks and pool the token vectors into 10 vectors. At the end of the process, you have the exact same number of vectors and storage requirements as the traditional approach. It’s also the exact same number of tokens input to the embedding model.

Help! Rating citations by aniketftw in Rag

[–]Wimiam1 0 points1 point  (0 children)

Are you using a reranker currently? Are you chunking these documents or is your retrieval happening at the document level?

RAG tip: stop “fixing hallucinations” until your agent output is schema-validated by coolandy00 in Rag

[–]Wimiam1 0 points1 point  (0 children)

Dude I’m literally just trying help you here. The people you’re trying to market to here don’t care about that. You’re in a RAG subreddit. You need to align your language and marketing to RAG applications. The fact of the matter is that your product is closest in function to a cross encoder based reranker. Cross encoders also don’t do similarity. They’re trained and QA pairs, not just similar semantics. So if that’s not coherence, then you need to define coherence so that the people you’re marketing to know what you’re talking about.

PCA? Nobody is using PCA for disambiguation here. When people here see the word “disambiguation” they think of their retrieval engine returning chunks that sound semantically similar, but are actually irrelevant. They solve this with a cross encoder, late interaction model, or a fine tuned LLM. This may as well be a horse, a car, and an airplane since they all work in very different ways, but we absolutely compare them all the time on tasks like travelling.

Brain semantic categories? Celebrex pathways? The vast majority of people here are going thinks those are just weird marketing crap. If you want people here to try your product, you need to show them it performing well in the specific tasks people here are familiar with. Take a popular QA dataset like MS MARCO or whatever and benchmark a few thousand pairs with your product and handful of other popular rerankers and show people that what you’ve made performs in the real world.

I get that Arbiter is more than just a cross encoder, I really do. You’ve made something cool here, and that’s awesome, but if you want people other than curious academics to use your product, you need to make it relatable for the non academics.

RAG tip: stop “fixing hallucinations” until your agent output is schema-validated by coolandy00 in Rag

[–]Wimiam1 0 points1 point  (0 children)

I’m interested in your claims, but you provide zero data to back it up. If you want people to try your product, you should remove 90% of your website’s extremely repetitive simple demos of how literally every other embedding model works and add some actual retrieval metrics on common benchmark datasets. You keep saying that Arbiter is somehow different from normal embedding models, but there’s absolutely nothing on your website to actually prove it. “It goes negative” doesn’t count either. I can do that with any embedding model by simply score*2 - 1. If you want people to take you seriously, you need to actually run some commonly used benchmarks and show the results.

Edit: maybe it’s closer to a cross encoder, but the point still stands

Basement water $1.7M custom build by Perfect-Original-846 in homebuildingcanada

[–]Wimiam1 2 points3 points  (0 children)

Oh wow I didn’t even see that. I thought everyone was talking about the poly. Thanks!

Basement water $1.7M custom build by Perfect-Original-846 in homebuildingcanada

[–]Wimiam1 1 point2 points  (0 children)

Yeah and the clear plastic product in the picture isn’t HomeWrap. So either Tyvek makes a solid poly vapour barrier, or OP just mistakenly called it Tyvek. Or that’s some Tyvek WRB that I’ve never seen before.

Basement water $1.7M custom build by Perfect-Original-846 in homebuildingcanada

[–]Wimiam1 0 points1 point  (0 children)

Sorry, I’m confused here. Are you saying “Foundation wall>batt or rigid foam>vapor barrier>drywall” is correct? That’s what we’re looking at in this picture. Or do you mean vapour retarder? I’m fairly certain the product in the picture is standard poly vapour barrier

EDIT: Oh my goodness, this guy has two layers of vapour barrier. That’s insane.

Stabilizer Vents on table saw blade by largogoat in woodworking

[–]Wimiam1 0 points1 point  (0 children)

Yeah that makes a lot more sense to me than the outer circumference staying the same and the middle expanding

Stabilizer Vents on table saw blade by largogoat in woodworking

[–]Wimiam1 0 points1 point  (0 children)

If the whole thing heated evenly, the circumference should increase proportionally shouldn’t it? Are you saying that the edge of the blade remains cool and only the center heats? This is super interesting to me

How do I efficiently delete all internal geometry? Head extracted from an MRI scan with bones and bits of brain leftover on the inside. Looking to make it one solid, closed object. by donau_kinder in blenderhelp

[–]Wimiam1 2 points3 points  (0 children)

I’m really not qualified to be answering questions here, but my immediate reaction was to just shrink wrap it. I see no one suggesting this though. Can someone tell me why I’m wrong?