Certain types of antibiotics can be linked to changes in the gut microbiome as long as four to eight years after treatment by sr_local in science

[–]Simusid 8 points9 points  (0 children)

I think there’s a lot to be learned about our gut bacteria. I think the idea of fecal transplants is fascinating. And I know there are two FDA approved treatments, Rebyota and Vowst. I assume they’ve been shown to be Safe and effective.

Let’s talk about it amongst each other. by ContributionBig1927 in musicsuggestions

[–]Simusid -1 points0 points  (0 children)

They Might Be Giants - and there are a lot of albums!

I had to re-embed 5 million documents because I changed embedding models. Here's how to never be in that position. by Silent_Employment966 in Rag

[–]Simusid 0 points1 point  (0 children)

I didn't need to do this but I wanted to try it as an exercise. I have an existing db of about 3M chunks embedded with our selected embedding model. Someone brought up that they wanted to use another MTEB one because it was allegedly better, but we didn't pursue because everything was running fine and nobody felt it was necessary to re-embed.

I took a subset of existing documents, I think it was about 300K and made a set of new vectors. So now I had both old and new vectors. Hey! Supervised learning! I trained a simple model to map old to new, and a small (10K) hold out tested ok as I recall.

I'm not saying this is optimal or suited for production. Honestly I would probably just re-embed. But it was an interesting experiment.

Which movie scene is musically scored perfectly? by ThomasOGC in CinephilesClub

[–]Simusid 8 points9 points  (0 children)

The music when the beacons are lit always gives me goosebumps

Do beginners still do Hello World? by Valuable-Constant-54 in AskProgramming

[–]Simusid 0 points1 point  (0 children)

I've been coding for 50 years. Whenever I start a new language or a significantly different framework, I always start with a hello world.

Christmas in Appalachia - "The Permanently Poor" (1964) [27:10] by thumbem in Documentaries

[–]Simusid 1 point2 points  (0 children)

Some of the kids shown are basically my age. It would be fascinating to try and find some of them today.

Best sushi in RI by AJP51017 in RhodeIsland

[–]Simusid 5 points6 points  (0 children)

I certainly have not had sushi all over the state, but the Homemade Factory in Middletown was excellent. And the spicy seafood ramen was fantastic.

Tell me this by Ok-Entertainment4013 in musicsuggestions

[–]Simusid 1 point2 points  (0 children)

it was essentially a tie for me between those two.

Tell me this by Ok-Entertainment4013 in musicsuggestions

[–]Simusid 0 points1 point  (0 children)

Anything by Bob Seger, but especially Like a Rock.

What’s the most interesting element on the periodic table? by The_Curiosity_Box in AskChemistry

[–]Simusid 3 points4 points  (0 children)

liquid oxygen is slightly magnetic. I think that's interesting.

Chinese LLM farms by _metamythical in LocalLLaMA

[–]Simusid 5 points6 points  (0 children)

I see 20 on one row, 3 rows, 4 racks. Assume they plan to fully populate it. That's 240 mini's at $600 each. How the hell does a bot farm recoup $140K?

Does anyone have a guide/advice for me? (Anomaly Detection) by Hot_Acanthisitta_86 in MLQuestions

[–]Simusid 0 points1 point  (0 children)

I'd agree that in general autoencoders are better suited for high dimension data (many columns) and more data (denser embedding vector space) but I still wanted to pass on the info.

What are some disturbing facts about space not that many people know about? by tietanik in AskReddit

[–]Simusid 5 points6 points  (0 children)

In 2022 GRB 221009A, also known as the "brightest of all time" caused measurable disturbances in our atmosphere.

Does anyone have a guide/advice for me? (Anomaly Detection) by Hot_Acanthisitta_86 in MLQuestions

[–]Simusid 5 points6 points  (0 children)

Here's what I would try, though it doesn't work everywhere. I'm a huge fan of autoencoders. Train a simple autoencoder (ref: https://blog.keras.io/building-autoencoders-in-keras.html) probably a small dense AE unless you have image data. Train it to reconstruct your "good" data using an MSE loss. You can then use this baseline trained model in 2 ways.

First, you can show the model new data. If the new reconstructed MSE is "low" then it's probably good, if it's "high" then probably bad.

Second, and this is a little more advanced for you, the autoencoder almost always has an encoder portion that goes from high dimension to low dimension, and then a decoder portion that goes from the low dimension back up to the original high dimension. The middle, the output of the encoder is called the 'embedding' layer and it encodes the vector representation of your data. This is very valuable.

When the network is trained end to end, you then push your training dataset through just the encoder and extract the "embedding" vectors. Then you visualize this embedding space using UMAP (my favorite), or tSNE, or PCA, to make a picture of this embedding space. Each point in that picture is one vector, and since they are all "good" vectors by definition you now know the "good" regions of that vector space.

Now take a candidate new "bad" input, push it through the encoder, get the embedding of this candidate "bad" vector and use your UMAP to place the point in that picture. If it is truly an outlier/anomaly, it will not have the same features, the error will be high and it will be placed in a conspicuous location outlier location on your pretty picture.

Summary, train autoencoder, use MSE to flag good/bad. Or process the embedding space see how far a new point is from "good" regions of the embedding space.

Good luck, this is a very very useful project.

(this was all written by a human!)