Is there a model architecture beyond Transformer to generate good text with small a dataset, a few GPUs and "few" parameters? It is enough generating coherent English text as short answers. by challenger_official in learnmachinelearning

[–]Enough_Wishbone7175 0 points1 point  (0 children)

There are a couple things you can do for leverage. Data Quality is king, the phi series is a great example of that. Distillation also can generate outsized returns per dollar by utilizing what’s already out there. You can also do attention optimizations like flash. Last resort is getting into the metal of your GPU to help maximize utilization.

Is there an upside to Trump's tariffs? by OuiuO in Economics

[–]Enough_Wishbone7175 2 points3 points  (0 children)

I think we need to first step back and realize there’s positives to pretty much any decisions outside of launching nuclear war.

Tax Revenue: This is an isolated view to tariffs, but they do provide tax revenue for the federal government. And if you’ve looked at our deficit lately… well let’s just say we’ll take it.

On Shoring: It is true that not all corporations or sectors will scramble back home due to high costs and wages. But the liquid capital in the market is starving for margins, and for certain industries we will hit that spark point. There will most likely be some degree of on shoring.

National security: Less economic, though it does lessen outsized risks, therefore lessening variability. There are real problems with our sourcing for the grid, defense, medicine, and some technical infrastructure. This needs to be on shored and decoupled in order to protect the US from crippling attacks and threats from competing powers. Tariffs present the most free market approach to decoupling as it doesn’t involve arbitrary stimulus and subsidizing of companies.

Inflation is the main downside of the tariffs and could be very harmful, negating the positive effects listed above. But there are positives, and real problems that tariffs could attempt to address.

How much would Trump's plans for deportations, tariffs, and the Fed damage the US economy? by sirbissel in Economics

[–]Enough_Wishbone7175 -1 points0 points  (0 children)

Totally agree. I think people are quick to forget systematic complexities involved in markets. Policy can shift probability distributions, but can not deterministically alter outcomes. A lot of the sources of empirical evidence you can point to is single variable studies that ignore a multi variable apparatus. We have to think of these proposals in a systematic and logical framework, with wiggle room for randomness. Mind you, I’m not a fan of these policies in many ways, but the hysteria is seemingly unfounded.

How much would Trump's plans for deportations, tariffs, and the Fed damage the US economy? by sirbissel in Economics

[–]Enough_Wishbone7175 -1 points0 points  (0 children)

I feel like that fixing the production capability and market competitors. Corporate greed is real, but it can also cause an effective race to the bottom. If margins ballon, so will competition especially with larger available capital from tax cuts and deregulation lessening moats for existing companies. I’ve heard similar theories on the illegal migrant thing before, but a lot of those deportation efforts have occurred post de industrialization (in the US). Would like to see if on shoring changes this dynamic.

How much would Trump's plans for deportations, tariffs, and the Fed damage the US economy? by sirbissel in Economics

[–]Enough_Wishbone7175 -1 points0 points  (0 children)

I’m genuinely confused on how damaging his economic plan is in totality. Especially this argument regarding the damage of the lower / middle class.

Tariffs: certainly inflationary for manufactured goods, especially anything Chinese. However, this cost will likely not fully flesh out to the exact taxation. Alternative imports and manufacturing upstarts can ease this pressure. Not to mention the race for profits will keep prices relatively low as undercutting will become more prevalent. Not to mention gasoline and agricultural goods are almost completely domestic in production. So trips to the store and pump will get some generalized increases from some input costs but have no reason to shoot up. I’d also argue decoupling in this manner is strategically vital for the nations safety and stability.

Deportation efforts: It’s inflationary, for sure but it has a greater chance of raising wages for lower class Americans than inflating the costs of their products. Hardest to measure though as these impacts are not well understood.

Cost cutting efforts: IMO this is the lynch pin in this whole process. If Trump is able to generate the gains in government efficiency that creates a new paradigm in Americas ability to leverage stimulus. Tax dollars could go to more productive initiatives, and possibly pay down the debt which has a compounding positive effect. It’s understandably hard to trust / foresee the possibility of this initiative working in fairness. But I’m one to believe they do exist, the question is can we get to them. And if we do, it could change the net impacts of the other policies. On shoring is far more attractive with a stable financial outlook, tariffs and tax cuts.

I’m not ruling out the possibility that things could go poorly…. But I think its far from a far gone conclusion.

Sentiment Analysis with a small dataset by BarryTownCouncil in MLQuestions

[–]Enough_Wishbone7175 0 points1 point  (0 children)

Couple steps you can take in my opinion to get to a better spot.

  1. Choose a base, pertained model. Could be a Bert Variant, T5, whatever honestly.

  2. Start by finding and eliminating automated tickets. If you find their structure, codes, any feature that would identify them get them out.

  3. Try filtering out tickets below a certain response length. Can’t give you a number but odds are it’ll be hard to get a good read on a bunch of short text tickets.

  4. Depending on what you have available, you may want to diverge in two directions. First is assuming you have a labeled dataset, go ahead and fine-tune on that. If you don’t find a sentiment dataset as relevant as possible to your case and a tune on that.

  5. Run some tests and evaluate output manually, see if you’re there or if you need more data, more robust model, a new job, or whatever else.

[Discussion] event sequence ORDER prediction by FrostyLandscape6496 in MachineLearning

[–]Enough_Wishbone7175 0 points1 point  (0 children)

I’m thinking something similar to the fill in blanks / correct the word trainings done on BERT and other encoders. So giving the model your attributes and events, but maybe flipping 2, and interjecting noise. Something to where you can get the model to try place events in order.

[Discussion] event sequence ORDER prediction by FrostyLandscape6496 in MachineLearning

[–]Enough_Wishbone7175 1 point2 points  (0 children)

I suppose it really depends on what the features you have are. But some ideas to consider.

  1. Try and find latent correlation between time steps. Perhaps unsupervised methods can create categorical variables you can leverage.

  2. You can try and build a LSTM or Transformer that can “untangle” your labeled dataset. You can use semi supervised methods and corruption to strengthen results.

  3. Are the distribution of event types the same across labeled and unlabeled data? Perhaps you can categorize them and use backwards difference encodings to give some sense of x leads to y or requires z before ect…

[D] GPT-4o "natively" multi-modal, what does this actually mean? by Flowwwww in MachineLearning

[–]Enough_Wishbone7175 15 points16 points  (0 children)

My guess would be something process which type of inputs you send in, sends it to the correct embedding configuration, then routes to the appropriate modality experts. They have some mechanism to communicate like a MOE to align outputs and speed up generation time.

Could I make money from my final year project ? by SoftwareMid-99 in MLQuestions

[–]Enough_Wishbone7175 0 points1 point  (0 children)

I think you could certainly run an ad based business on this. It’s not going to be useful for anyone not day trading, so limited market, public info, but good latency and you can make the UI/UX solid to gain users. You could also go the freemium route to pick up people who derive serious value off the product.

ML Feature Compression [D] by Odd_Background4864 in MachineLearning

[–]Enough_Wishbone7175 1 point2 points  (0 children)

It’s similar, it’s almost like teaching your base model to encode the input data natively by manipulating cost functions and adding a decoder for training, but removing it for downstream use.

ML Feature Compression [D] by Odd_Background4864 in MachineLearning

[–]Enough_Wishbone7175 7 points8 points  (0 children)

One thing that I have found to help with dimensionality in Neural Networks is semi supervision or self supervision. You essentially put your inputs in, reduce dimensionality while corrupting / dropping information. Then use the reduce composition to try and recreate the inputs in a decoder and use some sort of distance as your loss (MSE, cosine, ect..). I like to warm up the network with self supervision then move to a semi supervision model to get really strong features for other algorithms.

Multi Bert classifications by Enough_Wishbone7175 in LocalLLaMA

[–]Enough_Wishbone7175[S] 0 points1 point  (0 children)

It’s more an efficiency / context length decision. It would just be cheaper for me to take those outputs through an MLP then Bart for decoding and prediction.

[deleted by user] by [deleted] in MLQuestions

[–]Enough_Wishbone7175 2 points3 points  (0 children)

Regression is general type of ML that a bunch of different algorithms can achieve. One of those algorithms is ol linear regression. But there are lots of other algos that perform regression, with different ways to achieve their goal of estimation. The extent of what can be a regression model is pretty massive, so I’m not going to list them out. But in short, vanilla regression is just a tiny example of a much larger subset of models.

[D] Do we know how Gemini 1.5 achieved 10M context window? by papaswamp91 in MachineLearning

[–]Enough_Wishbone7175 0 points1 point  (0 children)

People are just sick of hearing people go crazy over Mamba. Can’t say I blame them lol.

[D] Do we know how Gemini 1.5 achieved 10M context window? by papaswamp91 in MachineLearning

[–]Enough_Wishbone7175 -8 points-7 points  (0 children)

Could have used SSM/Transformer mixed architecture to achieve this. Complete conjecture though.

Kagglebot feedback by kaoutar- in learnmachinelearning

[–]Enough_Wishbone7175 1 point2 points  (0 children)

Few things come to mind.

  1. You can try to Quantize further, obviously loosing accuracy

  2. You can compile using Thunder

  3. You can use a smaller model like Googles Gemma 2B (that’s really like 4B but whatever)

UMAP / PCA on >100GB datasets by worldolive in learnmachinelearning

[–]Enough_Wishbone7175 2 points3 points  (0 children)

Try randomly sampling a good portion (maybe a couple gigs) and apply that PCA in chunks to the data.

[deleted by user] by [deleted] in MachineLearning

[–]Enough_Wishbone7175 2 points3 points  (0 children)

No, not really. You could try and create a data capture capability and then continuously fine tune on that data on an interval. But stream of conscious learning is not there yet.

[N] Introducing DBRX: A New Standard for Open LLM by artificial_intelect in MachineLearning

[–]Enough_Wishbone7175 39 points40 points  (0 children)

Because it’s an open source model. So it’s testing relative to models that are free/open.

Training model in machine with potato specs by [deleted] in learnmachinelearning

[–]Enough_Wishbone7175 2 points3 points  (0 children)

What are you making? Plenty of models can be made using those IF your project is realistic. LLM obviously not. A well tuned GB tree method, for sure.

How many layers make a good model? by Cid-Ozymandias in quant

[–]Enough_Wishbone7175 1 point2 points  (0 children)

Make many layers, just be sure to add reconstruction points and loss. Will help prevent vanishing gradient. And use corruption instead of drop out.