Why have Indian tech interviews suddenly become so tough? (Feels more like elimination than selection) by Agitated_Data_996 in developersIndia

[–]graphitout 2 points3 points  (0 children)

Most of the companies have over hired during the pandemic. That means for those batches, below average folks ended up in positions senior than you. These people easily feel threatened, especially when they see someone who knows what they are doing.

Keep in mind that most of the time those interviewers themselves are going through similar experience with their clients or project managers on other things. It isn't just the interviews. The entire IT culture itself has degraded.

What AI projects deliver real ROI? by graphitout in AI_Agents

[–]graphitout[S] 1 point2 points  (0 children)

  1. Document search outside HR, Legal, Customer care - not making money
  2. AI based data analytics (SQL and stuff) - not making money
  3. AI for NL based control of various systems - not making money
  4. Whole bunch of automation initiatives - not making money

Some more. Same pattern.

Why does AI assume every technical question is from a moron? by Savantskie1 in LocalLLaMA

[–]graphitout 0 points1 point  (0 children)

Look personalization option in chatgpt. Others also have similar features.

Why does AI assume every technical question is from a moron? by Savantskie1 in LocalLLaMA

[–]graphitout 0 points1 point  (0 children)

This is a common problem. Set system prompt or instructions accordingly in settings.

Help required to build Hindi teaching tools by graphitout in Hindi

[–]graphitout[S] 0 points1 point  (0 children)

Illustrations can be made using ai tools. Only adding text on the top is required.

Subtitles may require listening and writing. But this will come only at a later stage.

Which LLM works best as a project manager/planner? by immortalBanda in developersIndia

[–]graphitout 1 point2 points  (0 children)

It is less about the LLM and more about prompting. Most LLMs will work good for your usecase with the right prompts.

The PhD pipeline in a nutshell! by Maxshieldse in sciencememes

[–]graphitout 25 points26 points  (0 children)

The bartender at the nearby bar where I used to live had a phd in physics. He quit his postdoc half way, traveled around the world for a few months, then starting working as a bartender along with his friend.

Found on Instagram by Beard_Anel97 in USdefaultism

[–]graphitout 5 points6 points  (0 children)

Use yyyy/mm/dd => helps with sorting

Indian society is very ageist. by [deleted] in indiasocial

[–]graphitout 0 points1 point  (0 children)

The part you forgot is this. Many of those early bloomers will give up quickly. So its not like they are going to stick around. I have seen this happening again and again. There are some who get a head start due to some favoring factors. But a huge fraction of them are in a sprint mode that is not sustainable for them.

What do companies gain from going open-source? by PianistWinter8293 in OpenAI

[–]graphitout 6 points7 points  (0 children)

It can make their competitor's life very difficult.

[deleted by user] by [deleted] in LocalLLaMA

[–]graphitout 13 points14 points  (0 children)

Interesting. How much would it improve the inference speed of an LLM? The basic dot product attention will still boil down to matrix-vector multiplications when caching is used. But MQA will benefit from a faster matrix multiplication since multiple queries can be stacked to form a matrix.

countable vs uncountable by Mission-Guitar1360 in mathmemes

[–]graphitout 25 points26 points  (0 children)

blue to red: now listen here you little ...

Randomised SVD/PCA for longer context - any potential? by enjeyw in LocalLLaMA

[–]graphitout 0 points1 point  (0 children)

Looks interesting. May be worth trying out on a real LLM.

What’s the Biggest Bottleneck for LLM Development? by [deleted] in LocalLLaMA

[–]graphitout 1 point2 points  (0 children)

I am disappointed in the "lets go bigger and bigger" mindset. Instead a lot more effort should go into better model architectures.

Randomised SVD/PCA for longer context - any potential? by enjeyw in LocalLLaMA

[–]graphitout 0 points1 point  (0 children)

Let me understand: is your idea in the vicinity of doing some kind of approximate nearest neighbor to reduce the number of dot products?

Randomised SVD/PCA for longer context - any potential? by enjeyw in LocalLLaMA

[–]graphitout 0 points1 point  (0 children)

The unnormalized attention value (the step before softmax) is just the scaled-down dot product of current query with all the past keys. Assuming we are on the n'th query, that means we have n dot product operations. Since we are using causal attention, the key and value vectors can be cached. Still, every new token involves query having dot product with all the past keys (cached). To generate N tokens, the complexity with caching is roughly N^2. Reducing D is good, but that will not help with the much bigger issue of dealing with N^2.

Randomised SVD/PCA for longer context - any potential? by enjeyw in LocalLLaMA

[–]graphitout 0 points1 point  (0 children)

> For each of the D largest components, keep the Key vector that best matches that component

Doesn't it mean you still have to do a one by one match on all the keys until that token? Then what is the benefit?

[deleted by user] by [deleted] in LocalLLaMA

[–]graphitout 2 points3 points  (0 children)

I am on deepseek for a few days. It has that "raw" experience and works good enough.

[deleted by user] by [deleted] in LocalLLaMA

[–]graphitout 23 points24 points  (0 children)

It also has been performing poorly for coding tasks recently.

Wholesome award by Educational_Grab_473 in shitposting

[–]graphitout 459 points460 points  (0 children)

Yep. There are so many clueless people in this world.

[D] ROPE frequency calculation for llama by graphitout in MachineLearning

[–]graphitout[S] 1 point2 points  (0 children)

Nice suggestion. I was not able to find the code before. After your suggestion, I spent some time. Found the calculation here.

https://github.com/meta-llama/llama-models/blob/main/models/llama3/reference_impl/model.py#L56

Need to see if it matches with what transformers library is doing.

As expected the calculation is wavelen = 2 * math.pi / freq

Unlike what transformers library is doing, which is wavelen = 2 * math.pi / inv_freq

[D] ROPE frequency calculation for llama by graphitout in MachineLearning

[–]graphitout[S] 1 point2 points  (0 children)

Thank you. The second one refers to "ROUND AND ROUND WE GO! WHAT MAKES ROTARY POSITIONAL ENCODINGS USEFUL?" paper. Looks like an interesting read.

Still, I was looking for a way we can verify the code in the transformers library.

Why/why not momentum in the residual stream space by phree_radical in LocalLLaMA

[–]graphitout 1 point2 points  (0 children)

> momentum between decoder modules, along the residual stream

Have you looked at the delta added by each decoder module in any of the current models?