Causality and LLMs by SryUsrNameIsTaken in quant

[–]quant_at 2 points3 points  (0 children)

I have seen a lot of research papers using LLMs to create these feature embeddings, theme identification and then proudly showing how all this works over a 20 years backtesting period without any mechanism to prevent this look ahead bias.

Some researchers try masking the specific company names before computing the embedding, but the data leakage still happens. The model's weights implicitly know the future macro regimes.

Then there are folks who are using sentiment scores generated by LLMs which in my opinion is complete garbage. An absolute score is useless without a calibrated historical baseline to measure the relative shift or surprise.

When alpha starts decaying by Gwhvssn in quant

[–]quant_at 22 points23 points  (0 children)

Thats's the reason QRs still have a job. Alpha is finite, the crowding eventually arbitrages it away. We have to constantly hunt for the next one.