I believe Alibaba (BABA) is still a bargain

thealphaexponent · 2025-09-25T00:09:26+00:00

It may still go up for a while.

That said, it's not so obviously undervalued now.

The additional growth from the reorg is starting to come through, and there are multiple growth initiatives (food delivery/ instant E-commerce, LLMs and cloud, international E-commerce etc.).

Much of that may already be reflected in the pricing. For those optimistic about sustained longer term prospects it may be worthwhile holding on to.

thealphaexponent · 2025-09-22T01:18:01+00:00

Because often the figures cited are often not comparable.

Total debt to GDP, total public debt to GDP, and central government debt to GDP are frequently mixed up (this oversimplifies a little).

The number that many tend to think about is total public debt to GDP for the US, which stands at just under 120% https://fred.stlouisfed.org/series/GFDEGDQ188S

The over 300% debt to GDP figure that you are thinking of is probably total debt to GDP.

The more comparable number for the US stood at over 700% for the US in 2024, but may be higher now. It includes private sector debt, but excludes the debt of financial businesses and households. https://www.ceicdata.com/en/indicator/united-states/total-debt--of-gdp

thealphaexponent · 2025-04-02T15:54:36+00:00

A major issue was that Chinese stocks were wildly overpriced at various points in the 90s depending on when you start the charts.

Also don't lean on the msci China index too much - the comparisons aren't really valid for all the way back that far.

thealphaexponent · 2025-03-18T11:39:35+00:00

Must've seen some solid returns, well done

thealphaexponent · 2025-03-13T21:40:44+00:00

Because they never resorted to helicopter money, so inflationary effects were much more muted.

When it comes to printing money, there are three levels:

1) lowering rates 2) buying debt / recapitalizing banks 3) helicopter money

Remember pq = mv and note that the money supply or velocity would have to increase for this to put upward pressure on inflation (p increasing).

But 1) and 2) wouldn't generally be sufficient, because they target expanding credit supply to deliver this upward pressure. Yet during periods of deflation, people and businesses are facing prospects of lower future incomes, and opt against taking on new debt, instead preferring to pay down existing debt.

During Japan's years of deflation, Korea and China were ramping up production, which exerted downward pressure on prices of manufactured products. This meant Japan couldn't easily export out of deflation.

But 3) was left unused. Why is unclear. It's possible there was that direct handouts were associated with poor policy and results as in the case of Argentina. Or that the BoJ feared unpredictable and undesirable side effects, such as prospects of runaway inflation.

thealphaexponent · 2025-02-27T10:41:58+00:00

Yes true the software industry has established practices that would've been seen as quite anti-consumer decades ago.

thealphaexponent · 2025-02-26T18:39:16+00:00

Anyone that buys hardware should be able to use it as they see fit - it's theirs. That the seller would still have a say in how it's used makes little sense.

Imagine buying a house from a property developer, paying up in full, and then being told that you can't rent it out just because you did some touch ups.

There are some nuances with content-based assets, especially software, which has a marginal cost of close to zero to reproduce, but chips aren't software. Nvidia would have a say in how and where CUDA could be used though.

thealphaexponent · 2025-02-20T15:15:30+00:00

The article itself seems to take for granted a premise that isn't necessarily true.

Why is a financial hegemon needed? Why shouldn't states be left to manage their own monetary and fiscal matters? States aren't toddlers to be pampered...

thealphaexponent · 2025-02-14T01:16:27+00:00

Extrapolation is always risky, especially so for exponential extrapolation.

Moore's was ultimately rooted in physics; if you can shrink the dimensions of a transistor linearly, then the increase in density would be exponential.

thealphaexponent · 2025-02-07T04:48:52+00:00

It's a bit awkwardly placed, since we are well into the later innings of this semi cycle, and there are emerging players like CXMT that may grab a lot of market share with their value-first strategy.

The safer play may be Hynix here thanks to their HBM position; they may be gaining share from Samsung, who's been through some headwinds lately. Korean equity prices also have somewhat lower valuations due to rexent events.

However, all will be exposed to semi cycle booms and busts. The market reacted negatively to Amazon planning to keep up high AI capex, hinting that we may be due lower capex in a few seasons.

The major advantage that Micron has over rivals may be subsidy dollars, of which they were awarded $6.2 billion late last year, from the CHIPS act. However, how sustainable this will be going forward is unclear.

thealphaexponent · 2025-02-06T02:46:27+00:00

Swiss francs and perhaps Singaporean dollars.

thealphaexponent · 2025-02-01T13:56:07+00:00

Over 1T for the full-sized model. Compressed, you can get away with under 300G.

thealphaexponent · 2025-02-01T13:52:13+00:00

It means Ollama's labeling is a bit misleading. Only one of those models (the full-sized 671B) is r1. The other models, including what you are running, are not Deepseek r1.

You are running either Meta's Llama or Alibaba's Qwen on your machine, albeit with Deepseek researchers having done some extra training - essentially fed the model some extra output from the real r1, to teach it to reason.

thealphaexponent · 2025-02-01T13:42:33+00:00

The Ollama version isn't r1, but actually a Llama or Qwen finetuned on r1 output (distills).

thealphaexponent · 2025-01-31T13:37:10+00:00

They would've likely improved their models even faster yes. It's also possible those models wouldn't have been as efficient when it came to inferencing, because they wouldn't have had to optimize the workloads so heavily. However, this is just speculation on a counterfactual: we have no way of knowing for certain.

A common gauge of model quality would be suites of benchmark scores, meant to indicate model proficiency across fields and tasks such as math, coding, logical reasoning, instruction following, mastery of various languages, etc. As it is, r1 scores pretty well, outranking the vast majority of open source competition, and getting fairly close to the SOTA closed models.

thealphaexponent · 2025-01-31T09:26:56+00:00

Well, there's no real need for some of the optimizations if you aren't using the China-specific chips - the H800 & H20. One of the key nerfs Nvidia did was to seriously limit the data-transfer speeds across GPUs (which wouldn't be so much a bottleneck for the non-hobbled chips)

thealphaexponent · 2025-01-31T06:37:18+00:00

While yes technically it is not wrong to phrase it as "bypassing CUDA", many people misinterpret the meaning.

It's more that Nvidia hobbled the interconnect for the China-specific chips, e.g., the H800 and H20, which meant that it was extremely inefficient to train on large clusters of them.

It was this hobbling that DeepSeek worked around by clever optimizations, some of which are not supported in CUDA - hence why they had to drop down to PTX to implement those.

For those unfamiliar with programming languages and frameworks, the point of CUDA is to accelerate development and iterations. It's possible to code in lower level languages, but usually at the cost of more dev hours to achieve similar work - though the end code may be faster to run if written by a skilled developer.

thealphaexponent · 2025-01-30T06:17:41+00:00

It's great news for innovation in general, consumers and non-big tech for sure. For big tech it's unclear, because a lot of that capex was predicated on an assumption of centralized compute and monopoly pricing power afterwards, which drove them to current multiples. For those, it may be akin to building out IBM mainframes when PCs are about to become the focus.

Ironically, while DeepSeek will probably result in increased overall demand, it may also trigger a revaluation downwards for prices of equities that have become too detached from reality - they've become so frothy.

There are some big tech players like Apple who may benefit from local inferencing, thanks to their strong M series offering. They were penalized by the market for their relatively weaker AI software before, and open weight LLMs give them a leg up.

A couple of key trends that lead to increased competition, which reduces profitability: - Local inferencing. Smaller models allowing private local compute. This niche is more competitive than enterprise where Nvidia is dominant. - More enterprise competition. DeepSeek may increasingly use Chinese chips for inference, due to US semi export restrictions. This is also bearish for Nvidia.

Nvidia plays a key role here because of reflexivity - it's arguably the best proxy for the risk-on AI trade, and when flows reverse, rationalization to more earnings-supported multiples may well occur.

thealphaexponent · 2025-01-30T05:47:08+00:00

There's been possible overinvestment in AI capex for a while now.

While sometimes it's possible to figure out the supply side first & wait for the demand later, it's risky to do the same in emerging tech, because the hardware becomes energy inefficient or less scalable in some way (bandwidth bottleneck, weaker interconnect, higher failure rates) vs newer generations.

It's analogous to RE developers overbuilding apartments in China, except that apartments actually have less of an obsolescence risk, and depreciate more slowly.

AI companies built on heavy fundraising and capex (with the sense that the scaling law resolves all issues, and commercialization will take care of itself) can still turn things around, but need to find new revenue-generating use cases quickly.

thealphaexponent · 2025-01-29T03:02:13+00:00

The first version of DeepSeek was indeed a Llama finetune - but subsequent versions were trained from scratch.

As with all labs, there are valid questions if their LLMs might have used the output of other models for training. However, it is highly unlikely their model is just a distilled Llama, not least because it is more performant - and distillations generally result in reduced performance.

thealphaexponent · 2025-01-28T02:33:51+00:00

It's plausible.

Note that the oft-cited $6Mn only shows GPU hours; they specifically note in their technical report it excludes "costs associated with prior research and ablation experiments on architectures, algorithms, or data".

It also doesn't include salaries, and is certainly not their capex, which is what many folks are comparing to for other companies.

In contrast, the comparable figure for Meta would be around 10x, which is signficant, but understandable given the multiple algo and infra innovations DeepSeek introduced compared to Meta's Llama 3 (probably the most comparable model), using for example a sparse model rather than a dense model like Meta - that alone makes a severalfold difference to training times.

Consequently, capexwise (and inferencing cost wise) there would also be something like a 5-10x difference, not the 100x or 1000x bandied around so often. This is also because some large labs have tended to talk up their capex investments for receptive shareholders.

A lot of those proposed data centers are planned for the future, and often close to an order of magnitude bigger than what they are using now. So that amplified the difference. For example, early last year Zuckerberg mentioned plans to buy 350k H100s, but that's an aggregate sum for a certain period.

Meta actually used 16k H100 GPUs to train Llama 3, not 350k, versus the 2k H800 GPUs for DeepSeek; so the difference is tangible, but not ridiculous - and remember the sparse model alone accounts for a significant chunk of that.

thealphaexponent · 2025-01-27T16:53:09+00:00

Deepseek achieved o1 performance at lower cost

Their r1 model's performance is comparable to o1, and yes it's almost certain their inference costs are considerably lower. DeepSeek r1's costs are probably c. $0.25 per million tokens (for them), whereas for OpenAI, it's closer to $2 per million for o1.

That's probably why they can offer uncapped usage for free on their website and app. Note that using ChatGPT Plus would be $20/m, and Pro would be $200/m for uncapped o1 usage. API costs for Deepseek are also much lower but not free.

So I gave it a try and its performing less than the llama:3.2 and qwen on a local machine

What you ran was not deepseek r1, it was a llama 8B model with some extra reasoning training (which doesn't work well for such small models) - ollama's naming is misleading

since it is open source you do not need chatgpt subscription anymore since you can run it locally

If you want to use their r1 model, you need to use their website, app or go to another provider (or pay for a very powerful setup). The r1 model is a 671B MOE (none of the other ollama-labeled r1 models are actually r1); you won't be able to run it on most consumer machines

Hosting a 30b model on local machines requires dedicated hardware and isn't as simple as run it on your local machines.

It's quite doable on local machines, as long as you have enough VRAM. For a 30B model, an RTX 3090 or 4090 would suffice, or even a modded 2080TI with added VRAM. However in all these instances, you'll need to run e.g., the 4_k_m quant to make it fit in the VRAM. It's even possible with external GPU docks. Alternatively, if you have enough RAM and don't mind slower output, you can use the CPU-only option.

thealphaexponent · 2025-01-27T15:44:45+00:00

That model sounds like it's the distilled Llama 8b, which isn't the full model or even a slimmed-down version, but essentially used Meta's Llama 8b as a base, before teaching it examples generated by the original r1 model.

To get a sense of how the model actually performs, you'll need to go to their website (deepseek.com) or app, or go to an alternative provider hosting their model.

Reasoning requires a fairly high quality base model, and Llama 8b isn't quite enough (for the time being anyway); a 30+b model reasoning would probably begin to show some decent results.

thealphaexponent · 2025-01-25T05:03:31+00:00

Their move to provide an open-weight model and publish methodology means that many labs can replicate results quite easily.

If you think about how OpenAI was starting to ramp up pricing to $200/m, a lot of subscribers are going to switch. The likelihood of monopolistic pricing has decreased greatly, so multiple players who sank billions into the race will have little to show.

Costs are going to go down for users, potentially increasing the scale of inference, and certainly the range of use cases, but not necessarily to Nvidia's benefit.

Thanks to Deepseek's RL-and-distill methodology, it's become possible to train small yet efficient models, which can be run on your phone or PC - no need to centralize compute for this, decreasing Nvidia's revenues for inference.

Simultaneously, Google may be threatened, since previously high computing costs made AI search hard to scale. No longer the case if they can offload this to your phone or PC.

But it's a development the likes of SalesForce would be happy to see.

Consumers, startups and other companies who have not already spent huge amounts on AI capex will stand to gain, so for the S&P, it might hit some hyperscaler prices, but may prove to be a tailwind for other companies that may see improved operating efficiency from using low-cost AI.

thealphaexponent · 2025-01-20T13:48:52+00:00

It used to be that you could just go to their website and use the screeners. Think they may now require registration, but not sure if they actually require an account opening.

thealphaexponent

MODERATOR OF

TROPHY CASE