DeepSeek

 

Sara:
Hey Scott, glad we can catch up after that investment team session. There’s a lot to unpack about DeepSeek’s super low training costs, MoE architecture, and the possible impact on AI hardware demand.

Scott:
Absolutely, Sara. Ever since DeepSeek went public with its low-price strategy, we’ve been juggling questions: “Will AI hardware spending tank?” “Is HPC demand about to crater?” Let’s gather everything we discussed, so we can make sense of it before we publish any final thoughts.

Sara:
Sure. First point, remember the immediate fear: if training LLMs is suddenly 10× cheaper, do we see a rapid decline in GPU or server spending? But we highlighted that LLM-based AI is in a frontier expansion phase. Cost savings usually just let teams train bigger models or do more specialized tasks—so HPC usage often increases rather than shrinks.

Scott:
Right. There was the notion of “circular thought speculation” swirling around the market, too—where some folks concluded HPC would become obsolete. But as we clarified, when you introduce things like Mixture-of-Experts (MoE), it can reduce per-token compute, but gating overhead can be huge if your cluster isn’t well-optimized.

Sara:
Exactly. And that’s the point behind synergy between hardware and software. If you’re running MoE on a typical multi-GPU setup with limited interconnect, you might lose the theoretical gains to routing overhead. Meanwhile, if you do invest in advanced HPC fabrics (e.g., InfiniBand, Ultra Ethernet, or NVLink), you can harness those MoE benefits—but then you’re still spending heavily on HPC.

Scott:
We also noted that DeepSeek uses more than just MoE: they mention FP8 quantization to save memory, plus distillation (teacher–student). But sometimes folks conflate MoE with distillation, which is inaccurate. MoE is a sparse-activation strategy, while distillation is knowledge compression after training.

Sara:
Yes, that’s a key misconception. Another one is the idea that “FP8 → 75–85% memory cut.” In reality, partial fallback to FP16 or overhead from all-to-all routing means you don’t see a clean 4× factor in practice—maybe 2–3×.

Scott:
Right. And there’s also that rumor of “70B running on a phone.” We concluded that’s basically unrealistic in real-time usage—smartphones typically have nowhere near the DRAM for that. Distilled or not, tens of billions of parameters are just too large for on-device.

Sara:
Which ties into the bigger picture: yes, DeepSeek might slash certain costs, but the entire HPC ecosystem is still crucial for new features and bigger expansions. Remember how we discussed Shawn’s streaming analogy? Cheaper video compression led to a streaming explosion, fueling global streaming network evolution and CDN rapidly emerged.  

Scott:
Totally. Shawn’s piece on video file compression—published as a blog article—makes the same argument: once costs go down, usage tends to go up, ironically boosting the total hardware demand. We suspect a similar “Jevons Paradox” effect around DeepSeek’s cost claims.

Sara:
Yes. And from a competitive standpoint, if DeepSeek or similar labs truly cut costs drastically, big players in the U.S. and Europe would respond by investing more in HPC to stay ahead, not less. Historically, perceived external competition drives HPC expansion.

Scott:
That’s precisely what we concluded in our meeting, that a “global HPC arms race” is more likely than any HPC meltdown. So, ironically, these lower cost claims could catalyze HPC growth.

Sara:
Right. That’s why we walked away with a multi-dimensional perspective. Any cost reduction in early-stage AI generally triggers new usage expansions—like extended context windows or domain-specific fine-tuning—thereby sustaining or even amplifying HPC demand.

Scott:
Exactly. So let’s organize these final points.

  1. DeepSeek’s “1/10th cost” doesn’t kill HPC spending; it’s re-channeled into bigger or broader usage.
  2. MoE overhead is real—especially for inference. Gains can vanish if hardware can’t support the gating loads.
  3. Competitive pressures from China or elsewhere typically yield more HPC investment.
  4. A “software-only” or “hardware-only” approach is short-sighted; synergy is key.
  5. Past HPC efficiency waves show expansions, not collapses.

Sara:
That’s basically our stance. So the next step is turning these insights into a research note. We have enough from our internal discussion to clarify the bigger picture.

Scott:
Yes. We’re all set to finalize. Let’s incorporate the big bullet points—MoE, distillation, gating overhead, synergy, HPC expansions, and Shawn’s streaming analogy. That’ll shape a cohesive narrative about why DeepSeek’s claims don’t signal an AI hardware crash but rather hint at a bigger HPC future.

Sara:
On a macro level, the key is whether there will be scalable demand as cost and price dynamics expand. Efficiency improvements enable diverse deployments, expanding the total addressable market (TAM). However, simplistic views often dominate initially, as understanding these dynamics requires careful consideration before reaching conclusions.

Scott:
True, and at the micro architecture level, maximum efficiency is theoretical, as hardware and software continuously evolve. It requires a multi-dimensional, bottom-up perspective that considers trade-offs in local optimizations.

Sara:
Perfect. Let’s do it then. Thanks, Scott—looking forward to seeing how it looks like.

Scott:
Likewise, Sara. I think it’ll clear up a lot of confusion. Ultimately, long-term fundamental investment research must be multi-dimensional. It requires careful attention and, as anecdotally recognized in the LLM, “the attention mechanism necessitates multi-dimensional or high-dimensional analysis.   

 

  1. Introduction

The rise of DeepSeek—a company touting exceptionally low training costs yet highly capable large language models (LLMs)—has continued to generate substantial discussion among investors and industry professionals. DeepSeek’s low-price approach, underpinned by Mixture-of-Experts (MoE) architecture and additional software optimizations, challenges common beliefs about hardware needs in AI. Some observers interpret this as heralding a wave of commoditization for LLMs, akin to “made-in-China” disruptions in hardware, fueling concerns that AI infrastructure spending could swiftly taper. However, a closer examination reveals a far more complex dynamic, where software- and hardware-level innovations proceed in mutually reinforcing ways, and where cost reductions can stimulate greater usage rather than shrinking the overall market.

To appreciate this, one must remember that LLM-based AI remains in a period of rapid expansion, far from saturation. Efficiency gains in such environments typically feed back into more ambitious R&D aims, including training larger or more specialized models, adding new functionalities like extended context windows, or venturing into multi-modal integration (text, images, speech). These expansions, in turn, require continued high-performance computing (HPC) support, meaning that raw hardware demands seldom recede.

  1. Observations and Market Impact
  1. Disruption via Low Prices
    DeepSeek has rattled established AI pricing models by offering significantly discounted usage fees. In effect, they’ve set a fresh baseline that compels other providers to revisit their cost structure. The immediate worry is that if such low prices endure—and if the performance remains competitive—then the entire revenue model for LLM-based services might adjust downward, cutting into the margins of major players.
  2. Tech Media Frenzy & Investor Reactions
    With rumors of 10× to 100× cost reductions flying around, segments of the market concluded that AI hardware spending would nosedive: if training each model is so much cheaper, fewer data-center servers and GPUs are needed. This triggered heightened volatility in stock prices across chip manufacturers, cloud providers, and AI platform companies. The phenomenon can be described as “circular thought speculation,” where initial fears stoke more panic until clarified by more nuanced reasoning.
  3. Subsequent Stabilization
    Since then, clarifications from major AI and hyperscale cloud firms tempered these alarmist views. Their core message was that software-based gains in cost-efficiency generally unlock additional, more specialized or higher-end workloads, thereby increasing HPC usage over time. This recognition calmed the earlier negativity, driving valuations back up and emphasizing that AI’s long-term growth potential remains.
  1. “Deep and Through” Approach

To position DeepSeek in its proper context, we take a three-pronged perspective:

  1. Technical Structure – Examine the interplay of MoE architecture, advanced quantization, and other LLM optimization strategies.
  2. Long-Term View – Situate these developments within the historical continuum of hardware–software co-evolution, a mainstay of HPC.
  3. Implications – Determine how these factors reshape HPC investments and cluster configurations, potentially influencing AI deployment strategies worldwide.

By synthesizing these angles, we see that new software breakthroughs seldom eliminate the requirement for robust hardware. Instead, they frequently redirect or magnify computing demands into new areas where efficiency gains can be exploited.

  1. Detailed Analysis and Key Views

4.1 Our View 1: Software Productivity Gains Don’t Necessarily Kill Investment

  1. Choice of Efficiency Usage
    a. Cost Reduction: In sectors reaching maturity, large efficiency gains might stabilize or even reduce overall hardware spending once product differentiation narrows. Firms can choose to pass savings on as lower prices, thereby shrinking total outlays.
    b. Quality Enhancement: In an evolving domain such as LLM-based AI, cost reductions are generally reinvested to stretch boundaries (e.g., training far larger models, implementing advanced reasoning layers, or exploring multi-lingual expansions). Instead of less hardware, organizations simply aim bigger.                                                                                              
  2. Mismatch in Market Reactions
    • Some investors jumped to the conclusion: “Cheaper training = fewer servers.” But in a frontier market, cost efficiency often fuels new feats like large-scale RLHF (Reinforcement Learning from Human Feedback), bigger token-context windows, or broader domain coverage. Each of these expansions drives HPC usage upward, not downward. Thus, the oversimplified logic that HPC demand must contract rarely holds when new AI capabilities can be created.

4.2 Our View 2: System Performance Advances Holistically (Hardware + Software)

  1. No Single Optimization Dominates
    • HPC systems are complex stacks. Accelerating one component (like gating or model parallelism) can expose weaknesses in memory bandwidth, interconnect topology, or scheduling algorithms. Historically, HPC progress is iterative: each software leap triggers a hardware response, which then sets the stage for the next software evolution.
  2. Local Gains, Global Implications
    • A prime illustration is quantization. Slashing parameter size from FP32 to FP8 or INT8 can slash raw memory usage, but in MoE settings, the new overhead of gating tokens to many experts can spike cross-node traffic demands. This, in turn, demands a robust cluster interconnect (like InfiniBand with specialized all-to-all patterns). The net effect is that HPC hardware remains critical—potentially needing greater sophistication to sustain these advanced software approaches.

4.3 Our View 3: DeepSeek’s Demonstration of “Heavily Software-Focused” Optimization

  1. Familiar Techniques, Real Deployment
    • MoE (Mixture-of-Experts) is not new, having been explored in multiple research labs. However, DeepSeek claims a real-world system combining:
      • MoE for sparse activation, focusing compute on relevant experts.
      • FP8 Quantization to reduce parameter footprint.
      • Distillation to compress knowledge from a large teacher model.
      • Reinforcement Learning for potential gating improvements or policy alignment.
    • Taken together, these yield the theoretical capability of major cost savings in both training and inference.
  2. Cheaper Pricing
    • DeepSeek has leveraged these gains to undercut established LLM vendors. If the performance indeed matches that of more expensive solutions, smaller enterprises can suddenly afford robust LLMs, igniting a wave of new usage that ironically increases HPC consumption.
  3. Mixed Precision
    • The AI industry has moved from FP32 to BF16/FP16 and is now exploring FP8. This is an ongoing trend that aims to exploit lower-precision math for speedups without catastrophic accuracy loss. DeepSeek’s alignment with FP8 indicates they are pushing the envelope of precision-lowering and memory-saving.
  4. Open-Source
    • If DeepSeek’s architecture is open-sourced or otherwise accessible, others can reproduce and extend it. This accelerates iterative improvements and fosters further HPC demands as more participants attempt large-scale training or adaptation of the model.

4.4 Our View 4: Market Oversimplifications Abound

Despite the underlying complexities, several simplified narratives continue circulating:

  1. Memory Drastically Reduced
    • Claim: “With MoE, you only store active experts in memory, so VRAM usage is tiny.”
    • Reality: In most operational MoE setups, the entire pool of experts remains resident to enable immediate gating decisions, keeping total memory usage close to that of the sum of all experts.
  2. Exaggerated Cost Savings
    • Claim: “DeepSeek is 1/10th or 1/100th the cost of GPT-based solutions.”
    • Reality: Such impressive ratios often omit overhead from gating or fail to measure real-time performance under typical small-batch conditions. Transparent methodology is rarely provided.
  3. Unverified Parameter Reduction
    • Claim: “They shrank from 670B to 37B parameters,” or “70B runs on a phone.”
    • Reality: True on-device usage for tens of billions of parameters is extremely unlikely with current smartphone DRAM (commonly under 16 GB). Distillation can reduce total parameters, but large-scale real-time inference on a phone remains implausible.
  4. MoE = Distillation
    • Claim: “Using MoE is just teacher–student compression, right?”
    • Reality: MoE is about distributing compute among specialized experts. Distillation is an entirely separate method to compress knowledge post-hoc. They can combine but are conceptually distinct.
  5. FP8 = 75–85% Memory Cut
    • Claim: “Moving from FP32 to FP8 yields a quarter the size, so memory is cut by 75%.”
    • Reality: Gains are tempered by partial higher-precision fallback (e.g., some layers or accumulations in FP16), plus overhead from metadata and alignment. A practical improvement might be 2–3×, not a clean 4×.

4.5 Our View 5: DeepSeek R1’s MoE Gains Under Hardware Constraints

  1. Low Throughput in Benchmarks
  2. Inference Complexity
    • MoE demands short, high-volume bursts of communication among multiple nodes to route tokens to the correct experts. Load-balancing among tens or hundreds of experts can be nontrivial. HPC clusters typically require top-tier network fabrics (NVLink, InfiniBand, or Ultra Ethernet) plus specialized scheduling logic to exploit MoE fully.

4.6 Our View 6: Chinese LLM Threat & Acceleration of Global AI R&D

  1. Competitive Pressures
    • If DeepSeek or similar Chinese AI labs do indeed operate at a fraction of the cost, major US-based cloud and AI companies could hasten HPC expansions and intensify internal R&D to maintain leadership. Historically, perceived threats from abroad have spurred greater HPC investment, not a retrenchment.
  2. Low Training Cost?
    • DeepSeek’s reported cost metrics may revolve around “activated parameters” or reliance on teacher-distilled sets, drastically lowering the needed GPU hours. Yet from an industry-wide vantage point, it broadens the sense of possibility: more firms can attempt large-scale training, fueling HPC usage across the board.

4.7 Our View 7: The Best Approach Combines Hardware & Software Innovation

  1. Synergy, Not Either–Or
    • Major AI breakthroughs typically arise when hardware architectures (GPU, TPU, specialized ASICs) co-evolve with software frameworks (Transformer-based models, MoE, quantization strategies). Falling behind in hardware constrains the benefits of advanced software, and stale software fails to leverage new hardware potential.
  2. Trade-Off Reality
    • Gains in one dimension (say, memory savings from quantization) can shift overhead to another dimension (e.g., gating or networking). Distillation can yield a smaller model but might degrade certain niche task performances if not carefully tuned. True HPC synergy emerges when the entire pipeline is designed to handle these trade-offs.

4.8 Our View 8: Broader Ecosystem of High Growth

  1. Efficiency Gains Expand the TAM
    • Lowering per-token cost in LLMs typically unlocks new user segments, analogous to how cheaper data plans fueled more streaming. This expansion can be self-reinforcing, as more customers adopt AI, spurring HPC expansions to handle volume.
  2. Circular Demand Logic
    • The standard cycle: cost goes down → usage surges → HPC expansions accelerate → advanced R&D cuts cost further → usage grows yet again. This phenomenon has played out in multiple tech booms, and LLM-based AI is unlikely to break this pattern.
  3. Investment Implications
    • Fears that HPC hardware’s golden era might abruptly end typically appear in each wave of software innovations, but historically, expansions in capacity outpace predicted slowdowns. AI is still near an early stage, implying future HPC expansions remain likely.
  1. The Extended Macro–Micro Perspective
  1. Software Efficiency Gains Often Reinforce Hardware Demand
    • Techniques like MoE, quantization, and distillation shift but do not eliminate HPC usage. Freed capacity is swiftly redirected to more demanding tasks, like training multi-modal or higher-context models.
  2. High Demand & Oversimplification
    • It’s easy to think: “Once cost drops, HPC needs vanish.” In reality, an emergent field uses those cost savings to push new frontiers—leading to HPC expansions that overshadow any baseline cost reductions.
  3. Ecosystem Growth
    • As with streaming media, improved efficiency (compression) led to exponential user uptake, and data centers and global CDN ballooned to meet new consumption patterns. Similarly, cheaper LLMs create new use-cases that keep HPC systems busier than ever.
  4. Long-Term Investment Mindset
    • In evaluating HPC hardware or AI software stocks, short-term hype cycles can obscure deeper trends. Over multi-year horizons, efficiency improvements historically open broader markets, fueling more HPC development, a pattern repeated across supercomputing, big data, and now LLM-based AI.
  1. Conclusion & Key Takeaways for Investors
  1. DeepSeek
    • Claims of “1/10th cost” or “significantly less GPU hours” do not automatically mean HPC demand vanishes. Instead, they typically pave the way for new or expanded AI tasks, reinforcing HPC usage long-term.
  2. MoE Overhead
    • While MoE can trim per-token compute, gating overhead plus distributed multi-expert routing can hamper throughput unless advanced hardware and software synergy is in place. Real-world gains can differ markedly from theoretical.
  3. Competitive AI Race
    • Chinese and other emerging players, if they truly achieve dramatically cheaper LLMs, might spur a global HPC arms race. Past examples (e.g., global competition in HPC for weather modeling, supercomputing for cryptanalysis) indicate that perceived external threats often boost HPC budgets.
  4. Multi-Dimensional Analysis
    • Early-stage AI simultaneously exploits cost savings to enlarge usage. A purely linear viewpoint, in which cost savings only reduce hardware needs, misses how hardware–software co-evolution fosters constant expansions.
  5. Market Speculation vs. Fundamentals
    • Dramatic headlines can trigger short-run panic, but historically each wave of HPC efficiency fosters bigger user bases and more advanced research. Over time, prudent investors see that HPC expansions typically endure.

The DeepSeek phenomenon illustrates how software breakthroughs can seem to reduce hardware requirements, yet they often amplify overall demand in practice, as newly enabled features and heavier usage flourish. For discerning investors, this typically indicates a sustained or rising appetite for HPC infrastructure—especially once synergy between hardware engineering and advanced software techniques unlocks possibilities not feasible under older cost structures. Maintaining a holistic and multi-dimensional perspective on hardware–software interplay can help fundamental investors control and regard short-term price variability as the premium paid for future return, while pursuing for the continued growth of intrinsic value.