🖼️

Future SOTA model weights may be open

Today, the best LLM models are Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5, which are all closed-weights. The best open model, Llama 3.1, comes in fourth.

However, I suspect closed-weight models may not continue to be state of the art.

Why might open model weights win?

  • Ecosystem: open ecosystems and platforms win when they can draw upon a wider base of users, developers, and partners to invest into the ecosystem
    • Today, Llama has a stronger finetuning ecosystem than its competitors
    • Tom at Etched (a startup making cheaper ASICs for inference) tells me that their chip design is informed more by Llama architecture than others, because Llama was open
  • Ideology: top AI researchers or company leaders may be drawn to open weights for society’s benefit
  • Economics: the marginal cost of duplication of model weights is zero, so there’s a economic pressure for companies which can profitably open model weights to do so.
    • See the fall in cost of music as a result of streaming services (though: music is still licensed via copyright)
  • Case studies from open source code
    • Code, like model weights, require high upfront costs to produce, but the output has ~0 marginal cost of duplication
    • Linux
    • React
  • Pathways to open model weights
    • OpenAI or Anthropic decide that opening the weights help with their mission of safe AGI; or that it’s in their self-interest
      • Not so crazy — this was the original formulation of OpenAI!
    • Meta AI takes the lead, and they uphold their commitments to open weights
      • Also seems reasonable, commitments are meaningful and going back on it would be embarrassing
    • Government funded research initiative (like Leopold’s “the project”) decides its weights should be made public, or the public demands that its weights should be public

Why not?

  • Economics: It’s very expensive to train SOTA models, and keeping those weights private gives the trainers the ability to monetize
    • Though: for this to hold, one needs to explain why open source exists at all
      • Is there more open source or closed source code in the world? Probably closed source.
      • Is more open source or closed source code run in the world? mu.
  • Ecosystem: In LLM applications (eg Cursor) it’s pretty easy to switch out one API provider for another, meaning that applications themselves don’t care too much whether the model is open or not
    • So an alternative loop: apps will choose whichever model is SOTA, SOTA models can take those customer dollars and invest into larger models (or
  • Case studies from non-open code:
    • Google
    • Facebook
    • Reddit
  • Case studies from generative AI:
    • After a brief flourishing ecosystem, Stable Diffusion (the company) seems to have imploded

What would open model weights imply?

  • Model weight security is not worth investing in
  • Does more or less value accrue to:
    • Chipmakers like Nvidia
    • Cloud hyperscalers like AWS/Google/Microsoft
    • Application developers?

More reading