🔥
Austin's Notes
🛞

The everything store of data (and, compute?)

Musings

  • Why doesn’t this already exist?
    • Under what circumstances does a marketplace/store/platform make sense, vs bespoke negotiations?
      • Homogeneity/interchangeability of thing you sell? Idealized finance markets are stocks, commodities
  • Selling data vs compute
    • Data examples = commoncrawl, Reddit/X API access
    • Compute examples = LLM credits. transaction fees (?)
    • Data = nonrivalrous, can be sold many times; compute = one time, for a specific individual
    • Compute is now sometimes the biggest input into data (think: RL environments, the transcribing process at scale)
      • Other inputs: human time, expertise/taste, platform moats
  • Existing marketplaces
    • OpenRouter and Replicate are compute marketplaces at the API level
    • And something like SFCompute or bespoke datacenter negotiations are compute marketplaces at the GPU level
    • Huggingface (?) might be an existing data marketplace. Or Github.
      • Seems like data “marketplaces” skew towards free, maybe thanks to the nature of info as nonrivalrous
    • Reddit, twitter are data marketplaces if you squint
  • How do frontier labs collect and buy data?
    • How many providers sell data to one lab exclusively? Sell the same data to multiple labs?
    • When does a lab in-house vs buy?

Motivating cases

  • Imagining future apps/companies/
  • Building peruse, and the tradeoff between fast cheap universal transcripts and high quality ones, parakeet vs sonnet
    • Also, ListenNotes
  • Building LLM apps and needing to switch between different providers, denominated in different credits/pay systems
    • Even just for models: Openrouter, Replicate, Modal
  • Explosion in RL companies selling environments to frontier labs, eg preference model, mechanize