🛞

The everything store of data (and compute?)

Musings

  • Why doesn’t this already exist?
    • Under what circumstances does a marketplace/store/platform make sense, vs bespoke negotiations?
      • Homogeneity/interchangeability of thing you sell? Idealized finance markets are stocks, commodities
  • Selling data vs Compute
    • Data = commoncrawl, Reddit/X API access
    • Data = nonrivalrous, can be , resellable, compute = one time
    • Compute is now sometimes the biggest input into data (think: RL environments, what )
      • Other inputs: human time
  • Existing marketplaces
    • OpenRouter and Replicate are compute marketplaces at the API level
    • And something like SFCompute or bespoke datacenter negotiations are compute marketplaces at the GPU level
    • Huggingface (?) might be an existing data marketplace. Or Github.
      • Seems like data “marketplaces” skew towards free, maybe thanks to the nature of info as nonrivalrous
    • Reddit, twitter are data marketplaces if you squint

Motivating cases

  • Building peruse, and the tradeoff between fast cheap universal transcripts and high quality ones, parakeet vs sonnet
    • Also, ListenNotes
  • Building LLM apps and needing to switch between different providers, denominated in different credits/pay systems
    • Even just for models: Openrouter, Replicate, Modal
  • Explosion in RL companies selling environments to frontier labs, eg preference model, mechanize,