Musings
- Why doesn’t this already exist?
- Under what circumstances does a marketplace/store/platform make sense, vs bespoke negotiations?
- Homogeneity/interchangeability of thing you sell? Idealized finance markets are stocks, commodities
- Selling data vs compute
- Data examples = commoncrawl, Reddit/X API access
- Compute examples = LLM credits. transaction fees (?)
- Data = nonrivalrous, can be sold many times; compute = one time, for a specific individual
- Compute is now sometimes the biggest input into data (think: RL environments, the transcribing process at scale)
- Other inputs: human time, expertise/taste, platform moats
- Existing marketplaces
- OpenRouter and Replicate are compute marketplaces at the API level
- And something like SFCompute or bespoke datacenter negotiations are compute marketplaces at the GPU level
- Huggingface (?) might be an existing data marketplace. Or Github.
- Seems like data “marketplaces” skew towards free, maybe thanks to the nature of info as nonrivalrous
- Reddit, twitter are data marketplaces if you squint
- How do frontier labs collect and buy data?
- How many providers sell data to one lab exclusively? Sell the same data to multiple labs?
- When does a lab in-house vs buy?
Motivating cases
- Imagining future apps/companies/
- Building peruse, and the tradeoff between fast cheap universal transcripts and high quality ones, parakeet vs sonnet
- Building LLM apps and needing to switch between different providers, denominated in different credits/pay systems
- Even just for models: Openrouter, Replicate, Modal
- Explosion in RL companies selling environments to frontier labs, eg preference model, mechanize