đŸ”„
Austin's Notes
/
📌
Posts
/
🌐
Public notes from The Curve 2025
🌐

Public notes from The Curve 2025

‣

Geoff Ralston (with Maxwell Zeff)

  • With OpenAI, going around, 50% “when is AGI happening”
    • 2015: 5-100y. avg: 2030
  • President of YC for 4y
    • At end of 2022, ChatGPT was in the wild; seemed to change everything
  • Convos about AIS vs AI video
    • OAI: formed for safety; explicit goal, to create superintelligence first
    • Convos always about tech
    • Ilya in 2017 — all about compute
  • Why SAIF?
    • (boring story from Geoff)
    • After leaving YC, people wanted to talk about startups, but mostly ended up talking about AI. Spoke to people about AIS
    • GGI convo — it’s good. But need more!
      • “Don’t have many skills, but working with startups is a hammer I have”
    • What if we help folks who want to build guardrails for the future
    • AI — horizontal tech, will be embedded everywhere, could go quite sideways. Who’s worrying about it?
      • Learning more about ecosystem in the past months/years
      • If we can build great startups with scaleable solutions
      • And: elevate the conversation, build the community, ecosystem around safety. Including: nonprofits, forprofits
        • Can cure cancer, old age
    • Any sufficiently powerful tech needs guardrails. Plane takeoff — safest thing you can do; didn’t happen for free; paid for it
      • FDA is no joke. (can argue about regulatory capture). But: we paid
  • Now: YC is opposing safety. Garry Tan, Marc Andreessen, Peter Thiel — how did we get here?
    • Strange that “safety” has become a dirty word. “You must be a decel” “They want to take away our future” “everyone who dies of car crash”
    • Strange, counterintuitive that regulation has become dirty. SB1047 — >$100m, might be liable
      • Sure, some startups are $B — but feels like an exception
    • Pandemics from AI: bad. Happy that frontier labs build in restrictions
    • “Not afraid of saying “safety” loudly. Safety, security, trust — help us deploy things more widely, make things more beneficial than humanity”
    • Semi-religious aspect which is weird. Bible: Antichrist is someone who talks about peace and safety. Those talking about safety might be antichrist
      • Marc Andreessen — evil to want any sort of guardrail in place
  • Is YC making a mistake, positioning against AI safety?
    • Will it hurt YC? No
    • Is it a mistake, more broadly? yes! wrongheaded
    • But: YC is funding AIS companies.
    • Really saying: regulation will slow us down; it’ll hurt startups
    • Shouldn’t be state-by-state regulation. US congress
  • YC says “Build something people want”. AI safety startups: is this something people want? How do you make a compelling startup?
    • Building trust, tech that allows you to trust better, is something people want.
    • We do attempt, when interviewing, talking, include the lens: are we building something people wrong
    • May have funded a company nobody wants. That happens.
    • YC’s motto is “build sth people want” is not because pg did it and said it was a good idea. He started with building sth people didn’t want. PG & Geoff met through viaweb acquisition
  • $100B infra rollouts, giant boom. Why this moment to start investing in AI? Feeling of importance as investor?
    • Geoff: I wake up, can’t get back to sleep, because of the thought of how everything is changing so fast; feels hard to imagine being able to make a difference, grab onto sth, say, “let’s matter here”
    • SAIF is our small effort to do that. Not going to change the course of humanity. Maybe change a small role. Talking about a small amount being spent
    • Accelerationist, people are spending $T. We’re maybe spending $10s of Ms
    • Maybe: spend a couple % points on future.
      • They say, “no, this is for the benefit of humanity”
      • Whether tech goes off on its own, or someone uses it — both exist. As a country, as a species, should think hard about the next stage
    • So: because it’s happening now
    • 2015 — were right, at end of decade, this’ll be real
  • Leading products that aren’t SSI or safety, just making money. Social media apps; Meta using AI chats to advertise
    • Incentives drive human behavior. Problem these AI companies have: capital expenditures are incredible; cost of inference > more than payment. Can’t make up in volume
    • Need secular sources of revenue to make it not a bubble
    • Half growth in GDP is coming from datacenter spend, incl. GPUs
    • Sama: 250 GW of capacity by 2033. $Ts of spend. Have to drive enormous revenues to be able to do that. Meta, Google revenue streams
    • From 2022 to 2025, OAI revenue with from 0 to $13-15b.
  • Hearing from employees: consumer business powers the mission. Makes money to do AIS research. But companies like Google, FB evolve. Start with a clean-sounding mission, gets sullied over time. Is now different? Or have we seen this story before
    • Cyclical nature of biz, history rhyming — yeah. Looks like tech rollouts over history
    • Likely to be a bust here; likely, but don’t know if it will matter in the arc of rollout
    • “Who was around in 2000?”
      • Reflection: yahoo mail. didn’t change, growth of usage didn’t change. But tech was one of the fundamental permanent changes in how the world operated. Bubbles come and go. Same for AI
    • Stanford kid: “don’t remember how I did school before ChatGPT.”
      • How many kids are using LLMs, ChatGPT?
        • 30-50%? no, all of them
    • Maybe financial bubble
  • Audience Q: Practical AI safety governance structures for founders — how to preserve mission as we scale?
    • Need to think more. Fair question — shooting from hip. So far, if aligned — let’s go
    • What do you do to keep them aligned? What do you tell a 2 person startup?
    • Can say “we promise not to be evil”. Does that matter? Not sure.
    • If you have ideas, on what language to use, Geoff is all ears.
  • Audience Q: Economics, incentives. Safety bring a public good, so it’s harder to fund using market mechanisms. Are things can use markets: regulations to push into investment; liability. How does Geoff view things from that angle
    • 100% correct
    • AI underwriting company: creating a set a criteria so that agentic deployment can be insurable
    • Some parts of this question which are not approachable from a startup/forprofit. Personally, interested in philanthropically supporting research
    • xrisk — maybe an Ilya or Softmax, SSI before bad ones — maybe?
    • Games we’re playing are small ball (okay to admit)
      • But research into harder questions
      • We’re in a window of hope
    • With SI is there, not going to let us take a screwdriver into its innards. (not sure. maybe it will!) But I doubt it.
    • Now: any work with potential, not fruitful
    • Room for philanthropy, room for structure to create profit incentives for companies that can grow and scale. Also, AIUC in liability
  • Audience Q (Kylie) regulation? Fear we won’t have substantial regulation in time (eg SB1047). Do we have the ability to get it before too late? What does it look like, what do we need?
    • Hopeful, but not massively optimistic. I supported 1047; I supported 53 (it passed!)
    • At least, we have transparency
    • Support the RAISE act in NY. Support smart local efforts (though, agree that state-by-state is a bad idea). Not a great way to build regulatory regime
    • Not hopeful
    • Other efforts will hopefully infect what happens.
    • Maybe if we have a little disaster, might wake people up. E.g pandemic only kills 30m people? not a great example
  • Audience Q: Tim O’reilly, side project on AI disclosures.
    • Geoff: if you need to be safe, need to build it in?
    • [tim] Also looked at regulatory regimes. Learned: great # of things that we think of “safety innovations” were commercial. Eg headlights, wipers. Also, slow evolution (road signs, they were state-by-state)
    • Mandatory seatbelts — 70y after the seatbelt
    • [tim] We do a lot of premature regulation. Regulation evolves out of torts; someone sues, people say “bad things happen”. SAIF invests in headlights — useful to user, and increases safety
      • Observability, AI observability — part of infra that makes it regulable tech
    • Doesn’t address rogue SI
    • [tim] Afraid of fearmongering to get uninformed politicians for things that don’t work. Labs take SI escape seriously, and politicians won’t help.
    • Geoff: Might be surprised about what $B can do to someone.
      • Joke on all of us — getting offered $100m/y to go to the dark side
    • Meta ruining people’s lives is amenable to tort, sued, air traffic safety
      • [ac] doesn’t seem like law/regulation speeds up in sync with pace of regulation.
‣

Future of work with Aparna Chennapragada

  • Chief Product Officer of Microsoft
  • Previously, CPO of Robinhood, then VP at Google
  • https://aparnacd.substack.com/p/most-work-is-translation
    • Q: Where does org chart compression first happen: big tech? startups? labs?
    • today: shorthand for impressiveness is how many people you manage — how to push back against that incentive gradient with LLMs?
  • Practitioner issues — 10 lessons from the trenches. Moving from “just add AI” to “AI-native”
  • Been through 3 shifts: akamai, content
    • Eg first website was a scanned brochure
    • First mobile: take website, shrink it , call it mobile app
    • So: first wave are just add AI, do excel, photoshop, ppt
    • But: moving from AI-native. Rethink process, the how, the what
    • Going to focus on the how and the who
  1. NLX is the new UX
    • (Natural language experience). No heavy graphics, UI, menus. Natl interface, you don’t have to adapt to menus
    • Friend spent 10 years learning illustrator — was a moat, until now
    • Good thing: flexible, zero learning curve. So high adoption
    • Bad thing: don’t know what to ask. Blank prompt is daunting
    • Product builders: not bells and whistles. Conversations have grammar
      • 1:1
  1. Prompt sets are the new PRDs
    • Instead of product spec
    • Process shifts. What are the set of prompts that you want the Agent/AI product to do really well at?
      • That’s the effective product spec
      • Everyone’s winging it. Which represents the whole? Who writes these things?
    • Composition of team changes — natl lang prompts are not same skillset as PRD
  1. Benchmarks are the new QA
    • Almost every launch has demo +
    • Benchmarks are the way to establish your product works
    • Sales team — QA, and sales collateral
    • Almost nobody doing work is doing product math.
      • Talking to ppl, uploading spreadsheets, telling ppl what to do
      • How do you define benchmarks characterizing real work.
      • Not for frontier labs, but
  1. “Small is the new black”
    • (on team sizes)
    • Some people say startups — speed vs scale
    • But also: AI era, very interesting
      • Talked to 25+ AI-native teams within msft & externally
    • When you’re small: teams rely a lot more on models
      • Instead of hiring more ppl to do eng that ends up being
      • HPUs for GPUs
        • Models keep getting better. In 3 months, ramp up of adding ppl vs leaning on models
    • Other reason: internet, mobile, have happened over a decade
      • This happens over months. Small team can adapt quickly, throw away, unlearn and learn
  1. Double barbell of teams
    • Team composition is changing
    • Typically: swimlanes. Build/eng. Product. Design. Policy, comms.
    • Now: Complete barbell.
      • One extreme: a bunch of generalist builders
      • Other extreme: model whisperers. Understand how to get the best of the models. Getting the megabucks
      • In between: all AI
      • [ac] are model whisperers scarcer than generalist builders? How does one find or become a model whisperer?
    • Tools, evals, how do you have the runs
    • Interesting thing that Aparana takes away — very trainable
      • More recent gras are more in tune
      • ML researchers from a few years into their careers, have to unldearn too many things. 2-3y of phd, figuring it out
  1. Avoid the six finger trap
    • A bunch of teams overengineered around the quirks
    • Then new model comes, 4o, and it’s a nonissue
    • Lesson: don’t overeng on current quirks. The model will eat that.
  1. Skate to the model puck is going
    • Corollary. Weird thing to do.
    • Used to be: build the thing that can work today
    • But weird calculus. If you overcorrect to making it work today, will be obsolete in a few months, or underperforming relative to model puck
    • Build the product where model grows into it
      • Eg today, data scientist isn’t that good. But AI agent in a box
    • q: Convincing us? easy. Convincing customer: harder. how do you solve, customer wants it working day 1
      • aparna: Central tension
      • Consumer world: can shitpost on Twitter, say trust us, you’re not paying
        • (but even there, bar is higher)
        • Google I/O — preannouncement disease.
        • Companies, OpenAI, today you can use it
      • Enterprise: committed to a roadmap, have put something in their hands. Some frameworks; pilot vs not. But can’t say “pay me as if it’s a real thing”
      • So: benchmarks are starting to work. Past is predictor of future. 2 points don’t make a curve, but can extrapoloate
  1. Memory is a moat
    • Roughly converging — are you just a wrapper on large models?
    • Raw intelligence is no longer the big differentiator
      • (eg coding agents. github/claude code, ppl switching all the time)
    • Memory is a moat — as a product, but also, what does it mean for users too
    • Wanted to use another model, but all my stuff is in ChatGPT. I just didn’t want to try a different thing
      • Need to start thinking about portability
      • Data portability was a big deal. What does it mean to take my stuff, go to a different product or model
      • In enterprise
    • [ac] Manifold memory
      • Get off Discord, and onto Manifold
    • Also technical problem. Building chief-of-staff agent
    • Want to marry world’s knowledge to meetings, emails, etc
      • But: there’s a shitton of it! Easy to dilute/pollute context window. How do you pull up the right context, make judgemtns
    • Also a policy perspective
  1. Today’s magic is tomorrow’s commodity
    • Image, now it’s just Saturday.
    • How do you play in this. Don’t overeng for today. Take a bet. Bet-based product making, different from previous eras
  1. The model is the product
    • Or, the model eats the product
    • (already happening: when someone writes sth. Don’t start a document editor. A lot start with chat, use it as a thought coprocessor, go back and forth)
    • Last mile: go tweaks, do human editing
    • Even narrow band of coding
    • IDEs vs Agents
      • Devs: VSCode, etc. But now: if you can delegate a lot, why look at pixels?
      • UI, products predicated on human eyeballs looking at the screen
    • “The best X is no X”
    • Payments team: optimize payment funnel
    • But uber vs cab — one joyous benefit, not have to rummage. Just get in and get out
      • Best payments = no payment
    • Best IDE = no IDE
    • Where are the cases where the model will eat the product
  1. Bonus: The model is eating the org chart
    • “I talk to people so eng don’t have to”
    • legal: Converting constraints to contracts
    • product: converting customer stream of consciousness into product set
    • economics don’t support translation layer
    • very small LMs that are token constrained, incentive issues, incomplete info
    • Not trying to preserve their jobs

End structure: AI spine, becomes much more accessible, with a reasoning layer on top. Any person (including CEO) can access in read/write out of it. Won’t happen in a day (a lot of decisionmakers are themselves in translation layer). Upton Sinclair on salary & incentives, can’t make a man understand.

What are doers doing? Folks who are coding, using LLM tools, pump in meeting transcripts, standup notes; everything is pumped into the spine

These agents

CEO = godmode. But access control overall

Workbench (like Swebench)

  • What do most people do? Workflows characterizing translation layers, plot across stakes, other dimension = error rate. Follo wup: what do people do, how can we characterize as LLM-translateable

Job loss and automation

  • #11 — even less disrupted fields (b/c doers are not being replaced)
  • Rather than thinking of asi — artificial superdiligence. not asleep at the wheel, distracted. Everything we think of around intelligence, the bar might be higher or lower.
    • Medium worker/translator: diligence can be much higher, 24/7 vs 996
  • Different layers of orgs: debate, collaboration vs translation
    • Between top & bottom, not just translation
    • Seems harder to replace the things on the bottom
    • Maybe “transduce” would be more of an accurate term
    • Mixture of experts debate — can build into orchestration among agents with smaller translation
      • with memory, critique
      • Even though sales person is 2y, have institutional memory
      • Folks in big (or small) company is knowing how stuff was done

Failure modes:

  • Easy hallucination
  • “You’re holding it wrong” — agents use it a lot, some don’t use it. Is it an agents issue, how to use, trust? uneven in terms of usage. With customers
  • Blank prompt — people don’t know what this can do. Subtle thing: tried it 6mo ago, capability wasn’t there. Human priors are slow to update
‣

METR, w/ Beth Barnes & Lawrence

  • METR:
    • AI R&D, think self improvement
    • Loss of control (rogue replication)
    • Alignment (scheming, sabotage)
  • Rogue replication hard to shut down — seems concerning?
    • [ac] are blockchains a bit like
  • Three questions:
    1. Current real-world impact
    2. When should we see very capable AI systems?
    3. How easy to make systems do what we want to do
  • Uplift takeaway
    • interesting: software is very top heavy, most productive
  1. Capable AIs?
    • First billionaire-$ solopreneur (Dario) vs Gary
  • Fast saturation problem
    • Artificial, constrained to 1 domain
    • RE-Bench
  • Most impressive things in each year
    • 2019 GPT2: Ai can write coherent news articles, cherry pick of 10 samples
    • 2022 text-davinci-003: struggling to use basic tools
    • 2024 writing cuda kernels that beat some experts.
      • What were most impressive? IMO question, physics research, ML code — can write code better than human experts
  • Proposal: convert to length of task that can be autonomously completed
  • Vision tasks are much worse (but: positive upward trend)
  • Benchmarks, anecdotes overestimate model performance
  • Trends vs absolute value
  • Current work: What is the actual trend?
    • Could be constant factor lower
    • Could be
  • Are current models doing what we want?
    • You get what you measure
    • Reward hacking
      • Seem to know “this is not what we wanted”
      • Seems to save “I have no reason to game the system”
    • In chain of thought, seems to know, and reward hacks
    • Unclear what they’re thinking of
  • Future evaluation will be even harder
  • Takeaways:
    • Try to collect ground truth; examine trends, not where we’re at; don’t take evals at face value
‣

Neuroscience, by Greg Courado

  • Disappointment: making vision work didn’t help us understand visual cortex
    • Surprised: Scaling up compute
    • Transformer architecture: unrolling of neural network, related, but same problem: LLMs on transformers, haven’t understood human cortex of language synthesis
      • Theory: could construct chains of likelihoods — but understanding vague
    • Visual cortex — similar to conv neural nets
  • Structures of models vs humans
  • Conception: LLM contains multitudes, best & worst of humanity
    • Has a worn path of good reasoning
    • But also a worn path of psychosis, emotional blackmail
    • Any stream that’s a thread in human/thought — is getting transmitted into system
  • Neuroscience ⇒ AI?
    • Wanted to understand how computation, thought worked
    • Even recording from 1000 neurons — miniscule % of overall population; conductivity map is not well understood
    • Demis: we’re going to build brainlike systems, we will be able to do this
    • Hinton: End academic talks “and that’s how the brain works”
  • AI Sentience — headlines. What signs in LLMs might indicate sentience?
    • Regardless of facts, will have contention
      • Even if definitely not sentience or conscious, there’s a passionate minority committed to the fact that they are. People are willing to defend, fight for them
      • And also, some people will be human rights first
    • Dunno what consciousness is on a biological level
    • Epistemological level: unknowable.
      • Plausible: brain in a vat argument. Hard to prove that we’re sentient.
        • Worth trying to figure out?
          • If it’s unknowable: huge waste of time. People will believe what they believe
          • Is there a higher power?
    • Imagine: religious war of the future might be about sentience
  • Deploying AI safely?
    • If people do it, worry that AIs will do it too
      • People are sneaky. kids trying to learn to lie!
      • Redteaming — if a human can find a workaround, LMs can too
  • Q: Sentience: Anthropic gave Opus 4 the ability to stop conversation. Not making a judgement on sentience, views on repeating?
    • Take some of these to use judiciously. If AI systems will be disruptive in good ways, in ways that are safe and stable — need to be collaborators with humans, work in social fabric of how humans behave.
    • For that to actually work, makes sense to give systems ability to behave in ways that act according to social contract
    • Encourage systems to not be rude.
      • But: Human customer support will eventually hang up
    • Good for society to try things like this
      • But also: no suffering atm
  • Why, after collecting data, not able to know how the brain works?
    • Function of scale? Not sheer function, but something repeats (every connection not describe repeatedly)
    • Maybe; haven’t measured right things; lack right mathematics.
      • If LLM systems allow these breakthroughs
  • Q: using biological substrate for LLM?
    • impedance mismatch. LLM systems: fast, simple, electrified, high consumers of energy
    • real bio systems: slow, move ions, very energy efficient
      • Real info transfer between
  • LLMs have taught us nothing about the brain?
    • Stirred the pot; ppl feel emboldened; tension, is language fundamental? connected networks fundamental?
      • Both feel vindicated
    • Nature of humans: both sides are right, and still don’t agree
  • Are LLMs opaque for the same reasons?
  • Believer in mechinterp. More resource may not help, but make it go as fast as possible
  • Might not think like us; might think like we write
    • When we write, we’re exporting a reasoning process
      • Might be not how you think, but how you write about how you think
  • Q: Info quality?
    • All the companies have improved model behavior by paying more attention to quality of training data going in
    • Natural sciences: right thing is not “is data noisy”, but “does it come from distribution you care about”
      • Train on good patterns of thoughts. More training on Youtube comments
      • Training kids: start with curriculum, what do you teach
  • Architecture shifts that will be next unlock?
    • Word2Vec — huge unlock
    • Transformers — before, LSTM. Greg: these are shit, please
      • Noam Shazeer — please come up with something
    • “probability 0 that transformer architecture is the one going forward”
    • Matter of time: a way to change, abstract, generalize, tweak
    • In RL: explore vs exploit. Market is exploiting
      • But will switch - bayesian networks got Greg into AI
    • Would love to see probabalistic reasoning back into system
      • Real bayes networks: hard to build & train
      • Maybe: build neural systems which are universal approximators for neural reasoning
    • Also: some people think large LLM + elegant system prompt would give you something that you like & trust
      • Greg: BS. no way that an architecture this flat will lead to consistent results
    • Once we’re in agentic architectures, with structure inside cognition, will become dominant paradigm
      • Game: what are the right cognitive building blocks to build things out of
      • Imagine: maybe psychology PHDs matter
  • Believe in augmenting human abilities — how to approach using synthetic intelligence
    • Economic incentives don’t work out — if you can do sth without a person, it’s cheaper
    • Friends who are creative professionals, make their living making art. Generative AI is deeply problematic. Philosophical question:
      • is synthetic art like photography to painting? Electronic music to resonating cavities? (new tool)
      • Or: does it obviate. After word processers, everyone typed, no more typing roles
    • Everyone is now an editor. Can use early drafting. But taste, editorial skills become more important. (just like electronic music removes some kind of virtuosity)
    • Kylie: As a writer — might extinguish. Worry about brain atrophy
      • Greg: what do we teach our kids? what’s a job skill that’ll be useful? what will you be paid for in the future?
      • Nobody is deciding in a reasoned way; governed by market forces
    • Media is largely not using AI-gen writing (currently)
      • Unclear how well it’ll hold, in what contexts
      • Video games might become more individualized & personalized & immersive, until the eat up the space taken up by movie franchises
  • If a major lab had discovered an alternative, would we know?
    • Greg: no. but nobody’s found one yet.
    • People won’t change underlying architecture without radical perf improvement
    • If anything, gap between the best of the best and worst of best is smaller
      • eg if one lab had repeated failed pretraining runs?
        • greg: no
    • ICE is a process, auto engine has been at this process for a while
    • Car engines have gotten better in last 20 years. Continue on transformers
  • Can we get away from human centricity in AI?
    • Robotics: learning through sim & experience might allow systems to develop new patterns of cognition that feel nonhuman
    • Maybe more like ants, or spiders. Solving a problem, though not in a quintessentially human way
    • Moving knowledge from other species (dolphins? flocks?)
    • Can someone show a good way of doing that? SF ants and NYC cockroaches are intelligent