Public notes from The Curve 2025

‣

Geoff Ralston (with Maxwell Zeff)

With OpenAI, going around, 50% “when is AGI happening”

2015: 5-100y. avg: 2030

President of YC for 4y

At end of 2022, ChatGPT was in the wild; seemed to change everything

Convos about AIS vs AI video

OAI: formed for safety; explicit goal, to create superintelligence first
Convos always about tech
Ilya in 2017 — all about compute

Why SAIF?

(boring story from Geoff)
After leaving YC, people wanted to talk about startups, but mostly ended up talking about AI. Spoke to people about AIS
GGI convo — it’s good. But need more!

“Don’t have many skills, but working with startups is a hammer I have”

What if we help folks who want to build guardrails for the future
AI — horizontal tech, will be embedded everywhere, could go quite sideways. Who’s worrying about it?

Learning more about ecosystem in the past months/years
If we can build great startups with scaleable solutions
And: elevate the conversation, build the community, ecosystem around safety. Including: nonprofits, forprofits

Can cure cancer, old age

Any sufficiently powerful tech needs guardrails. Plane takeoff — safest thing you can do; didn’t happen for free; paid for it

FDA is no joke. (can argue about regulatory capture). But: we paid

Now: YC is opposing safety. Garry Tan, Marc Andreessen, Peter Thiel — how did we get here?

Strange that “safety” has become a dirty word. “You must be a decel” “They want to take away our future” “everyone who dies of car crash”
Strange, counterintuitive that regulation has become dirty. SB1047 — >$100m, might be liable

Sure, some startups are $B — but feels like an exception

Pandemics from AI: bad. Happy that frontier labs build in restrictions
“Not afraid of saying “safety” loudly. Safety, security, trust — help us deploy things more widely, make things more beneficial than humanity”
Semi-religious aspect which is weird. Bible: Antichrist is someone who talks about peace and safety. Those talking about safety might be antichrist

Marc Andreessen — evil to want any sort of guardrail in place

Is YC making a mistake, positioning against AI safety?

Will it hurt YC? No
Is it a mistake, more broadly? yes! wrongheaded
But: YC is funding AIS companies.
Really saying: regulation will slow us down; it’ll hurt startups
Shouldn’t be state-by-state regulation. US congress

YC says “Build something people want”. AI safety startups: is this something people want? How do you make a compelling startup?

Building trust, tech that allows you to trust better, is something people want.
We do attempt, when interviewing, talking, include the lens: are we building something people wrong
May have funded a company nobody wants. That happens.
YC’s motto is “build sth people want” is not because pg did it and said it was a good idea. He started with building sth people didn’t want. PG & Geoff met through viaweb acquisition

$100B infra rollouts, giant boom. Why this moment to start investing in AI? Feeling of importance as investor?

Geoff: I wake up, can’t get back to sleep, because of the thought of how everything is changing so fast; feels hard to imagine being able to make a difference, grab onto sth, say, “let’s matter here”
SAIF is our small effort to do that. Not going to change the course of humanity. Maybe change a small role. Talking about a small amount being spent
Accelerationist, people are spending $T. We’re maybe spending $10s of Ms
Maybe: spend a couple % points on future.

They say, “no, this is for the benefit of humanity”
Whether tech goes off on its own, or someone uses it — both exist. As a country, as a species, should think hard about the next stage

So: because it’s happening now
2015 — were right, at end of decade, this’ll be real

Leading products that aren’t SSI or safety, just making money. Social media apps; Meta using AI chats to advertise

Incentives drive human behavior. Problem these AI companies have: capital expenditures are incredible; cost of inference > more than payment. Can’t make up in volume
Need secular sources of revenue to make it not a bubble
Half growth in GDP is coming from datacenter spend, incl. GPUs
Sama: 250 GW of capacity by 2033. $Ts of spend. Have to drive enormous revenues to be able to do that. Meta, Google revenue streams
From 2022 to 2025, OAI revenue with from 0 to $13-15b.

Hearing from employees: consumer business powers the mission. Makes money to do AIS research. But companies like Google, FB evolve. Start with a clean-sounding mission, gets sullied over time. Is now different? Or have we seen this story before

Cyclical nature of biz, history rhyming — yeah. Looks like tech rollouts over history
Likely to be a bust here; likely, but don’t know if it will matter in the arc of rollout
“Who was around in 2000?”

Reflection: yahoo mail. didn’t change, growth of usage didn’t change. But tech was one of the fundamental permanent changes in how the world operated. Bubbles come and go. Same for AI

Stanford kid: “don’t remember how I did school before ChatGPT.”

How many kids are using LLMs, ChatGPT?

30-50%? no, all of them

Maybe financial bubble

Audience Q: Practical AI safety governance structures for founders — how to preserve mission as we scale?

Need to think more. Fair question — shooting from hip. So far, if aligned — let’s go
What do you do to keep them aligned? What do you tell a 2 person startup?
Can say “we promise not to be evil”. Does that matter? Not sure.
If you have ideas, on what language to use, Geoff is all ears.

Audience Q: Economics, incentives. Safety bring a public good, so it’s harder to fund using market mechanisms. Are things can use markets: regulations to push into investment; liability. How does Geoff view things from that angle

100% correct
AI underwriting company: creating a set a criteria so that agentic deployment can be insurable
Some parts of this question which are not approachable from a startup/forprofit. Personally, interested in philanthropically supporting research
xrisk — maybe an Ilya or Softmax, SSI before bad ones — maybe?
Games we’re playing are small ball (okay to admit)

But research into harder questions
We’re in a window of hope

With SI is there, not going to let us take a screwdriver into its innards. (not sure. maybe it will!) But I doubt it.
Now: any work with potential, not fruitful
Room for philanthropy, room for structure to create profit incentives for companies that can grow and scale. Also, AIUC in liability

Audience Q (Kylie) regulation? Fear we won’t have substantial regulation in time (eg SB1047). Do we have the ability to get it before too late? What does it look like, what do we need?

Hopeful, but not massively optimistic. I supported 1047; I supported 53 (it passed!)
At least, we have transparency
Support the RAISE act in NY. Support smart local efforts (though, agree that state-by-state is a bad idea). Not a great way to build regulatory regime
Not hopeful
Other efforts will hopefully infect what happens.
Maybe if we have a little disaster, might wake people up. E.g pandemic only kills 30m people? not a great example

Audience Q: Tim O’reilly, side project on AI disclosures.

Geoff: if you need to be safe, need to build it in?
[tim] Also looked at regulatory regimes. Learned: great # of things that we think of “safety innovations” were commercial. Eg headlights, wipers. Also, slow evolution (road signs, they were state-by-state)
Mandatory seatbelts — 70y after the seatbelt
[tim] We do a lot of premature regulation. Regulation evolves out of torts; someone sues, people say “bad things happen”. SAIF invests in headlights — useful to user, and increases safety

Observability, AI observability — part of infra that makes it regulable tech

Doesn’t address rogue SI
[tim] Afraid of fearmongering to get uninformed politicians for things that don’t work. Labs take SI escape seriously, and politicians won’t help.
Geoff: Might be surprised about what $B can do to someone.

Joke on all of us — getting offered $100m/y to go to the dark side

Meta ruining people’s lives is amenable to tort, sued, air traffic safety

[ac] doesn’t seem like law/regulation speeds up in sync with pace of regulation.

‣

Future of work with Aparna Chennapragada

Chief Product Officer of Microsoft
Previously, CPO of Robinhood, then VP at Google
https://aparnacd.substack.com/p/most-work-is-translation

Q: Where does org chart compression first happen: big tech? startups? labs?
today: shorthand for impressiveness is how many people you manage — how to push back against that incentive gradient with LLMs?

Practitioner issues — 10 lessons from the trenches. Moving from “just add AI” to “AI-native”
Been through 3 shifts: akamai, content

Eg first website was a scanned brochure
First mobile: take website, shrink it , call it mobile app
So: first wave are just add AI, do excel, photoshop, ppt
But: moving from AI-native. Rethink process, the how, the what
Going to focus on the how and the who

NLX is the new UX

(Natural language experience). No heavy graphics, UI, menus. Natl interface, you don’t have to adapt to menus
Friend spent 10 years learning illustrator — was a moat, until now
Good thing: flexible, zero learning curve. So high adoption
Bad thing: don’t know what to ask. Blank prompt is daunting
Product builders: not bells and whistles. Conversations have grammar

Prompt sets are the new PRDs

Instead of product spec
Process shifts. What are the set of prompts that you want the Agent/AI product to do really well at?

That’s the effective product spec
Everyone’s winging it. Which represents the whole? Who writes these things?

Composition of team changes — natl lang prompts are not same skillset as PRD

Benchmarks are the new QA

Almost every launch has demo +
Benchmarks are the way to establish your product works
Sales team — QA, and sales collateral
Almost nobody doing work is doing product math.

Talking to ppl, uploading spreadsheets, telling ppl what to do
How do you define benchmarks characterizing real work.
Not for frontier labs, but

“Small is the new black”

(on team sizes)
Some people say startups — speed vs scale
But also: AI era, very interesting

Talked to 25+ AI-native teams within msft & externally

When you’re small: teams rely a lot more on models

Instead of hiring more ppl to do eng that ends up being
HPUs for GPUs

Models keep getting better. In 3 months, ramp up of adding ppl vs leaning on models

Other reason: internet, mobile, have happened over a decade

This happens over months. Small team can adapt quickly, throw away, unlearn and learn

Double barbell of teams

Team composition is changing
Typically: swimlanes. Build/eng. Product. Design. Policy, comms.
Now: Complete barbell.

One extreme: a bunch of generalist builders
Other extreme: model whisperers. Understand how to get the best of the models. Getting the megabucks
In between: all AI
[ac] are model whisperers scarcer than generalist builders? How does one find or become a model whisperer?

Tools, evals, how do you have the runs
Interesting thing that Aparana takes away — very trainable

More recent gras are more in tune
ML researchers from a few years into their careers, have to unldearn too many things. 2-3y of phd, figuring it out

Avoid the six finger trap

A bunch of teams overengineered around the quirks
Then new model comes, 4o, and it’s a nonissue
Lesson: don’t overeng on current quirks. The model will eat that.

Skate to the model puck is going

Corollary. Weird thing to do.
Used to be: build the thing that can work today
But weird calculus. If you overcorrect to making it work today, will be obsolete in a few months, or underperforming relative to model puck
Build the product where model grows into it

Eg today, data scientist isn’t that good. But AI agent in a box

q: Convincing us? easy. Convincing customer: harder. how do you solve, customer wants it working day 1

aparna: Central tension
Consumer world: can shitpost on Twitter, say trust us, you’re not paying

(but even there, bar is higher)
Google I/O — preannouncement disease.
Companies, OpenAI, today you can use it

Enterprise: committed to a roadmap, have put something in their hands. Some frameworks; pilot vs not. But can’t say “pay me as if it’s a real thing”
So: benchmarks are starting to work. Past is predictor of future. 2 points don’t make a curve, but can extrapoloate

Memory is a moat

Roughly converging — are you just a wrapper on large models?
Raw intelligence is no longer the big differentiator

(eg coding agents. github/claude code, ppl switching all the time)

Memory is a moat — as a product, but also, what does it mean for users too
Wanted to use another model, but all my stuff is in ChatGPT. I just didn’t want to try a different thing

Need to start thinking about portability
Data portability was a big deal. What does it mean to take my stuff, go to a different product or model
In enterprise

[ac] Manifold memory

Get off Discord, and onto Manifold

Also technical problem. Building chief-of-staff agent
Want to marry world’s knowledge to meetings, emails, etc

But: there’s a shitton of it! Easy to dilute/pollute context window. How do you pull up the right context, make judgemtns

Also a policy perspective

Today’s magic is tomorrow’s commodity

Image, now it’s just Saturday.
How do you play in this. Don’t overeng for today. Take a bet. Bet-based product making, different from previous eras

The model is the product

Or, the model eats the product
(already happening: when someone writes sth. Don’t start a document editor. A lot start with chat, use it as a thought coprocessor, go back and forth)
Last mile: go tweaks, do human editing
Even narrow band of coding
IDEs vs Agents

Devs: VSCode, etc. But now: if you can delegate a lot, why look at pixels?
UI, products predicated on human eyeballs looking at the screen

“The best X is no X”
Payments team: optimize payment funnel
But uber vs cab — one joyous benefit, not have to rummage. Just get in and get out

Best payments = no payment

Best IDE = no IDE
Where are the cases where the model will eat the product

Bonus: The model is eating the org chart

“I talk to people so eng don’t have to”
legal: Converting constraints to contracts
product: converting customer stream of consciousness into product set
economics don’t support translation layer
very small LMs that are token constrained, incentive issues, incomplete info
Not trying to preserve their jobs

End structure: AI spine, becomes much more accessible, with a reasoning layer on top. Any person (including CEO) can access in read/write out of it. Won’t happen in a day (a lot of decisionmakers are themselves in translation layer). Upton Sinclair on salary & incentives, can’t make a man understand.

What are doers doing? Folks who are coding, using LLM tools, pump in meeting transcripts, standup notes; everything is pumped into the spine

These agents

CEO = godmode. But access control overall

Workbench (like Swebench)

What do most people do? Workflows characterizing translation layers, plot across stakes, other dimension = error rate. Follo wup: what do people do, how can we characterize as LLM-translateable

Job loss and automation

#11 — even less disrupted fields (b/c doers are not being replaced)
Rather than thinking of asi — artificial superdiligence. not asleep at the wheel, distracted. Everything we think of around intelligence, the bar might be higher or lower.

Medium worker/translator: diligence can be much higher, 24/7 vs 996

Different layers of orgs: debate, collaboration vs translation

Between top & bottom, not just translation
Seems harder to replace the things on the bottom
Maybe “transduce” would be more of an accurate term
Mixture of experts debate — can build into orchestration among agents with smaller translation

with memory, critique
Even though sales person is 2y, have institutional memory
Folks in big (or small) company is knowing how stuff was done

Failure modes:

Easy hallucination
“You’re holding it wrong” — agents use it a lot, some don’t use it. Is it an agents issue, how to use, trust? uneven in terms of usage. With customers
Blank prompt — people don’t know what this can do. Subtle thing: tried it 6mo ago, capability wasn’t there. Human priors are slow to update

‣

METR, w/ Beth Barnes & Lawrence

‣

Neuroscience, by Greg Courado

Disappointment: making vision work didn’t help us understand visual cortex

Surprised: Scaling up compute
Transformer architecture: unrolling of neural network, related, but same problem: LLMs on transformers, haven’t understood human cortex of language synthesis

Theory: could construct chains of likelihoods — but understanding vague

Visual cortex — similar to conv neural nets

Structures of models vs humans
Conception: LLM contains multitudes, best & worst of humanity

Has a worn path of good reasoning
But also a worn path of psychosis, emotional blackmail
Any stream that’s a thread in human/thought — is getting transmitted into system

Neuroscience ⇒ AI?

Wanted to understand how computation, thought worked
Even recording from 1000 neurons — miniscule % of overall population; conductivity map is not well understood
Demis: we’re going to build brainlike systems, we will be able to do this
Hinton: End academic talks “and that’s how the brain works”

AI Sentience — headlines. What signs in LLMs might indicate sentience?

Regardless of facts, will have contention

Even if definitely not sentience or conscious, there’s a passionate minority committed to the fact that they are. People are willing to defend, fight for them
And also, some people will be human rights first

Dunno what consciousness is on a biological level
Epistemological level: unknowable.

Plausible: brain in a vat argument. Hard to prove that we’re sentient.

Worth trying to figure out?

If it’s unknowable: huge waste of time. People will believe what they believe
Is there a higher power?

Imagine: religious war of the future might be about sentience

Deploying AI safely?

If people do it, worry that AIs will do it too

People are sneaky. kids trying to learn to lie!
Redteaming — if a human can find a workaround, LMs can too

Q: Sentience: Anthropic gave Opus 4 the ability to stop conversation. Not making a judgement on sentience, views on repeating?

Take some of these to use judiciously. If AI systems will be disruptive in good ways, in ways that are safe and stable — need to be collaborators with humans, work in social fabric of how humans behave.
For that to actually work, makes sense to give systems ability to behave in ways that act according to social contract
Encourage systems to not be rude.

But: Human customer support will eventually hang up

Good for society to try things like this

But also: no suffering atm

Why, after collecting data, not able to know how the brain works?

Function of scale? Not sheer function, but something repeats (every connection not describe repeatedly)
Maybe; haven’t measured right things; lack right mathematics.

If LLM systems allow these breakthroughs

Q: using biological substrate for LLM?

impedance mismatch. LLM systems: fast, simple, electrified, high consumers of energy
real bio systems: slow, move ions, very energy efficient

Real info transfer between

LLMs have taught us nothing about the brain?

Stirred the pot; ppl feel emboldened; tension, is language fundamental? connected networks fundamental?

Both feel vindicated

Nature of humans: both sides are right, and still don’t agree

Are LLMs opaque for the same reasons?
Believer in mechinterp. More resource may not help, but make it go as fast as possible
Might not think like us; might think like we write

When we write, we’re exporting a reasoning process

Might be not how you think, but how you write about how you think

Q: Info quality?

All the companies have improved model behavior by paying more attention to quality of training data going in
Natural sciences: right thing is not “is data noisy”, but “does it come from distribution you care about”

Train on good patterns of thoughts. More training on Youtube comments
Training kids: start with curriculum, what do you teach

Architecture shifts that will be next unlock?

Word2Vec — huge unlock
Transformers — before, LSTM. Greg: these are shit, please

Noam Shazeer — please come up with something

“probability 0 that transformer architecture is the one going forward”
Matter of time: a way to change, abstract, generalize, tweak
In RL: explore vs exploit. Market is exploiting

But will switch - bayesian networks got Greg into AI

Would love to see probabalistic reasoning back into system

Real bayes networks: hard to build & train
Maybe: build neural systems which are universal approximators for neural reasoning

Also: some people think large LLM + elegant system prompt would give you something that you like & trust

Greg: BS. no way that an architecture this flat will lead to consistent results

Once we’re in agentic architectures, with structure inside cognition, will become dominant paradigm

Game: what are the right cognitive building blocks to build things out of
Imagine: maybe psychology PHDs matter

Believe in augmenting human abilities — how to approach using synthetic intelligence

Economic incentives don’t work out — if you can do sth without a person, it’s cheaper
Friends who are creative professionals, make their living making art. Generative AI is deeply problematic. Philosophical question:

is synthetic art like photography to painting? Electronic music to resonating cavities? (new tool)
Or: does it obviate. After word processers, everyone typed, no more typing roles

Everyone is now an editor. Can use early drafting. But taste, editorial skills become more important. (just like electronic music removes some kind of virtuosity)
Kylie: As a writer — might extinguish. Worry about brain atrophy

Greg: what do we teach our kids? what’s a job skill that’ll be useful? what will you be paid for in the future?
Nobody is deciding in a reasoned way; governed by market forces

Media is largely not using AI-gen writing (currently)

Unclear how well it’ll hold, in what contexts
Video games might become more individualized & personalized & immersive, until the eat up the space taken up by movie franchises

If a major lab had discovered an alternative, would we know?

Greg: no. but nobody’s found one yet.
People won’t change underlying architecture without radical perf improvement
If anything, gap between the best of the best and worst of best is smaller

eg if one lab had repeated failed pretraining runs?

greg: no

ICE is a process, auto engine has been at this process for a while
Car engines have gotten better in last 20 years. Continue on transformers

Can we get away from human centricity in AI?

Robotics: learning through sim & experience might allow systems to develop new patterns of cognition that feel nonhuman
Maybe more like ants, or spiders. Solving a problem, though not in a quintessentially human way
Moving knowledge from other species (dolphins? flocks?)
Can someone show a good way of doing that? SF ants and NYC cockroaches are intelligent