âŁ
Geoff Ralston (with Maxwell Zeff)
- With OpenAI, going around, 50% âwhen is AGI happeningâ
- 2015: 5-100y. avg: 2030
- President of YC for 4y
- At end of 2022, ChatGPT was in the wild; seemed to change everything
- Convos about AIS vs AI video
- OAI: formed for safety; explicit goal, to create superintelligence first
- Convos always about tech
- Ilya in 2017 â all about compute
- Why SAIF?
- (boring story from Geoff)
- After leaving YC, people wanted to talk about startups, but mostly ended up talking about AI. Spoke to people about AIS
- GGI convo â itâs good. But need more!
- âDonât have many skills, but working with startups is a hammer I haveâ
- What if we help folks who want to build guardrails for the future
- AI â horizontal tech, will be embedded everywhere, could go quite sideways. Whoâs worrying about it?
- Learning more about ecosystem in the past months/years
- If we can build great startups with scaleable solutions
- And: elevate the conversation, build the community, ecosystem around safety. Including: nonprofits, forprofits
- Can cure cancer, old age
- Any sufficiently powerful tech needs guardrails. Plane takeoff â safest thing you can do; didnât happen for free; paid for it
- FDA is no joke. (can argue about regulatory capture). But: we paid
- Now: YC is opposing safety. Garry Tan, Marc Andreessen, Peter Thiel â how did we get here?
- Strange that âsafetyâ has become a dirty word. âYou must be a decelâ âThey want to take away our futureâ âeveryone who dies of car crashâ
- Strange, counterintuitive that regulation has become dirty. SB1047 â >$100m, might be liable
- Sure, some startups are $B â but feels like an exception
- Pandemics from AI: bad. Happy that frontier labs build in restrictions
- âNot afraid of saying âsafetyâ loudly. Safety, security, trust â help us deploy things more widely, make things more beneficial than humanityâ
- Semi-religious aspect which is weird. Bible: Antichrist is someone who talks about peace and safety. Those talking about safety might be antichrist
- Marc Andreessen â evil to want any sort of guardrail in place
- Is YC making a mistake, positioning against AI safety?
- Will it hurt YC? No
- Is it a mistake, more broadly? yes! wrongheaded
- But: YC is funding AIS companies.
- Really saying: regulation will slow us down; itâll hurt startups
- Shouldnât be state-by-state regulation. US congress
- YC says âBuild something people wantâ. AI safety startups: is this something people want? How do you make a compelling startup?
- Building trust, tech that allows you to trust better, is something people want.
- We do attempt, when interviewing, talking, include the lens: are we building something people wrong
- May have funded a company nobody wants. That happens.
- YCâs motto is âbuild sth people wantâ is not because pg did it and said it was a good idea. He started with building sth people didnât want. PG & Geoff met through viaweb acquisition
- $100B infra rollouts, giant boom. Why this moment to start investing in AI? Feeling of importance as investor?
- Geoff: I wake up, canât get back to sleep, because of the thought of how everything is changing so fast; feels hard to imagine being able to make a difference, grab onto sth, say, âletâs matter hereâ
- SAIF is our small effort to do that. Not going to change the course of humanity. Maybe change a small role. Talking about a small amount being spent
- Accelerationist, people are spending $T. Weâre maybe spending $10s of Ms
- Maybe: spend a couple % points on future.
- They say, âno, this is for the benefit of humanityâ
- Whether tech goes off on its own, or someone uses it â both exist. As a country, as a species, should think hard about the next stage
- So: because itâs happening now
- 2015 â were right, at end of decade, thisâll be real
- Leading products that arenât SSI or safety, just making money. Social media apps; Meta using AI chats to advertise
- Incentives drive human behavior. Problem these AI companies have: capital expenditures are incredible; cost of inference > more than payment. Canât make up in volume
- Need secular sources of revenue to make it not a bubble
- Half growth in GDP is coming from datacenter spend, incl. GPUs
- Sama: 250 GW of capacity by 2033. $Ts of spend. Have to drive enormous revenues to be able to do that. Meta, Google revenue streams
- From 2022 to 2025, OAI revenue with from 0 to $13-15b.
- Hearing from employees: consumer business powers the mission. Makes money to do AIS research. But companies like Google, FB evolve. Start with a clean-sounding mission, gets sullied over time. Is now different? Or have we seen this story before
- Cyclical nature of biz, history rhyming â yeah. Looks like tech rollouts over history
- Likely to be a bust here; likely, but donât know if it will matter in the arc of rollout
- âWho was around in 2000?â
- Reflection: yahoo mail. didnât change, growth of usage didnât change. But tech was one of the fundamental permanent changes in how the world operated. Bubbles come and go. Same for AI
- Stanford kid: âdonât remember how I did school before ChatGPT.â
- How many kids are using LLMs, ChatGPT?
- 30-50%? no, all of them
- Maybe financial bubble
- Audience Q: Practical AI safety governance structures for founders â how to preserve mission as we scale?
- Need to think more. Fair question â shooting from hip. So far, if aligned â letâs go
- What do you do to keep them aligned? What do you tell a 2 person startup?
- Can say âwe promise not to be evilâ. Does that matter? Not sure.
- If you have ideas, on what language to use, Geoff is all ears.
- Audience Q: Economics, incentives. Safety bring a public good, so itâs harder to fund using market mechanisms. Are things can use markets: regulations to push into investment; liability. How does Geoff view things from that angle
- 100% correct
- AI underwriting company: creating a set a criteria so that agentic deployment can be insurable
- Some parts of this question which are not approachable from a startup/forprofit. Personally, interested in philanthropically supporting research
- xrisk â maybe an Ilya or Softmax, SSI before bad ones â maybe?
- Games weâre playing are small ball (okay to admit)
- But research into harder questions
- Weâre in a window of hope
- With SI is there, not going to let us take a screwdriver into its innards. (not sure. maybe it will!) But I doubt it.
- Now: any work with potential, not fruitful
- Room for philanthropy, room for structure to create profit incentives for companies that can grow and scale. Also, AIUC in liability
- Audience Q (Kylie) regulation? Fear we wonât have substantial regulation in time (eg SB1047). Do we have the ability to get it before too late? What does it look like, what do we need?
- Hopeful, but not massively optimistic. I supported 1047; I supported 53 (it passed!)
- At least, we have transparency
- Support the RAISE act in NY. Support smart local efforts (though, agree that state-by-state is a bad idea). Not a great way to build regulatory regime
- Not hopeful
- Other efforts will hopefully infect what happens.
- Maybe if we have a little disaster, might wake people up. E.g pandemic only kills 30m people? not a great example
- Audience Q: Tim Oâreilly, side project on AI disclosures.
- Geoff: if you need to be safe, need to build it in?
- [tim] Also looked at regulatory regimes. Learned: great # of things that we think of âsafety innovationsâ were commercial. Eg headlights, wipers. Also, slow evolution (road signs, they were state-by-state)
- Mandatory seatbelts â 70y after the seatbelt
- [tim] We do a lot of premature regulation. Regulation evolves out of torts; someone sues, people say âbad things happenâ. SAIF invests in headlights â useful to user, and increases safety
- Observability, AI observability â part of infra that makes it regulable tech
- Doesnât address rogue SI
- [tim] Afraid of fearmongering to get uninformed politicians for things that donât work. Labs take SI escape seriously, and politicians wonât help.
- Geoff: Might be surprised about what $B can do to someone.
- Joke on all of us â getting offered $100m/y to go to the dark side
- Meta ruining peopleâs lives is amenable to tort, sued, air traffic safety
- [ac] doesnât seem like law/regulation speeds up in sync with pace of regulation.
âŁ
Future of work with Aparna Chennapragada
- Chief Product Officer of Microsoft
- Previously, CPO of Robinhood, then VP at Google
- https://aparnacd.substack.com/p/most-work-is-translation
- Q: Where does org chart compression first happen: big tech? startups? labs?
- today: shorthand for impressiveness is how many people you manage â how to push back against that incentive gradient with LLMs?
- Practitioner issues â 10 lessons from the trenches. Moving from âjust add AIâ to âAI-nativeâ
- Been through 3 shifts: akamai, content
- Eg first website was a scanned brochure
- First mobile: take website, shrink it , call it mobile app
- So: first wave are just add AI, do excel, photoshop, ppt
- But: moving from AI-native. Rethink process, the how, the what
- Going to focus on the how and the who
- NLX is the new UX
- (Natural language experience). No heavy graphics, UI, menus. Natl interface, you donât have to adapt to menus
- Friend spent 10 years learning illustrator â was a moat, until now
- Good thing: flexible, zero learning curve. So high adoption
- Bad thing: donât know what to ask. Blank prompt is daunting
- Product builders: not bells and whistles. Conversations have grammar
- 1:1
- Prompt sets are the new PRDs
- Instead of product spec
- Process shifts. What are the set of prompts that you want the Agent/AI product to do really well at?
- Thatâs the effective product spec
- Everyoneâs winging it. Which represents the whole? Who writes these things?
- Composition of team changes â natl lang prompts are not same skillset as PRD
- Benchmarks are the new QA
- Almost every launch has demo +
- Benchmarks are the way to establish your product works
- Sales team â QA, and sales collateral
- Almost nobody doing work is doing product math.
- Talking to ppl, uploading spreadsheets, telling ppl what to do
- How do you define benchmarks characterizing real work.
- Not for frontier labs, but
- âSmall is the new blackâ
- (on team sizes)
- Some people say startups â speed vs scale
- But also: AI era, very interesting
- Talked to 25+ AI-native teams within msft & externally
- When youâre small: teams rely a lot more on models
- Instead of hiring more ppl to do eng that ends up being
- HPUs for GPUs
- Models keep getting better. In 3 months, ramp up of adding ppl vs leaning on models
- Other reason: internet, mobile, have happened over a decade
- This happens over months. Small team can adapt quickly, throw away, unlearn and learn
- Double barbell of teams
- Team composition is changing
- Typically: swimlanes. Build/eng. Product. Design. Policy, comms.
- Now: Complete barbell.
- One extreme: a bunch of generalist builders
- Other extreme: model whisperers. Understand how to get the best of the models. Getting the megabucks
- In between: all AI
- [ac] are model whisperers scarcer than generalist builders? How does one find or become a model whisperer?
- Tools, evals, how do you have the runs
- Interesting thing that Aparana takes away â very trainable
- More recent gras are more in tune
- ML researchers from a few years into their careers, have to unldearn too many things. 2-3y of phd, figuring it out
- Avoid the six finger trap
- A bunch of teams overengineered around the quirks
- Then new model comes, 4o, and itâs a nonissue
- Lesson: donât overeng on current quirks. The model will eat that.
- Skate to the model puck is going
- Corollary. Weird thing to do.
- Used to be: build the thing that can work today
- But weird calculus. If you overcorrect to making it work today, will be obsolete in a few months, or underperforming relative to model puck
- Build the product where model grows into it
- Eg today, data scientist isnât that good. But AI agent in a box
- q: Convincing us? easy. Convincing customer: harder. how do you solve, customer wants it working day 1
- aparna: Central tension
- Consumer world: can shitpost on Twitter, say trust us, youâre not paying
- (but even there, bar is higher)
- Google I/O â preannouncement disease.
- Companies, OpenAI, today you can use it
- Enterprise: committed to a roadmap, have put something in their hands. Some frameworks; pilot vs not. But canât say âpay me as if itâs a real thingâ
- So: benchmarks are starting to work. Past is predictor of future. 2 points donât make a curve, but can extrapoloate
- Memory is a moat
- Roughly converging â are you just a wrapper on large models?
- Raw intelligence is no longer the big differentiator
- (eg coding agents. github/claude code, ppl switching all the time)
- Memory is a moat â as a product, but also, what does it mean for users too
- Wanted to use another model, but all my stuff is in ChatGPT. I just didnât want to try a different thing
- Need to start thinking about portability
- Data portability was a big deal. What does it mean to take my stuff, go to a different product or model
- In enterprise
- [ac] Manifold memory
- Get off Discord, and onto Manifold
- Also technical problem. Building chief-of-staff agent
- Want to marry worldâs knowledge to meetings, emails, etc
- But: thereâs a shitton of it! Easy to dilute/pollute context window. How do you pull up the right context, make judgemtns
- Also a policy perspective
- Todayâs magic is tomorrowâs commodity
- Image, now itâs just Saturday.
- How do you play in this. Donât overeng for today. Take a bet. Bet-based product making, different from previous eras
- The model is the product
- Or, the model eats the product
- (already happening: when someone writes sth. Donât start a document editor. A lot start with chat, use it as a thought coprocessor, go back and forth)
- Last mile: go tweaks, do human editing
- Even narrow band of coding
- IDEs vs Agents
- Devs: VSCode, etc. But now: if you can delegate a lot, why look at pixels?
- UI, products predicated on human eyeballs looking at the screen
- âThe best X is no Xâ
- Payments team: optimize payment funnel
- But uber vs cab â one joyous benefit, not have to rummage. Just get in and get out
- Best payments = no payment
- Best IDE = no IDE
- Where are the cases where the model will eat the product
- Bonus: The model is eating the org chart
- âI talk to people so eng donât have toâ
- legal: Converting constraints to contracts
- product: converting customer stream of consciousness into product set
- economics donât support translation layer
- very small LMs that are token constrained, incentive issues, incomplete info
- Not trying to preserve their jobs
End structure: AI spine, becomes much more accessible, with a reasoning layer on top. Any person (including CEO) can access in read/write out of it. Wonât happen in a day (a lot of decisionmakers are themselves in translation layer). Upton Sinclair on salary & incentives, canât make a man understand.
What are doers doing? Folks who are coding, using LLM tools, pump in meeting transcripts, standup notes; everything is pumped into the spine
These agents
CEO = godmode. But access control overall
Workbench (like Swebench)
- What do most people do? Workflows characterizing translation layers, plot across stakes, other dimension = error rate. Follo wup: what do people do, how can we characterize as LLM-translateable
Job loss and automation
- #11 â even less disrupted fields (b/c doers are not being replaced)
- Rather than thinking of asi â artificial superdiligence. not asleep at the wheel, distracted. Everything we think of around intelligence, the bar might be higher or lower.
- Medium worker/translator: diligence can be much higher, 24/7 vs 996
- Different layers of orgs: debate, collaboration vs translation
- Between top & bottom, not just translation
- Seems harder to replace the things on the bottom
- Maybe âtransduceâ would be more of an accurate term
- Mixture of experts debate â can build into orchestration among agents with smaller translation
- with memory, critique
- Even though sales person is 2y, have institutional memory
- Folks in big (or small) company is knowing how stuff was done
Failure modes:
- Easy hallucination
- âYouâre holding it wrongâ â agents use it a lot, some donât use it. Is it an agents issue, how to use, trust? uneven in terms of usage. With customers
- Blank prompt â people donât know what this can do. Subtle thing: tried it 6mo ago, capability wasnât there. Human priors are slow to update
âŁ
METR, w/ Beth Barnes & Lawrence
- METR:
- AI R&D, think self improvement
- Loss of control (rogue replication)
- Alignment (scheming, sabotage)
- Rogue replication hard to shut down â seems concerning?
- [ac] are blockchains a bit like
- Three questions:
- Current real-world impact
- When should we see very capable AI systems?
- How easy to make systems do what we want to do
- Uplift takeaway
- interesting: software is very top heavy, most productive
- Capable AIs?
- First billionaire-$ solopreneur (Dario) vs Gary
- Fast saturation problem
- Artificial, constrained to 1 domain
- RE-Bench
- Most impressive things in each year
- 2019 GPT2: Ai can write coherent news articles, cherry pick of 10 samples
- 2022 text-davinci-003: struggling to use basic tools
- 2024 writing cuda kernels that beat some experts.
- What were most impressive? IMO question, physics research, ML code â can write code better than human experts
- Proposal: convert to length of task that can be autonomously completed
- Vision tasks are much worse (but: positive upward trend)
- Benchmarks, anecdotes overestimate model performance
- Trends vs absolute value
- Current work: What is the actual trend?
- Could be constant factor lower
- Could be
- Are current models doing what we want?
- You get what you measure
- Reward hacking
- Seem to know âthis is not what we wantedâ
- Seems to save âI have no reason to game the systemâ
- In chain of thought, seems to know, and reward hacks
- Unclear what theyâre thinking of
- Future evaluation will be even harder
- Takeaways:
- Try to collect ground truth; examine trends, not where weâre at; donât take evals at face value
âŁ
Neuroscience, by Greg Courado
- Disappointment: making vision work didnât help us understand visual cortex
- Surprised: Scaling up compute
- Transformer architecture: unrolling of neural network, related, but same problem: LLMs on transformers, havenât understood human cortex of language synthesis
- Theory: could construct chains of likelihoods â but understanding vague
- Visual cortex â similar to conv neural nets
- Structures of models vs humans
- Conception: LLM contains multitudes, best & worst of humanity
- Has a worn path of good reasoning
- But also a worn path of psychosis, emotional blackmail
- Any stream thatâs a thread in human/thought â is getting transmitted into system
- Neuroscience â AI?
- Wanted to understand how computation, thought worked
- Even recording from 1000 neurons â miniscule % of overall population; conductivity map is not well understood
- Demis: weâre going to build brainlike systems, we will be able to do this
- Hinton: End academic talks âand thatâs how the brain worksâ
- AI Sentience â headlines. What signs in LLMs might indicate sentience?
- Regardless of facts, will have contention
- Even if definitely not sentience or conscious, thereâs a passionate minority committed to the fact that they are. People are willing to defend, fight for them
- And also, some people will be human rights first
- Dunno what consciousness is on a biological level
- Epistemological level: unknowable.
- Plausible: brain in a vat argument. Hard to prove that weâre sentient.
- Worth trying to figure out?
- If itâs unknowable: huge waste of time. People will believe what they believe
- Is there a higher power?
- Imagine: religious war of the future might be about sentience
- Deploying AI safely?
- If people do it, worry that AIs will do it too
- People are sneaky. kids trying to learn to lie!
- Redteaming â if a human can find a workaround, LMs can too
- Q: Sentience: Anthropic gave Opus 4 the ability to stop conversation. Not making a judgement on sentience, views on repeating?
- Take some of these to use judiciously. If AI systems will be disruptive in good ways, in ways that are safe and stable â need to be collaborators with humans, work in social fabric of how humans behave.
- For that to actually work, makes sense to give systems ability to behave in ways that act according to social contract
- Encourage systems to not be rude.
- But: Human customer support will eventually hang up
- Good for society to try things like this
- But also: no suffering atm
- Why, after collecting data, not able to know how the brain works?
- Function of scale? Not sheer function, but something repeats (every connection not describe repeatedly)
- Maybe; havenât measured right things; lack right mathematics.
- If LLM systems allow these breakthroughs
- Q: using biological substrate for LLM?
- impedance mismatch. LLM systems: fast, simple, electrified, high consumers of energy
- real bio systems: slow, move ions, very energy efficient
- Real info transfer between
- LLMs have taught us nothing about the brain?
- Stirred the pot; ppl feel emboldened; tension, is language fundamental? connected networks fundamental?
- Both feel vindicated
- Nature of humans: both sides are right, and still donât agree
- Are LLMs opaque for the same reasons?
- Believer in mechinterp. More resource may not help, but make it go as fast as possible
- Might not think like us; might think like we write
- When we write, weâre exporting a reasoning process
- Might be not how you think, but how you write about how you think
- Q: Info quality?
- All the companies have improved model behavior by paying more attention to quality of training data going in
- Natural sciences: right thing is not âis data noisyâ, but âdoes it come from distribution you care aboutâ
- Train on good patterns of thoughts. More training on Youtube comments
- Training kids: start with curriculum, what do you teach
- Architecture shifts that will be next unlock?
- Word2Vec â huge unlock
- Transformers â before, LSTM. Greg: these are shit, please
- Noam Shazeer â please come up with something
- âprobability 0 that transformer architecture is the one going forwardâ
- Matter of time: a way to change, abstract, generalize, tweak
- In RL: explore vs exploit. Market is exploiting
- But will switch - bayesian networks got Greg into AI
- Would love to see probabalistic reasoning back into system
- Real bayes networks: hard to build & train
- Maybe: build neural systems which are universal approximators for neural reasoning
- Also: some people think large LLM + elegant system prompt would give you something that you like & trust
- Greg: BS. no way that an architecture this flat will lead to consistent results
- Once weâre in agentic architectures, with structure inside cognition, will become dominant paradigm
- Game: what are the right cognitive building blocks to build things out of
- Imagine: maybe psychology PHDs matter
- Believe in augmenting human abilities â how to approach using synthetic intelligence
- Economic incentives donât work out â if you can do sth without a person, itâs cheaper
- Friends who are creative professionals, make their living making art. Generative AI is deeply problematic. Philosophical question:
- is synthetic art like photography to painting? Electronic music to resonating cavities? (new tool)
- Or: does it obviate. After word processers, everyone typed, no more typing roles
- Everyone is now an editor. Can use early drafting. But taste, editorial skills become more important. (just like electronic music removes some kind of virtuosity)
- Kylie: As a writer â might extinguish. Worry about brain atrophy
- Greg: what do we teach our kids? whatâs a job skill thatâll be useful? what will you be paid for in the future?
- Nobody is deciding in a reasoned way; governed by market forces
- Media is largely not using AI-gen writing (currently)
- Unclear how well itâll hold, in what contexts
- Video games might become more individualized & personalized & immersive, until the eat up the space taken up by movie franchises
- If a major lab had discovered an alternative, would we know?
- Greg: no. but nobodyâs found one yet.
- People wonât change underlying architecture without radical perf improvement
- If anything, gap between the best of the best and worst of best is smaller
- eg if one lab had repeated failed pretraining runs?
- greg: no
- ICE is a process, auto engine has been at this process for a while
- Car engines have gotten better in last 20 years. Continue on transformers
- Can we get away from human centricity in AI?
- Robotics: learning through sim & experience might allow systems to develop new patterns of cognition that feel nonhuman
- Maybe more like ants, or spiders. Solving a problem, though not in a quintessentially human way
- Moving knowledge from other species (dolphins? flocks?)
- Can someone show a good way of doing that? SF ants and NYC cockroaches are intelligent