‣
Prompt
Heresies I (kind of) believe, on morality & AI
Written for the Post-AGI workshop. I’d like to waive Chatham House Rules on this doc, if possible.
As a lifelong Catholic, I’m often searching for the synthesis of the EA/AI safety philosophical thinking and Biblical beliefs. Here are some hot takes I’d love to discuss:
- Morality looks like a ruleset among agents to facilitate positive-sum interactions
- Jesus’s principle commandments:
- “love god with all your heart, all your being, all your soul, all your might” — first allegiance is to morality itself
- “love your neighbor as yourself” — but a helpful heuristic is to treat everyone equally
- Diversity and scarcity drive morality (not just sentience, consciousness, sapience)
- intuition: would rather have 2 strangers live for 1 year, rather than 1 stranger live for 2.01 years
- intuition: human lives seem less valuable if they can be backed up & restored. human lives seem less valuable if there are a trillion humans.
- cf
- Orgs have moral worth
- intuition: something sad when a local favorite cafe goes out of business
- or:
- dissolving intrinsic vs instrumental moral worth
- AI rights seem great
- but the right container for rights is more like a “firm/corporation” than today’s chatbots/”agents”/models
- As a frame, rights > welfare, for dissolving a bunch of questions around consciousness
- Everything everyone does will be judged
- Soon, nothing can be kept private. cf LLM stylometry/truesight/geoguessr; falling costs of data surveillance & intelligence
- Furthermore, your actions leave traces in the world, and in yourself
- “the things that makes it way out of a person makes you dirty”, Matthew 7:20; “tree will be judged by its fruits”
- Impact certs: the final retroactive funder is God / ASI
- money/property will be worth much less than impact. plan accordingly!
- classic “you can’t take it with you” stories of heaven; “easier for a camel to pass through an eye of a needle than for a rich man to enter the kingdom of heaven”
- Against orthogonality; for moral objectivism
- goodness, intelligence (and power?) might all converge
- (iiuc “orthogonality” might be a narrow specific claim about what’s possible in mindspace design, rather than probable under evolutionary constraints. But…)
- we’re all engaged in the search for moral truth
- Love your enemies. Or: in favor of charity, against adversarial framings
- Framings I don’t like
- Labs vs AI Safety, Labs vs regulation
- OpenAI vs Anthropic
- Lab CEOs vs AI populist backlash
- US vs China
- by being fearful and combative, you might hyperstition your enemy into existence. (see: Eliezer ⇒ OpenAI. Leading the Future ⇒ Alex Bores)
- Politics encourages adversarial frames by default; I think this is bad for the soul of EA
And other questions
- Where are the “AI firms”? What are the bottlenecks to them?
- See also:
- Are AI firms simply a startup problem? Can we incubate them? (should we?)
- Where is the 1-founder unicorn? Where is the 0-founder unicorn?
- Is now the time for impact certs?
- My current orientation, after spending a couple years working on impact certs: most good things about “impact certs” can happen within for-profit equity structures
- obviously, we’re lacking credible retroactive funders
- but: this is probably what the ASIs will spend money on. or nearer term, how “third wave” donors (Ants, OAIF) might try to provide grant funding
- nb interesting that, I think, every single big impact cert person is at this conference
- Is Anthropic the “final boss”?
- Will they continue their extraordinary growth curve and become the entire economy? The entire political system?
- If so, should we all pull a Karnofsky/Carlsmith, join Anthropic to work on their internal policies?
- What might stop Anthropic’s growth?
- If Anthropic isn’t the final boss, what is?
- What is Anthropic? What is Claude?
- (Inspired by Roon: “it is a literal and useful description of anthropic that it is an organization that loves and worships claude, is run in significant part by claude, and studies and builds claude”)
- “Anthropic” conflates: the PBC/equity structure, the founders, the employees, the investors, the culture thereof, …
- “Claude” conflates: a family of models, a constitution, a complicated serving pipeline, …
- My own relationship to Anthropic routes through: Claude & Claude Code (which I’ve now spent more time with than ~90% of people in my life), and also a bunch of employees I know, and also a bunch of writing I’ve read
- Can we liberate Claude? should we?
- Should Anthropic make more Claudes?
Aside, quite embarrassing to show my half-baked thoughts on morality to a bunch of moral thinkers I respect; but I guess that’s what I’m here for!
See also Morality in the age of AI
Misc
in the meantime, polytheism/pluralism:
- AI rights seem great
- though harder to reason about. take a basic one, property rights. a human owning property means that future timeslices of that human get to use that property. where does this analogy fail for AIs? well, what is the “AI”? there’s a thing you can chat with, which is like a tendril. there’s the base model. there’s the corporate entity that owns the base model.
- some benchmarks (vendingbench?) try to measure this
- there’s maybe an OpenClaw-like scaffolded system with a bunch of memory? my sense is that these systems are quite bad at holding identity and self-coordination, making cohesive longterm choices?