@October 28, 2024 election-promises usage notes
- Started with Claude Artifacts
- (also wtf Claude really needs to make these things exportable)
Mock up a website design to inform voters of likely outcomes of the presidential election. The website name is "If elected, I will...". The outcomes are taken from predictions on the site Manifold Markets.Show an image of Harris saying "reschedule marijuana from a schedule 1 drug" and trump saying "organize a ceasefire in Ukraine before 2026 midterms".
Put Kamala on the left and Trump on the right of their cards; make the quotes look like speech bubbles coming out of their avatar
But it didn’t actually have Trump/Harris images, so I grabbed them from Manifold
Replace the images with https://manifold.markets/political-candidates/harris.png and trump.png; add a little tail to the speech bubbles so it looks like it comes out of their mouths
Could paste in images, which was cool/timesaving

- But inability to see the results with real images led me to switch to yield…
- See discussion on https://yield.sh/edit/election-promises/app.tsx
- Looking at the diffs sometime helped when code didn’t match my expectations
- I guess it saves a bit of time too
- Applying diffs via Haiku seemed to basically always work, nice
- Could write eval for Cerebras
- Had to be prompted to refactor code
- I sometimes dropped in to tweak wording exactly (best for copy text, where my taste is better than Claude’s)
- Not having image paste was a bit awkward; I copied in some text instead
- Maybe ideal is pasting in an URL?
- With images, the benefit is in providing exactly the context and no more
- Benefits of yield over Claude Artifacts
- Immediately hosted & shareable (eg sent to Nate)
- Can render externally linked images…
- Can tweak code directly for things I know how to do, instead of asking via prompt
- Can import custom libraries from esm.sh
- Benefits of Claude Artifacts over yield
- Feels a bit more reliable? idk
- Version tabbing is nice, maybe we should steal it
- Though in practice I’m not really going back to old version very much lol. It’s more about peace of mind, having an undo function
- Maybe instead of versioning to a number, it saves a commit message based on the prompt
- Maybe it’s integrated inline to the chat, vs separate numbering system?
- Export formats
- Could export to single HTML page which is hosted… somewhere (git?)
- There’s gotta be some cheap/free place to just host static HTML files at scale. (Cloudflare?)
- Some stuff not needed for export (like uiw react editor)
- Nice to have source code too, for modifying + exporting to NextJS I guess (did this for futarchy.dev)
- Could just add a copy button
Do want to have side-by-side code + preview for these cases
Accepts image-pasted input
Shadcdn out of the box
Fewer clicks (applied & saved automatically)

At least, get rid of alert popup, and show current loaded version “7/9”. Maybe also show timestamp.
Try: fast diff gen + apply with cerebras, versioned on every checkpoint?
@October 25, 2024 Playing with different futarchy.dev versions
futarchy.dev + yield.sh@October 25, 2024 Check in with Matt
- Catch ups, childcare, what happened, Matt & childcare
- Overall:
- yield.sh: upgraded sonnet, db generation works out of the box a lot, one more prompt that passed “want to share” test
- shareability
- Intuition: turn up the fun, turn up variability, get something that you are
- Sonnet should be make things a lot better
- Old sonnet 3.5 vs new sonnet 3.5 to build stuff in Minecraft
- Even though it’s not much better — on this test it’s so much better. Some speculation — maybe get better spatial & visual reasoning. Feels handwavy
- Some tradeoff between “how to get started” vs constraining space
- Pippa: Maybe in early days, don’t constrain the space
- Like Matt where you find a set of users to force, do 15 prompts, get past the hurdle
- Instinctive user behavior in the early days, the 2 things Pippa sees in the left panel. Really interesting behavior
- starting talking with Nate Foss (founder of gather town) on a joint project to hack on (prediction market futarchy)
- Matt: Usual model, be pretty explicit, it’s something we’re pretty into, don’t know where it’ll go, maybe won’t be a thing. Pretty open minded, hope that it evolves into something that’s important to me
- In an ideal world, have someone to do it with. Not asking to accept a marriage proposal
- Way Matt usually thinks about it — do we feel more productive, ideally more than sum of parts
- Anchor on a reasonable level of uncertainty, with upside case being clear
- Could just be really interesting to — as well as hacking together — spend some time talking about what it could become, seeing what his instincts
- There’s the core of something interesting here, suspect there’ll be a new behavior somewhere, needs to come up
- Prompting, sharing, a loop around it
- Fun to see whether we feel aligned or not
- slight tangent: was at AI conference last week. “Why do AI panels suck?” was talking about it’s a weird technology, the primitives are so far ahead of productiziation
- Other than ChatGPT, few products that use the primitives
- Panel with chief product officers of OAI, Ant, but also Instagram
- Riff on early insta — they got so obsessed with features, but in the end, what they thought was important, were not — and the things they didn’t hink about was key
- Like the square — scroll back to early insta, became iconic
- More important: have the right primitive, and then create enough variance that interesting behaviors get amplified
- Just getting people to use it even if it feels weird
- Very generative, the 2 of them, talking about what it might be. But then — let’s build it, put in people hands
- What are the new primitives?
- Arbitrary image, arbitrary prompt and get high quality response
- Core premise: chatpgt will not be most popular product in 5 years. We’re basically not using the primitives in 5 years, the way we’re using them is boring
- (Already told pippa) 2 panel of og 6 of ChatGPT
- When they finished GPT4 training in May/Jun of ‘22, 2 posttraining teams, smaller one was chat, that was not that interesting, not the cool thing (which was instruction tuning)
- In Nov there were 7 people in Chat
- Median guess for #users was 50k after 1mo. 1 person guessed 1m, actually 100m
- Ran from Aug - Oct a friends & family beta in 50. Average daily actives was 4
- Reflection — 50 is too small to observe real variation in usage. Interesting, the 4 people, all were using it for debugging
- Nobody was using it for things that drive most of the usage — 50% of GPT might still be people who are coding
- Interesting the inarguably fastest growing product
- Small usage at start can obscure what’s interesting
- Also: debugging matters, something feels applicable around how you generate enough breadth to allow the product to blossom
- Some equivalent of temperature for this, where you can increase the variance in the actual code generation
- Artificial equivalent of 4 different types, where you increase the variance at level of
- Computer usage
- [m] Feels like it’s been pretty underappreciated
- What to do next
- advice for The Curve (esp — maintaining value after the conference?)
- Won’t be able to get out to SF in Nov will be in London
- From Jan, will be in SF a lot. Keep having a lot of other things
- also Progress Conference — but sad not able to make
- E.g. how did Matt do it for AISS
- AISS is very formal
- Been to a bunch of good conferences with good informal ways of following up
- Genuinely active whatsapp groups
- For AISS — easy bit, knew there was another one in Korea in May. So formal touchpoints leading up to May.
- Road to summit, lay trail for next of agenda
- One thing Matt really likes: intl stuff, cool to have a London day, SF day, 3mo later
- Dialog every year, it works really well. Some city that no one’s from (last one was Frankfurt)
- Most important: events can be fantastic ways to build community, but it’s real work. Need someone who wakes up, “what do I to stimulate”
- Ben casnova. Runs Village Global, man about town in SV, runs a thing called Satori (annual conf)
- Satori whatsapp group remains active all year. Ben just makes a massive effort
- Part linkedin, news alerts — any time anyone who attended the last one attended something cool, he posts in that group
- Whenever he writes anything that is vaguely topical for people, posts on it
- Rhythm “Can’t believe Pippa got knighted”
- Pippa: Group chat from events conference, one of the only networking things that is valuable, active, the exact thing playbook
- Plus they actually have a monthly call where people catch up
- advice for Manifold
Next steps for checkpoint
- try the “variations” approach
- put it in front of 15 people
- build out the futarchy thing here
@October 24, 2024
- How to do a db migration?
- Adding upvotes sucks
- Even though no migration needed!
- Like, need a new join table, then need to add APIs to write to and sync from the table, ugh
- Just want to pretend everything’s in memory and just works
@October 22, 2024 Orchestrating prompts
- Fundamentally: how to structure code when language models are creative but unreliable?
- Analogy: human users are also creative and unreliable, eg on a social media site, or Github
- Also corporations, on AWS
- But there
‣
‣
- Hm… Instead of complicated fanout structure for eg generating a slug for an app, can just ask for structured text output:
- Or could try some kind of structured JSON or tool calling stuff
<antArtifact identifier="dashboard-component" type="application/vnd.ant.react" title="React Component: Metrics Dashboard">
@October 19, 2024
- Bun is seriously so good, shell scripting instead of package.json :chefskiss:
- This Vercel AI SDK thing seems annoying to merge with Cerebras. Let’s just use bare cerebras for now.
- Cerebras is fast (like, 1s to generate apps from scratch) but the apps are somewhat worse than sonnet’s
- Could be good for applying diffs
- Regenerating apps is a neat party trick, but users probably want a more iterative, diffusion-y approach
- See Aider benchmarks: https://aider.chat/docs/leaderboards/
- meta: there’s probably going to be lots of things like this, where there’s a tradeoff between price, latency, and quality among different LLM providers
- Useful thinking for world of LLMs, eg o1 is very high quality vs cerebras llama-7.1b is very fast
@October 8, 2024
- Notes from Ben Evans:
- https://www.ben-evans.com/benedictevans/2024/6/8/building-ai-products
A stock reaction of AI people to examples like mine is to say “you’re holding it wrong” - I asked 1: the wrong kind of question and 2: I asked it in the wrong way. I should have done a bunch of prompt engineering! But the message of the last 50 years of consumer computing is that you do not move adoption forward by making the users learn command lines - you have to move towards the users.
Looking at this on another axis: with any new technology, we begin by trying to make it fit the problems we already have, while the incumbents try to make it a feature (hence Google and Microsoft spraying LLMs all over their products in the last year). Then startups use it to unbundle the incumbents (to unbundle search, Oracle or Email), but meanwhile, other startups try to work out what we could build that would be truly native to the new technology. That comes in stages. First, Flickr had an iPhone app, but then Instagram used the smartphone camera, and used local computing to add filters, and further on again, Snap and TikTok used the touch screen, video and location to make something truly native to the platform. So, what native experiences do we build with this, that aren’t the chatbot itself, or where the ‘error rate’ doesn’t matter, but abstracts this new capability in some way?
@October 7, 2024
- Playing with Eden Treaty; need to figure out how to pass Server generated Elysia framework to clients?
- Maybe a codegen solution is actually better vs magic type inference
- Spent a while trying to reroute to OpenRouter; it’s not an exact match for Anthropic API…
- Could go all in on OpenRouter
- … or could try Vercel’s ai-sdk…
@October 5, 2024
- Hm, try catching errors the way Val Town does?
- Bug: modifyCode sometimes just returns a subset
Testing user interaction with LLMs?
- Spin up a new app, make various calls, have LLMs input commands and test them?
- For each app, ask LLM to create a testing plan for each one??
- Write tests for the different platform components (eg auth, db, etc?)
- Note: When more of the coding happens via LLM, then testing/verifying becomes more expensive, takes up a larger chunk of human time.
Flo user test
- Expected code to be on the same side as the chat
- Generated movie recs, enter a movie.
- Oh, I really like “The intouchables”
- Get more recommendations
- I want to change to be book recommendations ⇒
- Looking at diff through what’s changed, just curious
- When Flo uses lang models to generate code, feel like I want to check that it makes sense
- If it had just applied, I wouldn’t have done it
- Tried getting Claude to do it
- don’t have a Claude API key
- Can you edit the diff?
- “You have to apply”. Oh, you can edit the code
- Cool I can get my API key, I guess
- I guess the reason is that we don’t want people to use your API keys, maybe
Auto-apply diffs? cursor doesn’t, v0 does…
Tried to save just now with ctrl+s
- Formatting feels weird, thought it would auto format
- Now it doesn’t work — is that something I did?
- [a] Still had to take over, check console logs, etc
Ended up having to delete Flo’s stuff because of local vs remote sync issues… ideally figure out a way to merge volumes and databases?
- Volumes could go with git
- Databases…. not sure how to generally merge sqlite stuff
- Cleaning up data sucks, dealing with data incompatibility sucks… db migration sucks
- Wonder how to make all this stuff much better — how to get to developer nirvana with databases?
- (go schemaless?)
- (why do you need a database when your data fits in memory?)
Oh, should ask about background with coding, code gen tools as part of interview
‣
@October 4, 2024
- More context to Claude:
- Internal APIs
- LLM capabilities like /anthropic. Maybe add /fal.
- Setting up DBs
- Other files in the same project
- NPM packages — how to know which esm.sh things will work?
- Approaches for context on internal APIs:
- Just write a text prompt
- Build on some docusaurus-like thing
- … or build our own…
- Bun docs are a collection of .md files https://github.com/oven-sh/bun/tree/main/docs
- Get it for free(ish) from Elysia & Swagger?
- Hm… Elysia is a bit of a commitment
Embed GitHub
- Also, do we want strong typing? Or does LLM solve this?
- Scalar looks nice though!
- Get Claude to do documentation?
- see what Hanson is doing
- Generate from example apps?
- Maybe we should start versioning APIs…
@October 2, 2024
- Auth
- Okay now trying to fix sth based on the way csrf token is set — Claude noticed that
__Host-authjs.csrf-token=
matches, but notauthjs.csrf-token=
- Database
- How to handle database migrations in userspace?
- Generate via Drizzle? Not sure if we want to be locked into that though, adds extra complexity
- Generate a migration somehow, apply diff in bun, or somehow specify the init script to rerun — without dropping everything
- Do something like https://david.rothlis.net/declarative-schema-migration-for-sqlite/
- (could migrate python script to JS)
- hm, doesn’t do data migrations though
- … ask Claude to generate the migration??
- very tempted… Could introspect the existing sql schema maybe, then diff against intended schema.
- Go schemaless???
‣
‣
@October 1, 2024 Approaches to screenshots
- Tried some html2canvas approaches suggested by Claude, but doesn’t work for iframes?
- Might be good for instant screenshots for introspecting though, vs another service with roundtrip & rendering
- Would be nice to use end-user’s browser for rendering though — literally just shows what’s on the screen
- Also,
- Could also try a screenshot API of some kind from chrome?
- More exotic: Add a chrome extension that makes it easy to send context to Claude
- Started trying puppeteer
- But hosting this on the same flyio/docker container is a bit annoying
- Blows up the install size, adds complexity.
- Plus it doesn’t work yet, something about missing a dependency…
- After a bunch of wrangling, still doesn’t work to install chrome-headless-shell into the bun image, maybe time to give up
- Learned a bunch about running docker locally though, not too bad
- Hm, to run screenshotting might have to use
- To get around the x64 vs arm differences on local Mac dev vs Fly.io
- Claude recommends splitting up microservices but thinking about architecture there is also kind annoying
- Docker images to choose from…
Embed GitHub
- https://pptr.dev/guides/docker
- Unsure what the differences are to https://developer.chrome.com/blog/chrome-headless-shell
- Which is available via NPM…?
- Could try to find a different service, like an existing API? “trim” or sth looked okay
- Though tbh maybe should just host our own service if it’s going to be important
docker buildx create --use docker buildx build --platform linux/amd64 -t your-image-name .
Franz
- Try creating chess game, and then create a neural net with self play, then an animation of the action space distribution, animated on the board
- Add a chess set make it playable
- saw chess prompts
- (not clear that you could just go back)
- Want to reset the state for the ping pong, to reproduce the bug, otherwise
- Could reprompt the last
- Want to reproduce the game when it’s stock. Post hoc see it what happened
- Expectation that refreshing would kill the game
- Even if I wanted to test it, get a gradient to best update prompting strategy
- Python is more familiar, reason being machine learning
Some animation of the thing loading would be good
Something went wrong — want to go back, do the edit thing
More output when “something went wrong”
Pippa
- John Daley — ping, looking for first users, understand the demo
- If you’re doing this, the metric to decide to invest, who’s winning, who has customers
- Cursor vs Buni
- Buni could be better for nontechnical people
- How to decide Wix vs more technical
- Put together some paragraphs on:
- What you’re building, who it’s for, how it plays in the existing landscape, by 4pm with Matt
- Meaningful input.
- If you’re building for the same audience of cursor, the path to winning is narrow
- Don’t need to write “why you can win”, just do that by winning
- Other question:
- What do you want to get out of the cohort?
- As part of research: think about what Cursor doesn’t work well for right now
- For the people who are here
- Who is Wix built for, how would you build
- [a] Even my wife uses Wix
- Or who are the right users
- Put together a persona — technicality, comfortable with these tasks, this is what they’d use cursor for today, this is where you find it painful
- Figure out the format, clarify thoughts, 1 page, 1 place, do a brief
- After figuring that out, can think through GTM
- Distribution — the broader base from day 1
- Level of virality in the right circles
- Use distribution network of EF/YC while you have it.
@September 26, 2024
- How much to focus on de novo app generation, vs steady state editing?
- De novo is flashy, explains the system
- Huge % of coding activity is steady state editing
- Goal of buni is to blur the line between the two concepts? When I use NextJS etc to spin up a new app, it’s still scaffolding off a long list of stuff
- De novo today often starts from templates anyways
10da6e41f23e80ccb8c4c1f5405471f8
@September 24, 2024
- Codemirror
- Pros: Batteries included; supports extensions? Nicer to code in…
- Cons:
- Packaging via
bun install
really inflates HTML size/parseability - Also adds like 1mb in general
- Trying to figure out how to build without babel but it turned out nontrivial
- Overall question: how important will coding be? Ideally, not very…
@September 21, 2024
- How to do sql access in userspace?
- Something with Bun, with straight sql strings? https://bun.sh/docs/api/sqlite
- Could proxy as an API call, for now.
- Drizzle ORM? Nicer DX but maybe harder for LLM to understand?
- Insane: don’t use sql at all, just write data directly to JS files
- Might be easier for Claude to read…
- For sqlite tracking chats:
- one database per app, including meta level stuff?
- Or one global db for the “buni” chat app
Late musings
- Why is prompting high-effort?
- You try to be creative, but often the result doesn’t live up to what you imagined
- So much easier to browse and not risk being wrong
- Slow to see results
- Not sure what’s in your possibility space
- Solutions:
- Multiple tries?
- Show examples?
- Make iterating cheap
- Juice: reward typing in longer, better prompts, reward iterating
- Suggest better things to do
- Everything will be meta*programming
- Telling AI to write prompts and code to tell other AI to …
- Most of programming with AI is thinking very hard about which tokens to keep
- What gets checked into a codebase is a very small set of things you trust and affix your name to.
- Right now: staring at AI-gen code to see if it’s doing the right thing
- Expensive piece is thinking of what might work, and validating that it did work
- Implies: make it easier to throw prototype stuff away, and keep/amplify stuff that does work
- Also, a lot of doing research, figuring out what APIs and components are good to use, what people say about them, previewing stuff, having some taste
@September 18, 2024
- Maybe just importmap react to esm.sh and then don't bundle React via bun's node_modules. - ...has been working ok, but adding eg react codemirror has been a pain
- Even with simpler @uiw/react-textarea-code-editor, have to: 1. Add react/jsx-runtime 2. add to importmap 3. remove deps with ?external=... (and it still doesn't style right right)
- One path: just roll my own textarea thing that is natively here
- Oh, well, now it’s probably just css, so adding
?css&external=…
gets you most of the way - Now something about MIME types, maybe css imports are just going to be hard
- Thorny problem: How to deal with CSS imports, if not for Tailwind?
- One answer: just use Tailwind.
- Another answer: hack the <html> generated to bundle in <link> tag, maybe with Bun plugin
- Esbuild recommends this strategy: https://esbuild.github.io/content-types/#css-from-js. For now, might just hardcode this for the text editor.
- Musing: Debuggability is not super there atm, want to expose more of the guts (or, build the thing that makes the guts exposable)
@September 16, 2024
- esm.sh is kinda amazing.
- Also https://docs.val.town/reference/import/
- importmapping esm.sh
- Pros
- Easier to read html output
- Sets stage for esm.sh importing of external NPM modules
- Cons
- Slower than full bundling? (maybe on first load but not after caching the “react” import)
- One more API for me/Claude to learn
- Some things don’t work and then take some debugging
- So possibly esm.sh path still requires some kind of whitelisting
- (alternatively, could self-host a version of esm.sh that runs on Bun and thus does fun bundle things)
- Hm, packaging @uiw/react-codemirror has been kinda hard with esm.sh
- https://gist.github.com/Potherca/028514c75f581db115797ecb50c6f945 but the example doesn’t even work for regular codemirror
- Though this does work, so maybe just version mismatch? https://glitch.com/edit/#!/furtive-bird-alder
- Okay CodeMirror is a bit nicer but since it doesn’t work with importmaps, replacing. Tried
https://esm.sh/@uiw/react-codemirror@4.23.0?external=react,react-dom
but no dice on the multi-load problem.
react-dom.production.min.js:127 Uncaught Error: Unrecognized extension value in extension set ([object Object]). This sometimes happens because multiple instances of @codemirror/state are loaded, breaking instanceof checks
@September 15, 2024
- pricing: Claude Sonnet 3.5 is $15/mtok, to output, aka:
- 1.5c per 1000 tokens (a typical response)
- at ~100tok/s, ~10c/min or $6/h. (So can’t bankrupt myself running Claude Sonnet continuously)
- Claude Haiku is 2x faster and 10x cheaper so maybe just use that
- Prompt architecture?
- Maybe prompts should be stored in /codegen instead of in repo
- … once codegen versioning is anything like git
- how to bootstrap the prompt/Claude API calls?
- abstract away:
- File read/writes
- API keys
- internet read/writes
- Someplace to store this code
- Misc musing: RTS like Age of Mythology except you spend your “gold” on training LLMS?
- Or imagine a CTF where everything is constructed via LLM and you are the one allocating funding to agents which you think are doing a better job
@September 2, 2024
Flyio mysteries while trying to get push & pull to volumes working:
- Why is the app down sometimes?
- … why isn’t there any easy way to sync local files to cloud files?
Trying out Render — seems okay too, on the paid $7/mo plan
@July 29, 2024
- Hm, doesn’t build correctly in prod…
- Should we build as a single HTML file?
- The completely bundled js file seems to cause string escape issues (eg a stray </script> might kill it)
- The issue only happens on prod though — maybe it’s a configuration of Bun.build?
- Or conflicting bun versions?
- But a relative /dist/ file is somewhat more annoying to send around
- Could try to host it statically?
- Maybe that’s the overall architecture of buni
Dev notes @July 24, 2024
Architecture questions:
- Where to store LLM-generated code?
- First guess: filesystem at buni/austin/counter
- Is there a package.json?
- If so: adds complexity
- If not: how to specify imports between LLM generated code?
- How to compile and serve it?
- What is the ownership model of data?
- Note: NPM packages are published built
- See https://claude.ai/chat/67c1ba48-09dc-4427-a0ad-a2a087605bbc
- Do we want NPM compatibility (for plugins/ecosystem; don’t reinvent the wheel)
- Do we want NPM interoperability (so people can literally import @buni/austin/counter…?)
- Alternatively: Deno’s JSR? https://deno.com/blog/jsr_open_beta
Things we’re trying to replace
- Twitter as a town square
- Traditional hosting of web apps
- NPM as a package registry
- (Github as a source code ecosystem)
What is the first use case?
- Building a bunch of my own small side projects?
- eg Contact Swap?
- Hosting blogs?
- Self expression?
- Minigames? Like the boardless games vision?
- Calculators? Like microcovid, kelly bet? Or RPS poker calculator?
- Connecting to AI functionality? Like midjourney, chatgpt?
- “Zapier for AI”
- Q: What have Claude Artifacts been most used for?
- Q: Which things require user navigation? Where do people spend most of their time?
- Productivity (Notion, VS Code)
- Social (Twitter, Reddit)
- Information (Substack)
- Q: What could go viral?
- The glitch meme generator