@October 28, 2024 election-promises usage notes
- Started with Claude Artifacts
- (also wtf Claude really needs to make these things exportable)
Mock up a website design to inform voters of likely outcomes of the presidential election. The website name is "If elected, I will...". The outcomes are taken from predictions on the site Manifold Markets.Show an image of Harris saying "reschedule marijuana from a schedule 1 drug" and trump saying "organize a ceasefire in Ukraine before 2026 midterms".
Put Kamala on the left and Trump on the right of their cards; make the quotes look like speech bubbles coming out of their avatar
But it didn’t actually have Trump/Harris images, so I grabbed them from Manifold
Replace the images with https://manifold.markets/political-candidates/harris.png and trump.png; add a little tail to the speech bubbles so it looks like it comes out of their mouths
Could paste in images, which was cool/timesaving
- But inability to see the results with real images led me to switch to yield…
- See discussion on https://yield.sh/edit/election-promises/app.tsx
- Looking at the diffs sometime helped when code didn’t match my expectations
- I guess it saves a bit of time too
- Applying diffs via Haiku seemed to basically always work, nice
- Could write eval for Cerebras
- Had to be prompted to refactor code
- I sometimes dropped in to tweak wording exactly (best for copy text, where my taste is better than Claude’s)
- Not having image paste was a bit awkward; I copied in some text instead
- Maybe ideal is pasting in an URL?
- With images, the benefit is in providing exactly the context and no more
- Benefits of yield over Claude Artifacts
- Immediately hosted & shareable (eg sent to Nate)
- Can render externally linked images…
- Can tweak code directly for things I know how to do, instead of asking via prompt
- Can import custom libraries from esm.sh
- Benefits of Claude Artifacts over yield
- Feels a bit more reliable? idk
- Version tabbing is nice, maybe we should steal it
- Though in practice I’m not really going back to old version very much lol. It’s more about peace of mind, having an undo function
- Maybe instead of versioning to a number, it saves a commit message based on the prompt
- Maybe it’s integrated inline to the chat, vs separate numbering system?
- Export formats
- Could export to single HTML page which is hosted… somewhere (git?)
- There’s gotta be some cheap/free place to just host static HTML files at scale. (Cloudflare?)
- Some stuff not needed for export (like uiw react editor)
- Nice to have source code too, for modifying + exporting to NextJS I guess (did this for futarchy.dev)
- Could just add a copy button
Do want to have side-by-side code + preview for these cases
Accepts image-pasted input
Shadcdn out of the box
Fewer clicks (applied & saved automatically)
At least, get rid of alert popup, and show current loaded version “7/9”. Maybe also show timestamp.
Try: fast diff gen + apply with cerebras, versioned on every checkpoint?
@October 25, 2024 Playing with different futarchy.dev versions
futarchy.dev + yield.sh@October 25, 2024 Check in with Matt
- Catch ups, childcare, what happened, Matt & childcare
- Overall:
- yield.sh: upgraded sonnet, db generation works out of the box a lot, one more prompt that passed “want to share” test
- shareability
- Intuition: turn up the fun, turn up variability, get something that you are
- Sonnet should be make things a lot better
- Old sonnet 3.5 vs new sonnet 3.5 to build stuff in Minecraft
- Even though it’s not much better — on this test it’s so much better. Some speculation — maybe get better spatial & visual reasoning. Feels handwavy
- Some tradeoff between “how to get started” vs constraining space
- Pippa: Maybe in early days, don’t constrain the space
- Like Matt where you find a set of users to force, do 15 prompts, get past the hurdle
- Instinctive user behavior in the early days, the 2 things Pippa sees in the left panel. Really interesting behavior
- starting talking with Nate Foss (founder of gather town) on a joint project to hack on (prediction market futarchy)
- Matt: Usual model, be pretty explicit, it’s something we’re pretty into, don’t know where it’ll go, maybe won’t be a thing. Pretty open minded, hope that it evolves into something that’s important to me
- In an ideal world, have someone to do it with. Not asking to accept a marriage proposal
- Way Matt usually thinks about it — do we feel more productive, ideally more than sum of parts
- Anchor on a reasonable level of uncertainty, with upside case being clear
- Could just be really interesting to — as well as hacking together — spend some time talking about what it could become, seeing what his instincts
- There’s the core of something interesting here, suspect there’ll be a new behavior somewhere, needs to come up
- Prompting, sharing, a loop around it
- Fun to see whether we feel aligned or not
- slight tangent: was at AI conference last week. “Why do AI panels suck?” was talking about it’s a weird technology, the primitives are so far ahead of productiziation
- Other than ChatGPT, few products that use the primitives
- Panel with chief product officers of OAI, Ant, but also Instagram
- Riff on early insta — they got so obsessed with features, but in the end, what they thought was important, were not — and the things they didn’t hink about was key
- Like the square — scroll back to early insta, became iconic
- More important: have the right primitive, and then create enough variance that interesting behaviors get amplified
- Just getting people to use it even if it feels weird
- Very generative, the 2 of them, talking about what it might be. But then — let’s build it, put in people hands
- What are the new primitives?
- Arbitrary image, arbitrary prompt and get high quality response
- Core premise: chatpgt will not be most popular product in 5 years. We’re basically not using the primitives in 5 years, the way we’re using them is boring
- (Already told pippa) 2 panel of og 6 of ChatGPT
- When they finished GPT4 training in May/Jun of ‘22, 2 posttraining teams, smaller one was chat, that was not that interesting, not the cool thing (which was instruction tuning)
- In Nov there were 7 people in Chat
- Median guess for #users was 50k after 1mo. 1 person guessed 1m, actually 100m
- Ran from Aug - Oct a friends & family beta in 50. Average daily actives was 4
- Some equivalent of temperature for this, where you can increase the variance in the actual code generation
- Artificial equivalent of 4 different types, where you increase the variance at level of
- Computer usage
- What to do next
- advice for The Curve (esp — maintaining value after the conference?)
- Won’t be able to get out to SF in Nov will be in London
- From Jan, will be in SF a lot. Keep having a lot of other things
- also Progress Conference — but sad not able to make
- E.g. how did Matt do it for AISS
- advice for Manifold
Next steps for checkpoint
- try the “variations” approach
- put it in front of 15 people
- build out the futarchy thing here
@October 24, 2024
- How to do a db migration?
- Adding upvotes sucks
- Even though no migration needed!
- Like, need a new join table, then need to add APIs to write to and sync from the table, ugh
- Just want to pretend everything’s in memory and just works
@October 22, 2024 Orchestrating prompts
- Fundamentally: how to structure code when language models are creative but unreliable?
‣
‣
- Hm… Instead of complicated fanout structure for eg generating a slug for an app, can just ask for structured text output:
@October 19, 2024
- Bun is seriously so good, shell scripting instead of package.json :chefskiss:
- This Vercel AI SDK thing seems annoying to merge with Cerebras. Let’s just use bare cerebras for now.
- Cerebras is fast (like, 1s to generate apps from scratch) but the apps are somewhat worse than sonnet’s
- meta: there’s probably going to be lots of things like this, where there’s a tradeoff between price, latency, and quality among different LLM providers
@October 8, 2024
- Notes from Ben Evans:
@October 7, 2024
- Playing with Eden Treaty; need to figure out how to pass Server generated Elysia framework to clients?
- Spent a while trying to reroute to OpenRouter; it’s not an exact match for Anthropic API…
@October 5, 2024
- Hm, try catching errors the way Val Town does?
- Bug: modifyCode sometimes just returns a subset
Testing user interaction with LLMs?
- Spin up a new app, make various calls, have LLMs input commands and test them?
- Write tests for the different platform components (eg auth, db, etc?)
- Note: When more of the coding happens via LLM, then testing/verifying becomes more expensive, takes up a larger chunk of human time.
Flo user test
- Expected code to be on the same side as the chat
- Generated movie recs, enter a movie.
- Tried getting Claude to do it
- Can you edit the diff?
- “You have to apply”. Oh, you can edit the code
Tried to save just now with ctrl+s
- Formatting feels weird, thought it would auto format
- Now it doesn’t work — is that something I did?
Ended up having to delete Flo’s stuff because of local vs remote sync issues… ideally figure out a way to merge volumes and databases?
Oh, should ask about background with coding, code gen tools as part of interview
‣
@October 4, 2024
- More context to Claude:
- Approaches for context on internal APIs:
@October 2, 2024
- Auth
- Database
@October 1, 2024 Approaches to screenshots
- Tried some html2canvas approaches suggested by Claude, but doesn’t work for iframes?
- Could also try a screenshot API of some kind from chrome?
- Started trying puppeteer
- Could try to find a different service, like an existing API? “trim” or sth looked okay
Franz
- Try creating chess game, and then create a neural net with self play, then an animation of the action space distribution, animated on the board
- saw chess prompts
- Python is more familiar, reason being machine learning
Pippa
- John Daley — ping, looking for first users, understand the demo
- If you’re doing this, the metric to decide to invest, who’s winning, who has customers
- Cursor vs Buni
- Put together some paragraphs on:
- Other question:
- As part of research: think about what Cursor doesn’t work well for right now
- After figuring that out, can think through GTM
@September 26, 2024
- How much to focus on de novo app generation, vs steady state editing?
@September 24, 2024
- Codemirror
@September 21, 2024
- How to do sql access in userspace?
- For sqlite tracking chats:
Late musings
- Why is prompting high-effort?
- Everything will be meta*programming
- Most of programming with AI is thinking very hard about which tokens to keep
- Also, a lot of doing research, figuring out what APIs and components are good to use, what people say about them, previewing stuff, having some taste
@September 18, 2024
- Maybe just importmap react to esm.sh and then don't bundle React via bun's node_modules. - ...has been working ok, but adding eg react codemirror has been a pain
- Even with simpler @uiw/react-textarea-code-editor, have to: 1. Add react/jsx-runtime 2. add to importmap 3. remove deps with ?external=... (and it still doesn't style right right)
- Musing: Debuggability is not super there atm, want to expose more of the guts (or, build the thing that makes the guts exposable)
@September 16, 2024
- esm.sh is kinda amazing.
- Also https://docs.val.town/reference/import/
- importmapping esm.sh
- Hm, packaging @uiw/react-codemirror has been kinda hard with esm.sh
- Okay CodeMirror is a bit nicer but since it doesn’t work with importmaps, replacing. Tried
https://esm.sh/@uiw/react-codemirror@4.23.0?external=react,react-dom
but no dice on the multi-load problem.
@September 15, 2024
- pricing: Claude Sonnet 3.5 is $15/mtok, to output, aka:
- Prompt architecture?
- Misc musing: RTS like Age of Mythology except you spend your “gold” on training LLMS?
@September 2, 2024
Flyio mysteries while trying to get push & pull to volumes working:
- Why is the app down sometimes?
- … why isn’t there any easy way to sync local files to cloud files?
Trying out Render — seems okay too, on the paid $7/mo plan
@July 29, 2024
- Hm, doesn’t build correctly in prod…
- Should we build as a single HTML file?
Dev notes @July 24, 2024
Architecture questions:
- Where to store LLM-generated code?
- How to compile and serve it?
- What is the ownership model of data?
- Note: NPM packages are published built
Things we’re trying to replace
- Twitter as a town square
- Traditional hosting of web apps
- NPM as a package registry
What is the first use case?
- Building a bunch of my own small side projects?
- Hosting blogs?
- Minigames? Like the boardless games vision?
- Calculators? Like microcovid, kelly bet? Or RPS poker calculator?
- Connecting to AI functionality? Like midjourney, chatgpt?
- Q: What have Claude Artifacts been most used for?
- Q: Which things require user navigation? Where do people spend most of their time?
- Q: What could go viral?