🛺

Dev notes: Subsim

Tech stack

‣

Claude prompt:

Log

A lot of my work is in trying to parse eg Substack RSS feeds, or Manifold

https://manifold.markets/Austin?tab=questions — how to pass in to an LLM?

Parse the HTML? How do they go through and read it if it’s JS enhanced at all?

How does the Cursor’s “scrape the docs” thing work?
Goal: Pass in a URL, then output the markdown?

Similar to https://apify.com/apify/website-content-crawler

A simple list like https://www.astralcodexten.com/archive has 6mb of content for a few hundred links…

Oh, pull from sitemap

Prior art

Subreddit Simulator GPT2
LLM style finetuning: https://sarahconstantin.substack.com/p/fine-tuning-llms-for-style-transfer

And Nostalgebraist: https://sarahconstantin.substack.com/p/fine-tuning-llms-for-style-transfer/comment/59587659
Idea: Just make finetuning really easy. Paste in a Substack, get out a finetuned ACX writer model.

AI Objectives Institute research? Eg https://ai.objectives.institute/whitepaper