Tech stack
‣
Log
- A lot of my work is in trying to parse eg Substack RSS feeds, or Manifold
- https://manifold.markets/Austin?tab=questions — how to pass in to an LLM?
- Parse the HTML? How do they go through and read it if it’s JS enhanced at all?
- How does the Cursor’s “scrape the docs” thing work?
- Goal: Pass in a URL, then output the markdown?
- A simple list like https://www.astralcodexten.com/archive has 6mb of content for a few hundred links…
- Oh, pull from sitemap
Prior art
- Subreddit Simulator GPT2
- LLM style finetuning: https://sarahconstantin.substack.com/p/fine-tuning-llms-for-style-transfer
- And Nostalgebraist: https://sarahconstantin.substack.com/p/fine-tuning-llms-for-style-transfer/comment/59587659
- Idea: Just make finetuning really easy. Paste in a Substack, get out a finetuned ACX writer model.
- AI Objectives Institute research? Eg https://ai.objectives.institute/whitepaper