🛺

Dev notes: Subsim

Tech stack

‣
Claude prompt:

Log

  • A lot of my work is in trying to parse eg Substack RSS feeds, or Manifold
    • https://manifold.markets/Austin?tab=questions — how to pass in to an LLM?
      • Parse the HTML? How do they go through and read it if it’s JS enhanced at all?
    • How does the Cursor’s “scrape the docs” thing work?
    • Goal: Pass in a URL, then output the markdown?
      • Similar to https://apify.com/apify/website-content-crawler
    • A simple list like https://www.astralcodexten.com/archive has 6mb of content for a few hundred links…
      • Oh, pull from sitemap

Prior art

  • Subreddit Simulator GPT2
  • LLM style finetuning: https://sarahconstantin.substack.com/p/fine-tuning-llms-for-style-transfer
    • And Nostalgebraist: https://sarahconstantin.substack.com/p/fine-tuning-llms-for-style-transfer/comment/59587659
    • Idea: Just make finetuning really easy. Paste in a Substack, get out a finetuned ACX writer model.
  • AI Objectives Institute research? Eg https://ai.objectives.institute/whitepaper