Flagship independent browser-extension project

PromptReady

Offline-first browser extension work for capturing rendered web pages into clean Markdown, with local extraction, fixture coverage, and quality-gated AI cleanup.

WXTReactTypeScriptChrome ExtensionReadability/TurndownVitestOpenRouter

Chrome Web Store

PromptReady

Summary: PromptReady is an independent browser-extension project focused on turning messy web pages into useful Markdown for prompting, notes, and export. The core engineering work is not the popup UI; it is the capture pipeline: rendered-page HTML, local Markdown extraction, source metadata, fixture tests, and AI cleanup that is allowed to improve formatting but not erase structure.

Why I built it

I wanted a practical tool for the problem I kept hitting while working with AI systems: the quality of the prompt depends heavily on the quality of the source material. Copying from a page often brings navigation, cookie banners, broken code fences, missing headings, or an empty single-page app shell.

The first assumption was too simple: capture the page, run readability-style extraction, convert the result with Turndown, and clean up the Markdown afterward. That works on friendly pages. It is not enough for modern sites that render late, hide content behind app shells, or mix prose with code/config blocks.

PromptReady became a project about making that failure visible and testable instead of pretending every site can be handled by one generic cleanup pass.

What it does

The extension captures page or selection HTML from the browser, carries metadata such as title, URL, captured time, and a selection hash, then processes the content into Markdown locally before any AI step is considered.

The offline path is the baseline. It handles the main extraction and canonical Markdown cleanup. The AI path, when configured with a bring-your-own OpenRouter key, receives that offline Markdown baseline as source context and is treated as a cleanup pass. If AI output drops headings, loses too much content, breaks fences, or loses technical tokens, PromptReady falls back to the local result with a stable warning instead of shipping the prettier but weaker output.

Extraction Pipeline

 [Messy Web Page HTML]
         │
         ▼
 ┌───────────────┐
 │ ReadabilityJS │ ──➔ (Metadata & clean DOM)
 └───────┬───────┘
         ▼
 ┌───────────────┐
 │  TurndownJS   │ ──➔ (Baseline Markdown)
 └───────┬───────┘
         ▼
 ┌───────────────┐
 │  Local Parser │ ──➔ (Canonical code & table formatting)
 └───────┬───────┘
         ▼
 ┌───────────────┐
 │ AI Validation │ ──➔ [Passed] ──➔ Output polished MD
 │  (Structure   │
 │   Comparison) │ ──➔ [Failed] ──➔ Fall back to Local MD + Warning
 └───────────────┘

Terminal / Capture Event Log

[PromptReady:Capture] Initiating page capture for: https://news.ycombinator.com/
[CAPTURE] Querying browser tabs...
[DOM] Extracting outerHTML (document size: 42.1 KB)
[EXTRACT] Running ReadabilityJS extraction pass...
[MD] Converting DOM to Markdown via TurndownJS...
[PARSER] Canonicalizing code fences and table spacing...
[LOCAL] Local Markdown baseline generated: 1.2 KB (14 paragraphs)
[AI] Skipping optional AI cleanup (no OpenRouter API key provided)
[SUCCESS] Saved clean Markdown to clipboard.

Tech stack

PromptReady is built as a Chrome extension with WXT, React, TypeScript, content scripts, an offscreen processing path, local Markdown processing, and Vitest coverage around the extraction behavior.

The pipeline uses browser capture code at the edge, local extraction and Markdown canonicalization in the core path, and OpenRouter only as an optional BYOK enhancement. That boundary matters: the project should still produce usable Markdown when AI is unavailable, misconfigured, rate-limited, or not trustworthy for a particular page.

Key engineering decisions

The most important decision was to make offline Markdown the source of truth.

That led to a few concrete constraints:

Prepare a local Markdown baseline before AI cleanup.
Preserve source metadata and selection identity across the capture and processing path.
Prefer stable warning codes over vague success/failure copy.
Treat AI output as accepted only when it preserves the baseline structure.
Keep technical Markdown repair in shared canonicalization paths, not only in the AI branch.
Use pinned fixtures and focused tests for the failure shapes that are easy to miss manually.

The project also separates capture policy from output polish. Deep capture can help when a page needs scroll and render settling, but it is not a universal fix. A Reddit-specific collapse to source metadata plus a title, for example, remains a boundary to investigate in the extraction/selection path rather than a solved UI toggle problem.

Problems I ran into

The first hard failure was empty or shallow capture. Some pages looked meaningful in the browser but produced HTML that was closer to:

<div id="root"></div>

That forced the project toward rendered capture, settle waits, deep capture policy, and a fixture corpus that could be exercised without live network dependency in the normal test loop.

The second hard failure was AI output that looked cleaner but was less faithful. It could drop headings, condense structure, or glue technical blocks together:

doctor```The installer writes managed client config.

That changed the AI design. The goal stopped being “make this page nicer” and became “repair and preserve this offline Markdown document.”

The third boundary is site-specific extraction. Some pages still need better rules or diagnostics when the captured material collapses to metadata and a heading only. I do not want to claim perfect live-site extraction; the honest project state is a stronger local baseline with known remaining edge cases.

Engineering Notes & Lessons Learned

Context Preservation Over Aesthetics: Markdown extraction is a contract of source preservation. A document that looks cleaner is a failure if it drops essential code blocks, structures, or URLs.
Visible Failure Boundaries: Shifting from ad-hoc debugging to a fixed fixture corpus was the inflection point for reliability. Pinned regression cases prevent downstream layout changes from breaking extraction silently.

Validation Notes

Hydrated DOM Capture: Verified extraction of main article structures from single-page apps (SPAs) and late-hydrated web pages, preventing empty-body page captures.
Structural Integrity Gate: Automatically rejects AI-generated Markdown outputs that drop headings or code blocks, falling back to the local offline baseline.
Fixture Regression Coverage: Tested against a local fixture corpus of 15+ complex document templates to protect extraction behavior against layout changes.

What I would improve next

The next work is narrower and more diagnostic: better handling for difficult social and app-heavy pages, clearer capture diagnostics in the UI, more fixture diversity, and a tighter explanation of when deep capture helps versus when extraction needs a site-specific rule.

The product boundary should stay local-first: capture, canonicalize, preserve source structure, optionally improve with AI, and fail back to the offline result when the AI output is not faithful.

PromptReady

Why I built it

What it does

Extraction Pipeline

Terminal / Capture Event Log

Tech stack

Key engineering decisions

Problems I ran into

Engineering Notes & Lessons Learned

Validation Notes

What I would improve next

Links