Automation
Why Chrome extension UI automation becomes whack-a-mole
Notes from building AI Studio Prompt Library: scoped MV3 messaging, layered DOM selectors, Angular form events, and user-controlled selector fallbacks.
Date
Read
5 minAI Studio Prompt Library started from a narrow workflow problem: I kept reusing the same system instructions in Google AI Studio, and the product did not give me a reusable prompt library.
The extension itself is intentionally small. It stores prompts locally, exposes a searchable popup and options page, and inserts the selected prompt into AI Studio’s System instructions field. The part that became interesting was not CRUD or storage. It was making insertion reliable inside a third-party Angular/Material single-page app that I do not control.
That kind of UI automation can feel like whack-a-mole because every “simple” DOM assumption has a hidden state boundary behind it.
The naive insertion script
The tutorial version looks like this:
const textarea = document.querySelector("textarea.system-input");
textarea.value = "You are a helpful assistant...";
That is not a real contract for a modern SPA. It can fail in several different ways:
- the System instructions field may not be mounted until the panel is opened
- setting
.valuecan update the DOM node without updating framework form state - generated Material/CDK class names are weaker than labels, placeholders, and user-provided overrides
- the wrong visible textarea may be the main chat input, not the system field
The fix was to treat the host page as an asynchronous state machine, not as static HTML.
Keep the extension boundary small
The architecture stayed narrow on purpose. The background worker and popup send an INSERT_PROMPT message. The content script owns the host-page automation. Settings and prompt data stay in Chrome storage. The content script is scoped to https://aistudio.google.com/*, and the extension avoids external network calls.
The message contract is simple:
export type MsgInsertPrompt = {
type: "INSERT_PROMPT";
prompt: Prompt;
mode?: "replace" | "append" | "prepend";
};
That boundary matters because it keeps product behavior separate from DOM surgery. The popup does not know how to find Angular fields. The content script does not know how to render the library UI. Each side has one job.
Find the field by evidence, not hope
The selector strategy is layered from most intentional to least intentional. A user-provided selector is the escape hatch. Stable semantics come next. Class hints and broad fallbacks are last.
The real locator also has to avoid the main prompt input:
function findSystemTextarea(
settings: Settings | undefined,
root: ParentNode = document,
): HTMLTextAreaElement | null {
if (settings?.customSelector) {
try {
const customEl = root.querySelector(settings.customSelector);
if (customEl && !looksLikeMainPrompt(customEl) && isVisible(customEl)) {
return customEl as HTMLTextAreaElement;
}
} catch {
// Invalid user selector: continue to built-in fallbacks.
}
}
let el = root.querySelector('textarea[aria-label="System instructions"]');
if (el && !looksLikeMainPrompt(el) && isVisible(el)) {
return el as HTMLTextAreaElement;
}
el = root.querySelector(
'textarea[placeholder="Optional tone and style instructions for the model"]',
);
if (el && !looksLikeMainPrompt(el) && isVisible(el)) {
return el as HTMLTextAreaElement;
}
const allTextareas = Array.from(root.querySelectorAll("textarea"));
return allTextareas.find(
(candidate) => isVisible(candidate) && !looksLikeMainPrompt(candidate),
) as HTMLTextAreaElement | null;
}
This is less elegant than one selector. It is also closer to the real failure surface. Third-party UI automation needs a hierarchy of evidence because no single selector deserves full trust.
Opening the panel is part of insertion
The System instructions field may not exist until the user opens the panel. That means insertion cannot just search once and fail. It has to try to open the panel and then wait for the textarea to mount.
The implementation uses a click plus a MutationObserver wait instead of a fixed sleep:
async function openSystemPanelIfNeeded(settings?: Settings) {
const present = findSystemTextarea(settings);
if (present) {
return { success: true, didClick: false };
}
const btn = findSystemButton();
if (!btn) {
return { success: false, didClick: false };
}
btn.click();
const ta = await waitForSystemTextarea(5000, settings);
return { success: !!ta, didClick: true };
}
The wait itself observes document mutations and resolves as soon as the locator finds a usable field:
observer = new MutationObserver(() => {
const ta = findSystemTextarea(settings);
if (ta) {
clearTimeout(timeout);
observer.disconnect();
resolve(ta);
}
});
observer.observe(document.documentElement, {
childList: true,
subtree: true,
});
This is the part that turns “button click automation” into an actual state transition. The extension records whether it opened the panel, then can optionally close it again after insertion.
Updating framework state requires events
Setting text is not enough. Angular needs to hear the input/change events that normally come from user typing. The insertion path updates the value, sets cursor position, and dispatches bubbling events:
function applyInsert(
target: HTMLTextAreaElement | HTMLElement,
text: string,
mode: InsertMode = "replace",
) {
if (isTextarea(target)) {
const prev = target.value;
let next = text;
if (mode === "append") {
next = prev ? prev + (prev.endsWith("\n") ? "" : "\n") + text : text;
}
if (mode === "prepend") {
next = text + (text.endsWith("\n") ? "" : "\n") + prev;
}
target.focus();
target.value = next;
target.setSelectionRange(target.value.length, target.value.length);
target.dispatchEvent(new Event("input", { bubbles: true }));
target.dispatchEvent(new Event("change", { bubbles: true }));
}
}
The detail that matters is not just the event dispatch. It is preserving the user-facing insertion modes while still making the host framework accept the update.
The escape hatch is a product feature
Layered selectors reduce breakage, but they do not eliminate it. AI Studio can change labels, placeholders, or structure without warning.
That is why the options page includes a custom selector setting. If the built-in locator fails, a user can inspect the current page, save a selector, and make that selector the first lookup path. It is not a glamorous feature, but it turns DOM drift from a blocked release into a local configuration problem.
The lesson is that resilience is not only code. Sometimes resilience is giving the user a controlled override when the host page changes faster than your release cycle.
The tradeoff
This is more ceremony than an API integration:
- background commands need to target the active tab
- content scripts need scoped host-page responsibilities
- selectors need layered fallbacks
- insertion needs form events, not just value assignment
- overwrite prompts and auto-close behavior need to respect the host page’s modal layers
The tradeoff is worth it for this kind of tool. A prompt library is only useful if insertion feels boring and repeatable. The hidden work is making the browser automation boring enough that the user does not have to think about it.