Skip to content
Ahmed Hamza

Automation

Why Chrome extension UI automation becomes whack-a-mole

Notes from building AI Studio Prompt Library: scoped MV3 messaging, layered DOM selectors, Angular form events, and user-controlled selector fallbacks.


Date

Read

5 min

AI Studio Prompt Library started from a narrow workflow problem: I kept reusing the same system instructions in Google AI Studio, and the product did not give me a reusable prompt library.

The extension itself is intentionally small. It stores prompts locally, exposes a searchable popup and options page, and inserts the selected prompt into AI Studio’s System instructions field. The part that became interesting was not CRUD or storage. It was making insertion reliable inside a third-party Angular/Material single-page app that I do not control.

That kind of UI automation can feel like whack-a-mole because every “simple” DOM assumption has a hidden state boundary behind it.

The naive insertion script

The tutorial version looks like this:

const textarea = document.querySelector("textarea.system-input");
textarea.value = "You are a helpful assistant...";

That is not a real contract for a modern SPA. It can fail in several different ways:

The fix was to treat the host page as an asynchronous state machine, not as static HTML.

Keep the extension boundary small

The architecture stayed narrow on purpose. The background worker and popup send an INSERT_PROMPT message. The content script owns the host-page automation. Settings and prompt data stay in Chrome storage. The content script is scoped to https://aistudio.google.com/*, and the extension avoids external network calls.

The message contract is simple:

export type MsgInsertPrompt = {
  type: "INSERT_PROMPT";
  prompt: Prompt;
  mode?: "replace" | "append" | "prepend";
};

That boundary matters because it keeps product behavior separate from DOM surgery. The popup does not know how to find Angular fields. The content script does not know how to render the library UI. Each side has one job.

Find the field by evidence, not hope

The selector strategy is layered from most intentional to least intentional. A user-provided selector is the escape hatch. Stable semantics come next. Class hints and broad fallbacks are last.

The real locator also has to avoid the main prompt input:

function findSystemTextarea(
  settings: Settings | undefined,
  root: ParentNode = document,
): HTMLTextAreaElement | null {
  if (settings?.customSelector) {
    try {
      const customEl = root.querySelector(settings.customSelector);
      if (customEl && !looksLikeMainPrompt(customEl) && isVisible(customEl)) {
        return customEl as HTMLTextAreaElement;
      }
    } catch {
      // Invalid user selector: continue to built-in fallbacks.
    }
  }

  let el = root.querySelector('textarea[aria-label="System instructions"]');
  if (el && !looksLikeMainPrompt(el) && isVisible(el)) {
    return el as HTMLTextAreaElement;
  }

  el = root.querySelector(
    'textarea[placeholder="Optional tone and style instructions for the model"]',
  );
  if (el && !looksLikeMainPrompt(el) && isVisible(el)) {
    return el as HTMLTextAreaElement;
  }

  const allTextareas = Array.from(root.querySelectorAll("textarea"));
  return allTextareas.find(
    (candidate) => isVisible(candidate) && !looksLikeMainPrompt(candidate),
  ) as HTMLTextAreaElement | null;
}

This is less elegant than one selector. It is also closer to the real failure surface. Third-party UI automation needs a hierarchy of evidence because no single selector deserves full trust.

Opening the panel is part of insertion

The System instructions field may not exist until the user opens the panel. That means insertion cannot just search once and fail. It has to try to open the panel and then wait for the textarea to mount.

The implementation uses a click plus a MutationObserver wait instead of a fixed sleep:

async function openSystemPanelIfNeeded(settings?: Settings) {
  const present = findSystemTextarea(settings);
  if (present) {
    return { success: true, didClick: false };
  }

  const btn = findSystemButton();
  if (!btn) {
    return { success: false, didClick: false };
  }

  btn.click();
  const ta = await waitForSystemTextarea(5000, settings);
  return { success: !!ta, didClick: true };
}

The wait itself observes document mutations and resolves as soon as the locator finds a usable field:

observer = new MutationObserver(() => {
  const ta = findSystemTextarea(settings);
  if (ta) {
    clearTimeout(timeout);
    observer.disconnect();
    resolve(ta);
  }
});

observer.observe(document.documentElement, {
  childList: true,
  subtree: true,
});

This is the part that turns “button click automation” into an actual state transition. The extension records whether it opened the panel, then can optionally close it again after insertion.

Updating framework state requires events

Setting text is not enough. Angular needs to hear the input/change events that normally come from user typing. The insertion path updates the value, sets cursor position, and dispatches bubbling events:

function applyInsert(
  target: HTMLTextAreaElement | HTMLElement,
  text: string,
  mode: InsertMode = "replace",
) {
  if (isTextarea(target)) {
    const prev = target.value;
    let next = text;
    if (mode === "append") {
      next = prev ? prev + (prev.endsWith("\n") ? "" : "\n") + text : text;
    }
    if (mode === "prepend") {
      next = text + (text.endsWith("\n") ? "" : "\n") + prev;
    }

    target.focus();
    target.value = next;
    target.setSelectionRange(target.value.length, target.value.length);
    target.dispatchEvent(new Event("input", { bubbles: true }));
    target.dispatchEvent(new Event("change", { bubbles: true }));
  }
}

The detail that matters is not just the event dispatch. It is preserving the user-facing insertion modes while still making the host framework accept the update.

The escape hatch is a product feature

Layered selectors reduce breakage, but they do not eliminate it. AI Studio can change labels, placeholders, or structure without warning.

That is why the options page includes a custom selector setting. If the built-in locator fails, a user can inspect the current page, save a selector, and make that selector the first lookup path. It is not a glamorous feature, but it turns DOM drift from a blocked release into a local configuration problem.

The lesson is that resilience is not only code. Sometimes resilience is giving the user a controlled override when the host page changes faster than your release cycle.

The tradeoff

This is more ceremony than an API integration:

The tradeoff is worth it for this kind of tool. A prompt library is only useful if insertion feels boring and repeatable. The hidden work is making the browser automation boring enough that the user does not have to think about it.