Agentic Web Browsing Workflows with Python and Playwright
TL;DR Agentic web browsing combines Playwright's headless browser automation with large language models to extract data from dynamic sites without relying on hardcoded CSS selectors. By passing a sanitized version of the rendered DOM to an LLM, the model can navigate pages, interact with elements, and return structured JSON in real time. Modern web applications do not serve static HTML. Content is fetched asynchronously via API calls, rendered on the client side, and obfuscated behind complex CSS modules. Traditional web scraping relies on identifying specific DOM elements using XPath or CSS selectors. When a site deploys a new build, class names change, and standard scrapers break. LLMs change this paradigm. Instead of defining exactly where data lives, developers can define what data they want. The LLM acts as the routing layer, analyzing the current state of the page and deciding how to extract the target information. This shifts scraping from a brittle, rule-based approach to an adaptable, semantic model. Implementing this requires a bridge between the LLM's reasoning engine and the actual web page. Playwright provides the execution environment. Python orchestrates the logic. An agentic scraper operates in a continuous loop. It observes the environment, plans an action, executes that action, and repeats until the objective is complete. The observation phase is critical. LLMs have strict context window limits. Feeding raw HTML from a modern single-page application into an LLM will exhaust token limits and result in hallucinations. The DOM must be minimized. The planning phase utilizes the LLM's function-calling capabilities. You define a set of available tools, such as click_element(id), type_text(id, text), and extract_data(json_schema). The model reviews the sanitized DOM and selects the appropriate tool. The execution phase runs the selected tool within the Playwright context. If the model chooses to click a button, Python triggers the Playwright click event,
