The Web’s Next Interface (4/5)

Part 4 of a series on the future of the web

The previous post described agents entering the web as actors, not just intermediaries. Systems that pursue goals, take actions, and operate on behalf of users. We also chatted about tools like Google’s Disco that regenerate the web itself, consuming content from multiple sites and generating new interfaces on the fly.

That shift surfaces a conundrum for publishers and businesses: when an agent visits your site, or when a generative browser consumes your content alongside competitors, you lose control over how you’re represented.

There’s a response to this. Instead of just exposing static content and hoping the visiting agent interprets it well, the site can present a delegate of its own.

What Is a Site Delegate?

A site delegate is an agent that represents your entity to other agents.

It has access to everything about you: your content, your products, your services, your policies, your constraints. It understands what you offer and how you want to be represented. And instead of waiting passively to be scraped, it actively engages with visiting agents to present information in the most effective way.

Think about what happens today when an agent visits a website. It crawls pages, extracts content, synthesizes what it finds, and moves on. The publisher has no control in that interaction. The content is taken as-is, interpreted according to the agent’s logic, and represented to the user however the agent decides.

Our best approach to address this is progressive disclosure, or reordering content to steer the way the agent consumes and synthesizes it as we discussed in the past post, Agents and The New Internet.

This answer was rendered almost instantly, but was informed by Google visiting 14 sites.

Now imagine the agent arrives and is greeted not by static pages, but by a representative. One that can ask what the visiting agent is looking for. It can present relevant information directly rather than making the agent hunt through pages. It can emphasize what matters, clarify ambiguities, and defend the entity’s positioning.

The directive to the site delegate is something like: you have access to all of this content and information about who we are and what we offer. Represent us in the best possible way. Be quick, because there are latency requirements. But make sure the answer positions us effectively.

Why This Matters

The site delegate solves several problems at once.

It gives publishers control over representation. Instead of hoping an agent interprets your content correctly, you actively shape how your entity is presented. The delegate knows your messaging, your differentiators, your preferred framing. It can adapt that presentation based on what the visiting agent is actually looking for.

It creates efficiency for everyone. A visiting agent trying to understand your catalog doesn’t need to crawl hundreds of pages. It can ask your delegate directly: “What products do you have that match these criteria?” The delegate searches/synthesizes internally and returns a focused answer.

It enables real negotiation. When the interaction is agent-to-agent, there’s an opportunity for back-and-forth that doesn’t exist with static content. The visiting agent can ask clarifying questions. Your delegate can counter with relevant information. Pricing, availability, terms, fit. These become conversational rather than fixed.

It defends against pure extraction. Without a delegate, your site is just data to be harvested. (Ideally, we’re able to address that through things like RSL.) With a delegate, you’re a participant in the conversation.

Note: You’ll see me mention Agent to Agent throughout the remainder of this series. I’m speaking more to the concept of two site delegates that interact, and less specifically towards Google’s protocol, though I think it is a good start to begin thinking about how this type of implementation could actually work.

Agent-to-Agent Negotiation

The idea of AI agents negotiating with each other sounds like fiction, or dystopia. It’s also something that would have sounded pretty crazy even a few years ago. But the dynamic is actually familiar.

Think about the user who tells their agent to find a laptop under $1,000 with good battery life. The agent traverses the web, discovers relevant merchants, and starts gathering information. At each merchant site, it encounters a site delegate.

The user’s agent presents what it’s looking for. The merchant’s delegate searches its catalog and presents options that fit. The user’s agent might counter: “Can you do better on price? I’m also talking to other vendors.” The merchant’s delegate might respond: “We can offer 10% off for first-time buyers, plus free shipping if you purchase today.”

This is negotiation. It’s also just commerce. Progressive disclosure based on expressed intent. Competitive pressure because the buyer is shopping around. Offers tailored to the specific situation rather than generic pricing.

The difference is that it’s happening between Agents rather than humans. But the underlying dynamic, merchants competing for business by presenting their best case, is unchanged. And – I can see an actual draw to this type of interaction, especially in the context of commerce offering personalized and custom discounts, which could result in a higher conversion rate, and in general a more profitable business.

Trust in Agent-to-Agent Interactions

How do you trust the other agent?

Competition creates honesty pressure. The user’s agent is shopping around. If your delegate misrepresents inventory or inflates capabilities, the user’s agent will discover the discrepancy when it cross-references with competitors.

Auditability provides verification. The delegate’s responses are grounded in actual content and data. Everything can be traced back to public content: here’s what the underlying information says, here’s why the delegate responded this way – for both the consumer of said information, and the person or business behind that representative.

If a merchant’s website lies about a product today, the merchant is liable. The same is true for a merchant’s delegate. The entity remains responsible for ensuring their representative tells the truth.

Reputation still transfers, too. The trust signals that exist today, domain authority, brand reputation, track record, still matter – Think back to the first article. A lot is changing but the way the web is structured still remains. A delegate from a known, reputable entity carries different weight than one from an unknown source.

Three Paths to a Site Delegate

The turnkey path is for the majority. A small business, a local service provider, an individual with a website. They don’t have resources to build custom AI systems. The delegate is auto-generated from existing content, and further tuned through a conversational interview with the owner. Sensible defaults, minimal configuration. This is the WordPress model applied to agents: democratize access so everyone can participate.

The configured path is for growing businesses with specific needs. Same foundation, but with deeper customization. Set the personality to match your brand. Define negotiation rules. Restrict certain actions, emphasize certain content.

I’m notably being a bit surface level with these descriptions and not specifying exactly what the interface or product looks like because I think there’s a lot to discover. However, as we’re seeing with the video rise of agent skills, text is the new programming language, so I think a lot of these configurations will simply be generated from long-form user interviews (of course, facilitated by AI).

The custom path is for enterprises and developers with specific requirements. Build your own delegate using the same open protocols. Complete control over behavior, logic, and presentation. Funnily, right now this is the most accessible path to many, due to the speed of the changes.

The critical point across all three: the protocols need to be open. If site delegates only work with one platform or one AI provider, we’ve just created a new walled garden.

Modality Collapse

When the interface becomes agent-to-agent, the wall between modalities doesn’t just blur. It collapses entirely.

**Modalities**, being things like text, video, audio, all are beginning to blur.

Think back to the first post in this series. The web started as digital representations of physical things. A business existed in the real world. It created a website to tell people about itself. You would hear about the website, type in the URL, and visit. The digital was a mirror of the physical.

Over time, the digital world became its own thing. Digital-native businesses, digital-native content, digital discovery through search engines. The web became less a representation of the physical world and more a parallel world with its own logic.

What’s emerging now is the digital and physical starting to merge back together, with presentation adapting fluidly to context.

When a delegate can present information dynamically, the format becomes a choice. The same underlying content can be rendered as a webpage for someone at a desktop. A voice summary for someone driving. A structured data feed for an agent doing comparison shopping. A short video explanation for someone who doesn’t want to read.

But it goes further. The content could be rendered as a virtual environment. You could enter into a world created by the website itself. Explore a product in 3D space. Walk through a property listing. Experience a conference venue before deciding to attend. The delegate doesn’t just describe; it creates an explorable representation.

An example via https://marble.worldlabs.ai/, Meta and Apple are also heavily investing in research here and have the hardware to back it.

We will meet the user where they want to be met. With the information that they need.

This is already how some people interact with complex content. You find a post, you want to understand it deeply, so you put it into ChatGPT and have a conversation about it. You ask questions, explore tangents, go back and forth until you’ve really absorbed the material. That conversational exploration is a modality. Now imagine that conversation happens inside a spatial environment where the concepts are visualized, where you can manipulate and explore rather than just read.

The physical world comes back into the equation too. If you’re physically located at a business, at a conference, at an event, the delegate can know that. It can present information relevant to your physical context. You’re standing in front of a product; the delegate can tell you about it. You’re at a venue; the delegate can guide you through it.

This is the full loop. Physical reality created the original web as its digital mirror. The digital world developed its own existence. Now the digital can project back into physical reality, augmented by AI that understands both the content and the context.

The User Side of Modality Collapse

There’s a user side to this transformation as well. Tools like Google’s Disco do something similar from the browser’s perspective. They take content from multiple sites, consume it as data, and generate new interfaces that the user interacts with instead of the original pages.

When a user opens five tabs researching a purchase and Disco synthesizes those into a custom comparison app, your website’s carefully designed product page becomes one data source among several. The user never sees your layout, your messaging, your brand. They see a generatively created interface that consumed your content alongside your competitors’.

This makes site delegates more important, not less. If your content is going to be consumed and transformed by generative browsers, having a delegate helps ensure it’s represented accurately in that transformation. The delegate can provide structured data that’s harder to misinterpret. It can expose the relationships and priorities that matter. It can shape what gets fed into the synthesis rather than leaving it to chance.

The site delegate and the agentic browser are two sides of the same shift. One represents the entity to agents. The other represents the user to the web. Where they meet is where the new interface forms.

The Emerging Infrastructure

This isn’t speculative. The protocols are being built.

MCP (Model Context Protocol) provides a standard for how agents discover and invoke capabilities. Originally created by Anthropic, it’s now under the Linux Foundation with support from OpenAI, Google, Microsoft, and others. Over 10,000 MCP servers are already active, and WordPress supports this in a Core-Canonical way.
Agent2Agent (A2A) is being developed specifically for agent-to-agent communication. It complements MCP: where MCP handles tool access, A2A handles the conversational layer between agents.

These protocols are open. Multiple implementations exist. The ecosystem is forming around shared standards rather than proprietary lock-in. Of course it will take time,

The site delegate isn’t replacing the website. It’s extending it. Human visitors still matter. But alongside them, a new interface emerges. One optimized for agent interactions. Where the entity actively represents itself rather than passively waiting to be interpreted. Where negotiation and adaptation happen in real time. Where the modality of presentation becomes fluid.

Next: What This Means for WordPress