Google to explore alternatives to robots.txt in wake of generative AI and other emerging technologies

Google is exploring alternatives or supplemental ways of controlling crawling and indexing beyond the 30 year standard of the robots.txt protocol. “We believe it’s time for the web and AI communities to explore additional machine-readable means for web publisher choice and control for emerging AI and research use cases,” Google wrote.

Engaging with the community. Google said it is inviting members from the web and AI communities to discuss in a new protocol. Google said it is “kicking off a public discussion,” with a “broad range of voices from across web publishers, civil society, academia and more fields from around the world.

Timing. Google said these discussions are happening over ” to join the discussion, and we will be convening those interested in participating over the “the coming months.” So nothing is happening too soon and nothing is changing tomorrow.

Paywalled content issue. Recently, Open AI disabled the browse with Bing feature in ChatGPT after it was able to access paywalled content without publisher permission. This is one of the many reasons why maybe Google is looking for alternatives to the robots.txt protocol.

Why we care. We have all become accused to allowing bot access to our websites by using robots.txt and other forms of newer structured data. But we may be looking at new methods in the future. What those methods and protocols may look like is unknown right now but the discussion is happening.


New on Search Engine Land

    About the author

    Barry Schwartz

    Barry Schwartz a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics. Barry can be followed on Twitter here.


    Read More