Cloudflare to block AI crawlers
Cloudflare is the first internet infrastructure provider to block AI crawler accessing content without permission or compensation, by default.
Website owners will now have the option if they want to allow AI crawlers to access their content and decide how AI companies use it thanks to Cloudflare’s new default setting. As one of the first Internet infrastructure provider to block AI crawlers, the option is seen as a big win for content developers, especially news and media organizations, who continue to have concerns about their content being used by AI to train its models.
The new setting also allows AI companies to clearly state their purpose when accessing such content. This includes stating if their crawlers are used for training, inference, or search so that website owners decide which crawlers to allow. Cloudflare's new default setting is the first step toward a more sustainable future for both content creators and AI innovators.
With GenAI capabilities now capable of generating more content, one of the biggest concerns was on the originality of the content produced. News organizations, musicians and writers continue to take legal action on some AI companies who train their models on their content without permission. The other concern is compensation for the content used for training.
“If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone – creators, consumers, tomorrow’s AI founders, and the future of the web itself,” said Matthew Prince, co-founder and CEO of Cloudflare.
“Original content is what makes the Internet one of the greatest inventions in the last century, and it's essential that creators continue making it. AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators, while still helping AI companies innovate. This is about safeguarding the future of a free and vibrant Internet with a new model that works for everyone,” added Prince.
Pay Per Crawl
According to Cloudflare, starting today, a group of some of the leading publishers and content creators can also set a price for AI companies to access their content through a feature called Pay Per Crawl . This feature will let creators control access and get paid, ensuring AI companies can use quality content the right way.
Specifically, Pay Per Crawl allows publishers to block all AI crawlers, allow specific ones, charge for access, or grant free access. They will have full control over how their content is accessed. Meanwhile, AI crawler owners will be able to use Pay Per Crawl to register, see pricing, and choose to pay or walk away.
Apart from safeguarding their content, publishers and content creators also have the opportunity to monetize their content as Pay Per Crawl enables them to unlock new income by licensing content to AI companies through a simple platform.
Cloudflare currently powers one of the world’s largest networks, helping to manage and protect traffic for 20% of the web. The company handles trillions of requests daily and thus has the world’s most advanced bot management solutions, accurately distinguishing between human users and AI crawlers. In September 2024, Cloudflare introduced the option to block AI crawlers with a single click. More than one million customers have since chosen this option, meant to be an aggressive but easy solution that halts scraping while they determine their AI strategy.
The latest update enables a permission-based model for AI crawlers. As such, AI companies will now be required to obtain explicit permission from a website before scraping. Upon sign-up with Cloudflare, every new domain will now be asked if they want to allow AI crawlers, giving customers the choice upfront to explicitly allow or deny AI crawlers access.
This significant shift means that every new domain starts with the default of control and eliminates the need for webpage owners to manually configure their settings to opt out. Customers can easily check their settings and enable crawling at any time if they want their content to be freely accessed.
At the same time, Cloudflare has also proposed making new ways for AI bots to authenticate themselves as well as for websites to identify those bots. This gives creators and website owners new identification mechanisms and control over what crawlers they want to allow. Cloudflare is participating in the development of a new protocol to provide bot owners and AI agent developers with a public, standard way to identify themselves.