HF Dev team announce the launch of Deploy on Cloudflare Workers AI today, a new Hugging Face Hub integration. This integration simplifies the use of open models as a serverless API. It relies on advanced GPUs in Cloudflare’s edge data centers. They start integrating popular open models from Hugging Face into Cloudflare Workers AI. These integrations leverage their production solutions, like Text Generation Inference.
Developers can create strong Generative AI applications with Deploy on Cloudflare Workers AI. They avoid managing GPU infrastructure and servers. The operating cost is low; they pay only for the compute used.
Generative AI for Developers
The new experience builds on the strategic partnership announced last year. It aims to make it easier to access and deploy open Generative AI models. A major issue for developers and organizations is the limited availability of GPUs. There are also high costs involved in deploying servers for development. Deploy on Cloudflare Workers AI offers a simple, affordable solution. It provides serverless access to popular Hugging Face Models with pay-per-request pricing.
Consider a specific example. You’re developing an RAG Application that receives around 1000 requests daily. It processes an input of 1k tokens and produces 100 tokens using Meta Llama 2 7B. The daily production costs for LLM inference would be about $1.

John Graham-Cumming, CTO of Cloudflare, expressed excitement about the rapid realization of this integration. He highlighted the benefits of giving developers access to Cloudflare’s global network of serverless GPUs. Coupled with Hugging Face’s most popular open source models, this move is expected to spark widespread innovation across their global community.
HuggingFace + Cloudflare: How it works
Using Hugging Face Models on Cloudflare Workers AI is straightforward. Here are detailed steps to use Hermes 2 Pro on Mistral 7B, the latest model by Nous Research. All available models are listed in the Cloudflare Collection. Remember: Access to a Cloudflare Account and API Token is necessary.
The “Deploy on Cloudflare” option is visible on the pages of all available models, such as Llama, Gemma, or Mistral.

Open the “Deploy” menu and choose “Cloudflare Workers AI.” This action opens an interface with instructions for using the model and sending requests.
If the “Cloudflare Workers AI” option is missing for your chosen model, it’s not supported yet. Efforts are ongoing to increase model availability in collaboration with Cloudflare. For requests, contact api-enterprise@huggingface.co.

The integration offers two methods: the Workers AI REST API or the Cloudflare AI SDK. Choose your preferred option and paste the code into your setup. Also, ensure that the ACCOUNT_ID and API_TOKEN variables are set when using the REST API.
Now, you’re ready to send requests to Hugging Face Models on Cloudflare Workers AI. Remember to use the prompt and template the model expects.
Read related articles:

