Centralized data infrastructure violates Web3’s core of decentralization

Open data must transition to decentralized infrastructure to realize its full potential and reap the benefits of affordable LLM training, accessible research data sharing and unstoppable DApp hosting.

Opinion by: Michael O’Rourke, founder of Pocket Network and CEO of Grove

Open data is currently a major contributor toward building a global emerging tech economy, with an estimated market of over $350 billion. Open data sources often rely, however, on centralized infrastructure, contrary to the philosophy of autonomy and censorship resistance.

To realize its potential, open data must shift to decentralized infrastructure. Once open data channels start using a decentralized and open infrastructure, multiple vulnerabilities for user applications will be solved.

Open infrastructure has many use cases, from hosting a decentralized application (DApp) or a trading bot to sharing research data to training and inference of large language models (LLMs). Looking closely into each helps us better understand why leveraging decentralized infrastructure for open data is more utilitarian than centralized infrastructure.

Affordable LLM training and inference 

The launch of the open-source AI DeepSeek, which wiped out $1 trillion from the US tech markets, demonstrates the power of open-source protocols. It’s a wake-up call to focus on the new world economy of open data.

To begin with, closed-source, centralized AI models have high costs for training LLMs and generating accurate results.

Unsurprisingly, the final stage of training DeepSeek R1 cost just about $5.5 million, compared to over $100 million for OpenAI’s GPT-4. Yet, the emerging AI industry still relies on centralized infrastructure platforms like LLM API providers, which are essentially at odds with emerging open-source innovations. 

Hosting open-source LLMs like Llama 2 and DeepSeek R1 is simple and inexpensive. Unlike stateful blockchains requiring constant syncing, LLMs are stateless and only need periodic updates. 

Recent: Here’s why DeepSeek crashed your Bitcoin and crypto

Despite the simplicity, the computational costs of running inference on open-source models are high, as node runners need GPUs. These models can save costs as they don’t require real-time updates to continuously sync.

The rise of generalizable base models like GPT-4 has enabled the development of new products through contextual inference. Centralized companies like OpenAI won’t allow any random network support or inference from their trained model.

On the contrary, decentralized node runners can support the development of open-source LLMs by serving as AI endpoints to provide deterministic data to clients. Decentralized networks lower entry barriers by empowering operators to launch their gateway on top of the network.

These decentralized infrastructure protocols serve millions of requests on their permissionless networks by open-sourcing the core gateway and service infrastructure. Consequently, any entrepreneur or operator can deploy their gateway and tap into an emerging market.

Leave a Comment