How can Web 3.0 ever hope to advance if it is dependent upon centralized data? This is something that goes against the core of what is sought.
We long discussed Web 3.0 and the challenges it faces. The fact that it is struggling is not a surprise. Ultimately, it comes down to infrastructure. Without that, little will be accomplished.
The explosion of AI, specifically generative AI such as LLMs, helped people to realize the value of data. It was something that most knew yet few had the insight into exactly how much is was sought after.
Over the last couple years, this was driven home. It is no surprise that Google, X, and Meta are 3 of the leading developers of models. They have access to years worth of data along with billons of tokens being added each day.
As we can see, this is a feedback loop that improves their systems to a degree larger than others.
Web 3.0 has a chance to step up in the open data market.
Image generated by Ideogram
Hive: Needs To Tap Into The $350 Billion Open Data Market
We are talking about a lot of money. When it comes to data, the denomination is billions. The open data market has an estimated value of $350 billion. This is only going to keep growing.
The problem with this is there is a dependency upon centralized infrastructure. Hence, the question at the top of the article. How will this reconcile?
My view is that it doesn't. For this reason, a transition is needed.
Enter blockchain. These networks are ideal for data storage. The infrastructure is already decentralized.
To realize its potential, open data must shift to decentralized infrastructure. Once open data channels start using a decentralized and open infrastructure, multiple vulnerabilities for user applications will be solved.
This would change the entire structure of AI models and, more importantly, inference. Open source (centralized) models require GPUs for inference. This gets very costly. Even though costs are plummeting, there is still a major barrier to entry.
In other words, the little guy is shut out. This we need another solution.
On the contrary, decentralized node runners can support the development of open-source LLMs by serving as AI endpoints to provide deterministic data to clients. Decentralized networks lower entry barriers by empowering operators to launch their gateway on top of the network.
Obviously, decentralized computing is something that is being explored within the Web 3.0 realm. The progress made there, however, is going to come back to the data available. Reliance on centralized infrastructure is not going to work out in the end.
Vulnerability From Centralization
The leading data hosting platforms are Amazon Web Services, Google Cloud and Microsoft Azure. This is where many applications relating to Web 3.0 are built. The reason is the familiarity by developers and the ease of use.
Issues arise when these platforms decide to shut operations down. It happened to Metamask in 2022. The companies that operate these systems are in complete control.
We see an obvious point of vulnerability. At any time, applications can be cut off from the data. This is the danger of third party providers. Open APIs are vital.
Blockchain offers a solution due to the fact that anyone can set one up. Hive data, for example, can be acquired by anyone. An API node can be fired up at any time. In fact, other nodes also are premissionless. One can run one for block validation or operate a layer 2 (called HAF) for dataset management.
The key here is that applications that rely upon this data cannot be cut off. Full access is always available since the application team is also in control of the API, which pulls data from a permissionless database.
Decentralized infrastructure is trustless, distributed, cost-effective and censorship-resistant. As a result, decentralized infrastructure will be the default choice for app developers and companies alike, leading to a mutually beneficial growth narrative.
[Source](https://cointelegraph.com/news/centralized-data-infrastructure-violates-web3)
As always, the question is who will be involved in this process?
Data Is The New Oil
By now, we all heard this statement. This, naturally, cannot operate in a vacuum.
For Web 3.0 to truly flourish, data must be fed over open infrastructure. Reliance upon centralized corporations like Big Tech is a dangerous game. At any time, they can simply shut things down.
Open infrastructure has many use cases, from hosting a decentralized application (DApp) or a trading bot to sharing research data to training and inference of large language models (LLMs).
The success of Deepseek shook markets around the world. Here was the introduction of a model that has a completely different cost structure as compared to what others were spending.
While there is debate about the true costs, the structural changes in the process are evident. It is believed the dependence upon GPUs was greatly reduced (if not eliminated).
Blockchain can offer an alternative. The cost of running inference can be reduced, something that is going to be evident over the next few years. Distributed computing, a field that is growing, is only of benefit if there is data accessibility. Training an application on a system like this can be thwarted if the datasets are housed on Amazon.
There is another old-time saying: information yearns to be free.
Our present Internet is a walled system to the point it is siloed. This is something we can start to address. Open data tears down the walls, providing access to everyone.
If we are moving towards a future where AI takes over most tasks (read jobs), then a replacement is needed. There is a growing movement of people talking about Universal Basic AI (UBAI) as a solution.
Of course, for this even to be remotely possible, we need open data or else the dependence is upon centralized systems like Big Tech or governments.
The open data market is growing and a blockchain like Hive has the potential to leverage this.