{
"labels": [],
"body": {
"proposal_body_version": "V1",
"name": "NEAR Data Integration into KYVE’s Decentralized Data Lake",
"description": "# NEAR Data Integration into KYVE’s Decentralized Data Lake\n\n**Description:**\n\nThis is a combined proposal for the following RFPs: [NEAR Lake](https://dev.near.org/infrastructure-committee.near/widget/app?page=rfp&id=3), [BigQuery, and Data Pipelines](https://dev.near.org/infrastructure-committee.near/widget/app?page=rfp&id=4). It also includes a **Trustless** **API** to query NEAR historical data, with the long-term vision of decentralizing the NEAR data infrastructure and turning NEAR data into a public good.\n\n## **Project Abstract**\n\nAs NEAR’s data infrastructure undergoes significant changes with Pagoda winding down, there is a **growing need for decentralized and validated data solutions** to ensure secure and scalable access to NEAR’s blockchain data. This proposal outlines a phased approach to integrate NEAR’s blockchain data into KYVE’s Web3 data lake, enabling validated and permanently stored data streams for developers, analysts, and businesses within the NEAR ecosystem.\n\n**What is KYVE?**\n\nKYVE, a data validation and transfer protocol, enables data providers to standardize, validate, and permanently store data streams. It leverages permanent data storage solutions like Arweave and uses its own Layer-1 Blockchain, ensuring the scalability, immutability, and availability of archived data over time.\n\nKYVE's Network is powered by decentralized validators, which are rewarded in $KYVE and other tokens ([https://docs.kyve.network/](https://docs.kyve.network/)).\n\n## **Objectives**\n\nThe goal of this integration is to create a robust and sustainable data infrastructure for the NEAR ecosystem. Unlike centralized solutions, KYVE relies on a network of decentralized providers to store, validate, and serve blockchain data. This reduces the risk of single points of failure and removes dependencies on centralized services.\n\nThe integration of NEAR’s blockchain data into KYVE's decentralized data lake ensures long-term availability, immutability, and verifiability. By utilizing decentralized storage providers like Arweave, NEAR data becomes a public good, eliminating the need for centralized services and reducing risks. KYVE’s built-in tools, including the Trustless API and Data Load Tool, provide seamless access to validated data, future-proofing NEAR’s data infrastructure.\n\n## **Outcomes**\n\nThe integration of NEAR’s blockchain data into KYVE’s decentralized data lake will deliver significant benefits across three major products: KYVE Data Lake, Data Load Tool, and Trustless API.\n\n**1. KYVE’s Decentralized Data Lake ([https://app.kyve.network/#/sources](https://app.kyve.network/#/sources))**\n\nNEAR’s blockchain data is validated by KYVE and stored on Arweave, ensuring permanent, decentralized storage. Validators are rewarded in $NEAR for maintaining data accuracy, reducing reliance on centralized services and securing the long-term availability of NEAR data as a public good.\n\n**2. Data Load Tool Support ([https://docs.kyve.network/access-data-sets/data-pipeline/overview](https://docs.kyve.network/access-data-sets/data-pipeline/overview))**\n\nKYVE’s Data Load Tool streamlines transferring NEAR blockchain data to external warehouses like BigQuery and Postgres, removing the need for custom pipelines. With Load and Sync commands, data is kept fresh and accessible for analysis. Additional steps enable full schematization for deeper insights.\n\n**3. Trustless API ([https://docs.kyve.network/access-data-sets/trustless-api/overview](https://docs.kyve.network/access-data-sets/trustless-api/overview))**\n\nKYVE’s Trustless API offers verifiable, decentralized access to historical and live NEAR data (2-minute latency), secured by Merkle Proofs. Developers can easily integrate data into applications via well-known endpoints. The open-source tool enables independent deployment for scalability without stressing node infrastructure.\n\n## **Project Team**\n\n*This is a joint proposal by KYVE Foundation & BCP Innovations, combining deep expertise in decentralized data infrastructure with a strong commitment to driving innovation in Web3 technologies. Together, we are committed to creating a secure, decentralized, and robust data ecosystem for the NEAR blockchain through KYVE’s cutting-edge solutions.*\n\n**KYVE Foundation**\n\nThe KYVE Foundation supports the development and growth of KYVE’s decentralized data infrastructure, with a strong emphasis on decentralization, security, and data integrity. Backed by key industry players, KYVE ensures secure and scalable data solutions for leading blockchain ecosystems like Cosmos Hub, Celestia, Archway, dYdX, and others.\n\n**BCP Innovations**\n\nBCP Innovations is a German Web3 development company specializing in blockchain infrastructure and decentralized applications. As the team behind KYVE, BCP Innovations delivers secure data validation and permanent storage. For the NEAR-KYVE integration, BCP will build secure data pipelines to ensure NEAR's blockchain data remains available and immutable.\n\n**Project Team:**\n\n**Fabian Riewe: BCP CEO and KYVE Founder,**- ([https://github.com/fabianriewe](https://github.com/fabianriewe)) - [fabian@kyve.network](mailto:fabian@kyve.network)\n\n**Maximilian Breithecker:** Data Engineer and Blockchain Developer, ([https://github.com/mbreithecker](https://github.com/mbreithecker)) - [max@kyve.network](mailto:max@kyve.network)\n\n**Troy Kessler:** Protocol Developer, [https://github.com/troykessler](https://github.com/troykessler) - troy@kyve.network\n\n**Christopher Brumm:** Developer [https://github.com/christopherbrumm](https://github.com/christopherbrumm) - [christopher@kyve.network](mailto:christopher@kyve.network)\n\nFor more details on BCP contributions, please see the Github repositories. All tools from this integration will be released under an **Open Source license**, ensuring transparency, community collaboration, and wide adoption.\n\n- [https://github.com/KYVENetwork/chain](https://github.com/KYVENetwork/chain)\n- [https://github.com/KYVENetwork/kyvejs](https://github.com/KYVENetwork/kyvejs)\n- https://github.com/KYVENetwork/kyve-dlt\n- [https://github.com/KYVENetwork/trustless-api](https://github.com/KYVENetwork/trustless-api)\n- [https://github.com/KYVENetwork/ksync](https://github.com/KYVENetwork/ksync)\n- [https://docs.kyve.network/](https://docs.kyve.network/)\n\n## **Usage & Examples**\n\n**Current Users of KYVE's Decentralized Data Lake**\n\n**KYVE** serves as the trusted data infrastructure for leading blockchain ecosystems like **Celestia, Osmosis, dYdX**, and **more** ensuring their on-chain data is securely validated and permanently stored through decentralized storage solutions like **Arweave**. These projects use KYVE to maintain the integrity, accessibility, and reliability of their mission-critical blockchain data without needing to manage complex, centralized infrastructure. By using KYVE, they benefit from decentralized validation and long-term storage, ensuring data remains immutable and available for future use as a public good. [https://app.kyve.network/#/sources](https://app.kyve.network/#/sources)\n\n**Statistics:**\n\n- **+70M Transactions**\n- **+7,87 TB validated and archived on Arweave (Mainnet)**\n- **+18.54 TB validated and archived on Testnet**\n\nKYVE continues to scale its infrastructure, making it a cornerstone for secure and scalable data management in Web3 ecosystems.\n\n**Google BigQuery Data Sets:**\n\nLeveraging KYVE’s data load pipeline, BCP Innovations has published several datasets, such as those for **Osmosis**, **dYdX, Celestia, Noble** making it significantly easier for developers and data analysts to access and use blockchain data without needing to sync it from scratch. This eliminates the time-consuming process of syncing large amounts of historical data, allowing users to quickly query and analyze data. By providing pre-synced, validated datasets, KYVE reduces infrastructure overhead, enhances data accessibility, and enables faster development cycles for applications that rely on blockchain data.\n\n[https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/36527693454/locations/eu/dataExchanges/kyve_public_datasets_191a23b668b/listings/osmosis_191a28703d2](https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/36527693454/locations/eu/dataExchanges/kyve_public_datasets_191a23b668b/listings/osmosis_191a28703d2)\n\n**Potential Use Cases for the NEAR <> KYVE Integration**\n\nNEAR Developers: KYVE’s decentralized data lake gives NEAR developers free access to validated, trustless data, eliminating the need for costly centralized services. This ensures a consistent, secure, and reliable infrastructure for applications.\n\ndApps and Blockchain Analytics: Decentralized applications and analytics platforms can rely on KYVE to validate and archive NEAR data in a fully decentralized way, ensuring data integrity and reducing reliance on centralized systems, while enabling in-depth analysis and insights.\n\n**Proof of Concept: NEAR Trustless API Demo:**\n\nThe **Trustless API** enables developers to seamlessly connect their applications and dApps to verified blockchain data. Each data item is verified with a **Merkle Proof** (included in the http header), ensuring the integrity and trustworthiness of the data accessed.\n\nYou can explore NEAR blocks using the Trustless API:\n\n- **First NEAR block**: [Block #127590000](https://staging-613b8d89-data.services.kyve.network/near/value?height=127590000)\n- **Last NEAR block**: [Block #127591973](https://staging-613b8d89-data.services.kyve.network/near/value?height=127591973)\n\nThis ensures that all data retrieved through the API is cryptographically verified and secure for use in any decentralized application.\n\n# **Milestones**\n\n**Milestone 1: NEAR Runtime Development & Testnet Deployment**\n\n*Completion by January 31, 2025*\n\n- **Develop NEAR runtime** to enable the retrieval and validation of NEAR blocks and chunks (txs and receipts are included in the chunks).\n- **Collaborate with NEAR team** and **Community** to ensure smooth integration and alignment.\n- **Upload validated data to Arweave** for permanent decentralized storage. After testing is complete, the testnet pool will switch and continue uploading to the KYVE storage provider, a centralized storage solution for testnet operations.\n- **Test the NEAR runtime on KYVE’s testnet (Kaon)** to validate data retrieval, storage processes, and functionality. This includes a beta-testing phase with KYVE/NEAR Validators.\n- After successful testing, **prepare for mainnet deployment**.\n\n---\n\n**Milestone 2: KYVE Data Pool Deployment on Mainnet & Base Functionality Release**\n\n*Completion by February 1, 2025*\n\n- **Deploy the KYVE NEAR data pool** on mainnet to handle data extraction, validation, and permanent storage.\n- Release **base functionalities** of the KYVE **Data Pipeline** and **Trustless API** for community access to validated NEAR data.\n- **Integrate NEAR tokens** into the KYVE ecosystem based on existing NEAR-IBC bridges or other solutions, to fund the data pool and incentivize validators for maintaining the pool.\n- **Collaborate with NEAR community** to ensure smooth deployment and provide initial performance reports for monitoring.\n\n---\n\n**Milestone 3: Performance Optimization, Maintenance & Optional Advanced Data Transformations**\n\n*Completion by March 1, 2025*\n\n- **Maintain and optimize the KYVE NEAR integration** for efficient data streaming, validation, and storage.\n- **Monitor performance** and implement improvements based on usage and community feedback.\n- **Host workshops and webinars** to engage the developer community and showcase the full capabilities of the KYVE NEAR integration.\n- **Optional**: Develop advanced data transformations using **DBT (Data Build Tool)** to convert raw NEAR data into schematized BigQuery tables for advanced analytics and reporting.\n\n---\n\n**Milestone 4: Transition Planning & Knowledge Transfer**\n\n*Completion by December 31, 2025*\n\n- **Conduct workshops** with NEAR teams (Infrastructure Committee) to develop a detailed transition plan to KYVE’s decentralized data infrastructure.\n- **Prepare a comprehensive knowledge transfer document** that outlines NEAR’s current infrastructure and the steps for transitioning to KYVE.\n\n## **Budget**\n\nA KYVE data pool requires ongoing funding to operate. This funding can be provided by any party interested in having the data archived, validated, and made publicly accessible. The funds are distributed to the protocol validators and their delegators, who ensure the data is securely validated and uploaded. If the pool runs out of funds, the uploading stops, but the previously archived data remains accessible as a public good.\n\n**Why Arweave?**\n\nAmazon S3 charges for the entire archive every month, which can add up significantly over time, especially for large volumes of data. In contrast, with Arweave, you pay a single upfront fee for permanent storage. This model drastically reduces long-term costs, making it ideal for archives where data needs to be retained indefinitely without the burden of recurring monthly payments. Additionally, Arweave allows for free data retrieval, eliminating another cost that can become substantial with traditional cloud providers. Another key advantage of Arweave is its decentralized storage model, which distributes data across a global network, increasing resilience and security by removing any single point of failure. By choosing Arweave over S3 in this proposal, the goal is to minimize long-term storage expenses while benefiting from the resilience, decentralization, and free data access that Arweave offers.\n\nThe cost for the integration will be split as follows:\n\n- 1) Initial Integration\n- 2) Data Pool Funding\n- 3) Maintenance\n\n### **Phase 1) Initial Integration -** **$60 000 (Deliverables as per Milestone 1)**\n\nDevelop the NEAR Runtime for KYVE to enable seamless retrieval and validation of NEAR blocks and chunks in collaboration with the NEAR team and NEAR ecosystem partners.\n\n- Implement functionality for uploading validated NEAR data to Arweave, ensuring permanent, decentralized storage.\n- Conduct extensive testing of the NEAR runtime on KYVE’s testnet (Kaon) to validate the efficiency of data retrieval, validation, and storage processes.\n- Upon successful testing and validation on the testnet, proceed with preparations for the mainnet deployment.\n- Assuming a total of 600h development work divided between BCP and the validators based on $100/h.\n\nThis process will require two developers over a period of two months, as well as infrastructure costs (such as a NEAR archival node) to facilitate the integration testing. Additionally, collaboration with the NEAR team will be necessary to perform research and set up a customized Trustless API for optimized data access.\n\n### **Phase 2) Continuous Funding and Maintenance (Deliverables as per Milestone 2-4)**\n\n### **Option 1: Fixed Payment through Grant $335,000**\n\n**Historical NEAR Data Pool Funding (Genesis to current live height) - $175,000**\n\nThe funding costs mainly consist of storage expenses and the incentivization premium for validators to upload data. For this example, our suggestion is to use Arweave as the storage provider.\n\nIt can be estimated that the NEAR raw data size, with compression, is going to be around 6.5TB, based on the current size of BigQuery tables.\n\nCost Breakdown:\n\n1. **Storage on Arweave: $105,000**\n \n Calculated using the current rate for Arweave storage via AR Turbo: 6.5TB * 16.36 USD/GB\n \n2. **Buffer for AR Price Fluctuation: $25,000**\n \n Includes a 25% buffer for potential AR price increases\n \n3. **Validator Incentivization: $45,000**\n \n Cost to incentivize validators for data uploading and validation\n \n\n**Live Data Pool (Future Blocks) - $60,000, payable in monthly installments of $5,000 for one year**\n\nAs NEAR’s blockchain continues to grow, BCP will establish a **live data pool** to handle ongoing data from the moment of deployment on mainnet. While the historical data is managed through existing historical data pools, this live data pool will sync in real-time from NEAR’s **live height** upon deployment. The costs for this include **storage on Arweave** and the incentivization of **KYVE validators** for validating and uploading the data.\n\nBased on NEAR’s block speed of **1.09 seconds** and an average block size of **50KB**, the following costs are estimated: **Total Monthly Costs: $5,000**\n\n- **Storage on Arweave**: 3.96 GB per month (approx.) $2,000\n- **Buffer for Arweave Price Fluctuations (up to 50%)**: $1,000\n- **Validator Incentivization**: $2,000\n\n![https://ipfs.near.social/ipfs/bafkreic52g47ny6rph4ljnbma6htnd6xqqy2eobie7hh4oulabdtwmphgi](https://ipfs.near.social/ipfs/bafkreic52g47ny6rph4ljnbma6htnd6xqqy2eobie7hh4oulabdtwmphgi)\n\n*All Data Pool funding will be directly forwarded to the corresponding NEAR data pools on KYVE.*\n\n**Maintenance - $100,000, payable in 10 monthly installments of $10,000**\n\nThis includes the maintenance of the data pool over a year in order to maintain it and follow the upgrade path, to support the validators archiving the NEAR data, and the maintenance and improvement of the Trustless API and of the Data Pipeline.\n\n- The above pricing includes reading data from the Trustless API and using the Data Pipeline by any data user.\n\nIf the total of the grant is not used, the remaining part will be returned or used to continue the funding. Once the funding is spent, the data and tooling remain accessible as a public good.\n\n### **Option 2: Public Good Funding (KYVE Foundation runs a validator to cover the costs)**\n\nKYVE's Public Goods Funding Program is an alternative way to provide funding for the data pools and cover the maintenance.\n\nTo integrate NEAR’s data, KYVE requests a validator delegation from the NEAR Foundation. KYVE uses the returns from this delegation to fund the operation of a Public Good pool. This pool ensures that NEAR’s historical and live data is validated, stored, and continuously synced to live height without the need for costly archival nodes.\n\n- Delegation: NEAR delegates tokens to KYVE to run a Public Good validator node.\n- Rewards: The data pool is funded by validator commission rewards, and both KYVE and NEAR validators receive tokens as rewards. The commission rewards should cover the costs associated with the operations of all data pools as well as the maintenance by BCP.\n\n![https://ipfs.near.social/ipfs/bafkreiccelx5qsyuo7e2h3ojrcd4ybeblr7fqfxagp6ophtv2lgeqdhqau](https://ipfs.near.social/ipfs/bafkreiccelx5qsyuo7e2h3ojrcd4ybeblr7fqfxagp6ophtv2lgeqdhqau)\n\nOffloads the need for NEAR to run its own expensive data infrastructure, allowing it to scale more easily. This mutually beneficial setup ensures NEAR’s data is maintained at minimal cost while enhancing the decentralization of its data infrastructure.\n\n# **Conclusion**\n\nKYVE's decentralized data lake provides NEAR with a complete, valid, and accessible solution for managing blockchain data. By decentralizing storage and validation, NEAR’s data becomes a public good, easily accessible without reliance on centralized infrastructure. Tools like the Trustless API and Data Load Tool allow developers to leverage NEAR’s raw data for various use cases, unlocking new opportunities for growth and innovation.\n\nKYVE Foundation invites the NEAR Foundation to partner with KYVE in building a decentralized, secure, and scalable data infrastructure. Together, we can future-proof NEAR’s data landscape and empower its ecosystem with robust tools for continued success.",
"category": "Infrastructure Committee",
"summary": "As NEAR’s data infrastructure undergoes significant changes with Pagoda winding down, there is a growing need for decentralized and validated data solutions to ensure secure and scalable access to NEAR’s blockchain data. This proposal outlines a phased approach to integrate NEAR’s blockchain data into KYVE’s Web3 data lake, enabling validated and permanently stored data streams for developers, analysts, and businesses within the NEAR ecosystem.",
"linked_proposals": [],
"requested_sponsorship_usd_amount": "395000",
"requested_sponsorship_paid_in_currency": "NEAR",
"receiver_account": "kyve_foundation.near",
"requested_sponsor": "infrastructure-committee.near",
"timeline": {
"status": "REJECTED",
"sponsor_requested_review": false,
"reviewer_completed_attestation": false
},
"linked_rfp": 3,
"supervisor": "trechriron71.near"
},
"id": 50
}