Resilient ML

Voting ended over 4 years agoSucceeded

Data is the modern oil of the blockchain economy. ResilientML Semantic Reservoirs will bring a vast collection of carefully crafted semantic and linguistically tailored datasets curated by experts in Natural Language Processing for utilization directly in machine learning methods and sentiment models running in the Ocean environment and available through the Ocean marketplace via the ResilientML NLP data app.

Full Proposal

https://port.oceanprotocol.com/t/resilientml-expansion-of-sentiment-data-and-sentence-structure-features-analytics-application/864

Grant Deliverables

In this round we will expand our offering to include:

historical and current high-level extracted sentence structure features
spectral and matrix factorisation features obtained from the document-document and word-word matrices, as a separate dataset for the currently available text data, as well as
additional Analytics components on the prototype App to illustrate information about dataset content and article authors, relating to the KPIs of veracity and provenance identification, and adding to the currently available KPIs of volume, variety and velocity (update rate). The dataset expansion will span the additional following components:
Web3.0 coins: Filecoin, BitTorrent, Stacks, The Graph, Basic Attention Token, Siacoin, Helium, Arweave;
Layer1 coins (Solana);
DeFi coins (Terra, PancakeSwap, Maker, THORChain, Serum);
Metaverse/Gaming/NFT coins (Enjin, Axie Infinity, Red Fox Labs);
Stablecoins (USDC, BUSD), and additional news sources:
NewsBTC;
Bitcoinist;
Blocknomi;
Coinspeaker. Specifically, in addition to the post-processed tokens, and the text-noise-free sentences, we will also provide parsimonious spectral features extracted from the articles via matrix factorisation approaches. As before, we will expand the dataset going forwards but also back-filling the currently available data. This will effectively increase the current data offering we have developed by around a factor of three in terms of content and volume of processed corpus. We will further continue to grow the sophistication of the data being provided . This will be achieved by progressively migrating from data munging to feature extraction and data curation for feature libraries.

Engage in community conversation, questions and feedback

https://discord.gg/TnXjkR5

Cast your vote below!

Off-Chain Vote

Yes

813.36K 100%

0 0%

Download mobile app to vote

Timeline

Sep 09, 2021Proposal created

Sep 09, 2021Proposal vote started

Sep 13, 2021Proposal vote ended

Oct 26, 2023Proposal updated

Grant Deliverables

In this round we will expand our offering to include:

historical and current high-level extracted sentence structure features

spectral and matrix factorisation features obtained from the document-document and word-word matrices, as a separate dataset for the currently available text data, as well as

additional Analytics components on the prototype App to illustrate information about dataset content and article authors, relating to the KPIs of veracity and provenance identification, and adding to the currently available KPIs of volume, variety and velocity (update rate). The dataset expansion will span the additional following components:

Web3.0 coins: Filecoin, BitTorrent, Stacks, The Graph, Basic Attention Token, Siacoin, Helium, Arweave;

Layer1 coins (Solana);

DeFi coins (Terra, PancakeSwap, Maker, THORChain, Serum);

Metaverse/Gaming/NFT coins (Enjin, Axie Infinity, Red Fox Labs);

Stablecoins (USDC, BUSD), and additional news sources:

NewsBTC;

Bitcoinist;

Blocknomi;

Coinspeaker. Specifically, in addition to the post-processed tokens, and the text-noise-free sentences, we will also provide parsimonious spectral features extracted from the articles via matrix factorisation approaches. As before, we will expand the dataset going forwards but also back-filling the currently available data. This will effectively increase the current data offering we have developed by around a factor of three in terms of content and volume of processed corpus. We will further continue to grow the sophistication of the data being provided . This will be achieved by progressively migrating from data munging to feature extraction and data curation for feature libraries.

Engage in community conversation, questions and feedback