Introduction
The Web is in decline due to enshittification, a term popularized by Cory Doctorow. Although Doctorow does not know this, open source decentralized applications (dapps) reaching state consensus via the blockchain is critical to overcome this problem and for Web3 to succeed in its mission to save humanity.
The underlying reason why enshittification occurs is because popular web platforms are owned and controlled by centralized organizations that ultimately do not do what is best for their users. Real blockchains on the other hand are either fully autonomous (never change), or are governed by a decentralized autonomous organization (DAO).
Ethereum is has evolved into a poor man's version of the original Polkadot. The next version of Polkadot (JAM) will push the boundries even further. This is why Polkadot is the ideal technology to build the autonomous infrastructure we need to run the world.
However, dapps need rich functionality beyond what can be provided on-chain in a fully decentralized manner, for example:
- searching data that is not stored in on-chain state
- pinning / searching IPFS content
- AI queries
- media transcoding
Currently dapps either utilize a centralized backend that will be subject to enshittification, or try to make do with only querying and verifying state directly.
The purpose of Acuity Index is to provide decentralized indexing for Polkadot dapps, starting with event log indexing.
Acuity Index was funded by 2 grants (1,2) from the Web3 Foundation.
It is one of the key technologies behind Acuity, the decentralized publication platform.
The Problem
Dapps need to write to and query blockchain state, events (logs) and decentralized filesystems such as IPFS. Current state can be read and modified on-chain by extrinsics (transactions). Historic state can only be read off-chain. Events can be written by extrinsics, but only read off-chain. Files on IPFS can only be written and read off-chain, but a cryptographic hash of the file can be stored on-chain in state or an event.
Read | Write | Modify | |
---|---|---|---|
Current State | on/off-chain | on-chain | on-chain |
Historic State | off-chain | ||
Events | off-chain | on-chain | |
IPFS | off-chain | off-chain |
Writing data to an event is considerably cheaper than writing it to state. IPFS is free, except for storage and bandwidth costs to keep a file pinned.
For example, the balance of an address must be stored in state. This is necessary so that it can be checked that an account has enough balance for a transfer. The record of balance transfers is stored in events to reduce transaction fees. A user's public avatar and blog posts would be stored on IPFS with only the hashes stored in events.
Dapps need to be able to search decentralized data. For example, a wallet dapp needs to be able to find every balance transfer event either from or to the user's address. A map dapp needs to perform geospatial search on GPS coordinates stored in events. A feed reader dapp needs to be able to perform full-text search stored on IPFS.
Currently, Polkadot dapps will typically query a centralized RPC provider to read the state and broadcast transactions.
This arrangement has a number of issues that undermines the decentralized nature of dapps:
Incorrect or missing data
The dapp typically trusts the RPC provider to return correct query results. In theory, the results may be incorrect or incomplete. This could be used to trick the user into doing something self-harming.
Event query limits
Polkadot RPCs do not provide any event searching facility. EVM RPC providers do, but typically have various query limits (especially free ones):
- max blocks scanned per query
- earliest scannable block
- max execution time
- event searching disabled completely
Additionally, there is no API for a dapp to query what an EVM RPC provider's limits are. If a dapp exceeds limits it will receive a non-semantic error message that can only be presented to the user. This makes for an unacceptable user experience.
Slow queries
The event indexing built into Ethereum uses a form of accelerated scanning using bloom filters. This is considerably slower to query than a real database index and uses more resources.
The provider may take a long time to respond to a query, giving the user a poor experience.
Unavailability
There are various reasons why an RPC provider may not provide query results to a dapp:
- technical problems - 100% uptime is impossible for a centralized service
- geoblocking - social and political pressure can result in queries from certain physical locations being blocked
- KYC requirements - the provider may be required by law to obtain the real-world identity of the user of the dapp
- lack of payment
No standard payment API
There are various services that offer paid access to high quality nodes, but there is no standard for how to pay for them. This creates a lot of friction for users that want to query multiple chains and switch between different providers.
Tracking
The provider could be logging data of which IP addresses and real-world identities are making which queries.
Encourages dapp backends
A very attractive solution to the issues with RPC providers is for dapp developers to build a centralized backend that will do everything required in a very efficient way. Unfortunately, this undermines many of the advantages of having a dapp.
More expensive transactions
Because searching for logs is unreliable, architects of smart contracts and Substrate pallets may decide to store data in state where it can be more easily retrieved. This is considerably more expensive. The additional use of block-space will also make all other transactions on the chain more expensive.
Lack of extensibility
The Solution
One solution is for the user to run their own full node for each chain that is being queried, but this is almost never practical. Running a full node typically requires terabytes of storage & bandwidth and can take weeks to become fully synchronized.
Dapps can query full nodes and use a light client to cryptographically verify the results are correct. In fact, Ethereum and Substrate are both introducing improvements to their light client technology. This solves the problem of incorrect data.
However, this does not solve all the other problems.
An Acuity Index node runs alongside the Substrate node it is indexing. In its simplest implementation it maintains an index of block number and event index for each key, for example account id.
High performance
Acuity Index uses the Sled key value database to create an event index that can handle very large query throughput. It is considerably more efficient than EVM indexed topics.
Clients can request to either receive just the block number and event index of each event, or also receive the event data.
Event data can then be verified as correct using the underlying light client of the chain.
For maximum performance, the index can store the event data. Alternatively, it can retrieve this from the full node as required to save space.
Well-defined query accounting
An index node can track cumulative query weight, either by authenticated account, or by virtual account (ip address).
Standardized payment API
create fee market for indexes
Privacy
If the user is accessing an index node for free, they will need to make a direct connection to the index so it can monitor and limit use based on IP address. This is not good for privacy.
If the user is paying for their queries, then they can also obfuscate their identity by querying via tor or mixnet and paying via anonymous means.
Extensible
full-text, geospatial, etc
Lower on-chain transaction fees
--
Optionally it can index event variants. For example, the index could return a list of all balance transfers. This makes the index much larger.
This entails the use of an index. Much like a full node, an event index consumes significant resources and takes a lot of time to synchronize. A dapp cannot maintain its own index. It needs to query an index run by someone else and verify the results using a light client.
This is the purpose of Acuity Index. An event indexer for all chain types that can be verified cryptographically.
When a Hybrid index is queried for a specific key it will return the block number and event index of events that contain the key.
Additionally, it can return the event contents and enough information for the events to be verified by a light client.
Acuity Index is a blockchain event indexer framework written in Rust. Currently, it can be used to build indexers for Substrate blockchains (Polkadot). In the future it will also support other types of chains such as Ethereum and Bitcoin.
Typically, when writing on-chain code (for example a smart contract or a Substrate pallet) data should only be stored in chain state when it might need to be read during execution of a subsequent transaction. This ensures that transaction fees are kept to an absolute minimum. Events should be emitted containing the data that only needs to be accessed off-chain. This data can then be indexed, either directly on the user's device or via a cloud service.
Architecture
Each Substrate chain has an event schema that changes over time. For this reason it is necessary for each chain family to have built its own Acuity Indexer that is kept current with chain runtimes.
When building an indexer, the acuity-index-substrate Rust library is used to do most of the heavy lifting. Currently, it only supports a single schema per chain. Therefore, once the schema is updated indexing older blocks may result in missing events. This will be resolved in a future version.
For example, to index Polkadot blockchains (Polkadot, Kusama, Westend, Paseo) acuity-index-polkadot is used.
Indexed Keys
acuity-index-substrate has support for indexing the keys that are built-into Substrate: AccountId
, AccountIndex
, BountyIndex
, EraIndex
, MessageId
, PoolId
, PreimageHash
, ProposalHash
, ProposalIndex
, RefIndex
, RegistrarIndex
, SessionIndex
, TipHash
.
Indexers can register additional keys to be indexed.
Indexed Pallets
acuity-index-substrate has support for indexing all the pallets that are built-into Substrate. Indexers can write macros to index additional pallets.
Event Variant Indexing
In addition to indexing keys, Hybrid can index event variants. This means that the index can be queried for occurrences of a specific type of event. This entails an additional database row for every event and therefore uses more storage space than any of the key indexes.
An indexer will typically expose this option via command line option.
Queue Depth
In order to maximize the rate of block indexing, Hybrid Substrate will request multiple blocks simultaneously according to the queue depth parameter.
Currently, this only applies when indexing old blocks. Head blocks are indexed one-at-a-time. When indexing a fully synced node this is fine because head blocks arrive slowly. But when indexing a syncing node the head blocks arrive too quickly and the indexer falls behind. A future release will index old and head blocks using the queue.
Example Output
Here is example output indexing the Westend blockchain:
2024-04-06T04:04:51.826855Z INFO hybrid_indexer: Indexing westend
2024-04-06T04:04:51.826884Z INFO hybrid_indexer: Database path: /home/jbrown/.local/share/hybrid-indexer/westend/db
2024-04-06T04:04:51.826889Z INFO hybrid_indexer: Database mode: LowSpace
2024-04-06T04:04:51.826892Z INFO hybrid_indexer: Database cache capacity: 1024.00 MiB
2024-04-06T04:04:56.169676Z INFO hybrid_indexer: Connecting to: ws://127.0.0.1:9944
2024-04-06T04:04:56.182384Z INFO hybrid_indexer::substrate: 📇 Event variant indexing: enabled
2024-04-06T04:04:56.182454Z INFO hybrid_indexer::websockets: Listening on: 0.0.0.0:8172
2024-04-06T04:04:56.225039Z INFO hybrid_indexer::substrate: 📚 Indexing backwards from #20,277,820
2024-04-06T04:04:56.225822Z INFO hybrid_indexer::substrate: 📚 Re-indexing span of blocks from #12,841,034 to #20,277,818.
2024-04-06T04:04:56.225833Z INFO hybrid_indexer::substrate: 📚 Reason: event variants not indexed.
2024-04-06T04:04:56.225842Z INFO hybrid_indexer::substrate: 📚 Queue depth: 64
2024-04-06T04:04:56.228018Z INFO hybrid_indexer::substrate: Downloading metadata for spec version 1009000
2024-04-06T04:04:56.241092Z INFO hybrid_indexer::substrate: Finished downloading metadata for spec version 1009000
2024-04-06T04:04:58.343335Z INFO hybrid_indexer::substrate: 📚 #20,277,469: 218 blocks/sec, 2,685 events/sec, 2,685 keys/sec
2024-04-06T04:04:59.740002Z INFO hybrid_indexer::substrate: ✨ #20,277,821: 13 events, 13 keys
2024-04-06T04:05:00.227530Z INFO hybrid_indexer::substrate: 📚 #20,275,313: 1,086 blocks/sec, 13,583 events/sec, 13,614 keys/sec
2024-04-06T04:05:02.763669Z INFO hybrid_indexer::substrate: 📚 #20,272,929: 940 blocks/sec, 15,064 events/sec, 18,394 keys/sec
2024-04-06T04:05:04.426178Z INFO hybrid_indexer::substrate: 📚 #20,272,922: 4 blocks/sec, 16,659 events/sec, 33,261 keys/sec
2024-04-06T04:05:05.593444Z INFO hybrid_indexer::substrate: ✨ #20,277,822: 13 events, 13 keys
2024-04-06T04:05:06.226993Z INFO hybrid_indexer::substrate: 📚 #20,272,261: 369 blocks/sec, 23,423 events/sec, 41,978 keys/sec
2024-04-06T04:05:08.227001Z INFO hybrid_indexer::substrate: 📚 #20,270,559: 856 blocks/sec, 45,057 events/sec, 79,452 keys/sec
2024-04-06T04:05:10.331457Z INFO hybrid_indexer::substrate: 📚 #20,268,186: 1,126 blocks/sec, 26,275 events/sec, 38,673 keys/sec
2024-04-06T04:05:11.703634Z INFO hybrid_indexer::substrate: ✨ #20,277,823: 13 events, 13 keys
2024-04-06T04:05:12.227412Z INFO hybrid_indexer::substrate: 📚 #20,266,277: 1,011 blocks/sec, 35,308 events/sec, 57,998 keys/sec
2024-04-06T04:05:14.242257Z INFO hybrid_indexer::substrate: 📚 #20,264,587: 832 blocks/sec, 32,157 events/sec, 53,993 keys/sec
2024-04-06T04:05:15.775867Z INFO hybrid_indexer::substrate: ✨ #20,277,824: 13 events, 13 keys
2024-04-06T04:05:16.227287Z INFO hybrid_indexer::substrate: 📚 #20,262,524: 1,048 blocks/sec, 25,504 events/sec, 38,078 keys/sec
2024-04-06T04:05:18.247665Z INFO hybrid_indexer::substrate: 📚 #20,261,078: 753 blocks/sec, 33,254 events/sec, 57,056 keys/sec
2024-04-06T04:05:20.227680Z INFO hybrid_indexer::substrate: 📚 #20,258,950: 1,054 blocks/sec, 23,478 events/sec, 33,931 keys/sec
2024-04-06T04:05:22.299554Z INFO hybrid_indexer::substrate: 📚 #20,257,402: 722 blocks/sec, 30,677 events/sec, 52,286 keys/sec
2024-04-06T04:05:23.811687Z INFO hybrid_indexer::substrate: ✨ #20,277,825: 13 events, 13 keys
2024-04-06T04:05:24.234355Z INFO hybrid_indexer::substrate: 📚 #20,255,634: 919 blocks/sec, 23,820 events/sec, 36,200 keys/sec
2024-04-06T04:05:26.723002Z INFO hybrid_indexer::substrate: 📚 #20,253,808: 729 blocks/sec, 35,103 events/sec, 61,190 keys/sec
2024-04-06T04:05:27.747424Z INFO hybrid_indexer::substrate: ✨ #20,277,826: 13 events, 13 keys
2024-04-06T04:05:28.227545Z INFO hybrid_indexer::substrate: 📚 #20,252,081: 1,159 blocks/sec, 17,027 events/sec, 19,702 keys/sec
Hybrid Substrate needs to connect to a full node with --state-pruning
set to archive-canonical
or archive
. Typically, it is necessary to index a node that you control. This is because the indexer will require the node to consume resources far beyond what a public RPC endpoint is prepared to provide.
Roadmap
- support event schema versioning
- decentralized querying
- index head blocks using queue
- library support for more languages
Build Indexer
To learn how to build an indexer for a Substrate chain with Hybrid Indexer it is best to examine Polkadot Indexer.
Follow the subxt instructions to download the metadata from the chain to be indexed:
subxt metadata --url <URL> > metadata.scale
It's generally a good idea to have the metadata in a separate workspace from the indexer. This avoids lengthly rebuilds during development.
'Cargo.toml'
[workspace]
resolver = "2"
members = [
"metadata",
"indexer",
]
'metadata/Cargo.toml'
[package]
name = "metadata"
version = "0.1.0"
edition = "2021"
[dependencies]
subxt = "0.34"
metadata/src/lib/rs
#![allow(unused)] fn main() { #[subxt::subxt(runtime_metadata_path = "metadata.scale")] pub mod metadata {} }
Copy and modify the boilerplate Cargo.toml
and main.rs
from polkadot-indexer.
Create a struct that implements the subxt Config trait. For example:
#![allow(unused)] fn main() { pub enum MyChainConfig {} impl Config for MyChainConfig { type Hash = H256; type AccountId = AccountId32; type Address = MultiAddress<Self::AccountId, u32>; type Signature = MultiSignature; type Hasher = BlakeTwo256; type Header = SubstrateHeader<u32, BlakeTwo256>; type ExtrinsicParams = SubstrateExtrinsicParams<Self>; } }
Each chain to be indexed by the indexer implements the RuntimeIndexer, IndexKey and IndexTrees traits. For example, look at PolkadotIndexer, ChainKey and ChainTrees.
Every event to be indexed is passed to process_event()
. It needs to determine which pallet the event is from and use the correct macro to index it. Macros for Substrate pallets are provided by hybrid-indexer. Additional pallet macros can be provided.
#![allow(unused)] fn main() { #[derive(Clone, Debug)] pub struct MyChainTrees { pub my_index: Tree, } impl IndexTrees for MyChainTrees { fn open(db: &Db) -> Result<Self, sled::Error> { Ok(MyChainTrees { my_index: db.open_tree(b"my_index")?, }) } fn flush(&self) -> Result<(), sled::Error> { self.my_index.flush()?; Ok(()) } } }
#![allow(unused)] fn main() { #[derive(Serialize, Deserialize, Clone, Debug, Eq, PartialEq, Hash)] #[serde(tag = "type", content = "value")] pub enum MyChainKey { MyKey(u32), } impl IndexKey for MyChainKey { type ChainTrees = MyChainTrees; fn write_db_key( &self, trees: &ChainTrees, block_number: u32, event_index: u16, ) -> Result<(), sled::Error> { let block_number = block_number.into(); let event_index = event_index.into(); match self { MyChainKey::MyKey(my_key) => { let key = U32Key { key: (*my_key).into(), block_number, event_index, }; trees.my_index.insert(key.as_bytes(), &[])? } }; Ok(()) } fn get_key_events(&self, trees: &ChainTrees) -> Vec<Event> { match self { MyChainKey::MyKey(my_key) => { get_events_u32(&trees.my_index, *my_key) } } } } }
#![allow(unused)] fn main() { pub struct PolkadotIndexer; impl hybrid_indexer::shared::RuntimeIndexer for MyChainIndexer { type RuntimeConfig = MyChainConfig; type ChainKey = MyChainKey; fn get_name() -> &'static str { "mychain" } fn get_genesis_hash() -> <Self::RuntimeConfig as subxt::Config>::Hash { hex!["91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3"].into() } fn get_versions() -> &'static [u32] { &[0] } fn get_default_url() -> &'static str { "wss://rpc.mychain.io:443" } fn process_event( indexer: &hybrid_indexer::substrate::Indexer<Self>, block_number: u32, event_index: u32, event: subxt::events::EventDetails<Self::RuntimeConfig>, ) -> Result<(), subxt::Error> { match event.as_root_event::<Event>()? { // Substrate pallets. Event::Balances(event) => { index_balances_event![BalancesEvent, event, indexer, block_number, event_index] } // MyChain pallets. Event::MyPallet(event) => { index_mypallet_event![MyPalletEvent, event, indexer, block_number, event_index] } _ => {} }; Ok(()) } }
Custom pallet indexer macros look something like this:
#![allow(unused)] fn main() { #[macro_export] macro_rules! index_mypallet_event { ($event_enum: ty, $event: ident, $indexer: ident, $block_number: ident, $event_index: ident) => { match $event { <$event_enum>::MyEvent { who, my_key.. } => { $indexer.index_event( Key::Substrate(SubstrateKey::AccountId(Bytes32(who.0))), $block_number, $event_index, )?; $indexer.index_event( Key::Chain(MyChainKey::MyKey(my_key)), $block_number, $event_index, )?; 2 } } }; } }
Examine the API documentation to help determine how to query the indexer.
JSON-RPC API
Types
Bytes32HexString
"0x0000000000000000000000000000000000000000000000000000000000000000"
Event
{
"blockNumber": Number,
"eventIndex": Number
}
EventMeta
{
"index": Number,
"name": String
}
Variant
{
"index": Number,
"name": String,
"events": EventMeta
}
Span
{
"start": Number,
"end": Number
}
SubstrateKey
{
"type": "AccountId",
"value": Bytes32HexString
}
{
"type": "AccountIndex",
"value": Number
}
{
"type": "BountyIndex",
"value": Number
}
{
"type": "EraIndex",
"value": Number
}
{
"type": "MessageId",
"value": Bytes32HexString
}
{
"type": "PoolId",
"value": Number
}
{
"type": "PreimageHash",
"value": Bytes32HexString
}
{
"type": "ProposalHash",
"value": Bytes32HexString
}
{
"type": "ProposalIndex",
"value": Number
}
{
"type": "RefIndex",
"value": Number
}
{
"type": "RegistrarIndex",
"value": Number
}
{
"type": "SessionIndex",
"value": Number
}
{
"type": "TipHash",
"value": Bytes32HexString
}
ChainKey
Chain specific keys defined by chain indexer implementation.
Key
{
"type": "Variant",
"value": [Number, Number]
}
{
"type": "Substrate",
"value": SubstrateKey
}
{
"type": "Chain",
"value": ChainKey
}
Request
Status
{
"type": "Status"
}
Subscribe Status
{
"type": "SubscribeStatus"
}
Unsubscribe Status
{
"type": "UnsubscribeStatus"
}
Variants
{
"type": "Variants"
}
Get Events
{
"type": "GetEvents",
"key": Key
}
Subscribe Events
{
"type": "SubscribeEvents",
"key": Key
}
Unsubscribe Events
{
"type": "UnsubscribeEvents",
"key": Key
}
Size On Disk
{
"type": "SizeOnDisk"
}
Response
Status
{
"type": "Status",
"data": [Span, ...]
}
Variants
{
"type": "Variants",
"data": [Variant, ...]
}
Events
{
"type": "Events",
"key": Key,
"data": [Event, ...]
}
Subscribed
{
"type": "Subscribed"
}
Unsubscribed
{
"type": "Unsubscribed"
}
Size On Disk
{
"type": "SizeOnDisk",
"data": Number
}