iroh-blobs 0.90 - New Request Types and Features
by rklaehnNew Request Types and New Features
In our previous blog post we talked a lot about the generic API design of the new iroh-blobs
. Now, let's take a look at the new features!
Previously, iroh-blobs
supported just a single request type - Get
. Get allows to stream a blob, ranges of a blob, or an entire sequence of blobs or ranges thereof. It is pretty powerful, but especially the part about streaming hash sequences can also be confusing.
Protocol Additions and Changes
iroh-blobs
0.90
adds several new request types.
GetMany
For the case where you want to fetch several blobs in a single request, but it’s possible the provide side doesn't necessarily group those blobs into a HashSeq
, there is a new request type GetMany
. This allows you to just specify a set of hashes, and additionally allows you to specify ranges for each of the hashes you want to download.
GetMany
is useful when dealing with a large number of small blobs. If you want to download a few large blobs, running multiple Get
requests in parallel is completely fine because QUIC has very cheap independent streams.
An important difference between GetMany
and multiple Get
requests is that GetMany
will proceed sequentially and abort the request as soon as the provider does not have the required data, while multiple parallel Get
requests will succeed or fail independently.
GetMany
uses a vector of hashes even though in most cases this will be a set of hashes. This allows the user to control the order in which the hashes are requested. The builder uses a set internally however, so multiple ranges for the same hash will be combined when using the builder.
Here is an example how to create a GetMany request using the builder:
let request = GetManyRequest::builder()
.hash(hash1, ChunkRanges::all())
.hash(hash2, ChunkRanges::empty()) // will be ignored!
.hash(hash3, ChunkRanges::bytes(0..100))
.build();
Push
The Push
request is the reverse of a Get
request. Instead of requesting a blob by hash, you send a description of what you are going to send, followed by the bao encoded data.
Push
requests are useful for uploading data. They require access control so people can't push arbitrary data to your node.
Push requests are most easily created by creating a Get
PushMany
PushMany
is not implemented yet, but will be before 1.0
. It is the push version of GetMany
.
PushMany
requests will require access control just like Push
requests.
Observe
The Observe
request allows you to get information about what data a remote node has for a hash. The response to an Observe
request is a stream of bitfields, where the first bitfield is the current availability for a blob and all subsequent updates are changes to the bitfield.
New API Features
Observing a blob
There is a new API for observing the Bitfield
of a blob. AnObserve
request returns a stream of bitfields, where each bitfield represents the current chunk availability of a blob. The stream is wrapped into an ObserveProgress
struct and follows the same patterns as the other progress structs (as described in our previous blobs blog post), so you can just use observe().await
to get the current bitfield.
See the bitfields section for more info about bitfields.
Restructured Remote API
The API to interact with remote nodes is split in two namespaces, remote
and downloader
.
Remote
Remote is for executing individual requests, which, due to the fact that blobs is a simple request/response protocol, always interacts with a single remote node.
In the remote
module, there is a distinction between executing a request, e.g. execute_get, which just executes the request and stores the resulting data locally without taking the local data into account, and more complex methods like fetch
which will only download data which is not present locally.
There is a local
method to get the locally available data for a Blob
or HashSeq
, which is used internally by fetch
. Whether remote
is the right place for this method, given that it is a purely local operation, is up for debate.
Downloader
If you want to do complex requests that download data from multiple nodes at once, there is the Downloader
. Unlike the aforementioned structs, this is not just a namespace but a stateful object that contains an iroh
endpoint and a connection pool.
The downloader allows executing requests where you just specify what you want to download (either just a hash or a complex request) via the SupportedRequest
trait, and from where you want to download using the ContentDiscovery
trait, that allows to specify a content discovery strategy.
The main user-facing method of the downloader is download
, which also has an "overload" download_with_opts
that allows specifying additional parameters. Currently, the only options is a split strategy.
The SplitStrategy
controls if the downloader is allowed to split requests into multiple requests to parallelize the download, or if it is supposed to proceed strictly sequentially. In the future, there will be more options for specifying the level of parallelism in case of a split.
SupportedRequest
SupportedRequest
is implemented for the two get request types Get
and GetMany
, as well as for an individual hash or a HashAndFormat
. You can implement it for anything that can be converted to either a Get
or GetMany
request.
ContentDiscovery
The ContentDiscovery
trait has a single method called find_providers
that returns a stream of providers. This can be either a finite stream, in which case the downloader will try each node in sequence and give up if the request can not be completed, or an infinite stream of possibly repeated node ids, in which case the downloader will try until success, or until the DownloadProgress
object which acts as a handle for the request is dropped.
One important fact about content discovery is that it only works with node ids. The downloader requires node discovery to be enabled in the iroh
endpoint, either via one of the built-in node discovery methods (n0 DNS, mDNS or mainline DHT) or using the StaticProvider
in the iroh
discovery system if you want to manage the data yourself.
ContentDiscovery
is implemented for any sequence of things that can be converted to iroh::NodeId
. So you can, for example, pass just a Vec<NodeId>
or a HashSet<NodeId>
. The order of the elements in the sequence controls the order in which the different nodes will be tried, so it is not arbitrary.
While the SupportedRequest
trait exists just to make the API more convenient to use, the ContentDiscovery
trait is intended as a way to extend content discovery to more generic mechanisms. E.g. the content tracker protocol and implementation that exists in iroh-experiments can be wrapped in a struct that implements ContentDiscovery
to allow content discovery via a tracker.
Provider Events and Access Control
The provider side now has more detailed yet simplified events for informing the provider of ongoing operations. These events - which unlike many other event streams - can only be consumed in-process and also contain provisions for access control.
Connections can be controlled on a per node-id basis, and potentially dangerous requests such as Push
can also be controlled on a per-request basis. For example, you can allow a certain node to push a certain hash to you, but nothing else.
The exact shape of this API might change in the future. For instance, it would be useful to also have access control for Get
requests, as has already been requested by users. But, we also don't want to slow down the very common case where Get
is unrestricted.
None of the hooks that exist now will be removed. If anything, there will be more fine-grained control before 1.0
.
Batch Add vs Non-Batch Add.
All operations that add data to the store can be performed either within a Batch or globally.
When adding data within a batch, the return type will be a TempTag
, and it will be your responsibility to either create a persistent Tag
or prevent the data from being garbage collected in some other way. Batches are useful for adding a large number of items to a hash sequence and then creating a single persistent tag for the hash sequence.
When adding data without a batch, the default behaviour will be to create a persistent tag for every add operation. This means that your data is safe, but it can also lead to a large number of tags being created.
You can customize the behaviour by using different functions on AddProgress
, such as assigning a named tag or opting out of tag creation with temp_tag.
Bitfields
Bitfields are the most notable reason for the rewrite of the file system based store. iroh-blobs
0.35
only kept track of partial blobs in a coarse way: by computing the missing ranges from the bao outboard and the file size. This is sufficient for use cases like sendme or other use cases where data is always sequentially written, so any interruption will lead to a partial blob with the first x chunks of data being complete.
The new store also keeps track of gaps, so it requires an additional bitfield file per incomplete blob. Keeping track of available ranges is also what enables the Observe
request.
Bitfield files will be lazily recomputed from the data and the outboard when first interacting with a blob, so they are ephemeral data. Recomputing the bitfield can be somewhat expensive for extremely large blobs though.
Wrapping Up
With the 0.90
release, iroh-blobs
adds features that allow for a smoother more extendable API and better observability. Whether you're fetching individual blobs, coordinating large transfers, observing availability, or pushing data securely, the new request types and API changes are designed to make it easier and more powerful.
If you’re building with iroh-blobs
, we’d love to hear what you’re trying to do and where things can be improved. In the meantime, try out the new requests, poke at the new APIs, and let us know what you discover.
To get started, take a look at our docs, dive directly into the code, or chat with us in our discord channel.