Introduction
dipa makes it easy to efficiently delta encode large Rust data structures.
In some applications, the data that you are sending to the client is often almost exactly the same as the data that you last sent to them.
Rather than repeatedly sending nearly identical state objects to a client, an application might calculate what has changed since the last time data was sent and then only send down those changes.
This approach can dramatically reduce both your and your users' bandwidth requirements and network traffic costs.
The process of determining what has changed between two instances of a data structure is known as delta encoding.
Historically, delta encoding code would become more and more difficult to maintain as your application's data structures grew more and more complex.
This made it a tedious optimization reserved for only the most bandwidth sensitive applications, such as networked games.
dipa eliminates the maintainability challenges of efficient delta encoding code by generating all of the code for you.
dipa is designed to generate very tiny diffs by default. In the most sensitive cases where you have application specific knowledge about your data structures that could help you generate even tinier diffs, you can implement the traits for that type yourself and let dipa's derive macro take care of the rest.
Note that dipa does not know anything about networks and has no networking code. It is only focused on encoding deltas, not transmitting them.
Use Cases
You might make use of dipa as the underlying delta compression machinery in any application where you want to reduce the network traffic required to keep clients up to date with state from a server such as:
-
Multiplayer networked games and simulations
-
Real time client side views into server side data
Note that dipa does not know anything about networks and has no networking code. It is only focused on encoding deltas, not transmitting them.
Quick Start
One key feature of dipa is the #[derive(DiffPatch)]
macro that allows you to implement delta compression
without the tedious and hard to maintain process of writing the necessary data structures and logic by hand.
Here's a quick look at dipa in action.
use dipa::DiffPatch; #[derive(DiffPatch)] struct MyStruct { a_field: MyEnum, another_field: i32, one_more: Vec<MyOtherStruct> } #[derive(DiffPatch)] enum MyEnum { A(u8), B(u16, Vec<f32>), C { set: HashSet<u128> }, } #[derive(DiffPatch)] struct MyOtherStruct(u64, BTreeMap<i8, MyEnum>); fn main () { let old = MyStruct { a_field: MyEnum::B(308, vec![1., 2.]), another_field: -10, one_more: vec![MyOtherStruct(7, BTreeMap::new())] }; let new = MyStruct { a_field: MyEnum::A(10), another_field: 650, one_more: vec![MyOtherStruct(567, BTreeMap::new())] }; let diff = old.create_delta_towards(&new); let serialized = bincode::options() .with_varint_encoding() .serialize(&diff) .unwrap(); // ... Pretend we sent the bytes over a network ... let deserialized: <MyStructDiff as dipa::Diffable<'_, MyStructDiff>>::DeltaOwned = bincode::options() .with_varint_encoding() .deserialize(&serialized) .unwrap(); let mut old = old; old.apply_patch(deserialized); // old is now equal to new. }
Using Derive
dipa provides a derive macro to generate implementations of the Diffable
and Patchable
traits for
data structures defined in your crate.
To enable the macro use the derive
feature. Then use #[derive(DiffPatch)]
on types that you want to
be able to delta encode.
# Cargo.toml
# ...
[dependencies]
dipa = { version = "0.x", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
# ...
#![allow(unused_variables)] fn main() { lib.rs use dipa::DiffPatch; #[derive(DiffPatch)] struct MyStruct { field_a: MyEnum, field_b: Vec<f64> } #[derive(DiffPatch)] struct MyEnum { field: Vec<f64> } }
Attributes
There are three categories of attributes that can be used with the derive macro. Container, variant and field attributes.
#![allow(unused_variables)] fn main() { #[derive(DiffPatch)] #[dipa(diff_derives = "Debug, Copy, Clone")] // <-- this is a container attribute struct S { #[dipa(todo_field_attribute_here)] // <-- this is a field attribute f: i32, } #[derive(DiffPatch)] #[dipa(patch_derives = "Debug, Serialize")] // <-- this is a container attribute enum E { #[dipa(todo_variant_attribute_here)] // <-- this is a variant attribute A( #[dipa(todo_field_attribute_here)] // <-- this is a field attribute String ), } }
Container Attributes
diff_derives = "SomeDerive, AnotherDerive"
Used to add #[derive(SomeDerive, AnotherDerive)]'s for the delta encoded diff type that dipa generates for your struct or enum.
This is mainly useful internally so that we can satisfy the necessary trait bounds for using our automatically generated
Diffable::Delta
's' with the DipaImplTester
.
patch_derives = "SomeDerive, AnotherDerive"
Used to add #[derive(SomeDerive, AnotherDerive)]'s for the associated Diffable::DeltaOwned
type that dipa generates for your struct or enum.
field_batching_strategy = "..."
At this time this can either be set to one_batch
, no_batching
. There are other batching strategies planned such as being able to use multiple enums
each responsible for a few fields, or being able to annotate individual fields in order to indicate which batch of deltas that they should belong to.
-
one_batch
- A single enum will be used as theDiffable::Delta
type. This enum will be able to represent every possible combination of the struct's fields changing. By default this strategy is limited to structs that have 5 fields since as the number of fields grows the number of enum variants grows exponentially. Themax_fields_per_batch
attribute can be used to increase this limit on a per-struct basis.#![allow(unused_variables)] fn main() { #[derive(DiffPatch)] #[dipa(field_batching_strategy = "one_batch")] struct MyStruct { field1: u32, field2: u64 } // Automatically generated delta would look something like this enum MyStructDelta<'d> { NoChange, Change_0(<u32 as dipa::Diffable<'d>::Delta), Change_1(<u64 as dipa::Diffable<'d>::Delta), Change_0_1( <u32 as dipa::Diffable<'d>::Delta, <u64 as dipa::Diffable<'d>::Delta ), } }
-
no_batching
- TheDiffable::Delta
type will be a struct with the same number of fields as your original type. This is useful when have too many fields for theon_batch
strategy. Note that in the future we will introduce other strategies that are likely to better handle large numbers of fields. So this is more of a temporary measure.#![allow(unused_variables)] fn main() { #[derive(DiffPatch)] #[dipa(field_batching_strategy = "no_batching")] struct MyStruct { field1: u32, field2: u64 } // Automatically generated delta would look something like this struct MyStructDelta { field1: <u32 as dipa::Diffable<'d>::Delta, field2: <u64 as dipa::Diffable<'d>::Delta, } }
max_fields_per_batch = 5
This can used when the field_batching_strategy = "one_batch"
.
By default, the one_batch
strategy can only be used with structs or enum that have 5 or fewer fields. The max_fields_per_batch
allows you to increase this limit.
There is a hard cap on how high you can set max_fields_per_batch
can be set in order to prevent you from accidentally causing unreasonable compile times. Values above
7 will lead to a compile time error. In the future we will experiment with different values to see how the compile time trade-offs look.
Space Guarantees
FIXME: Move this documentation to a subchapter within the optimizations chapter where we talk about the one_batch strategy.
Single Byte No Change Rule
Given a struct or enum type that uses #[dipa(delta_strategy = "one_batch")]
, or any type in the standard library
that we've implemented the Diffable
trait for.
Its unchanged delta encoding is guaranteed to be serialize-able to a single byte.
Note that this rule is true for nested data structures. If all nested types properly implement
Diffable
and your root type uses the derive macro, it can be delta encoded down to 1 byte when
it has not changed.
Note that this rule applies to enum fields not variants. i.e. in MyEnum::A(1, 2, 3)
, the
fields are "1, 2, 3". This rule applies to enums with any number f variants.
Your nested types do not need to use the one_batch
strategy. The strategy that they use does
not matter for this rule.
In the following code snippet, all three types can be delta compressed down to a single byte when they haven't changed.
// If this type has not changed its delta can be serialized to // a single byte. #[derive(DiffPatch)] MyStruct { field_a: Vec<f32>, field_b: HashMap<u8, HashSet<MyEnum>>, field_c: AnotherStruct } // If this type has not changed its delta can be serialized to // a single byte. // Remember: this rule applies to fields, not variants. #[derive(DiffPatch, Hash, Eq, Ord)] enum MyEnum { A, B([Vec<u8>; 2], i32), C { some_field: i128 }, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z } // If this type has not changed its delta can be serialized to // a single byte. #[derive(DiffPatch)] struct AnotherStruct(i8, u8, MyEnum); fn main () { let my_struct = make_my_struct(); let other = make_my_struct(); let diff = my_struct.create_diff_towards(&other); let serialized: Vec<u8> = bincode::options() .with_varint_encoding() .serialize(&diff) .unwrap(); // True for all types that properly implement the // Diffable trait. assert_eq!(serialized.len(), 1); } fn make_my_struct () -> MyStruct { let mut hash_set = HashSet::new(); hash_set.inset(MyEnum::A); let mut field_b = HashMap::new(); field_b.insert(200, hash_set); MyStruct { field_a: vec![1., 2., 3., 4., 5.], field_b, field_c: AnotherStruct( -1, 2, MyEnum::B([vec![6, 7], vec![8]], -3) ) } }
Why 5 fields?
The derive macro generates a Diffable::Delta
associated type that is an enum containing every possible combination of changed fields.
This means that there are 2<super>n</super>
enum variants that get generated, where n
is the number of fields in your struct or within an
enum variant. Since this is exponential, it grows quickly.
For now we choose 5
as as starting point for the maximum number of fields that we combine into a single Delta
enum in this way, but in the future we
will experiment with larger numbers in order to find the sweet spot where the number is as high as it can be before the potential impact on compile
times becomes non-negligible.
We will take real applications that use #[derive(DiffPatch)]
on non trivial data structures and benchmark the fresh and incremental compile times as n
increases.
Or, perhaps we'll expose feature flags that allow you to increase n
yourself. Or better yet a procedural macro attribute that can configure the n
value.
Note that your structs can still have more than 5 fields. They'll just diff to (field_count / 5.0).ceil()
bytes when unchanged.
So, currently, if you have 9 fields in a struct it will diff to 2 bytes when unchanged.
Delta Encoding Optimizations
This chapter describes the various delta encoding techniques that the derive macro exposes.
Some of them that are almost always desirable are done by default,
while others that might have trade-offs can be enabled using the dipa(...)
procedural macro attribute.
This chapter is geared towards those that want to save every single byte possible.
If you are just getting started you can safely skip this chapter and come back to it later. Nothing here will impact the design of your application.
Note that optimizations that are not yet implemented can be found in the Roadmap/Optimizations chapter
Performance
dipa wants to be useful in real time applications.
This means that we want the amount of time that it takes to delta encode a data structure should be as low as possible, without sacrificing on the focus of generating incredibly tiny diffs.
The sections in this chapter discuss different diffing performance related topics, and also provide tips on how to maximize diffing performance in your own applications.
Sequences
The Diffable
implementation for standard library sequences such as Vec<T>
and [T; N]
relies on a
dynamic programming solution to the longest common subsequence problem.
This means that delta encoding lists has a time complexity of O(M * N)
, where M
and N
are
the lengths of the before and after lists.
When your lists are small this is unlikely to be a performance bottleneck.
However, if your application deals with lots of large lists and you have benchmarked that delta encoding your lists is a performance bottleneck, consider making use of a changed flag.
Changed Flags
Say you have a real-time application that delta compresses the following type.
#![allow(unused_variables)] fn main() { #[derive(DiffPatch)] struct MyStruct { water_droplets: Vec<WaterDroplet> } #[derive(DiffPatch)] struct WaterDroplet([f32; 3]); }
At first you were simulating the Sahara Desert and things were going smoothly.
Now, however, you're simulating a section of the River Niger and your water_droplets
vector
can sometimes contain over 10,000
droplets.
Its currently an ice age, so these droplets don't move around very much and so your data structure rarely changes.
Because delta encoding lists is O(M * N)
time complexity where M
and N
are the lengths
of the two lists, you'd like to avoid delta encoding this data structure if possible.
#![allow(unused_variables)] fn main() { use dipa::ChangeFlagged; #[derive(DiffPatch)] struct MyStruct { water_droplets: ChangeFlagged(WaterDroplet) } #[derive(DiffPatch)] struct WaterDroplet([f32; 3]); }
Now if MyStruct.water_droplets.changed() == false
the underlying vectors will not be diffed.
NOTE: ChangeFlagged has not been implemented yet. If you need it please open an issue.
Custom Delta Encoding
In most cases, you'll start by using #[derive(DiffPatch)]
on your data structure in order to get yourself up and running quickly.
For complex applications where every byte matters you will eventually run into situations where you can use your knowledge of how your application works in order to generate even smaller deltas for your types.
In this case you can turn to the community to see if there is already a custom implementation that handles what you are after,
or simply implement Diffable
and Patchable
yourself.
#![allow(unused_variables)] fn main() { #[derive(DiffPatch)] struct MyStruct { field_a: Vec<f32>, custom: CustomStruct } // You know things about how you're using CustomStruct that dipa does not, // and you want to use that knowledge to control how it gets delta encoded. // So, you implement Diffable and Patchable yourself. struct CustomStruct { name: String } impl Diffable<'d, CustomStruct> for CustomStruct { type Delta = MyDelta; type DeltaOwned = MyDeltaOwned; // ... } impl Patchable<MyDeltaOwned> for CustomStruct { // ... } #[derive(Serialize)] struct MyDelta<'a>(&'a u128, &'a [u8]); #[derive(Deerialize)] struct MyDeltaOwned(u128, Vec<u8>); }
Implementing Diffable
Here's a look at the Diffable
trait.
#![allow(unused_variables)] fn main() { pub trait Diffable<'s, 'e Other> { type Delta; type DeltaOwned; fn create_delta_towards(&self, end_state: &'p Other) -> CreatedDelta<Self::Delta>; } }
The create_delta_towards
method takes some end value and generates a delta encoding that can later
be used to turn your start value (&self
) into this end value.
Let's walk through a simple implementation of Diffable
for an unsigned 128 bit integer.
#![allow(unused_variables)] fn main() { impl Diffable<'s, 'e i128> for i128 { // Note, the real implementation does not use a reference here since i128 // is Copy. This is simply for illustration. type Delta = Option<&'a i128>; type DeltaOwned = Option<i128>; fn create_delta_towards(&self, end_state: &'p i128) -> CreatedDelta<Self::Delta> { let mut did_change = false; let delta = if self == end_state { None } else { did_change = true; Some(end_state) }; CreatedDelta { delta, did_change, } } } }
The i128
in impl Diffable<'s, 'e i128>
is the type that we are diffing our &self
type against.
A Diffable
implementation can diff a type against any other type. The main use case for this is being able
to diff T
and &T
.
The Diffable::Delta
associated type is the type of your delta encoding. For simple types like integers
the implementations that dipa
provides use Option<T>
as the Delta
.
For more complex types such as Vec<T>
, custom data structures are used for the Delta
.
As we see in type Delta = Option<&'a i28>
a Delta
can borrow values from the Other
type in order to avoid
clones.
NOTE: The real
dipa
implementation fori128
does not use a reference for the delta sincei128
is a small copy type. It simply usesOption<i128>
.
type DeltaOwned
is just the Delta
type with all of the reference types replaced with owned types.
You'll typically serialize to Diffable::Delta
, send the bytes across the wire (potentially after further compressing
them using either a general purpose or custom compression algorithm), and then deserialize them to the
Diffable::DeserializeOwned
type.
The did_change
is used by the #[derive(DiffPatch)]
in the code that it automatically generates for you.
The Delta
that the macro generates for is an enum with different variants that get used based on which
of those fields within the Diffable
type have changed.
Implementing Patchable
Here's a look at the Patchable
trait.
#![allow(unused_variables)] fn main() { pub trait Patchable<P> { fn apply_patch(&mut self, patch: P); }
P
is the delta encoding that was generated by the Diffable.create_delta_towards
method.
Different types will have different patches that need to be applied in different ways.
Let's talk a look at an i128
, one of the more trivial Patchable
types.
#![allow(unused_variables)] fn main() { implement Patchable<Option<i128>> for i128 { fn apply_patch(&mut self, patch: Option<i128>) { if let Some(patch) = patch { *self = patch; } } } }
Here's a slightly more complex example patch function.
#![allow(unused_variables)] fn main() { enum CustomPatchExample { Replace(i128), Add(i16) } // This is not how the real implementation for i128 looks. implement Patchable<CustomPatchExample> for i128 { fn apply_patch(&mut self, patch: CustomPatchExample) { match patch { CustomPatchExample::Replace(new) => { *self = new; } CustomPatchExample::Add(add) => { self += add as i128; } } } } }
Testing Your Implementation
By enabling the impl-tester
feature, you gain access to machinery that makes it easy to test
your custom Diffable
and Patchable
implementations.
The DipaImplTester
will.
-
Delta encode your provided start and end values.
-
Serialize the delta using
bincode
with variable integer encoding. -
Assert that the number of bytes is what you expect.
-
Deserialize the delta back to
DeltaOwned
. -
Apply the delta to your original start value.
-
Ensure that your start value now equals your end value.
# Cargo.toml
[dev-dependencies]
dipa = {version = "0.1", features = ["impl-tester"]}
#![allow(unused_variables)] fn main() { use dipa::CreateDeltaTowardsReturn; struct MyStruct { field: u8 } enum MyDelta<'a> { New(&'a u8) } enum MyDeltaOwned { New(u8) } impl<'s,'e> Diffable<'s, 'e, MyStruct> for MyStruct { type Delta = MyDelta; type DeltaOwned = MyDeltaOwned; fn create_delta_towards (&self, &end_state) -> CreateDeltaTowardsReturn<Self::Delta> { todo!() } } type MyStructPatch<'s, 'e> = <MyStruct as Diffable<'s, 'e, MyStruct>; impl<'s, 'e> Patchable<MyStructPatch<'s, 'e'>> for MyStruct { fn apply_patch (&mut self, patch: MyStructPatch<'s, 'e>) { todo!() } } #[cfg(test)] mod tests { use dipa::DipaImplTester; #[test] fn diff_my_struct_changed() { DipaImplTester { label: Some("Diff MyStruct changed"), start: &mut MyStruct { field: 2 }, end: &MyStruct { field: 5 }, expected_delta: MyDelta::New(&5), expected_serialized_patch_size: 2, expected_did_change: true } .test(); } #[test] fn diff_my_struct_no_change() { DipaImplTester { label: Some("Diff MyStruct no change"), start: &mut MyStruct { field: 2 }, end: &MyStruct { field: 2 }, expected_delta: MyDelta::New(&2), expected_serialized_patch_size: 2, expected_did_change: false } .test(); } } }
Examples
If someone opens an issue with a question we can add an example and/or a book or documentation change that would have answered that question.
-
mkdir examples
-
Add examples in that directory. Examples should be added to the cargo workspace.
Feature Flags
derive
Enables the #[derive(DiffPatch)]
macro.
impl-tester
Exposes the DipaImplTester
utility that can be used to test your custom implementations.
Roadmap
This chapter contains information about future dipa work.
Optimizations
Over time more and more ideas will pop up on how to make the dipa macro perform more delta encoding optimizations.
This section contains a list of optimizations that we plan to tackle at some point.
Booleans to Bitflags
If a struct has two or more boolean fields we can use bitflags to pack the diffs into a single integer.
TODO: Link to corresponding issue
#![allow(unused_variables)] fn main() { #[derive(DiffPatch)] struct Foo { fielda: bool, fieldb: bool, fieldc: Vec<u32>, fieldc: bool } // This is just a quick example delta. Needs more planning and design work. enum FooDelta { NoChange, Changed_0(BitFlagsU8WithOneBitForEachBoolField), Changed_1(DiffForTheVecu32), Changed_0_1(BitFlagsU8WithOneBitForEachBoolField, DiffForTheVecu32) } }
Global Token Information
Say that you have the following data structures:
#![allow(unused_variables)] fn main() { #[derive(DiffPatch)] struct Outer { fielda: bool, fieldb: bool, inner: Inner } #[derive(DiffPatch)] struct Inner { fieldc: bool, fieldd: bool fielde: HashSet<i8> } }
If the derive macro invocations for Outer
knew about the structure of Inner
then we could generate code such
that Outer's
delta looked like this:
enum OuterDelta {
NoChange,
Change_0(Bitflag U8 that is used to represent all four booleans here),
// ...
}
In the above example we are using a single u8
bitflag to encode the changes of all of the bools in both Outer
and Inner
.
We could do further than this as well. If Outer
was given a #[dipa(max_fields_per_batch = 6)]
attribute, the derive
macro could generate an OuterDelta
that was essentially a Delta6
with all of the combinations of the 6 fields.
There are just two quick examples. There should be other things that we can do when creating a type's delta if we know information about its nested types.
Essentially, knowing about other types that are using the macro would allow us to perform optimizations that we could not otherwise.
One way to make this happen would be to make use of the filesystem, but this would mean that whenever you changed your types you would need to build once before all of the right information was cached. No good.
A better way would be if rustc had support for allowing a procedural macro to execute twice. On the first invocation
it would simply process all of the tokens and cache the information to env!(OUT_DIR)
or something like that.
Then on the second invocation it could use this cached information in order to apply optimizations like those described above.
This would need more thought and design and an RFC, but it could be an interesting avenue to pursue.
Internal Design
TODO;
Chapters with information about how dipa's macro works.
This will be useful to
-
Contributors that want to make changes
-
Developers that want to understand how their delta compression works so that they can make informed decisions when optimizing their network traffic.