Introduction

dipa makes it easy to efficiently delta encode large Rust data structures.

In some applications, the data that you are sending to the client is often almost exactly the same as the data that you last sent to them.

Rather than repeatedly sending nearly identical state objects to a client, an application might calculate what has changed since the last time data was sent and then only send down those changes.

This approach can dramatically reduce both your and your users' bandwidth requirements and network traffic costs.

The process of determining what has changed between two instances of a data structure is known as delta encoding.

Historically, delta encoding code would become more and more difficult to maintain as your application's data structures grew more and more complex.

This made it a tedious optimization reserved for only the most bandwidth sensitive applications, such as networked games.

dipa eliminates the maintainability challenges of efficient delta encoding code by generating all of the code for you.

dipa is designed to generate very tiny diffs by default. In the most sensitive cases where you have application specific knowledge about your data structures that could help you generate even tinier diffs, you can implement the traits for that type yourself and let dipa's derive macro take care of the rest.

Note that dipa does not know anything about networks and has no networking code. It is only focused on encoding deltas, not transmitting them.

Use Cases

You might make use of dipa as the underlying delta compression machinery in any application where you want to reduce the network traffic required to keep clients up to date with state from a server such as:

  • Multiplayer networked games and simulations

  • Real time client side views into server side data

Note that dipa does not know anything about networks and has no networking code. It is only focused on encoding deltas, not transmitting them.

Quick Start

One key feature of dipa is the #[derive(DiffPatch)] macro that allows you to implement delta compression without the tedious and hard to maintain process of writing the necessary data structures and logic by hand.

Here's a quick look at dipa in action.

use dipa::DiffPatch;

#[derive(DiffPatch)]
struct MyStruct {
    a_field: MyEnum,
    another_field: i32,
    one_more: Vec<MyOtherStruct>
}

#[derive(DiffPatch)]
enum MyEnum {
    A(u8),
    B(u16, Vec<f32>),
    C { set: HashSet<u128> },
}

#[derive(DiffPatch)]
struct MyOtherStruct(u64, BTreeMap<i8, MyEnum>);

fn main () {
	let old = MyStruct {
	    a_field: MyEnum::B(308, vec![1., 2.]),
	    another_field: -10,
	    one_more: vec![MyOtherStruct(7, BTreeMap::new())]
	};

	let new =  MyStruct {
	    a_field: MyEnum::A(10),
	    another_field: 650,
	    one_more: vec![MyOtherStruct(567, BTreeMap::new())]
	};

    let diff = old.create_delta_towards(&new);

    let serialized = bincode::options()
        .with_varint_encoding()
        .serialize(&diff)
        .unwrap();

    // ... Pretend we sent the bytes over a network ...

    let deserialized:
      <MyStructDiff as dipa::Diffable<'_, MyStructDiff>>::DeltaOwned =
      bincode::options()
        .with_varint_encoding()
        .deserialize(&serialized)
        .unwrap();

	let mut old = old;
    old.apply_patch(deserialized);

    // old is now equal to new.
}

Using Derive

dipa provides a derive macro to generate implementations of the Diffable and Patchable traits for data structures defined in your crate.

To enable the macro use the derive feature. Then use #[derive(DiffPatch)] on types that you want to be able to delta encode.

# Cargo.toml

# ...

[dependencies]
dipa = { version = "0.x", features = ["derive"] }
serde = { version = "1", features = ["derive"] }

# ...

#![allow(unused_variables)]
fn main() {
lib.rs

use dipa::DiffPatch;

#[derive(DiffPatch)]
struct MyStruct {
    field_a: MyEnum,
    field_b: Vec<f64>
}

#[derive(DiffPatch)]
struct MyEnum {
    field: Vec<f64>
}
}

Attributes

There are three categories of attributes that can be used with the derive macro. Container, variant and field attributes.


#![allow(unused_variables)]
fn main() {
#[derive(DiffPatch)]
#[dipa(diff_derives = "Debug, Copy, Clone")]  // <-- this is a container attribute
struct S {
    #[dipa(todo_field_attribute_here)]  // <-- this is a field attribute
    f: i32,
}

#[derive(DiffPatch)]
#[dipa(patch_derives = "Debug, Serialize")]  // <-- this is a container attribute
enum E {
    #[dipa(todo_variant_attribute_here)]  // <-- this is a variant attribute
    A(
      #[dipa(todo_field_attribute_here)]  // <-- this is a field attribute
      String
    ),
}
}

Container Attributes

diff_derives = "SomeDerive, AnotherDerive"

Used to add #[derive(SomeDerive, AnotherDerive)]'s for the delta encoded diff type that dipa generates for your struct or enum.

This is mainly useful internally so that we can satisfy the necessary trait bounds for using our automatically generated Diffable::Delta's' with the DipaImplTester.


patch_derives = "SomeDerive, AnotherDerive"

Used to add #[derive(SomeDerive, AnotherDerive)]'s for the associated Diffable::DeltaOwned type that dipa generates for your struct or enum.


field_batching_strategy = "..."

At this time this can either be set to one_batch, no_batching. There are other batching strategies planned such as being able to use multiple enums each responsible for a few fields, or being able to annotate individual fields in order to indicate which batch of deltas that they should belong to.

  • one_batch - A single enum will be used as the Diffable::Delta type. This enum will be able to represent every possible combination of the struct's fields changing. By default this strategy is limited to structs that have 5 fields since as the number of fields grows the number of enum variants grows exponentially. The max_fields_per_batch attribute can be used to increase this limit on a per-struct basis.

    
    #![allow(unused_variables)]
    fn main() {
    #[derive(DiffPatch)]
    #[dipa(field_batching_strategy = "one_batch")]
    struct MyStruct {
        field1: u32,
        field2: u64
    }
    
    // Automatically generated delta would look something like this
    enum MyStructDelta<'d> {
        NoChange,
        Change_0(<u32 as dipa::Diffable<'d>::Delta),
        Change_1(<u64 as dipa::Diffable<'d>::Delta),
        Change_0_1(
            <u32 as dipa::Diffable<'d>::Delta,
            <u64 as dipa::Diffable<'d>::Delta
        ),
    }
    }
    
  • no_batching - The Diffable::Delta type will be a struct with the same number of fields as your original type. This is useful when have too many fields for the on_batch strategy. Note that in the future we will introduce other strategies that are likely to better handle large numbers of fields. So this is more of a temporary measure.

    
    #![allow(unused_variables)]
    fn main() {
    #[derive(DiffPatch)]
    #[dipa(field_batching_strategy = "no_batching")]
    struct MyStruct {
        field1: u32,
        field2: u64
    }
    
    // Automatically generated delta would look something like this
    struct MyStructDelta {
        field1: <u32 as dipa::Diffable<'d>::Delta,
        field2: <u64 as dipa::Diffable<'d>::Delta,
    }
    }
    

max_fields_per_batch = 5

This can used when the field_batching_strategy = "one_batch".

By default, the one_batch strategy can only be used with structs or enum that have 5 or fewer fields. The max_fields_per_batch allows you to increase this limit.

There is a hard cap on how high you can set max_fields_per_batch can be set in order to prevent you from accidentally causing unreasonable compile times. Values above 7 will lead to a compile time error. In the future we will experiment with different values to see how the compile time trade-offs look.

Space Guarantees

FIXME: Move this documentation to a subchapter within the optimizations chapter where we talk about the one_batch strategy.

Single Byte No Change Rule

Given a struct or enum type that uses #[dipa(delta_strategy = "one_batch")], or any type in the standard library that we've implemented the Diffable trait for.

Its unchanged delta encoding is guaranteed to be serialize-able to a single byte.


Note that this rule is true for nested data structures. If all nested types properly implement Diffable and your root type uses the derive macro, it can be delta encoded down to 1 byte when it has not changed.

Note that this rule applies to enum fields not variants. i.e. in MyEnum::A(1, 2, 3), the fields are "1, 2, 3". This rule applies to enums with any number f variants.

Your nested types do not need to use the one_batch strategy. The strategy that they use does not matter for this rule.


In the following code snippet, all three types can be delta compressed down to a single byte when they haven't changed.

// If this type has not changed its delta can be serialized to
// a single byte.
#[derive(DiffPatch)]
MyStruct {
    field_a: Vec<f32>,
    field_b: HashMap<u8, HashSet<MyEnum>>,
    field_c: AnotherStruct
}

// If this type has not changed its delta can be serialized to
// a single byte.
// Remember: this rule applies to fields, not variants.
#[derive(DiffPatch, Hash, Eq, Ord)]
enum MyEnum {
    A,
    B([Vec<u8>; 2], i32),
    C { some_field: i128 },
    D, E, F, G, H, I, J, K, L, M,
    N, O, P, Q, R, S, T, U, V, W,
    X, Y, Z
}

// If this type has not changed its delta can be serialized to
// a single byte.
#[derive(DiffPatch)]
struct AnotherStruct(i8, u8, MyEnum);

fn main () {
    let my_struct = make_my_struct();
    let other = make_my_struct();

    let diff = my_struct.create_diff_towards(&other);

    let serialized: Vec<u8> = bincode::options()
        .with_varint_encoding()
        .serialize(&diff)
        .unwrap();

    // True for all types that properly implement the
    // Diffable trait.
    assert_eq!(serialized.len(), 1);
}

fn make_my_struct () -> MyStruct {
    let mut hash_set = HashSet::new();
    hash_set.inset(MyEnum::A);

    let mut field_b = HashMap::new();
    field_b.insert(200, hash_set);

	MyStruct {
        field_a: vec![1., 2., 3., 4., 5.],
        field_b,
        field_c: AnotherStruct(
            -1,
            2,
            MyEnum::B([vec![6, 7], vec![8]], -3)
        )
    }
}

Why 5 fields?

The derive macro generates a Diffable::Delta associated type that is an enum containing every possible combination of changed fields.

This means that there are 2<super>n</super> enum variants that get generated, where n is the number of fields in your struct or within an enum variant. Since this is exponential, it grows quickly.

For now we choose 5 as as starting point for the maximum number of fields that we combine into a single Delta enum in this way, but in the future we will experiment with larger numbers in order to find the sweet spot where the number is as high as it can be before the potential impact on compile times becomes non-negligible.

We will take real applications that use #[derive(DiffPatch)] on non trivial data structures and benchmark the fresh and incremental compile times as n increases.

Or, perhaps we'll expose feature flags that allow you to increase n yourself. Or better yet a procedural macro attribute that can configure the n value.

Note that your structs can still have more than 5 fields. They'll just diff to (field_count / 5.0).ceil() bytes when unchanged.

So, currently, if you have 9 fields in a struct it will diff to 2 bytes when unchanged.

Delta Encoding Optimizations

This chapter describes the various delta encoding techniques that the derive macro exposes.

Some of them that are almost always desirable are done by default, while others that might have trade-offs can be enabled using the dipa(...) procedural macro attribute.

This chapter is geared towards those that want to save every single byte possible.

If you are just getting started you can safely skip this chapter and come back to it later. Nothing here will impact the design of your application.

Note that optimizations that are not yet implemented can be found in the Roadmap/Optimizations chapter

Performance

dipa wants to be useful in real time applications.

This means that we want the amount of time that it takes to delta encode a data structure should be as low as possible, without sacrificing on the focus of generating incredibly tiny diffs.

The sections in this chapter discuss different diffing performance related topics, and also provide tips on how to maximize diffing performance in your own applications.

Sequences

The Diffable implementation for standard library sequences such as Vec<T> and [T; N] relies on a dynamic programming solution to the longest common subsequence problem.

This means that delta encoding lists has a time complexity of O(M * N), where M and N are the lengths of the before and after lists.

When your lists are small this is unlikely to be a performance bottleneck.

However, if your application deals with lots of large lists and you have benchmarked that delta encoding your lists is a performance bottleneck, consider making use of a changed flag.

Changed Flags

Say you have a real-time application that delta compresses the following type.


#![allow(unused_variables)]
fn main() {
#[derive(DiffPatch)]
struct MyStruct {
    water_droplets: Vec<WaterDroplet>
}

#[derive(DiffPatch)]
struct WaterDroplet([f32; 3]);
}

At first you were simulating the Sahara Desert and things were going smoothly.

Now, however, you're simulating a section of the River Niger and your water_droplets vector can sometimes contain over 10,000 droplets.

Its currently an ice age, so these droplets don't move around very much and so your data structure rarely changes.

Because delta encoding lists is O(M * N) time complexity where M and N are the lengths of the two lists, you'd like to avoid delta encoding this data structure if possible.


#![allow(unused_variables)]
fn main() {
use dipa::ChangeFlagged;

#[derive(DiffPatch)]
struct MyStruct {
    water_droplets: ChangeFlagged(WaterDroplet)
}

#[derive(DiffPatch)]
struct WaterDroplet([f32; 3]);
}

Now if MyStruct.water_droplets.changed() == false the underlying vectors will not be diffed.

NOTE: ChangeFlagged has not been implemented yet. If you need it please open an issue.

Custom Delta Encoding

In most cases, you'll start by using #[derive(DiffPatch)] on your data structure in order to get yourself up and running quickly.

For complex applications where every byte matters you will eventually run into situations where you can use your knowledge of how your application works in order to generate even smaller deltas for your types.

In this case you can turn to the community to see if there is already a custom implementation that handles what you are after, or simply implement Diffable and Patchable yourself.


#![allow(unused_variables)]
fn main() {
#[derive(DiffPatch)]
struct MyStruct {
   field_a: Vec<f32>,
   custom: CustomStruct
}

// You know things about how you're using CustomStruct that dipa does not,
// and you want to use that knowledge to control how it gets delta encoded.
// So, you implement Diffable and Patchable yourself.
struct CustomStruct { name: String }

impl Diffable<'d, CustomStruct> for CustomStruct {
  type Delta = MyDelta;
  type DeltaOwned = MyDeltaOwned;

   // ...
}

impl Patchable<MyDeltaOwned> for CustomStruct {
   // ...
}

#[derive(Serialize)]
struct MyDelta<'a>(&'a u128, &'a [u8]);

#[derive(Deerialize)]
struct MyDeltaOwned(u128, Vec<u8>);
}

Implementing Diffable

Here's a look at the Diffable trait.


#![allow(unused_variables)]
fn main() {
pub trait Diffable<'s, 'e Other> {
    type Delta;

    type DeltaOwned;

    fn create_delta_towards(&self, end_state: &'p Other)
      -> CreatedDelta<Self::Delta>;
}
}

The create_delta_towards method takes some end value and generates a delta encoding that can later be used to turn your start value (&self) into this end value.

Let's walk through a simple implementation of Diffable for an unsigned 128 bit integer.


#![allow(unused_variables)]
fn main() {
impl Diffable<'s, 'e i128> for i128 {
    // Note, the real implementation does not use a reference here since i128
    // is Copy. This is simply for illustration.
    type Delta = Option<&'a i128>;

    type DeltaOwned = Option<i128>;

    fn create_delta_towards(&self, end_state: &'p i128) ->
		CreatedDelta<Self::Delta> {
	    let mut did_change = false;

		let delta = if self == end_state {
            None
		} else {
		    did_change = true;
		    Some(end_state)
		};

		CreatedDelta {
		    delta,
		    did_change,
		}
    }
}
}

The i128 in impl Diffable<'s, 'e i128> is the type that we are diffing our &self type against.

A Diffable implementation can diff a type against any other type. The main use case for this is being able to diff T and &T.

The Diffable::Delta associated type is the type of your delta encoding. For simple types like integers the implementations that dipa provides use Option<T> as the Delta.

For more complex types such as Vec<T>, custom data structures are used for the Delta.

As we see in type Delta = Option<&'a i28> a Delta can borrow values from the Other type in order to avoid clones.

NOTE: The real dipa implementation for i128 does not use a reference for the delta since i128 is a small copy type. It simply uses Option<i128>.

type DeltaOwned is just the Delta type with all of the reference types replaced with owned types.

You'll typically serialize to Diffable::Delta, send the bytes across the wire (potentially after further compressing them using either a general purpose or custom compression algorithm), and then deserialize them to the Diffable::DeserializeOwned type.

The did_change is used by the #[derive(DiffPatch)] in the code that it automatically generates for you. The Delta that the macro generates for is an enum with different variants that get used based on which of those fields within the Diffable type have changed.

Implementing Patchable

Here's a look at the Patchable trait.


#![allow(unused_variables)]
fn main() {
pub trait Patchable<P> {
    fn apply_patch(&mut self, patch: P);
}

P is the delta encoding that was generated by the Diffable.create_delta_towards method.

Different types will have different patches that need to be applied in different ways.

Let's talk a look at an i128, one of the more trivial Patchable types.


#![allow(unused_variables)]
fn main() {
implement Patchable<Option<i128>> for i128 {
	fn apply_patch(&mut self, patch: Option<i128>) {
		if let Some(patch) = patch {
		    *self = patch;
		}
	}
}
}

Here's a slightly more complex example patch function.


#![allow(unused_variables)]
fn main() {
enum CustomPatchExample {
    Replace(i128),
    Add(i16)
}

// This is not how the real implementation for i128 looks.
implement Patchable<CustomPatchExample> for i128 {
	fn apply_patch(&mut self, patch: CustomPatchExample) {
		match patch {
			CustomPatchExample::Replace(new) => {
			    *self = new;
			}
			CustomPatchExample::Add(add) => {
			    self += add as i128;
			}
		}
	}
}
}

Testing Your Implementation

By enabling the impl-tester feature, you gain access to machinery that makes it easy to test your custom Diffable and Patchable implementations.

The DipaImplTester will.

  1. Delta encode your provided start and end values.

  2. Serialize the delta using bincode with variable integer encoding.

  3. Assert that the number of bytes is what you expect.

  4. Deserialize the delta back to DeltaOwned.

  5. Apply the delta to your original start value.

  6. Ensure that your start value now equals your end value.

# Cargo.toml

[dev-dependencies]
dipa = {version = "0.1", features = ["impl-tester"]}

#![allow(unused_variables)]
fn main() {
use dipa::CreateDeltaTowardsReturn;

struct MyStruct {
    field: u8
}

enum MyDelta<'a> {
   New(&'a u8)
}

enum MyDeltaOwned {
   New(u8)
}

impl<'s,'e> Diffable<'s, 'e, MyStruct> for MyStruct {
    type Delta = MyDelta;
    type DeltaOwned = MyDeltaOwned;

	fn create_delta_towards (&self, &end_state) -> CreateDeltaTowardsReturn<Self::Delta> {
	    todo!()
	}
}

type MyStructPatch<'s, 'e> = <MyStruct as Diffable<'s, 'e, MyStruct>;
impl<'s, 'e> Patchable<MyStructPatch<'s, 'e'>> for MyStruct {
	fn apply_patch (&mut self, patch: MyStructPatch<'s, 'e>) {
	    todo!()
	}
}

#[cfg(test)]
mod tests {
    use dipa::DipaImplTester;

    #[test]
    fn diff_my_struct_changed() {
        DipaImplTester {
            label: Some("Diff MyStruct changed"),
            start: &mut MyStruct { field: 2 },
            end: &MyStruct { field: 5 },
            expected_delta: MyDelta::New(&5),
            expected_serialized_patch_size: 2,
			expected_did_change: true
        }
        .test();
    }

    #[test]
    fn diff_my_struct_no_change() {
        DipaImplTester {
            label: Some("Diff MyStruct no change"),
            start: &mut MyStruct { field: 2 },
            end: &MyStruct { field: 2 },
            expected_delta: MyDelta::New(&2),
            expected_serialized_patch_size: 2,
			expected_did_change: false
        }
        .test();
    }
}
}

Examples

If someone opens an issue with a question we can add an example and/or a book or documentation change that would have answered that question.

  1. mkdir examples

  2. Add examples in that directory. Examples should be added to the cargo workspace.

Feature Flags

  • derive

Enables the #[derive(DiffPatch)] macro.

  • impl-tester

Exposes the DipaImplTester utility that can be used to test your custom implementations.

Roadmap

This chapter contains information about future dipa work.

Optimizations

Over time more and more ideas will pop up on how to make the dipa macro perform more delta encoding optimizations.

This section contains a list of optimizations that we plan to tackle at some point.

Booleans to Bitflags

If a struct has two or more boolean fields we can use bitflags to pack the diffs into a single integer.

TODO: Link to corresponding issue


#![allow(unused_variables)]
fn main() {
#[derive(DiffPatch)]
struct Foo {
    fielda: bool,
    fieldb: bool,
    fieldc: Vec<u32>,
    fieldc: bool
}

// This is just a quick example delta. Needs more planning and design work.
enum FooDelta {
    NoChange,
    Changed_0(BitFlagsU8WithOneBitForEachBoolField),
    Changed_1(DiffForTheVecu32),
    Changed_0_1(BitFlagsU8WithOneBitForEachBoolField, DiffForTheVecu32)
}
}

Global Token Information

Say that you have the following data structures:


#![allow(unused_variables)]
fn main() {
#[derive(DiffPatch)]
struct Outer {
    fielda: bool,
    fieldb: bool,
    inner: Inner
}

#[derive(DiffPatch)]
struct Inner {
    fieldc: bool,
    fieldd: bool
    fielde: HashSet<i8>
}
}

If the derive macro invocations for Outer knew about the structure of Inner then we could generate code such that Outer's delta looked like this:

enum OuterDelta {
    NoChange,
    Change_0(Bitflag U8 that is used to represent all four booleans here),
    // ...
}

In the above example we are using a single u8 bitflag to encode the changes of all of the bools in both Outer and Inner.

We could do further than this as well. If Outer was given a #[dipa(max_fields_per_batch = 6)] attribute, the derive macro could generate an OuterDelta that was essentially a Delta6 with all of the combinations of the 6 fields.

There are just two quick examples. There should be other things that we can do when creating a type's delta if we know information about its nested types.

Essentially, knowing about other types that are using the macro would allow us to perform optimizations that we could not otherwise.

One way to make this happen would be to make use of the filesystem, but this would mean that whenever you changed your types you would need to build once before all of the right information was cached. No good.

A better way would be if rustc had support for allowing a procedural macro to execute twice. On the first invocation it would simply process all of the tokens and cache the information to env!(OUT_DIR) or something like that.

Then on the second invocation it could use this cached information in order to apply optimizations like those described above.

This would need more thought and design and an RFC, but it could be an interesting avenue to pursue.

Internal Design

TODO;

Chapters with information about how dipa's macro works.

This will be useful to

  • Contributors that want to make changes

  • Developers that want to understand how their delta compression works so that they can make informed decisions when optimizing their network traffic.