Brim Data

Easy Deserialization of Go Interface Values

Unmarshal Go interface values using super-structured data
Author Steven McCanne

Have you ever gotten frustrated deserializing JSON into a Go interface value? Turns out you’re not the only one!

If you know what I’m talking about, you can cut to the chase, but if you are a mortal being like most of us, and you find Go interfaces a challenge to marshal, please read on.

The Problem

In the Go world, serialization and deserialization is accomplished with the Marshal and Unmarshal design pattern. While Go’s json package does a wonderful job marshaling interface values into JSON, there is an odd asymmetry when it comes to unmarshaling the very same data back into the very same interface value.

Why is this?

Let’s look at a concrete example. We’ll follow the patterns used in Greg Trowbridge’s article on this topic, where he first creates a Plant type and an Animal type, which both implement a Thing interface:

type Thing interface {
	Color() string
}

type Plant struct {
	MyColor string
}

func (p *Plant) Color() string { return p.MyColor }

type Animal struct {
	MyColor string
}

func (a *Animal) Color() string { return a.MyColor }

With this pattern, let’s make a Plant and marshal it into JSON:

p := Plant{MyColor: "green"}
byteSlice, _ := json.Marshal(p)
fmt.Println(string(byteSlice))

This of course prints out

{"MyColor":"green"}

You can try out this example live in the Go Playground. Just hit the Run button.

Marshaling Interfaces

Okay, we successively marshaled a Go struct, but what about an interface value? Fortunately, the marshaling logic here will work just fine for our Thing type. Suppose we get an interface value from somewhere like this:

func Make(which, color string) Thing {
	switch which {
	case "plant":
		return &Plant{color}
	case "animal":
		return &Animal{color}
	default:
		return nil
	}
}

And now, if we marshal a Thing, like so,

flamingo := Make("animal", "pink")
flamingoJSON, _ := json.Marshal(flamingo)
fmt.Println(string(flamingoJSON))

we’ll get the following output (try it):

{"MyColor":"pink"}

Perfect. json.Marshal followed the interface value to its implementation and output exactly what we wanted.

Now, let’s try to unmarshal the JSON back into an interface type, e.g., (try it):

var thing Thing
err := json.Unmarshal(flamingoJSON, &thing)
if err != nil {
	fmt.Println(err)
} else {
	fmt.Println(thing.Color())
}

Oh no, we get an error that looks like this:

json: cannot unmarshal object into Go value of type main.Thing

Why can’t Go’s json package unmarshal this object? That encoding is exactly what the Marshal function produced when we marshaled the flamingo object in the first place.

What gives?

Trowbridge boils this down to a very simple observation: what if we looked at the two JSON serializations from Go’s perspective?

To do so, here is a snippet to serialize a flamingo and a rose (try it):

rose := Make("plant", "red")
roseJSON, _ := json.Marshal(rose)
fmt.Println(string(roseJSON))
flamingo := Make("animal", "pink")
flamingoJSON, _ := json.Marshal(flamingo)
fmt.Println(string(flamingoJSON))

And, we get this output:

{"MyColor":"red"}
{"MyColor":"pink"}

Now the problem is clear: the JSON output here is exactly the same for both the Plant and the Animal. How is Go supposed to figure out which is which?

The fundamental issue here is that neither the plant-ness of the rose nor the animal-ness of the flamingo made it into the JSON output. Alas, you say, the solution is just a small matter of programming: add a plant/animal type field to the JSON output and you’re golden.

In fact, Go’s json package makes this approach all quite feasible with its custom Unmarshaler interface. Trowbridge walks you through how to do this, and after a number of non-obvious steps (especially if you’re new to Go) and a hundred or so lines of code, he declares victory at the end of the article: “YOU MADE IT!”

Is this the best we’ve got? Surely there’s got to be a better way.

Enter ZSON

What if there were a data format like JSON but it could reflect the Go types into its serialized representation so the plant-ness and animal-ness from our example above could be handled automatically?

It turns out there is a new kind of data called super-structured data that can carry the information needed to solve our problem here.

We won’t go into all the gory details of super-structured data but suffice it to say it provides a comprehensive type system that can reliably represent any serializable Go type and includes type definitions and first-class type values so it can carry the type names of Go values into its serialized form.

To explore this concept, we’ll use the ZSON form of super-structured data. ZSON is a superset of JSON so it will look familiar, but it carries the full power of the super-structured data model.

Armed with ZSON, we can serialize the flamingo and rose with the super-structured type information (try it):

rose := Make("plant", "red")
flamingo := Make("animal", "pink")
m := zson.NewMarshaler()
m.Decorate(zson.StyleSimple)
roseZSON, _ := m.Marshal(rose)
fmt.Println(roseZSON)
flamingoZSON, _ := m.Marshal(flamingo)
fmt.Println(flamingoZSON)

And, we get this output:

{MyColor:"red"}(=Plant)
{MyColor:"pink"}(=Animal)

As you can see, the plant-ness and animal-ness of the Thing is noted in the ZSON output!

The parenthesized strings at the end of each line are called type decorators. ZSON has a fully-fledged type system and these decorators may be embedded throughout complex and highly nested ZSON values to provide precise type semantics.

Mind you, these type names look like Go-specific type names but there is nothing language-specific in the ZSON type name. It can be any string, but it just so happens the ZSON marshaler chooses type names to match the Go types being serialized.

Given the type information in the ZSON output, we should be able to unmarshal the ZSON back into an interface value, right? There’s one little twist. Because Go doesn’t have a way to convert the name of type to a value of that type, you need to help out the ZSON unmarshaler by giving it an example list of values that might be referenced in the ZSON using the Bind method on the unmarshaler. Here’s how this works (try it):

u := zson.NewUnmarshaler()
u.Bind(Animal{}, Plant{})
var thing Thing
if err := u.Unmarshal(flamingoZSON, &thing); err != nil {
	fmt.Println(err)
} else {
	fmt.Println("The flamingo is " + thing.Color())
}
if err := u.Unmarshal(roseZSON, &thing); err != nil {
	fmt.Println(err)
} else {
	fmt.Println("The rose is " + thing.Color())
}

If you run this, you will see the serialized ZSON values are successfully marshaled into the interface variable with the correct underlying concrete types. The output here is:

The flamingo is pink
The rose is red

Just for good measure, you can see here that the type of concrete value is in fact correct (try it):

_, ok := thing.(*Animal)
fmt.Printf("Is the thing an Animal? %t\n", ok)

and the output is

Is the thing an Animal? true

In a nutshell, unmarshaling ZSON into an interface value just works! There’s no need for custom unmarshal methods on every underlying concrete type and no need for lots of glue code with custom maps and copious use of json.RawMessage.

Custom Type Names

You probably noticed in these examples that the ZSON marshaling logic used the exact same type names as the Go program. This can create name conflicts since the same type name may appear in different Go packages (e.g., io.Writer versus bufio.Writer).

To cope with this, the ZSON marshaler lets you specify more detailed types by providing a zson.TypeStyle to the marshaler’s Decorate method. You can use package names with zson.StylePackage, e.g., by changing

m.Decorate(zson.StyleSimple)

in our example to

m.Decorate(zson.StylePackage)

Running this variation of the code produces the following output (try it):

{MyColor:"red"}(=main.Plant)
{MyColor:"pink"}(=main.Animal)

Here, the Plant and Animal types are defined in the main package so each ZSON type is prefixed with main.

Type names can also be extended to include the absolute import path using zson.StyleFull and even include version numbers in the type path to provide a mechanism for versioning the “schema” of these serialized messages.

The NamedBindings method on the marshaler establishes a binding between the chosen ZSON type name and the Go data type, so we can add an option like this to our marshaling logic:

m.NamedBindings([]zson.Binding{{"CustomPlant.v0", Plant{}}, {"CustomAnimal.v0", Animal{}}})

Running this variation gives the following output (try it):

{MyColor:"red"}(=CustomPlant.v0)
{MyColor:"pink"}(=CustomAnimal.v0)

For example, suppose you enhanced the Animal and Plant implementations so the various instances of Things evolve. You could imagine unmarshaling multiple versions of the Thing, with different ZSON version numbers, formatted into different concrete types all behind a single Go interface value.

This is obviously a form of schema versioning, but here there’s no need to define explicit schemas as the schemas are simply implied by the Go types. Easier, don’t you think?

Higher Fidelity Types

Another big advantage of ZSON over JSON is the high fidelity provided by its super-structured type system. For example, marshaling the following Go value to JSON:

type TraceRecord struct {
	Name string
	Host net.IP
	Hops uint8
}

func main() {
	b, _ := json.Marshal([]TraceRecord{
		{"google.com", net.ParseIP("142.250.72.142"), 8},
		{"yahoo.com", net.ParseIP("74.6.231.20"), 13},
		{"facebook.com", net.ParseIP("31.13.70.36"), 8},
	})
	fmt.Println(string(b))
}

produces this output (try it):

[
  {
    "Name": "google.com",
    "Host": "142.250.72.142",
    "Hops": 8
  },
  {
    "Name": "yahoo.com",
    "Host": "74.6.231.20",
    "Hops": 13
  },
  {
    "Name": "facebook.com",
    "Host": "31.13.70.36",
    "Hops": 8
  }
]

While the Name field is preserved as a string, the Host field is changed from an IP address to a string and the Hops field is changed from an unsigned 8-bit integer to a JSON 64-bit floating point number. This loss of type fidelity is fundamental to the simplicity of JSON. Most of us are all too aware of this challenge, which often leads to custom serialization code to format non-standard types into strings so they can fit into JSON.

Using ZSON, however, we can preserve all of the type information from the original Go data structure in the serialized output without doing anything special. For example, we can change the marshaling logic from above as follows:

func main() {
	m := zson.NewMarshaler()
	m.Decorate(zson.StyleSimple)
	b, _ := m.Marshal([]TraceRecord{
		{"google.com", net.ParseIP("142.250.72.142"), 8},
		{"yahoo.com", net.ParseIP("74.6.231.20"), 13},
		{"facebook.com", net.ParseIP("31.13.70.36"), 8},
	})
	fmt.Println(string(b))
}

and we get the following fully-typed ZSON output (try it):

[
    {
        Name: "google.com",
        Host: 142.250.72.142,
        Hops: 8 (uint8)
    } (=TraceRecord),
    {
        Name: "yahoo.com",
        Host: 74.6.231.20,
        Hops: 13
    } (TraceRecord),
    {
        Name: "facebook.com",
        Host: 31.13.70.36,
        Hops: 8
    } (TraceRecord)
]

Here, you can see types are preserved: Host is a native ZSON IP address and the Hops field is a uint8.

This is nice because if you have marshaled data like this hanging around, say in a file events.zson, all of the original application type information is preserved and we can use tooling like zq to interrogate it in interesting ways. For example, this query groups the unique IP address by each Hops value:

$ zq 'over this | IPs:=union(Host) by Hops' events.zson
{
    Hops: 8 (uint8),
    IPs: |[
        31.13.70.36,
        142.250.72.142
    ]|
}
{
    Hops: 13 (uint8),
    IPs: |[
        74.6.231.20
    ]|
}

(Note that the syntax |[...]| indicates a Zed set type as Zed’s union aggregate function produces a set as output.)

The Zed Project

So where did all this stuff come from?

Turns out we’ve been pretty busy the past few years working away on the open-source Zed Project with the goal of creating an easier way to manage, query, search, and transform data.

The Zed data model at the heart of the Zed system keeps surprising us in serendipitous ways. We didn’t set out to create a simpler way to serialize Go interfaces as described here; rather, we realized along the way that Zed had some powerful properties for doing so.

In fact, the Zed system itself uses these marshaling techniques for serializing its internal data structures throughout the Zed lake format.

For example, the Zed lake’s Git-like commit objects contain data actions that are serialized using the marshaling techniques described above. A key difference here though is that the Zed lake uses the ZNG format for storing Zed data rather than the text-based ZSON format. Because ZNG is binary and compact, it is far more efficient than ZSON.

These actions in a Zed lake can easily be queried using a metadata query on a lake. As an example, you can create a lake using the zed CLI command, load some dummy data into it, and then query the commit history with a meta-query as follows:

export ZED_LAKE=./test
zed init
zed create POOL
zed use POOL
echo '{a:[1,2,3]}' | zed load -
echo '{b:4}' | zed load -
echo '{c:[3,4,5]}' | zed load -

Now if you run the zed log command, you can see the Git-like commit history:

$ zed log
commit 2Brb40yTAu3jLdigRf5CKFGN2ee (HEAD -> main)
Author: mccanne@bay.lan
Date:   2022-07-12T23:02:35Z

    loaded 1 data object

    2Brb3yO9eJxXISWJljNdLWphhAP 1 record in 21 data bytes

commit 2Brb3uKM4WrpoQWVIMNkijWxQMS
Author: mccanne@bay.lan
Date:   2022-07-12T23:02:34Z

    loaded 1 data object

    2Brb3owQirVdkqqtTmlqJBUcM1B 1 record in 14 data bytes

commit 2Brb3rI4gSZ2jfnmCY0I4sEBCLl
Author: mccanne@bay.lan
Date:   2022-07-12T23:02:34Z

    loaded 1 data object

    2Brb3qZB66zPUwM02DnGwWQSp2W 1 record in 21 data bytes

But here’s where it gets interesting: under the hood, the zed log command simply runs a metadata query on the lake and the query code just marshals the Zed data in the lake’s log objects using the techniques described above.

In other words, the Zed system uses Zed marshaling inside of itself.

If you run this meta-query on data we loaded and ask the zed command to pretty-print the output as ZSON using -Z, you’ll get output that looks like this:

$ zed query -Z "from POOL@main:rawlog"
{
    id: 0x0f5baf8a800e9874e86f1e6913a4af90bb48b03d (=ksuid.KSUID),
    parent: 0x0000000000000000000000000000000000000000 (ksuid.KSUID),
    retries: 0 (uint8),
    author: "mccanne@bay.lan",
    date: 2022-07-12T23:02:34.780087Z,
    message: "loaded 1 data object\n\n  2Brb3qZB66zPUwM02DnGwWQSp2W 1 record in 21 data bytes\n",
    meta: null
} (=commits.Commit)
{
    commit: 0x0f5baf8a800e9874e86f1e6913a4af90bb48b03d (=ksuid.KSUID),
    object: {
        id: 0x0f5baf8a68439d3b094fa2dfb2f99989272424a8 (ksuid.KSUID),
        meta: {
            first: null,
            last: null,
            count: 1 (uint64),
            size: 21
        } (=data.Meta)
    } (=data.Object)
} (=commits.Add)
{
    id: 0x0f5baf8ae3d60b1010e722d80bfea31956c0ea60 (=ksuid.KSUID),
    parent: 0x0f5baf8a800e9874e86f1e6913a4af90bb48b03d (ksuid.KSUID),
    retries: 0 (uint8),
    author: "mccanne@bay.lan",
    date: 2022-07-12T23:02:34.804476Z,
    message: "loaded 1 data object\n\n  2Brb3owQirVdkqqtTmlqJBUcM1B 1 record in 14 data bytes\n",
    meta: null
} (=commits.Commit)
{
    commit: 0x0f5baf8ae3d60b1010e722d80bfea31956c0ea60 (=ksuid.KSUID),
    object: {
        id: 0x0f5baf8a32df6f8e4c0a8ac10eddb3d7f75acdb1 (ksuid.KSUID),
        meta: {
            first: null,
            last: null,
            count: 1 (uint64),
            size: 14
        } (=data.Meta)
    } (=data.Object)
} (=commits.Add)
{
    id: 0x0f5baf8bbe3c6578b076c98d037f4657fc5fa548 (=ksuid.KSUID),
    parent: 0x0f5baf8ae3d60b1010e722d80bfea31956c0ea60 (ksuid.KSUID),
    retries: 0 (uint8),
    author: "mccanne@bay.lan",
    date: 2022-07-12T23:02:35.299159Z,
    message: "loaded 1 data object\n\n  2Brb3yO9eJxXISWJljNdLWphhAP 1 record in 21 data bytes\n",
    meta: null
} (=commits.Commit)
{
    commit: 0x0f5baf8bbe3c6578b076c98d037f4657fc5fa548 (=ksuid.KSUID),
    object: {
        id: 0x0f5baf8b6946f2c8014ce409bf8e01cd3ce201c9 (ksuid.KSUID),
        meta: {
            first: null,
            last: null,
            count: 1 (uint64),
            size: 21
        } (=data.Meta)
    } (=data.Object)
} (=commits.Add)

And like the events.json example above, we can easily manipulate this output as it’s just self-describing Zed data. You can apply whatever data manipulation queries you’d like to the resulting output using our Zed language, e.g., computing the average data object size:

$ zed query 'from POOL@main:rawlog | avg(object.meta.size)'
{avg:18.666666666666668}

Or you can select out all commit dates:

$ zed query 'from POOL@main:rawlog | has(date) | yield date'
2022-07-13T02:08:34.521281Z
2022-07-13T02:08:34.54489Z
2022-07-13T02:08:35.254674Z

Or you can even collect these dates into an array and output the result as an array of strings and pretty-print the JSON array using jq:

$ zed query -f json 'from POOL@main:rawlog | has(date) | dates:=collect(date) | yield dates' | jq
[
  "2022-07-12T23:02:34.780087Z",
  "2022-07-12T23:02:34.804476Z",
  "2022-07-12T23:02:35.299159Z"
]

Wrapping Up

So you may wonder, is all of this worth it? Why bother figuring out how to use this obscure new data format just to unmarshal JSON into Go interface values?

Of course, you can just keep doing things the way things have always been done. JSON is 20 years old and relational tables are 50 years old and going between these worlds is a big pain in the neck, especially when you want an analytics system or data warehouse to analyze the JSON events and logs that your app infrastructure produces.

We think Zed could very well turn out to be a lot easier. Could it be that this approach of marshaling Go structs straight to Zed then streaming batches of self-describing, marshaled values straight into a Zed lake is just so dead simple?

That all said, our project is still early. We don’t yet have broad cross-language support. We’re working on a vector engine for high-performance queries at scale. We’re actively improving our search design.

These improvements and more will show up in our GitHub repos in the coming months.

But rest assured, many of the Zed system pieces are production quality and useful. Feel free to kick the tires. You can:

We love working with all our users to help guide us to the best ways of solving your real, everyday problems. Give us a holler and we look forward to chatting.