Have you ever gotten frustrated deserializing JSON into a Go interface value? Turns out you’re not the only one!
If you know what I’m talking about, you can cut to the chase, but if you are a mortal being like most of us, and you find Go interfaces a challenge to marshal, please read on.
The Problem
In the Go world, serialization and deserialization is accomplished with the Marshal
and Unmarshal
design pattern.
While Go’s
json package does a wonderful
job marshaling interface values into JSON, there is an odd asymmetry when it comes
to unmarshaling the very same data back into the very same interface value.
Why is this?
Let’s look at a concrete example. We’ll follow the patterns used in
Greg Trowbridge’s article
on this topic, where he first creates a Plant
type and an Animal
type, which
both implement a Thing
interface:
type Thing interface {
Color() string
}
type Plant struct {
MyColor string
}
func (p *Plant) Color() string { return p.MyColor }
type Animal struct {
MyColor string
}
func (a *Animal) Color() string { return a.MyColor }
With this pattern, let’s make a Plant
and marshal it into JSON:
p := Plant{MyColor: "green"}
byteSlice, _ := json.Marshal(p)
fmt.Println(string(byteSlice))
This of course prints out
{"MyColor":"green"}
You can try out this example live in the Go Playground. Just hit the Run button.
Marshaling Interfaces
Okay, we successively marshaled a Go struct, but what about an interface value?
Fortunately, the marshaling logic here will work just fine for our Thing
type.
Suppose we get an interface value from somewhere like this:
func Make(which, color string) Thing {
switch which {
case "plant":
return &Plant{color}
case "animal":
return &Animal{color}
default:
return nil
}
}
And now, if we marshal a Thing
, like so,
flamingo := Make("animal", "pink")
flamingoJSON, _ := json.Marshal(flamingo)
fmt.Println(string(flamingoJSON))
we’ll get the following output (try it):
{"MyColor":"pink"}
Perfect. json.Marshal
followed the interface value to its implementation
and output exactly what we wanted.
Now, let’s try to unmarshal the JSON back into an interface type, e.g., (try it):
var thing Thing
err := json.Unmarshal(flamingoJSON, &thing)
if err != nil {
fmt.Println(err)
} else {
fmt.Println(thing.Color())
}
Oh no, we get an error that looks like this:
json: cannot unmarshal object into Go value of type main.Thing
Why can’t Go’s json package unmarshal this object? That encoding is exactly what the Marshal function produced when we marshaled the flamingo object in the first place.
What gives?
Trowbridge boils this down to a very simple observation: what if we looked at the two JSON serializations from Go’s perspective?
To do so, here is a snippet to serialize a flamingo and a rose (try it):
rose := Make("plant", "red")
roseJSON, _ := json.Marshal(rose)
fmt.Println(string(roseJSON))
flamingo := Make("animal", "pink")
flamingoJSON, _ := json.Marshal(flamingo)
fmt.Println(string(flamingoJSON))
And, we get this output:
{"MyColor":"red"}
{"MyColor":"pink"}
Now the problem is clear:
the JSON output here is exactly the same for
both the Plant
and the Animal
. How is Go supposed to figure out which is which?
The fundamental issue here is that neither the plant-ness of the rose nor the animal-ness of the flamingo made it into the JSON output. Alas, you say, the solution is just a small matter of programming: add a plant/animal type field to the JSON output and you’re golden.
In fact, Go’s json package makes this approach all quite feasible with its custom Unmarshaler interface. Trowbridge walks you through how to do this, and after a number of non-obvious steps (especially if you’re new to Go) and a hundred or so lines of code, he declares victory at the end of the article: “YOU MADE IT!”
Is this the best we’ve got? Surely there’s got to be a better way.
Enter ZSON
What if there were a data format like JSON but it could reflect the Go types into its serialized representation so the plant-ness and animal-ness from our example above could be handled automatically?
It turns out there is a new kind of data called super-structured data that can carry the information needed to solve our problem here.
We won’t go into all the gory details of super-structured data but suffice it to say it provides a comprehensive type system that can reliably represent any serializable Go type and includes type definitions and first-class type values so it can carry the type names of Go values into its serialized form.
To explore this concept, we’ll use the ZSON form of super-structured data. ZSON is a superset of JSON so it will look familiar, but it carries the full power of the super-structured data model.
Armed with ZSON, we can serialize the flamingo and rose with the super-structured type information (try it):
rose := Make("plant", "red")
flamingo := Make("animal", "pink")
m := zson.NewMarshaler()
m.Decorate(zson.StyleSimple)
roseZSON, _ := m.Marshal(rose)
fmt.Println(roseZSON)
flamingoZSON, _ := m.Marshal(flamingo)
fmt.Println(flamingoZSON)
And, we get this output:
{MyColor:"red"}(=Plant)
{MyColor:"pink"}(=Animal)
As you can see, the plant-ness and animal-ness of the Thing
is
noted in the ZSON output!
The parenthesized strings at the end of each line are called type decorators. ZSON has a fully-fledged type system and these decorators may be embedded throughout complex and highly nested ZSON values to provide precise type semantics.
Mind you, these type names look like Go-specific type names but there is nothing language-specific in the ZSON type name. It can be any string, but it just so happens the ZSON marshaler chooses type names to match the Go types being serialized.
Given the type information in the ZSON output, we should be able to unmarshal the
ZSON back into an interface value, right? There’s one little twist.
Because Go doesn’t have a way to convert the name of type to a value of that
type, you need to help out the ZSON unmarshaler by giving it an example list
of values that might be referenced in the ZSON using the Bind
method on the unmarshaler. Here’s how this works
(try it):
u := zson.NewUnmarshaler()
u.Bind(Animal{}, Plant{})
var thing Thing
if err := u.Unmarshal(flamingoZSON, &thing); err != nil {
fmt.Println(err)
} else {
fmt.Println("The flamingo is " + thing.Color())
}
if err := u.Unmarshal(roseZSON, &thing); err != nil {
fmt.Println(err)
} else {
fmt.Println("The rose is " + thing.Color())
}
If you run this, you will see the serialized ZSON values are successfully marshaled into the interface variable with the correct underlying concrete types. The output here is:
The flamingo is pink
The rose is red
Just for good measure, you can see here that the type of concrete value is in fact correct (try it):
_, ok := thing.(*Animal)
fmt.Printf("Is the thing an Animal? %t\n", ok)
and the output is
Is the thing an Animal? true
In a nutshell, unmarshaling ZSON into an interface value just works!
There’s no need for custom unmarshal methods on every underlying
concrete type and no need for lots of glue code with custom maps and copious
use of json.RawMessage
.
Custom Type Names
You probably noticed in these examples that the ZSON marshaling logic used the exact same type names as the Go program. This can create name conflicts since the same type name may appear in different Go packages (e.g., io.Writer versus bufio.Writer).
To cope with this, the ZSON marshaler lets you specify more detailed types by
providing a zson.TypeStyle
to the marshaler’s
Decorate method. You can use package names with zson.StylePackage
, e.g.,
by changing
m.Decorate(zson.StyleSimple)
in our example to
m.Decorate(zson.StylePackage)
Running this variation of the code produces the following output (try it):
{MyColor:"red"}(=main.Plant)
{MyColor:"pink"}(=main.Animal)
Here, the Plant
and Animal
types are defined in the main package so
each ZSON type is prefixed with main.
Type names can also be extended to include the absolute import path using
zson.StyleFull
and even include version numbers in the type path to provide
a mechanism for versioning the “schema” of these serialized messages.
The NamedBindings method on the marshaler establishes a binding between the chosen ZSON type name and the Go data type, so we can add an option like this to our marshaling logic:
m.NamedBindings([]zson.Binding{{"CustomPlant.v0", Plant{}}, {"CustomAnimal.v0", Animal{}}})
Running this variation gives the following output (try it):
{MyColor:"red"}(=CustomPlant.v0)
{MyColor:"pink"}(=CustomAnimal.v0)
For example, suppose you enhanced the Animal
and Plant
implementations
so the various instances of Things
evolve.
You could imagine unmarshaling multiple versions of the Thing
,
with different ZSON version numbers,
formatted into different concrete types all behind a single Go interface value.
This is obviously a form of schema versioning, but here there’s no need to define explicit schemas as the schemas are simply implied by the Go types. Easier, don’t you think?
Higher Fidelity Types
Another big advantage of ZSON over JSON is the high fidelity provided by its super-structured type system. For example, marshaling the following Go value to JSON:
type TraceRecord struct {
Name string
Host net.IP
Hops uint8
}
func main() {
b, _ := json.Marshal([]TraceRecord{
{"google.com", net.ParseIP("142.250.72.142"), 8},
{"yahoo.com", net.ParseIP("74.6.231.20"), 13},
{"facebook.com", net.ParseIP("31.13.70.36"), 8},
})
fmt.Println(string(b))
}
produces this output (try it):
[
{
"Name": "google.com",
"Host": "142.250.72.142",
"Hops": 8
},
{
"Name": "yahoo.com",
"Host": "74.6.231.20",
"Hops": 13
},
{
"Name": "facebook.com",
"Host": "31.13.70.36",
"Hops": 8
}
]
While the Name
field is preserved as a string, the Host
field is changed
from an IP address to a string and the Hops
field is changed from an unsigned
8-bit integer to a JSON 64-bit floating point number. This loss of type fidelity
is fundamental to the simplicity of JSON. Most of us are all too aware of this challenge,
which often leads to custom serialization code to format non-standard types into strings so they
can fit into JSON.
Using ZSON, however, we can preserve all of the type information from the original Go data structure in the serialized output without doing anything special. For example, we can change the marshaling logic from above as follows:
func main() {
m := zson.NewMarshaler()
m.Decorate(zson.StyleSimple)
b, _ := m.Marshal([]TraceRecord{
{"google.com", net.ParseIP("142.250.72.142"), 8},
{"yahoo.com", net.ParseIP("74.6.231.20"), 13},
{"facebook.com", net.ParseIP("31.13.70.36"), 8},
})
fmt.Println(string(b))
}
and we get the following fully-typed ZSON output (try it):
[
{
Name: "google.com",
Host: 142.250.72.142,
Hops: 8 (uint8)
} (=TraceRecord),
{
Name: "yahoo.com",
Host: 74.6.231.20,
Hops: 13
} (TraceRecord),
{
Name: "facebook.com",
Host: 31.13.70.36,
Hops: 8
} (TraceRecord)
]
Here, you can see types are preserved: Host
is a native ZSON IP address and
the Hops
field is a uint8
.
This is nice because if you have marshaled data like this hanging around,
say in a file events.zson
, all of the original application type information
is preserved and we can use tooling like
zq to interrogate it in interesting
ways. For example, this query groups the unique IP address by each Hops
value:
$ zq 'over this | IPs:=union(Host) by Hops' events.zson
{
Hops: 8 (uint8),
IPs: |[
31.13.70.36,
142.250.72.142
]|
}
{
Hops: 13 (uint8),
IPs: |[
74.6.231.20
]|
}
(Note that the syntax |[...]|
indicates a
Zed set type
as Zed’s
union aggregate function
produces a set as output.)
The Zed Project
So where did all this stuff come from?
Turns out we’ve been pretty busy the past few years working away on the open-source Zed Project with the goal of creating an easier way to manage, query, search, and transform data.
The Zed data model at the heart of the Zed system keeps surprising us in serendipitous ways. We didn’t set out to create a simpler way to serialize Go interfaces as described here; rather, we realized along the way that Zed had some powerful properties for doing so.
In fact, the Zed system itself uses these marshaling techniques for serializing its internal data structures throughout the Zed lake format.
For example, the Zed lake’s Git-like commit objects contain data actions that are serialized using the marshaling techniques described above. A key difference here though is that the Zed lake uses the ZNG format for storing Zed data rather than the text-based ZSON format. Because ZNG is binary and compact, it is far more efficient than ZSON.
These actions in a Zed lake can easily be queried using a metadata query on a lake. As an example, you can create a lake using the zed CLI command, load some dummy data into it, and then query the commit history with a meta-query as follows:
export ZED_LAKE=./test
zed init
zed create POOL
zed use POOL
echo '{a:[1,2,3]}' | zed load -
echo '{b:4}' | zed load -
echo '{c:[3,4,5]}' | zed load -
Now if you run the zed log
command, you can see the Git-like commit history:
$ zed log
commit 2Brb40yTAu3jLdigRf5CKFGN2ee (HEAD -> main)
Author: mccanne@bay.lan
Date: 2022-07-12T23:02:35Z
loaded 1 data object
2Brb3yO9eJxXISWJljNdLWphhAP 1 record in 21 data bytes
commit 2Brb3uKM4WrpoQWVIMNkijWxQMS
Author: mccanne@bay.lan
Date: 2022-07-12T23:02:34Z
loaded 1 data object
2Brb3owQirVdkqqtTmlqJBUcM1B 1 record in 14 data bytes
commit 2Brb3rI4gSZ2jfnmCY0I4sEBCLl
Author: mccanne@bay.lan
Date: 2022-07-12T23:02:34Z
loaded 1 data object
2Brb3qZB66zPUwM02DnGwWQSp2W 1 record in 21 data bytes
But here’s where it gets interesting: under the hood,
the zed log
command simply runs a metadata
query on the lake and the query code
just marshals the Zed data
in the lake’s log objects using the techniques described above.
In other words, the Zed system uses Zed marshaling inside of itself.
If you run this meta-query on data we loaded and ask the zed
command to
pretty-print the output as ZSON using -Z
, you’ll get output that looks like this:
$ zed query -Z "from POOL@main:rawlog"
{
id: 0x0f5baf8a800e9874e86f1e6913a4af90bb48b03d (=ksuid.KSUID),
parent: 0x0000000000000000000000000000000000000000 (ksuid.KSUID),
retries: 0 (uint8),
author: "mccanne@bay.lan",
date: 2022-07-12T23:02:34.780087Z,
message: "loaded 1 data object\n\n 2Brb3qZB66zPUwM02DnGwWQSp2W 1 record in 21 data bytes\n",
meta: null
} (=commits.Commit)
{
commit: 0x0f5baf8a800e9874e86f1e6913a4af90bb48b03d (=ksuid.KSUID),
object: {
id: 0x0f5baf8a68439d3b094fa2dfb2f99989272424a8 (ksuid.KSUID),
meta: {
first: null,
last: null,
count: 1 (uint64),
size: 21
} (=data.Meta)
} (=data.Object)
} (=commits.Add)
{
id: 0x0f5baf8ae3d60b1010e722d80bfea31956c0ea60 (=ksuid.KSUID),
parent: 0x0f5baf8a800e9874e86f1e6913a4af90bb48b03d (ksuid.KSUID),
retries: 0 (uint8),
author: "mccanne@bay.lan",
date: 2022-07-12T23:02:34.804476Z,
message: "loaded 1 data object\n\n 2Brb3owQirVdkqqtTmlqJBUcM1B 1 record in 14 data bytes\n",
meta: null
} (=commits.Commit)
{
commit: 0x0f5baf8ae3d60b1010e722d80bfea31956c0ea60 (=ksuid.KSUID),
object: {
id: 0x0f5baf8a32df6f8e4c0a8ac10eddb3d7f75acdb1 (ksuid.KSUID),
meta: {
first: null,
last: null,
count: 1 (uint64),
size: 14
} (=data.Meta)
} (=data.Object)
} (=commits.Add)
{
id: 0x0f5baf8bbe3c6578b076c98d037f4657fc5fa548 (=ksuid.KSUID),
parent: 0x0f5baf8ae3d60b1010e722d80bfea31956c0ea60 (ksuid.KSUID),
retries: 0 (uint8),
author: "mccanne@bay.lan",
date: 2022-07-12T23:02:35.299159Z,
message: "loaded 1 data object\n\n 2Brb3yO9eJxXISWJljNdLWphhAP 1 record in 21 data bytes\n",
meta: null
} (=commits.Commit)
{
commit: 0x0f5baf8bbe3c6578b076c98d037f4657fc5fa548 (=ksuid.KSUID),
object: {
id: 0x0f5baf8b6946f2c8014ce409bf8e01cd3ce201c9 (ksuid.KSUID),
meta: {
first: null,
last: null,
count: 1 (uint64),
size: 21
} (=data.Meta)
} (=data.Object)
} (=commits.Add)
And like the events.json example above, we can easily manipulate this output as it’s just self-describing Zed data. You can apply whatever data manipulation queries you’d like to the resulting output using our Zed language, e.g., computing the average data object size:
$ zed query 'from POOL@main:rawlog | avg(object.meta.size)'
{avg:18.666666666666668}
Or you can select out all commit dates:
$ zed query 'from POOL@main:rawlog | has(date) | yield date'
2022-07-13T02:08:34.521281Z
2022-07-13T02:08:34.54489Z
2022-07-13T02:08:35.254674Z
Or you can even collect these dates into an array and output the result
as an array of strings and pretty-print the JSON array using jq
:
$ zed query -f json 'from POOL@main:rawlog | has(date) | dates:=collect(date) | yield dates' | jq
[
"2022-07-12T23:02:34.780087Z",
"2022-07-12T23:02:34.804476Z",
"2022-07-12T23:02:35.299159Z"
]
Wrapping Up
So you may wonder, is all of this worth it? Why bother figuring out how to use this obscure new data format just to unmarshal JSON into Go interface values?
Of course, you can just keep doing things the way things have always been done. JSON is 20 years old and relational tables are 50 years old and going between these worlds is a big pain in the neck, especially when you want an analytics system or data warehouse to analyze the JSON events and logs that your app infrastructure produces.
We think Zed could very well turn out to be a lot easier. Could it be that this approach of marshaling Go structs straight to Zed then streaming batches of self-describing, marshaled values straight into a Zed lake is just so dead simple?
That all said, our project is still early. We don’t yet have broad cross-language support. We’re working on a vector engine for high-performance queries at scale. We’re actively improving our search design.
These improvements and more will show up in our GitHub repos in the coming months.
But rest assured, many of the Zed system pieces are production quality and useful. Feel free to kick the tires. You can:
We love working with all our users to help guide us to the best ways of solving your real, everyday problems. Give us a holler and we look forward to chatting.