Go · DTO

Passing data across software boundaries

When developing software natural, the concept of boundaries arises. A package, library, service, or system needs to communicate information to other packages, libraries, services, systems, and users. The boundary lies between those two entities.

The following ideas exist, they’re not necessarily Go specific, but the examples are.

Basic types

if both sides of the boundary speak the same language then it’s possible to use that language’s type system. Note: as will be clear shortly, I’m specifically talking about the simple types, nothing complex.

For example, an interface is defined that has a method that returns an int An implementation of that interface returns an int, and there are no explicit dependencies.

package A

type Foo interface {
    Get() int
}

------------
package B

type Bar struct {}

func (b *Bar) Get() int {
    return 42
} 

------------
package User

func Using(f Foo) {
    x := f.Get()
    fmt.Printf("The meaning of life is %d", x)
}

This is fine when both the producer and consumer speak the same language, and there is a limited set of data to exchange.

Multiple standard library types

Returning a set of values, all of which are defined by the standard library.

foo() (int, float, error) {
    return 1, 1.0, nil
}

It’s a common idiom in Go, with the final item usually an error type, or a boolean for the ok paradigm.

But this is clearly messy when large numbers of values are being returned, and leads to dog sledding and, the results are positional, which means that the user has to know the order, and meaning, of the returned values.

Data Transfer Objects (DTO)

However, when the dataset that needs to be exchanged is more complex, a Data Transfer Object makes a bit more sense.

There are a few options here, each have strengths and weaknesses, and none are perfect.

Maps

foo() map[string]int {
    return map[string]int{"One": 1, "Two":2}
}

Using this approach has the advantages or reducing the number of items being returned, no more dog sledding, and naming each of the values, that is, each value in the map will have a unique key, which provides an opportunity for developers to communicate meaning.

Unfortunately there’s no compulsion on a producer to put any particular data point into them map, and no way to communicate to the consumer what’s been placed into the map, there could be a “One”, there might not. Therefore the consumer has to either loop over all the keys in the map to discover what has been passed, or ask if a particular key has been set, and then use the value.

Maps also restrict the types that the keys and values can be composed of, that is, only one type is able to be specified and used for keys in the map, and the same for values.

Anonymous types

func Foo() struct{A int; B float32}{
    return struct{ A int; B float32}{A: 1; B: 2.0}
}

An anonymous type allows named fields with a variety of types to be passed in one single object.

Their biggest weakness is their lack of a name, so producers, and consumers, have to specify the names and types of each field in the struct every time that they want to use them.

Named types

type Foo struct {
    A int
    B float32
}

func Bar() Foo {
    return Foo{A: 1, B: 2.0}
}

There is far less repetitive boiler plate required for this approach, the fields are defined in a single point, whilst producers and consumers alike can see what those fields are, and what types they are composed of.

Unfortunately there is still an issue. In order to be able to use this single named type, producers and consumers have to know about it, forming a dependency. The norm is for a named type like this to be defined by the producer, and the consumer then has to import the producer to gain the knowledge, this is usually where interfaces are defined. The producer has the knowledge because it’s defined by them.

This breaks my favourite feature of implicit implementation of interfaces in Go, because the implementation has to know the type, and the code cannot be used to implement another interface. It’s locked to that interface via that named type.

Across disparate technologies

RFC and standardised communications When two systems aren’t able to directly communicate to one another in the same language (eg. different languages, different processes) then a third system is used. The third system normally takes the form of a standardised communication protocol, where the description and types of the information being passed is published somewhere that producers and consumers can access and review.

It should be noted, however, that problems with standards exist, and are plentiful, there’s the choice of protocol, the implementation (that is, the only compulsion that producers have to adhering to a given standard is that their consumers might complain).

A few examples exist, this is a tiny selection.

JSON

The JSON (Javascript Object Notation) specification provides a text format collection of key/value pairs that can be used to exchange data between systems.

Producers can offer documentation on how the object will be composed. They’re very much like any other key, value store (eg. maps), but consumers can inspect the data to discover the keys, and their values.

The types used in the object being passed may not be an exact match to the types that the producer or consumer understand.

Protobuf ( gRPC)

Protobuf (Protocol buffers) have emerged as a structured format for transferring data.

A .proto file is created, that describes the data that’s transferred (messages), and the functions called to get the data (services). that file is then used by producers and consumers to ensure that that contract is fulfilled.

There are two problems to be aware of, firstly a minor issue exists on who owns the file. It’s best practice for the producer to own the definition of messages and services, but breaking chances are best managed with versioning.

Summary

The ability to pass data across boundaries is not a solved problem. Best practice is to be as explicit as possible. Personally I am reluctant to create a dependency, and will avoid if possible, but that’s only really going to work for a small subset of use-cases.

There is also an idea to take away some of the strong typing, where both sides of the software boundary define their own version of the DTO, with the expectation that the compiler will ensure that the two types are compatible. This, would slow down compilation times, as the compiler would need to introspect both types to ensure that they were interchangeable, and could also be thought of as a dynamic typing implementation (at boundaries), which comes with its own set of challenges.

It should also be noted that having the business logic define the DTO means that when the business logic’s needs change, it can communicate that via the DTO.

Published:
comments powered by Disqus