When Go was first announced I remember looking over the list of its key features and feeling astonished that a new language would omit the classes and inheritance that I had come to depend on so heavily. My interest faded quickly.
Fast forward a few years and our team has fully embraced Go for its speed, tooling, standard library, concurrency support and all the other things we know and love about Go. If you’re interested in learning more about how we use Go at zvelo, we’ve recently published a blog post.
The concept of interfaces, while certainly not new to us, seemed more like an afterthought in our embrace of the language. We had used interfaces in C++, and they were useful but tedious. Despite hearing so many great things about implicitly satisfied interfaces, it still took us quite a while to really internalize what the implications of this simple concept were.
Let’s walk through the process that a newcomer to Go might follow in developing a simple text processor that replaces instances of “hodor” with “hold the door”. We will start with a naïve implementation and refactor it over several steps.
Naïve Implementation
|
|
|
|
Let’s ignore for the duration of the exercise the simplicity of the
function — it simply represents anything that has to modify data. There are
several obvious problems. First of all, the most glaring issue is that the
regular expression is being compiled every time process is executed.
Additionally, while there is no risk here (since the regex is fine), if the
regular expression were to fail compilation (if, say, it was loaded
dynamically), it would cause the application to panic during runtime. This would
be unacceptable for production systems, and proper error handling should have
been used instead (with regexp.Compile
), but I digress.
Precompiled Regular Expressions
|
|
We keep using regexp.MustCompile
, but because compilation occurs during
initialization, any errors in the regex are exposed immediately on application
startup. Benchmarking the updated function yields significantly better
performance for small strings, but as strings get larger the benefit approaches
zero.
|
|
Avoiding Regular Expressions
Regular expressions are excellent tools that have many valid uses, but are often abused when simple text processing will suffice.
|
|
|
|
By using strings.Replace
instead of a regular expression, we improve the
performance by an order of magnitude for both short and long strings. We also
minimize the number of memory allocations. Keep an eye on the B/op though, it
scales with string size and that may become an issue for very large strings.
Also, what if we want to operate on large files or even a network socket?
Using io.Reader
Let’s see if we can make this a bit more generic by using io.Reader
instead of
string
. We’ve seen io.Reader
used with things like os.File
and figure that we
can make that work somehow. But how do we return the processed data to the
caller? Let’s just return another io.Reader
.
|
|
|
|
Great! Now we are using interfaces, we’re golden right? Well… not so much. We
aren’t using less memory since we are using ioutil.ReadAll
(which is almost
always incorrect and only works with readers that return io.EOF
). Further, we
are just wastefully turning the result into an io.Reader
from the string. To add
insult to injury, our performance has dropped significantly across all metrics
too.
Streaming Data
It now occurs to us that if we streamed the data, byte by byte, we can avoid
the large memory allocations. This does introduce a new knob that will affect
processing performance. We will have to choose how much data to buffer at a
time before running Replace
. The larger the chunk size, the higher the B/op.
The smaller the chunk size, the greater the number of times data has to be
copied. There is no right answer for every situation.
It should be noted that there is a bit of a bug in this implementation in that the chunk could read until the middle of a hodor and it wouldn’t get replaced properly. Since this code is for demonstration only, fixing it is an exercise left to the reader.
|
|
|
|
This is definitely a step in the right direction as we are truly streaming the data now. However, because we are also managing the output buffer, we still require more memory and allocations than necessary. Don’t worry about the performance loss, things are about to get much better.
A Quick Side Note about Replace
Astute readers will see the as yet undefined Replace
in the above code. In
effect, it is only bytes.Replace
.
|
|
The difference between Replace
and bytes.Replace
is that Replace
is
passed a bytes.Buffer
. The benefit of this is not fully realized yet, but it
allows an already allocated buffer to be used instead of requiring new
allocations every time it is called. This is the same strategy that
io.CopyBuffer
uses. Since there is
no bytes.ReplaceBuffer
it had to be copied and modified.
Pushing Memory Allocation to the Caller
Let’s look at one more possibility for a Process
func. Rather than handling the
memory allocation ourselves, with bytes.Buffer
, let’s let the caller decide how
to handle memory by allowing a passed in io.Writer
.
|
|
|
|
This is certainly cleaner, and is a bit more performant and memory conscious. What if there was a way for us to prevent the need for any write buffer?
Implementing our own io.Reader
By writing a Processor
that implements the io.Reader
interface itself, we can
essentially create a pipeline for data to flow while minimizing data
allocations. As an io.Reader
our Processor
is usable by any number of packages
in the standard library and third party packages.
|
|
|
|
This is what we were looking for. It is nearly as performant as the
strings.Replace
version but uses a fraction of the memory and causes very
little garbage collector thrashing.
The ReadAll
metrics consider reading all of the io.Reader
as a single
operation, whereas the Read
metrics consider one r.Read
function as a single
operation.
It now becomes much clearer how different chunk sizes affect things. Smaller
chunks result in more allocations, but much faster individual r.Read
operations. Larger chunks work much like strings.Replace
since they are getting
most of the data at once and their performance and memory requirements (though
not allocations) approach it as well.
One thing to note is that the B/op metric is a bit deceiving since we are reusing the buffers so heavily. They indicate an average across many calls. The first few calls will do all of the allocations and then all subsequent ones can reuse them without requiring new allocations. The actual memory used by each call corresponds more closely with the chunk size.
For reference, here is the same test as the previous one, but using
bytes.Replace
instead of our buffered Replace
.
|
|
The primary difference is that the B/op metric is 3 orders of magnitude larger for long strings.
If you are more of a visual learner, here is the data in a few charts, and here is its source data.
Conclusion
I’ve attempted to illustrate the learning process that someone coming from
languages like Perl or Python might follow. Starting with regular expressions
and ending with implementing an io.Reader
, is certainly a possible, if
unlikely, progression. While for this simple example strings.Replace
certainly
would have sufficed, more complicated algorithms might justify the additional
complexity.
Remember, here, as always, write clean, maintainable code first, test it, then measure its performance. Only then optimize the parts where the performance is required.