Go for Dev & Ops

Go (or golang, as it is often called, because “Go” is a bit vague when typed into a search engine), if you haven’t heard of it, is a relatively new language for cross-platform development. It is a modern, C-style, statically-typed language that produces platform-native binaries, but still includes memory management and garbage collection.

Go is open-source, stable (currently at version 1.6, with its first stable release dating back to 2011), and it’s not going anywhere. Written by a couple of Google’s brilliant engineers, and used internally at Google at the levels of scale and reliability you would expect from a tech giant, it is an entirely different beast from something like Ruby or NodeJS. It encourages the kind of development and deployment pipeline that we should all strive for.

Software development has been moving more and more toward interpreted, dynamically-typed, single-threaded languages like Ruby and Node. It’s been moving more toward mountains of external dependencies – I’ve seen projects where more than 90% of the code was in third-party libraries. That’s a lot of other people’s technical debt and generalized implementations to absorb into your project.

Go encourages the exact opposite. It’s statically-typed, though it offers the convenience of type inference. It’s compiled, and not just to bytecode, but to native binary. It’s innately and intuitively concurrent. Its dependency management, like everything else about it, is intentionally minimalist, implicitly discouraging overuse of external libraries and frameworks.

Go, by design, at the language level, encourages a lot of really good programming practices:

  • Because the language and standard library aim for “exactly enough and no more”, developers are more likely to follow that lead with their own code. Go is the embodiment of KISS.
  • Go makes it easy and intuitive to handle concurrency through “communication instead of sharing”, keeping code cleaner and reducing the rate of defects. It also has a built-in race condition detector that can be activated when running tests or benchmarks.
  • The core Go tool chain includes support for unit tests and benchmarks, and will even include example code from your documentation to ensure it compiles and works.
  • Go has a language-level style guide, and included in the core Go tool chain is a tool (gofmt) that reformats files to fit the standard style.
  • Go omits many facets of OOP, including inheritance (though it has mix-ins), abstract classes, method overloading, operator overloading, and multi-level access control. Though useful tools in some cases, these are most frequently used to vastly over-complicate implementations. Go really wants you to just write the simplest thing that will work.

After the code is written, Go makes it very easy to set up continuous integration, including unit tests, benchmarks, linting, and test coverage. It’s trivial to have a build go on to cross-compile for every platform and architecture you might want to deploy to.

How do you deploy a Go app? Drop the executable on a server and execute it. There is no interpreter or virtual machine to install. In fact, there are no external dependencies at all unless you create them in code. Performance is excellent and resource usage is minimal – unlike, for example, Java or Ruby. The deployment footprint is about as small as it gets, and deployments are highly reliable. You suffer none of the fragility brought on by differences between versions of Java, Ruby, Python, or NodeJS – your binary runs the same regardless of the system it’s deployed to.

I won’t say that Go is perfect, or universally applicable. There are certainly use cases it is not well-suited to. For starters, Go produces console applications, not GUIs. It is not intended for building desktop applications – though there are libraries for doing so. Go also has a focus on simplicity and minimalism, and it could get difficult to manage very large code bases in Go unless you’re very careful and very smart about it.

Where Go really shines is in producing small, self-contained, single-purpose command-line tools and microservices. The Go standard library includes a multi-threaded HTTP server perfect for building a REST API server in a hurry, while still being able to stand up to production use.

What sort of things are built in Go? Let’s see:

 

A Good Case for Third Party Libraries

Developers have a tendency to over use third party libraries. They bring them in because they want some new technical advance that’s in the spotlight – ORM or IOC or AOP. They don’t stop to consider that they’re importing all that library’s technical debt and defects, along with a black box they won’t understand when it fails.

However, there are some excellent cases for third party libraries. Unfortunately, these are the same cases w where developers tend not to reach for an outside solution; cases where developers think, hey, this is easy, I’ve got this.

The best case I can think of for a third party library is functionality that is simple, well understood, and stable, but with a lot of edge cases. The canonical example is date and time handling.

Date and time handling is easy, right? You’ve understood clocks and calendars since grade school. You even grok Unix timestamps. You could do date and time handling in your sleep, with your hands tied behind your back – if you can manage to sleep with your hands tied behind your back in the first place, but that’s neither here nor there.

But wait – you wrote this software to use the user’s local time zone with no offset. Interoperability issues ensue. OK, no big deal, time zones are tricky but they’re not rocket science. You make a fix a move on.

Damn. Daylight savings time. No one hates daylight savings time more than developers. Another patch. But wait, some places don’t do daylight savings time. Another patch. Leap years. Forgot about leap years. Another patch. But they aren’t every four years – the year 2000, for example, was not a leap year. Another patch. Wait, leap seconds? That’s a thing? OK, another patch. What do you mean different people put the day, month, and year in a different order? Alright, alright, another patch. 24-hour time? No problem, patch. Wait, is midnight 00:00 or 24:00? What if someone inputs 24:05? What if someone increments a time by 10 minutes at five till midnight? Or at 1:55 the day daylight savings starts or ends? Or before a leap second is applied? What if a user is mobile and their time zone changes while they’re using the application?

Time and date are simple, every day parts of our lives. We take them for granted. We haven’t changed the way clocks and calendars work in any significant way in our lifetimes. But they’re absolutely rife with details and complexities and edge cases that are very difficult to enumerate off the top of your head – you’re bound to miss some of them. A library is perfect for this: because the functionality doesn’t need to change, a good library will be stable, with very infrequent updates, and few defects. Why kill yourself with patch after patch when there’s bound to be a solid solution waiting out there for you to use?

These are the sorts of cases where libraries are the right answer. Not for adding new bells and whistles, but for using an existing solution to a thoroughly solved problem. Character encoding is a good case. Handling any common file format is definitely a good case – XML, JSON, YAML, CSV. If I never see another hand-written CSV parser it’ll be too soon. Sorting – for Petes sake, if your language doesn’t have sorting built in, find a library. I don’t care if you learned to write a bubble sort in college. Don’t write one at work. Ever.

Anything even remotely related to cryptography you should absolutely solve with a library. Hashing, random number generation, and encryption are solved, hard problems. Even when security isn’t critical, you may as well lean on the work of other smart people.

Solved problems are where you want a library. Solutions in search of a problem may be more exciting, and come with more interesting acronyms and blog posts and conference keynotes, but they often create at least as many problems as they solve. Don’t get sucked in.

A Wrinkle in Time

You’ve built a prototype, everything is going great. All your dates and times look great, they load and store correctly, everything is spiffy. You have your buddy give it a whirl, and it works great for them too. Then you have a friend in Curaçao test it, and they complain that all the times are wrong – time zones strike again!

But, you’ve got this covered. You just add an offset to every stored date/time, so you know the origin time zone, and then you get the user’s time zone, and voila! You can correct for time zones! Everything is going great, summer turns to fall, the leaves change, the clocks change, and it all falls apart again. Now you’re storing dates in various time zones, without DST information, you’re adjusting them to the user’s time zone, trying to account for DST, trying to find a spot here or there where you forgot to account for offsets…

Don’t fall into this trap. UTC is always the answer. It is effectively time-zone-less, as it has an offset of zero and does not observe daylight savings time. It’s reliable, it’s universal, it’s always there when you need it, and you can always convert it to any time you need. Storing a date/time with time zone information is like telling someone your age by giving your birthday and today’s date – you’re dealing with additional data and additional processing with zero benefit.

When starting a project, you’re going to be better off storing all dates as UTC from the get-go; it’ll save you innumerable headaches later on. I think it is atrocious that .NET defaults to system-local time for dates; one of the few areas where I think Java has a clearly better design. .NET’s date handling in general is a mess, but simply defaulting to local time when you call DateTime.Now encourages developers to exercise bad practices; the exact opposite of the stated goals of the platform, which is to make sure that the easy thing and the correct thing are, in fact, the same thing.

On a vaguely related note, I’ve found a (in my opinion) rather elegant solution for providing localized date/time data on a website, and it’s all wrapped up in a tiny Gist for your use: https://gist.github.com/aprice/7846212

This simple jQuery script goes through elements with a data attribute providing a timestamp in UTC, and replaces the contents (which can be the formatted date in UTC, as a placeholder) with the date/time information in the user’s local time zone and localized date/time format. You don’t have to ask the user their time zone or date format.

Unfortunately it looks like most browsers don’t take into account customized date/time formatting settings; for example, on my computer, I have the date format as yyyy-mm-dd, but Chrome still renders the standard US format of mm/dd/YYYY. However, I think this is a relatively small downside, especially considering that getting around this requires allowing users to customize the date format, complete with UI and storage mechanism for doing so.

On Code Comments

I’ve been seeing a lot of posts lately on code comments; it’s a debate that’s raged on for ages and will continue to do so, but for some reason it’s been popping up in my feeds more than usual the last few days. What I find odd is that all of the posts generally take on the same basic format: “on the gradient of too many to too few comments, you should aim for this balance, in this way, don’t use this type of comments, make your code self-documenting.” The reasoning is fairly consistent as well: comments get stale, or don’t add value, or may lead developers astray if they don’t accurately represent the code.

And therein lies the rub: they shouldn’t be representing the code at all. Code – clean, self-documenting code – represents itself. It doesn’t need a plain-text representative to speak on its behalf unless it’s poorly written in the first place.

It may sound like I’m simply suggesting aiming for the “fewer comments” end of the spectrum, but I’m not; there’s still an entity that may occasionally need representation in plain text: the developer. Comments are an excellent way to describe intent, which just so happens to take a lot longer to go stale, and is often the missing piece of the puzzle when trying to grok some obscure or obtuse section of code. The code is the content; the comments are the author’s footnotes, the director’s commentary.

Well-written code doesn’t need comments to say what it’s doing – which is just as well since, as so many others have pointed out, those comments are highly likely to wind up out-of-sync with what the code is actually doing. However, sometimes – not always, maybe even not often, but sometimes – code needs comments to explain why it’s doing whatever it’s doing. Sure, you’re incrementing Frobulator.Foo, and everybody is familiar with the Frobulator and everybody knows why Foo is important and anyone looking at the code can plainly see you’re trying to increment it. But why are you incrementing it? Why are you incrementing it the way you’re doing it in this case? What is the intent, separate from its execution? That’s where comments can provide value.

As a side note (no pun intended), I hope we can all agree that doc comments are a separate beast entirely here. Doc comments provide meta data that can be used by source code analyzers, prediction/suggestion/auto-completion engines, API documentation generators, and the like; they provide value through some technical mechanism and are generally intended for reading somewhere else, not for reading them in the source code itself. Because of this I consider doc comments to be a completely separate entity, that just happen to be encoded in comment syntax.

My feelings on doc comments are mixed; generally speaking, I think they’re an excellent tool and should be widely used to document any public API. However, there are few things in the world more frustrating that looking up the documentation for a method you don’t understand, only to find that the doc comments are there but blank (probably generated or templated), or are there but so out of date that they’re missing parameters or the types are wrong. This is the kind of thing that can have developers flipping desks at two in the morning when they’re trying to get something done.

Optimizing Entity Framework Using View-Backed Entities

I was profiling a Web application built on Entity Framework 6 and MVC 5, using the excellent Glimpse. I found that a page with three lists of five entities each was causing over a hundred query executions, eventually loading a huge object graph with hundreds of entities. I could eliminate the round trips using Include(), but that still left me loading way too much data when all I needed was aggregate/summary data.

The problem was that the aggregates I needed were complex and involved calculated properties, some of which were based on aggregates of navigation collection properties: a parent had sums of its children’s properties, which in turn had sums of their children’s properties, and in some cases parents had properties that were calculated partly based on aggregates of children’s properties. You can see how this quickly spun out of control.

My requirements were that the solution had to perform better, at returning the same data, while allowing me to use standard entity framework, code first, with migrations. My solution was to calculate this data on the server side, using entities backed by views that did the joining, grouping, and aggregation. I also found a neat trick for backward-compatible View releases:

IF NOT EXISTS (SELECT Table_Name FROM INFORMATION_SCHEMA.VIEWS WHERE Table_Name = 'MyView')
EXEC sp_executesql N'create view [dbo].[MyView] as select test = 1'
GO
ALTER VIEW [dbo].[MyView] AS
SELECT ...

It’s effectively upsert for views – it’s safe to run whether or not the view already exists, doesn’t ever drop the view if it does exist (leaving no period where a missing view might cause an error), and it doesn’t require keeping separate create and alter scripts in sync when changes are made.

I then created the entities that would represent the views, using unit tests to ensure that the properties now calculated on the server matched expected values the same way that the original, app-calculated properties did. Creating entities backed by views is fairly straightforward; they behave just like tables, but obviously can’t be modified – I made the property setters protected to enforce this at compile time. Because my View includes an entry for every “real” entity, any query against the entity type can be cast to the View-backed type and it will pull full statistics (there is no possibility of an entity existing in the base table but not in the view).

Next I had to create a one to one association between the now bare entity type and the view type holding the aggregate statistics. The only ID I had for the view was the ID of the raw entity it was connected to. This turned out to be easier said than done – entity framework expects that, in a one to one relationship, it will be managing the ID at one end of the relationship; in my case, the ID’s at both ends were DB-generated, even though they were guaranteed to match (since the ID in the view was pulled directly from the ID in the entity table).

I ended up abandoning the one-to-one mapping idea after a couple days’ struggle, instead opting to map the statistics objects as subclasses of the real types in a table per type structure. This wound up being relatively easy to accomplish – I added a table attribute to the sub type, giving the name of the view, and it was off to the races. I went through updating references to the statistics throughout LINQ queries, views, and unit tests. The unit and integration tests proved very helpful in validating the output of the views and offering confidence in the changes.

I then ran my benchmarks again and found that pages that had required over a hundred queries to generate now used only ten to twenty, and were rendering in half to a third the time – a one to two hundred percent improvement, using views designed purely to mimic the existing functionality – I hadn’t even gone about optimizing them for performance yet!

After benchmarking, it looks even better (times are in milliseconds, min/avg/max):
EF + LINQ EF + Views
3 lists of 5 entities (3 types) 360/785/1675 60/105/675
2 lists of 6 entities (1 type) 325/790/1935 90/140/740
1 entity’s details + 1 list of 50 entities 465/975/2685 90/140/650
These tests were conducted by running Apache JMeter on my own machine against the application running on Windows Azure, across a sampling of 500 requests per page per run. That’s a phenomenal 450 to 650 percent improvement across the board on the most intensive pages in the application, and has them all responding to 100% of requests in under 1 second. The performance gap will only widen as data sets grow; using views will make the scaling much more linear.
I’m very pleased with the performance improvement I’ve gotten. Calculating fields on the app side works for prototyping, but it just can’t meet the efficiency requirements of a production application. View-backed entities came to the rescue in a big way. Give it a try!

My Personal Project Workflow/Toolset

I do a lot of side projects, and my personal workflow and tooling is something that’s constantly evolving. Right now, it looks something like this:

  • Prognosticator for tracking features/improvements, measuring the iceberg, and tracking progress
  • WorkFlowy for tracking non-development tasks (the most recent addition to the toolset)
  • Trac for project documentation, and theoretically for defect tracking, though I’ve not been good about entering defects in Trac recently; it doesn’t seem worth the effort on a one-person project, though with multiple people I think it would be a must
  • Trello for cross-cutting all the above and indicating what’s next/in progress/recently completed, and for quickly jotting down ideas/defects. Most of the defect tracking actually goes in here on one-man projects right now. This is a lot of duplication and the main source of waste in my current process.
  • Bitbucket for source control (I also use Atlassian’s excellent SourceTree as a Git/Hg client.)
It’s been working well for me, the only issue I have is duplication between the tools, and failing to consistently use Trac for defect tracking. What keeps me in Trello is how quick and easy it is to add items to it, and the fact that I’m using it as a catch-all – I can put a defect or an idea or a task into it in a couple of seconds; I just have to replicate it to the appropriate place later, which is the problem.
I think the issue boils down to being torn between having a centralized repository for “stuff to be done” (Trello) and having dedicated repositories catered to each type of thing to be done (Prognosticator, Trac, and WorkFlowy); and convenience. Trello is excellent for jotting something down quickly, but lacks the additional specific utility of the other tools for specific purposes.
I think what I’ll end up doing is creating a “whiteboard” list in WorkFlowy, and using that instead of Trello to jot down quick notes when I don’t have the time to use the individual tools; then I can copy from there to the other tools when I need to. That will allow me to cut Trello down to basically being a Kanban board.

Simple Entity Framework Model Structure

I’ll say right up front, I don’t have a lot of experience with Entity Framework, and this could either be a well-known solution or a completely foolish one. All I know is that, so far, it has worked extremely well for my purposes.

Coming from the Java world, I’m used to using DAO’s to serve as an abstraction layer between the controllers and the database, with the basic CRUD methods, plus additional methods for any specific queries needed for a given entity type.

Conveniently, entity framework provides a fairly complete DAO in the form of DbSet. DbSet is very easy to work with, and provides full CRUD functionality, and acts as a proxy for more complex queries. I wanted to keep queries out of my logic, however, and in the model.

Looking at it, I didn’t want to have to write an entire wrapper for DbSet, and subclassing it seemed like asking for trouble. That’s when it occurred to me to use extension methods for queries. It turns out you can define extension types against a generic type with a type argument specified (e.g. this IEnumerable). This not only allowed me to abstract out the queries and keep them in the model, without having to wrap or subclass anything; but by defining the extensions on IEnumerable instead of DbSet, I have access to my queries on any collection of the appropriate entity type, not just DbSet. I can then chain my custom queries in a very intuitive and fluid way, keeping all of the code clean, simple, and separate.

For example, I have a table of tags. I’ve created extension methods on IEnumerable to filter to tags used by a given user, and to filter by tags starting with a given string. I can chain these to get tags used by a given user and starting with a given string. I can also use these queries on the list of tags associated with another entity, as IList implements IEnumerable, and thus inherits my query extension methods.

I don’t know if this is the best way – or even a good way – but it’s worked for me so far. I do see some possible shortcomings; mainly, the extensions don’t have access to the context, so they can’t query any other DbSets, only the collection it’s called against. This means that only explicit relationships can be queried against, which hasn’t been a roadblock so far in my (admittedly simple) application. I’m not sure this is really a drawback though – you can still add a parameter to pass in an IEnumerable to query against, which again offers the flexibility to pass a DbSet or anything else.