An Opinionated Guide to Unit Testing
A language-agnostic guide on how to write maintainable, loosely-coupled unit tests which provide confidence and reduce fragility.

I’m a software engineer living and working in East London. I’m currently helping to build a one-stop-shop for the digitisation of alternative assets over at Daphne. Although once strictly front-end, today I work across the whole stack, including dipping my toes into DevOps and writing Rust & Go.
Over the last few years I’ve increasingly adopted a test-first mentality. While I don’t subscribe to the dogma often associated with TDD purists, I do find that more often than not the resulting code is more robust and focused.
I’ve written more tests in the last 3 years than ever before, and along the way formed plenty of opinions on what works (and what doesn’t). Collected here are a few guidelines I’ve found useful:
- Don’t use module mocking
- Don’t use lifecycle hooks
- Don’t mock owned dependencies
- Only use table-tests for similar outputs
- Create test helpers to reduce noise
- Avoid snapshot tests
- Move your test doubles to the edges
- Don’t re-use static fixtures
- Prefer stubs & spies over mocks
- Don’t be afraid of test overlap
- Avoid conditionals
- Anything else?
It goes without saying that every rule can be broken if the situation requires, and if a certain strategy is working for you, I’m not here to tell you you’re doing it wrong.
That said…
Don’t use module mocking
Jest provides a feature called module mocking. It looks like this:
JavaScriptimport { mocked } from 'ts-jest/utils';
import getHttpClient from '../getHttpClient';
jest.mock('../getHttpClient');
const mockedGetHttpClient = mocked(getHttpClient);
While convenient, it quickly becomes a crutch that leads to coupling and poor maintainability. See my explanation for why it should be avoided.
Don’t use lifecycle hooks
Most testing frameworks come with support for lifecycle hooks, allowing you to perform repeatable setup before each test in a series.
Here’s Jest’s:
JavaScriptbeforeEach(() => {
initializeCityDatabase();
});
afterEach(() => {
clearCityDatabase();
});
test('city database has Vienna', () => {
expect(isCity('Vienna')).toBeTruthy();
});
Harmless, right? What about this:
JavaScript// Applies to all tests in this file
beforeEach(() => {
initializeCityDatabase();
});
test('city database has Vienna', () => {
expect(isCity('Vienna')).toBeTruthy();
});
describe('matching cities to foods', () => {
// Applies only to tests in this describe block
beforeEach(() => {
initializeFoodDatabase();
});
test('San Juan <3 plantains', () => {
expect(isValidCityFoodPair('San Juan', 'Mofongo')).toBe(true);
});
});
Frameworks typically allow as many levels of nested groupings and hooks as you like. I’ve seen up to 6 levels of nested beforeEach
blocks in the wild.
Unfortunately, instead of the test itself describing its requirements, setup becomes distributed throughout the suite. This reduced locality not only makes the relationship between a test’s inputs and outputs opaque, but also guarantees debugging becomes an exercise in mental gymnastics (“where is this data coming from?”).
Rather than rely on lifecycle hooks, either inline the setup into the test, or, if there are many steps, create a test helper to reduce the repetition:
JavaScripttest('city database has Vienna', () => {
initializeCityDatabase();
expect(isCity('Vienna')).toBeTruthy();
});
describe('matching cities to foods', () => {
test('San Juan <3 plantains', () => {
initializeCityDatabase();
initializeFoodDatabase();
expect(isValidCityFoodPair('San Juan', 'Mofongo')).toBe(true);
});
});
Cypress and beforeEach
A similar problem crops up in Cypress tests, which often set up the navigation and any client-side interceptions in a shared hook [1]:
TypeScriptdescribe("Orders", () => {
beforeEach(() => {
// Set up client-side API interception
cy.interceptGQL(RouteMatcher.Orders, "GetAmountOfOrders", populatedOrdersResponse);
cy.visit("/restaurant/14005/delivery");
});
});
If every one of your tests genuinely does use the same data, then this works great (although should you really be using a shared fixture?). However, it’s likely that most tests have different data requirements:
TypeScriptit("displays the order count when > 0", () => {});
it("displays an error when 0 orders are returned", () => {});
It’s usually better to create a test helper and call it explicitly:
TypeScriptfunction setupWithOrders(amountOfOrdersResponse) {
cy.interceptGQL(RouteMatcher.Orders, "GetAmountOfOrders", amountOfOrdersResponse);
cy.visit("/restaurant/14005/delivery");
}
it("displays an error when 0 orders are returned", () => {
setupWithOrders(zeroOrdersResponse);
// Now we can test...
});
It’s more verbose, but much clearer. There’s no guessing about what prerequisites the test assumes. If a test needs something setup or shared, abstract it, but don’t try to hide it.
There are exceptions (such as a beforeEach
‘Log in’ hook), just be cautious not to overuse them.
Don’t mock owned dependencies
This one I’m stealing from Vladimir Khorikov (check out his excellent book).
If you have a test with an owned dependency, don’t mock it, use it.
Far too many tests try to substitute the database/repository layer with a test double. This avoids having to spin up a test database, which is both a performance and a convenience win, but comes at the expense of confidence. A crucial part of your application — the data access layer — is no longer part of your tests, and needs to be tested separately.
Instead, rather than verifying that a repository’s Create
method was called, actually let the repository insert the data into the real database, then validate you can retrieve it. This drastically reduces coupling between the test and the SUT. You’re no longer verifying method calls, but the user-visible output.
Test doubles are perfect for dependencies you don’t own, such as wrappers around third-party REST API’s, but increase fragility unnecessarily when you are the sole owner.
Error handling
The exception to this rule is error handling, where producing an error (such as a dropped connection) in a real dependency can be impractical. For these cases, a double is usually required.
Performance
One common concern is the performance implications this has on running a test suite. The short answer is to keep your controllers as free of business logic as possible, instead pushing as much as possible into the dependency-free domain layer.
However, in reality most dev machines and databases are fast enough that even running a full integration suite is rarely a bottleneck.
Only use table-tests for similar outputs
Once you start using table tests (also known as ‘data-driven’ tests), soon everything becomes one. However, if you try and group too many things in the same table, you get this:
Gotests := map[string]struct{
input string
expectedResult string
expectedErr error
}{
"Fails when string is empty": {
input: "",
expectedErr: notFoundError{},
},
"Returns results when string is populated": {
input: "jay",
expectedErr: nil,
},
}
This table incorporates parameters which have different expected outcomes. The problem with smushing these together is that it often leads to branching logic when you need to perform an additional assertion:
Goif test.expectedErr {
// Check error implements desired interface
var notFound *svcError
assert.ErrorAs(t, err, ¬Found)
}
Or, equally annoyingly, forces you into having to choose between less communicative test helpers such as Equals
or a conditional:
Goassert.Equals(t, expectedErr, err)
// Or
if test.expectedErr {
assert.Error(t, err)
} else {
assert.NoError(t. err)
}
It’s not worth the hassle. Tables are great when the outputs for each case are of the same type, but don’t try and put the kitchen sink in them.
Create test helpers to reduce noise
If your test is longer than 20 lines, it’s a red flag.
Avoid any extraneous or distracting setup inside your tests. Including a bunch of scaffolding inside your test makes the relationship between the inputs and outputs unclear. It quickly becomes hard to tell which parts are relevant, and which are just plumbing:
JavaScript// Contrived Java-esque example
it("should calculate the order total from the item costs", () => {
// Irrelevant
const address = new Address("123 test street", "E34UE", "London");
const customer = new Customer("Jay Freestone");
// Relevant but noisy
const lineItems = new OrderItems();
const orderItem = new OrderItem("sneakers", 499)
lineItems.add(orderItem)
// Actual SUT
const order = new Order(customer, address, lineItems)
expect(order.getTotal()).toEqual(499);
});
Instead, abstract setup logic and stub creation into helper functions:
JavaScriptit("should calculate the full order total from the item cost", () => {
const order = createOrderWithItems(new OrderItem("sneakers", 499))
expect(order.getTotal()).toEqual(499);
});
function createOrderWithItems(...items) {
const address = new Address("123 test street", "E34UE", "London");
const customer = new Customer("Jay Freestone");
const lineItems = new OrderItems();
for (const item of items) {
lineItems.add(item)
}
return new Order(customer, address, lineItems);
}
Feel free to create as many of these as you like, and have them as specific to a set of tests as makes sense.
Avoid snapshot tests
Snapshot tests are the shortcut to great test coverage. You see them everywhere, from generated component markup to API responses.
The problem with snapshots is that they don’t communicate anything. When you read a test, it should give you an insight into the expected behaviour of the SUT. Snapshots tell you something has changed, but not its relevance.
This often leads to snapshot fatigue, where engineers instinctively re-run the generation, accept the changes and commit them.
Snapshots are excellent for regression testing, or ensuring you don’t break an API contract. They’re just not a replacement for traditional tests.
Move your test doubles to the edges
If you’re using a wrapper around a dependency that produces a side-effect not outwardly observable (e.g. a log library which prints to stderr
), then substitute in a test double at the edge, not for the wrapper itself.
Testing your loggers
Let’s expand upon the logging example. [2] Logging libraries usually have multiple ways to construct structured entries, such as conditionally building up instances with fields which accumulate over time:
Go// zerolog, like most Go loggers, let's you create new logger
// instances with fields prepopulated.
// https://github.com/rs/zerolog#sub-loggers-let-you-chain-loggers-with-additional-context
// Sub logger
sublogger := log.With().Str("component", "foo").Logger()
sublogger.Info().Msg("hello world")
In the case of Go’s zerolog
, the library provides different ways to accomplish the same goal: write (somewhere) a log entry with a structured field. You can build up fields gradually, all at once, utilise sub loggers etc. The result will be the same, but the method you choose to get there may vary.
Abstracting over the entire logger with an interface in order to test method calls tightly couples your test to your application. Instead of testing the output, assertions only verify that a specific set of methods were invoked.
Here’s an example using a gomock
mock object (not an endorsement):
Gofunc TestRun(t *testing.T) {
t.Run("Logs error with field", func(t *testing.T) {
mockCtrl := gomock.NewController(t)
mockLogger := NewMockLogger(mockCtrl)
mockLogger.
EXPECT().
WithField("id", "jay").
Return(mockLogger)
mockLogger.
EXPECT().
WithField("user", "123").
Return(mockLogger)
mockLogger.
EXPECT().
Err(gomock.Any())
Run(mockLogger)
})
}
And an example failure:
=== RUN TestRun/Logs_error_with_field
controller.go:137: missing call(s) to *go_log.MockLogger.WithField(is equal to user (string), is equal to 123 (string)) /Users/jfree/Development/Personal/go-log/logger_test.go:20
controller.go:137: aborting test due to missing call(s)
Not only is the error (necessarily) generic, we’re asserting on something we shouldn’t care about — the internals.
Living on the edge
Instead, consider inserting a test double at the edge, for the actual out-of-process dependency — in this case the writer, which defaults to os.Stdout
:
Go// stubWriter implements the io.Writer interface
type stubWriter func(p []byte) (n int, err error)
func (s stubWriter) Write(p []byte) (n int, err error) {
return s(p)
}
func TestRun(t *testing.T) {
t.Run("Logs error with field", func(t *testing.T) {
// Arrange
var calls [][]byte
writer := stubWriter(func(p []byte) (n int, err error) {
calls = append(calls, p)
return len(p), nil
})
zLog := log.Output(writer) // Logger will output JSON
// Act
RunRealLogger(zLog)
// Assert
assert.NotEmpty(t, calls, "Logger was not called")
var result struct {
Id, User, Message string
}
err := json.Unmarshal(calls[0], &result)
assert.NoError(t, err)
assert.Equal(t, "jay", result.Id, "id did not match")
assert.Equal(t, "123", result.User, "user did not match")
})
}
We’ve used our log library of choice, and made no attempt to create an abstraction over it. The logging API is part of the contract/behaviour. We don’t try to replicate it, but instead move our test double as far out as we can.
Now we’re free to use whatever strategies we like to construct the logs, as long as we meet the requirements of the test: a written error log with two structured fields.
API Clients
Applying this rule generally, you might find yourself tempted to avoid creating interfaces around API client wrappers (e.g. a OrdersAPIClient
which calls a REST API), instead substituting in mock HTTP servers.
While this is a great strategy for unit testing the client itself, it adds unnecessary complexity to the tests of any consumer. API client methods which result in a mutation must be atomic (you would never have separate calls to decrease inventory and place an order) and are unlikely to contain any persistent state (they are often shared), making them unlikely candidates for coupling.
Don’t re-use static fixtures
Tests often need complex stub data to operate on. Some of this will be relevant to the test, and some will be extraneous (but nonetheless required).
It’s fine to create generic stubs which are reused for tests when they don’t concern the test itself — think passing a Company
struct to a Person
constructor as part of validating the person.ChangeName()
method — but avoid re-using stub fixtures for anything which impacts your assertions.
Cypress, for instance, has a fixtures
folder which holds static JSON blobs to be used across your tests. Although you could create a separate document for each test, the hassle means you probably won’t, leading to an ever-growing shared fixture on which multiple tests depend. Testing a long username? Update the fixture. Testing a Canadian address? Update the fixture.
The downside is that by sharing a fixture you greatly increase test fragility. It’s easy to create a cascade of failures by simply modifying a fixture for your specific use case.
Where possible, prefer creating test-specific fixtures. Cypress lets you pass a serialisable object to its interception methods, meaning you can create fixtures using factories and inject them:
JavaScriptconst createUserResponse = (name) = ({ name });
cy.intercept('POST', '/users*', {
statusCode: 201,
body: createUserResponse('Peter Pan'),
})
This allows you to create a fixture for your test’s specific needs and avoid any overlap.
Prefer stubs & spies over mocks
Although the term mock is commonly used to mean any kind of test-double, it actually has a very specific meaning. A mock object is a test-double which not only can respond to requests with stub data, but automatically asserts behaviour (i.e. that the double was indeed called, and it was called with the correct inputs).
Many kinds of tests don’t call for this kind of stringent validation. If your code produces the correct output, who cares what dependencies it called? [3].
If you just need a test double to return consistent data, create your own stub which matches the interface:
Gotype Post struct {
Id string
}
type Repository interface {
GetPosts() []Post
}
type stubRepo func() []Post
func (s stubRepo) GetPosts() []Post {
return s()
}
func TestDouble(t *testing.T) {
double := stubRepo(func() []Post {
return []Post{{ Id: "jay"}}
})
}
Choosing a double
So what test double should you pick?
- If you need a test double to respond in a specific way, use a stub.
- If you need to validate a call to an out-of-process dependency which you have exclusive ownership over (i.e. a database), use the actual database.
- If you need to validate a call to an out-of-process dependency that you don’t have ownership over (i.e. an external API), use a spy or a mock object.
If you fall into the last camp, there’s still probably no need to reach for dedicated mock-object libraries, such as gomock
or mockall
. Using a library adds a lot of incidental (and unnecessary) complexity.
In the case of Go’s gomock
, it means giving up type checking while building your mock object, while in the case of Rust’s mockall
, it means leaning on macros (which typically enjoy worse type assistance). Building your own spy and verifying the calls is so simple that it frequently isn’t worth the hassle:
Gotype Repository interface {
Create(post Post)
}
type stubRepo func(post Post)
func (s stubRepo) Create(post Post) {
s(post)
}
func TestDouble(t *testing.T) {
var calls []Post
double := stubRepo(func(post Post) {
calls = append(calls, post)
})
// Exercise SUT, ommitted
assert.NotEmpty(t, calls)
assert.Equal(t, expected, calls[0])
}
Regardless of library and choice, the error messages produced by generic mock object libraries are often unhelpful or needlessly cryptic. If you do use a library, prefer one that helps produces spies, such as Go’s moq.
Don’t be afraid of test overlap
Tests naturally overlap as you build a system. In order to test editing or deleting a user, you need to have created one in the first place.
One solution is to use back-door manipulation to prepare the environment. In the case of a database, this would mean inserting the records manually beforehand. In the case of an in-memory object, it might mean manipulating the inner state:
JavaScriptclass Directory {
constructor() {
this.people = [];
}
addPerson(person) {
this.people.push(person);
}
removePerson(personName) {
this.people = this.people.filter(p => p.name === personName);
}
}
// Test
const dir = new Directory();
dir.people = [new Person("Jay")];
// Now we can test removal...
Don’t do it. By using back-door manipulation, you unnecessarily couple your test to your implementation, increasing fragility.
Accept that, by testing one method, you will implicitly be testing another as well:
JavaScriptconst dir = new Directory();
dir.addPerson(new Person("Jay"));
// Now we can test removal...
The same rule applies for databases. Don’t be afraid of calling the Create
method of a repository in a test for Delete
. You’ll have already tested Create
anyway.
Avoid conditionals
This one is probably a given, but is worth repeating.
If you have any kind of control flow statement in your test (if
/switch
etc.), then you should refactor it out. Conditional logic makes tests hard to reason about, since it’s unclear which branch a test took through it.
Most conditionals can be replaced with guard clauses, which fail the test by checking if an invariant holds and panicking if not.
If your test needs to validate an additional property of the SUT only under certain circumstances (if there is an error, check the type of error), then you probably have an overburdened table test.
Anything else?
Agree, disagree, or have another personal favorite?
Shoot me an email (mail@jayfreestone.com) or @ me on Twitter.
Steve Freeman and Nat Pryce distinguish between two types of logging in Growing Object-Oriented Software Guided by Tests (p.233): logging for debugging, and logging to satisfy a business requirement. Logs are often used for mission critical behaviour, such as alerting and performance metrics (i.e. how many errors have occurred in the last hour). Any logging used to fulfil a business or ops requirement ought to be tested.↩︎
There are two different schools of thought on this, but I’m firmly in the Detroit camp.↩︎