My relationship with testing

During my last internship, I learnt a lot about testing in the context of developing software. In this post, I want to reflect on how my relationship with testing changed as I gained more software development experience.

Two years ago, after my year abroad, I did a three-month internship at a German software company. My main task was to work on a web frontend application for an internally used software product. Never having worked on a project with such a large team, I was very happy whenever I fixed a bug or developed a feature. The test suite was adequate for the project: There were a few hundred unit tests as well as a suite of Cypress end-to-end tests. The team used a CI pipeline to automatically run the unit tests and deploy into a development environment.

Sounds great, right?

For me at the time, I did not see much value in it.

Testing from a pet project point of view

Before my internship, I had only always worked on pet projects, the ones that you don’t write automated tests for because it’s just quicker to run the program three times or click around the website. Therefore, I did not understand why one should write automated tests - at last, after developing a feature, the programmer knows best what and where to test.

Not only that I didn’t want to write tests, the tests sometimes broke because of some changes I did. I did not have enough experience to recognize and work on fixing some of the flaky tests we had, so I came to dislike the tests themselves.

Understanding what the failing tests did was often hard because it involved reading other people’s code in an unfamiliar environment (testing instead of application code). I wanted to move fast so therefore, if a test failed, the easiest thing to do was to delete the test, or at least set it to skipped. After all, it broke because it is no longer a valid test.

Programming without automated tests slows down development

After my internship, I gradually started working on bigger projects where I didn’t have any sort of automated tests. When it got time to release, my team and I would test manually and check whether the software still behaved as expected.

I didn’t see any issue with this at first, because it allowed me to do my work faster. Oftentimes, this approach worked out and manual tests didn’t find anything that was broken because of changes I committed. But sometimes, it didn’t and I had to go back and review the code again. Some other times, it seemed to work on my machine and on the testing system, but users reported errors in production because a feature nobody had tested broke.

This frustrated me and I became less confident to push or release the code I’d been working on. After changing a piece of code, I often asked myself: “Does this affect any other part of the application? How would I find out?”.

Do GitHub Pull Requests reduce mistakes?

A common tactic to improve the software quality are code reviews, such as on GitHub pull requests. They are good at what they are supposed to be doing - at last, every piece of code has been seen by multiple people before it goes into production. This especially helps to improve code style and enforce conventions that were previously established in the team.

I like pull requests - yet they do not work well as the only means of verifying that new code does not cause any regressions, especially for large changes in large software systems. The problem is that these kinds of asynchronous code reviews put the reviewer very close to the thing that was developed. Instead of getting the big picture, they have to look at every single line of code. Hence, as a reviewer, it is very hard to tell from a Pull Request whether the new code breaks any existing functionality, in particular if you didn’t develop everything yourself.

In my experience, the only way to do so in a larger codebase would be to clone the branch and manually test the existing application functionality which is certainly not free. Proper reviews on Pull Requests are time-demanding anyway, testing existing functionality is even more expensive.

Rapid prototyping vs. maintainability

It might seem at first, especially in the starting phase of a project, that the business logic is so simple that it doesn’t need any automated testing. If there is only feature A and the team is working on feature B, it is probably trivial to manually test A and B before releasing B. This is very understandable - most often, bigger projects systems start as prototypes. There, the objective is to move as fast as possible to see if a concept is feasible - tests would actually slow you down.

Still, when transitioning from a prototype to something bigger, the maintainability of all code, new or old, should be a focus. As you add more new features, every old feature has to be tested on every release to assure optimal quality. Until you arrive at feature Z, you would have tested feature A 25 times.

“It’s been in there forever, there’s no way it could break. We don’t need to test everything every release!”

Maybe it’s not the developer that was on the team from the project beginning but a new team member refactoring existing code, but it is highly likely that a feature breaks at some point in the application lifecycle.

If you are not testing for broken features, somebody else has to. It could be the user, but I hope you agree that this is a suboptimal approach for software that is supposed to be stable. It could be your product owners in a test environment, it could be you when clicking though your local build. But I argue that it would be most optimal if your computer recognized bugs automatically.

This is due to the mental overhead required to get into the changes you were developing when creating the bug. A test suite is able to catch the bug in a matter of minutes while it might take you half an hour. Your product owners probably test when a release is coming soon, so maybe a few days after you introduced the bug. Your users could take even longer to notice and report the bug. Even if they didn’t, this would probably still hurt your business.

Thus, the faster your testing process, the faster you can focus on other tasks.

Writing tests is sustainable

It’s true that writing a test takes up more time than not writing a test. But in this situation, one should not only think about how to get a feature released as soon as possible, but also about what might happen to it in the future.

An automated test on critical business logic will probably prevent this logic from breaking in the future. This might save a lot of hours of debugging and will increase the quality of the software.

Also, with good tests, a thing that you will need less is developer documentation. This is due to the fact that automated tests are an inherently better way of documenting your code - they are always up-to-date and very concrete so that any new developer can immediately grasp the API of e.g. an application module.

How can one write high-quality software with reasonable speed?

A good testsuite also increases the work speed of a team of developers. When the test coverage is high, the workflow of releasing changes is much shorter - so much so that you probably don’t even need feature branches anymore. Google calls this trunk-based development and provides research that shows that “teams [practicing trunk-based-development] achieve higher levels of software delivery and operational performance (delivery speed, stability, and availability)”.

Writing a good testsuite

While it’s not easy to find the optimal testing setup for your project, it is probably a lot harder to shoot yourself in the foot with tests.

Here are some practices I recommend when it comes to writing tests:

Write the test first. This might seem counterintuitive as no-one likes failing tests. Still, we should know whether the test could fail in principle. A test that is green, even if the functionality is broken, does not provide any value to us. This approach is called Test-Driven-Development and consists of iteratively writing a piece of failing test code and some application code that makes it pass. A nice side effect is that your application becomes more modular since it is testable by design.

Even if you don’t write your test first, you should still check that it could fail, if the feature under test broke. I argue that tests that cannot fail are even worst that no tests since they give a false sense of safety when refactoring.
Write the right kind of test. A common mistake is to write lots of unit tests for parts that are implementation-specific. A sign for this is doing lots of mocking to set up your test. Unit tests should be written on business logic (which should be separated from GUI code). There, the behavior of your module exports should be under test. If you find yourself mocking a lot, consider writing integration tests or end-to-end tests which should not require any mocking.
Don’t rely on implementation details. Especially tests higher up on the testing pyramid should behave just as your users do. For example, a user does not look for a CSS class (they probably do not even know about them. Instead, they search for specific HTML tags, like a <checkbox>, an input or a heading, some text or aria roles. I believe that the queries react-testing-library supplies are very sensible when it comes to testing frontend code.
Treat your tests as you would your application code. The rules of clean code also apply to tests - you should consider that they will be subject to change at some point and look at the effort required to update them. Therefore, deal with tests just like you would deal with your application code.

As you can see, my distaste for automated tests has been reversed as I gained more programming experience. I believe that as (future) software engineers, we should try to produce quality software - and automated tests are one of the best tools available to us.

Sources

[1] Testing pyramid: https://upload.wikimedia.org/wikipedia/commons/6/64/Testing_Pyramid.svg