Test data spoofing options

arrow_circle_left

To articles

It is not a big deal to write a good test. If the functional code is implemented according to SOLID and isn't overloaded with logic, the test would be simple and laconical, but... On this step everything depends on the models we use.

While we are operating with things like "the method takes a string" and "the method returns an integer", everything works great; but when we move to real programming with objects, we face with the problem of setting data for every model in every single test.

The following text is a description of a typical, in my opinion, way of evolution of testing code and maybe you would recognize your own project on some point)

I carry everything with me

The first variation is based on the same approach with primitives - if I can define a value for an integer variable just inside a test, why wouldn't I fill a reference type in the same place too?

It definitely works for simple objects like a Role with fields Id and Name, but if we have a role, we also should have a user, who owns it; also there is a UserInRole class, that connects the prvious entities... If functional code has a method like [get all names of roles of a user with a name equals x], we would need to create not one instance, but three.

The test is getting bigger and bigger with growing of it's model, reading of it is getting non-convenient at all. How is it possible to find out what data are really important to the test (the login of the user for example), and what aren't (like him name)?

With such an approach we

break readability
forget about laconicity
and spend a lot of time trying to write each test

Special methods in testing classes

Finally, someone paid attention to the fact, that there are a lot of duplication in a class X. That duplication was being moved to a separate method, herein in the class X.

Tests become short again, we don't violate canons of prohibition of duplication of code, but wait a minute... There are no such canons, in terms of testing code! What have we achieved at the end? What can we say about laconicity? We have to admit that yep, the code is cleanier now... that's the first win)

What about minuses?

readability is not only not getting better, it is probably getting worse - from this point, to understand context of a test, you should jump between the test and factory method.
there are plenty situations, in which the same object have to be used in separate classes; if we write a fake factory in a class A, what should we do in a class B? Duplicate? In a small project it is not perciptible too much in the beginning, but as soon as the scale grows, we are sweepingly falling into problems...

A handwritten "database"

At some moment a "genius" idea comes to the scene: it is possible to hard-code a fake database in static collections, fill all the values and the relations and use all of that for our tests!

I see no sense to talk too much about that option, because it is appropriate only for projects with three classes and, preferably, without connections between them.

We haven't solved anything by this, centralization of data management is lost against the background of huge disadvantages...

As soon as the scale grows it becomes tremendously difficult to take into account and prescribe all details and connections in such a system.

The initializer of the fake context is turning into a monster of enormous length.

Moreover, developers in such situations are tend to rely on the data in these collections - since the desired value has already been set, we will keep this in mind when writing a test and using of it. That's all, now we finally said goodbye to readability)

In addition, those collections are getting sacred - god forbid you touch at least something, a half of tests would fall in a moment...

Handwritten factories

Finally we get annoyed of writing the same things in every test or shaking over a self-written database. The decision about moving to object factories is being accepted. So if we have a Role class, then in the testing project we create a RoleFakeFacory, with methods like Create and CreateMany. Yeah, a number of question will have to be solved : should the facroties be static? How do they get a context of a database or should they not know about it at all? Etc.

In point of fact these things have a little influence over convenience of the approach under discussion. Again, with a small amount of code it feels like everything is done right and the solution really works. And for the first few weeks everyone have probably been satisfied with the work done.

And again, what have we achieved?

We have defeated duplication finally, now everything lies at the appropriate place. And... that's all, to say the truth.

Minuses?

Poor code readability is still with us. Yep, now the management of fakes is centralized, but how to interpret these values of the fields?

public static class UserFactory
 {
    public static User Create()
    {
       return new User
       {
          Id = 0,
          Name = "0",
          Email = "0@cru.cru",
          PasswordHash = "WZRHGrs=",
          PhoneNumber = "12345",
          EmailConfirmed = true,
          PhoneNumberConfirmed = true,
          EmailConfirmationToken = "12345"
       }
    }
 }

EmailConfirmed == true for a specific test? For the majority of tests? Is it default value? It is absolutely not clear where all these values appear from and which ones should be given meaning and which ones should not.

Readability is still at the level of the previous option.

Sacredness has not gone away, we just designed the "magic" values a little differently, without actually changing the approach.

Automation

I often say that everything has already been thought up for us and programmers shouldn not reinvent the wheel at every step. Progress, of course, alse touched on module-testing sphere in terms mocking of objects. The list of libraries required for organizing of testing mentions, among other things, AutoFixture. This is what that will help us to overcome most of the discussed problems.

In the near future, another article will be released, that talk about how to put this tool into practice, for now we just talk about common approaches...

The main advantage is generation of objects at runtime with random values; you no longer have to guess the purpose of hard-coded variables - all the things that is really necessary for a test will be explicitly defined inside the test.

Creation of instances of classes in one line. Say goodbye to factories, special methods and so on, we get the opportunity to create a workable object anywhere and with any configuration.

What have we achieved by introducing autogeneration?

we can definitely be proud of readability of the code now. All the necessary data is right in the test.
conciseness is also on top - everything that is written, is written to provide a concrete test, without any extra data.
the speed of reading and especially writing tests has increased significantly, relatively to some of the approaches described, the increase can be multiple.
the speed is a consequence of simplicity and convenience of working with a system. The higher these indicators are, the higher the involvement of developers into the process, the more willingly they will write tests and the higher the quality of the production code will eventially become.

Minuses?

the entry threshold is a bit higher. Newbie developers, who just came to a project, are quite easily able to accept the idea of factories, because "it's familiar", but code generation at runtime with third-party libraries is much worse. At the same time, we should admit that the problem is not acute, and looking at the already written test, any ~~button-presser~~ programmer can figure out the common approach.
the speed of tests execution. It falls down, conforming to the laws, relatively to a self-made database, the difference might look like significant. On the other hand, although the difference between 4 and 6 seconds for 400 tests seems noticible, but it pays off with the pluses above and is not a super-clitical. Of course, if you're dealing with a cosmic-scale project with 10000+ tests, you could say that the difference becomes really significant... if you were really tend to launch all these tests at once. In the real life, there is no need to launch 100500 tests all the time and as a project grows, specific groups of tests are being created for separate parts of the project, to be run during development. Moreover, modern IDE allow you to automatically detect which code has been affected during developing and launch tests related to it.
there is one more difficulty. In some projects, the complexity and the number of relationships between classes are so monstrous, looped and tangled, that the auto-generation can do almost nothing. You have to hard-code almost everything, keeping in mind dozens of relationships... But I don't think that it is necessary to talk about such things seriously, in terms of disadvantages of the discussed approach. We are going to talk about how to deal with circular dependencies during auto-generation in the next articles. But the complexity of relations and a heap of constants, that have to be taken into account... no one promised that ~~shit-code~~ complexly structured code would be tested easily, it is almost impossible to even support it)