Exploratory testing while writing code

20/01/2016 20:27
How often have I heard that test automation is not real testing? There are endless conversations on Twitter & co about that. There are passionate testers that get offended by people using inconsistent terms and I spend a lot of thought about that, too, as I don't want to be sloppy (sometimes successful, sometimes not). I guess most of that has a right to be. But there is one impression that I get that I want to do something about: There seem to be people out there who believe that you cannot do exploratory testing when you write code. And there are surely people out there who do not know how to do exploratory testing on a very low level.
 
 
How it came together for me
 
Just recently I started at a new company and joined a team that never had a tester before. The developers didn't start even with consistent unit testing yet. The whole project was still in a very experimental state. When I joined, I was fully aware that our GUI was just a pretty static (very few exceptions) representation of a couple of business numbers. The main part of the logic was not written to even compute those numbers. Instead they were measurements of the interactions of our product with external products. The main part of our logic was server to server communication and computing our system's reactions to external behaviour. 
 
I am passionate about exploratory testing. For most cases that I encountered in job interviews, I consider writing automation either as boring or painful. So when I searched for a new job I was very adamant of not taking any job as a 'test automator' as they call those poor people nowadays. I can write code. But I'm not a monkey automating predefined test cases, sorry. And I don't go through the pain of UI automation (anymore). In my opinion, business logic belongs into lower layers than that.
 
From the start I knew I would write a lot of automation in my new team and still chose it. Why? Because most of the time I am sure I can do exploratory testing even though I write code while doing so. 
 
 
How I tackle exploratory testing through writing code as an example
 
When I joined, my team was a bit unsure about the correctness of the numbers that I described earlier. They would compute them of measured data every night and they weren't quite sure if the results were correct. On top of that, the definitions of how these numbers where to be calculated were, let's say, 'not set in stone yet'. The heart of most of the calculations were available through a low-level calculation method that I intended to test. 
 
So what did I have? I could control the input and query the output of the method which I could call directly. Perfect. That's almost like testing a mobile application: I can interact with it and see what's happening. Don't see the similarity yet? Let's have a look at the method signature:
 
calculate (id: String, a: List[Long], b: List[Long], c: List[long]): Numbers {}
 
For those who don't read Scala code yet: the calcutate method takes four arguments. One id as a string and three lists a, b, and c, with longs in them. The output is a Numbers object, where we kept our measurement results. This object contained the sums of all a's, b's and c's as well as lists where all those events where supposed to be summed up per date, and both of those also split up per id. The last part makes only sense if you consider that the calculations per id were accumulated in a later step. And then there were a couple of measurements where the order of the events a, b, and c were important.
 
One piece of information that I needed right at the start is an idea about what those lists mean. With a graphical user interface you can explain things with labels for example. In coding you usually make an effort to name things properly but the name, though fitting, was not giving me a real hint of what I was supposed to deliver to the method. Asking around I learned that the longs in the lists were timestamps for each time an event a, b, or c happened. 
 
Great. Looking at the signature and getting the idea of what usual inputs are gave me all I needed to at least start testing. My major oracles for the first step were the names of the different values within the Numbers object, as well as it's behaviour with the three different event types. My main assumptions at the start were that the names would tell me something about which operation might lead to the different values within the Numbers object and that the three different events would act mostly the same according to those operations or at least congruent within any event. Being aware of those two main assumptions, I started my journey. 
 
 
The first minutes of exploration
 
The very first thing I did was finding out if I get an "empty" Numbers object if I keep all the inputs as empty as possible. Putting in an empty string and three empty lists lead indeed to everything that I could query being either empty or zero but for a value called idCount. Well, that was to be expected, but could also be wrong if the only id that is processed actually does not carry any real data. I made a note by explicitly expecting 0 as a result there so the check would fail until taken care of.
 
Next I created a list for the events a, filled with several timestamps of the past days. As I assumed, the length of the list ended up in the sum of all a's and I saw a distribution per day of the a's. I repeated the check with slightly different data to see that I get different results matching my model of thought. I did, so this part was rather uninteresting for now. 
 
Assuming the same model would apply to the events b and c, I checked those and got surprised: b and c worked slightly differently. One of the sums didn't match my model. Again I left the red check in place as a base of discussion.
 
A bit of knowledge about the different events told me that the timely order is significant for a couple of the values of the Numbers object. However, I was not completely sure and tried some combinations out that I figured were interesting. I started with finding out if there is a difference between an event being after another one or an event being missing completely. I marked anything that seemed off or could be interpreted as ambiguous with a red test.
 
Thus I went on for a while longer. In the end I had a mixture of some red checks and some green checks left in my wake. The green checks were not useful for now as I didn't know really anything about them, other than the current implementation created the output that was in their expectations. I kept them around for now, but I knew I would have to go through them with more information to figure out their value and if we wanted to keep them around. In case of the planned refactoring of the class in which my calculation method was, they could even serve as a test harness to ensure that we changed the structure but not the behaviour. But the red checks were a great starting point for the work with my most important oracle: my team.
 
 
Using the output of my explorations
 
Now the part that I consider the hard one began: Finding the right information. Any red check was a start for a conversation with my developers. Pretty fast we found out that a couple of the things I found were indeed wrong behaviour and had to be fixed, a couple of others were working as expected according to the developers. Yet some others were more tricky: The developers did not know or disagreed. So here we worked together and used sources like our PO (in further steps we also could have used people from the business side and customer voices) to resolve most of the rest. Some tricky ones we kept open for investigation within the team as we didn't know yet which rule made the most sense.
 
I kept any of the red checks that turned out to be a bug and left the task to the developers to fix the code so the check would pass. Any check that had been red but where the behaviour had proven correct got a facelift and a new expectation so it went green. In both cases I actually verified that the checks would still pass with different inputs but the same underlying model. In the cases that we were unsure about, I kept the checks around but marked them as ignored until we had figured them out. I'm usually not a fan of ignored checks but it gives me the possibility to keep them around and mark them as unsolved without marking them as red which I consider "I know it's wrong" unless they're just on my machine.
 
 
Bottom line
 
I wrote unit tests. But I also explored. We found bugs. We investigated. I learned a lot. And I think my team did, too. The visible result of the work are unit tests that we rerun on our build server any time someone pushes code to that repo. I like this kind of automation: The checks are a result of tests. We just keep around the executable that we write while we explore to be reused later on.
 
Btw: This is already (one of) my new year's resolution(s): I wanted to find out if this is possible. Having unit tests as an outcome of exploratory testing. I thought it was harder, so I do understand if people doubt there can be a connection.
 
Comments are welcome. Go ahead and tell me if this was useful to you or if I have overlooked something.

—————

Back


comments powered by Disqus