Install our extension to search inside any video instantly.

More tests are always better? How to use AI to identify tests that bring little value
Added: 2026-06-02

335 views138:29Alliance4QualificationOriginal Release: 2026-05-29

AI test clustering uses large language models to identify and remove redundant tests in test suites, achieving significant reductions (90% less maintenance effort and execution time) while preserving nearly all bug-finding power, by leveraging the insight that similar tests tend to fail together when bugs are introduced.

[00:00:01]AI systems are different. They learn.

[00:00:03][music] They evolve. They behave probabilistically.

[00:00:07]Testing them requires a new level of expertise. The ISTQB Certified Tester AI Testing certification focuses exclusively [music] on validating AI-based systems from data quality and model behavior to bias, robustness, and life cycle risks.

[00:00:24]Structured [music] around the machine learning life cycle, CTAI equips professionals to assess trust, reliability, [music] and safety in AI solutions. As AI adoption accelerates, quality assurance [music] must evolve with it. For those ready to test beyond traditional software, AI testing [music] begins here.

[00:00:44]Discover the ISTQB CTAI certification.

[00:00:48][music] >> [music] [music] [music] >> Hello and welcome back to this next session of the A4Q Testing Summit 2026.

[00:01:24]We are now dealing with AI and I'm welcoming here Fabian Streitel from T-Systems or Team Scale.

[00:01:33]>> [laughter] >> Great to be great to have you here in this session and I'm really curious what you will talk about today.

[00:01:41]I've looked at the titles.

[00:01:43]You will talk about is it better to have more tests or how can we use IAI to identify tests which brings a little value or maybe more value. So I think very interesting for the whole audience and for all of us testers here.

[00:02:01]Before we start here for the audience, please don't forget you can vote for the best speaker and best presentation via the website and also take the opportunity and ask questions during the the speech and we can deal with them at the end of the presentation. And now Fabian, the stage is yours.

[00:02:22]>> Well, thank you very much and hello everybody. Thanks for joining my presentation.

[00:02:28]Now, as testers, we all know that having more tests is not automatically better.

[00:02:34]Having more tests means that your test suite as it grows takes longer and longer to run. So, when you add a new bug to the system, you have to wait longer and longer until you find out and that can be painful.

[00:02:45]The more tests you have, the more maintenance effort you obviously have.

[00:02:48]So, whenever something changes in the application, there's more tests that you need to adjust and touch.

[00:02:54]And finally, if you ever want to switch from one migration technology to a different one, then the more tests you have, the more effort that will be as well.

[00:03:03]So, there are lots of reasons why huge test suites can be problematic and costly.

[00:03:09]At the same time, as testers, we have often experienced that not all of our tests are created equal. Some are much more valuable than others.

[00:03:17]And so, the obvious question that raises is, can't we get rid of just the low value tests in our test suite to have all the benefits of a much leaner test suite without sacrificing much bug finding power?

[00:03:31]And that is something that has been asked a lot in computer science research where this problem is called test suite minimization.

[00:03:38]And today, I will show you one of the newest approaches out of that research field called AI test clustering that helps you shrink a test suite without losing bug finding power.

[00:03:49]I will show you both the research behind it and I will also show you how I use this in different industry teams to actually shrink test suites by quite a lot.

[00:04:01]But before we jump into that, a little bit about myself. Who am I? My name is Fabian Streitel. I work at TeamScale, where I lead the team called Test Intelligence. And what that means is I basically wear two hats on my head all the time. One is this sort of researcher hat, where I help supervise bachelor and master thesis at different universities to try and find new ways to make testing more efficient and more effective.

[00:04:27]And we get a lot of interesting research results out of that, but unfortunately, many times something that looks good on paper in a research environment is actually really hard to put into practice.

[00:04:38]And so I always also wear the sort of hard hat here, where for the last 8 years we've always tried to bring these research results into industry projects to really make sure that in the end they have an impact for us practitioners.

[00:04:51]And the approach that I show today, AI test clustering, it has passed both of these tests. So we do have excellent research results, but we've also used it in multiple industry projects to show that it actually works.

[00:05:04]Now full disclaimer, TeamScale of course also implements AI test clustering, but today is not about the tool, it's about the research, it's about how to use it.

[00:05:13]And of course you can also do AI test clustering without our tool.

[00:05:19]So, the more tests the better, right?

[00:05:21]Tests bring lots of great value, they help us find bugs, they they make our software quality better, but tests don't just come with upsides.

[00:05:31]They also have downsides and costs.

[00:05:33]Like for example, the maintenance effort we have to put into them.

[00:05:36]If you have hundreds or even thousands of tests, then whenever something changes in the application, you're going to have to touch some of them to make them work again. And over time, that maintenance effort, that cost compounds.

[00:05:48]At the same time, the bigger the test suites grow, the longer they take to run, and many of the teams that I work with that have test suites that have grown over many years are now at the stage where the test suites take hours, days, and sometimes even weeks to execute.

[00:06:05]And you can imagine that that is painfully slow when you introduce a new bug and then you have to wait until next week to find out. And that makes bug fixing that much more costly.

[00:06:16]And finally, if you have manual tests that you want to automate, or if you want to switch from one automation technology to a different one, then those are sort of migrations, and you have to pay a migration cost. And the bigger the test suite, the bigger the cost because you have to migrate more tests.

[00:06:33]So, there's a lot of costs that are maybe not initially apparent, but once you get to a certain size of a test suite, it can become quite painful.

[00:06:43]At the same time, we've all experienced that our tests are not exactly equally valuable.

[00:06:49]So, there are some tests that maybe never fail, so they don't really provide us any meaningful signal about bugs.

[00:06:56]Maybe tests that only find really not that relevant bugs, or tests that sort of cluster, and if one of those fails, all the others fail as well. So, the added value of the second, third, fourth, fifth test is really not that great.

[00:07:11]And finally, maybe there are even tests that test practically the same thing as another test, so do they really contribute anything to our test suite?

[00:07:20]Many different reasons why there could be low-value tests in our test suite, and the bigger our test suites grow, the more likely that we have those, right?

[00:07:29]So, the obvious question is, given that we have a lot of cost in a huge test suite, and there's also some tests that don't really pull their weight, can't we get rid of some of those low-value tests, so that we reap the benefits of a much leaner test suite, reduce those costs, reduce the effort we have to put into it, and then um can put that effort to better use in the future. For example, writing more high-value tests or refactoring our tests and so on.

[00:07:56]And this problem in computer science research is called test suite minimization.

[00:08:01]So you give me a test suite, and I try to strategically pick a few tests out of that test suite such that we get a much smaller suite that is faster to execute, [clears throat] has less maintenance effort, and so on.

[00:08:13]And I hope I can do this in a way where the uh the reduced test suite, the minimized test suite, has practically the same bug-finding power as the original one.

[00:08:25]And that is a problem that has been studied quite well. So over the past 10 years, there have been hundreds of research papers on this, and many of them produce really good results.

[00:08:34]So for industry test suites, you get large minimizations.

[00:08:38]So for example, only 10% or 20% of the test suite, but you retain almost all of the bug-finding power.

[00:08:44]And that looks really good on paper, but unfortunately, it's it's exactly one of those examples I mentioned at the beginning where transfer from the research paper to industry is really hard, because they usually rely on some data that is really hard to get in industry systems. But we're usually talking on the order of a whole year just to gather the data. And that's something that is usually prohibitively expensive, so no one really does this in practice.

[00:09:11]Now fortunately, in the last few years, there's been a great development with all these AI systems and large language models like ChatGPT, Claude, Gemini, and so on.

[00:09:20]And that has opened up a new idea, namely can we use these large language models to perform this minimization?

[00:09:27]And that is called AI test clustering.

[00:09:30]And that actually has the nice property that it works almost as well as these traditional approaches, but is really, really easy to apply. And that's what makes me really excited about this because for the first time this makes test suite minimization accessible also to pretty much any industry project, even if you have a large legacy test suite that has grown over the last 10 years.

[00:09:53]So, what I want to do today is show you how this works, what the research is behind it, and how we validated it, and then finally some examples um on how this is used in industry projects.

[00:10:10]So, obviously, when we want to pack so much bug finding power in a very small amount of our tests, we'll need a good insight into our test suite to be able to do this.

[00:10:21]And the insight that we're using here for AI test clustering is that for test suites that have grown over many, many years, there's usually a high amount of redundancy in the test suite.

[00:10:33]What do I mean by that?

[00:10:35]As these test suites grow over years and years, they usually grow by copy and paste.

[00:10:41]Meaning if you have a test suite and you need a new test, you're looking for something that is already there, a test that already exists, and does something similar. And then you copy it, and in the copy you make some modifications.

[00:10:54]And then you end up with two test cases where both test cases are 80 to 90% of the test execution time doing the same thing, and only 10 20% do they do something different.

[00:11:07]Now, that's a great way to quickly create test cases, but in the long run over many years, you probably don't just do this once, you do this many, many, many times, and then you end up with a test suite where you have large clusters of tests that are all 80 90% doing the same thing, and only 10 20% something different.

[00:11:26]And that means we're spending a lot of our test suite runtime and a lot of our maintenance effort on tests where each additional test provides only marginally more value.

[00:11:38]Let me show you this in a different way.

[00:11:42]If you imagine you have such a test suite and you run those tests, then some of those will fail, all right?

[00:11:49]So, there will be some bugs in the software system. Different bugs are depicted here with different colors. So, in this case, there's three bugs, a blue bug, an orange bug, and a purple bug, and they cause some of these tests to fail, which is the red X here.

[00:12:04]And what we usually see with these copy and paste grown test suites is that it's usually not the case that if you have one bug in the system, it will cause exactly one test case to fail, but rather what happens is that one bug causes a whole bunch of tests to fail at the same time. Like you see here, the blue bug causes four different tests to fail at the same time.

[00:12:28]But to debug and fix the blue bug, we would only need one of those four, right? If we only had the failure up here, then we could debug the blue bug, we could fix it, and then the other three test failures would be fixed as well.

[00:12:43]And that's the thing we can use to minimize our test suite.

[00:12:46]If we out of these large clusters of copy and pasted tests only keep one or two, then we get a much smaller test suite, but it still has practically the same bug finding power.

[00:12:58]Shown a different way, now this is an example from industry. This is a medical device.

[00:13:05]So, hardware device with some software in it, and these are the real end-to-end tests that involve the whole hardware.

[00:13:12]And as you run those tests, if you run them with no parallelization, you can see at the bottom here, it takes about 6 days to run all the tests.

[00:13:20]And what we've drawn here is a graph where after each test case has finished executing, we record how much test coverage have we achieved so far.

[00:13:28]From 0% to 100%, which means we've covered everything that the entire test suite could potentially cover.

[00:13:39]So, you can see how the test coverage builds up over time and we hit 100% here at the end.

[00:13:45]And I think what you probably all already noticed is these huge plateaus here.

[00:13:50]Where we're running hundreds of tests one after another, in this case for almost two days straight, but test coverage practically doesn't increase.

[00:13:58]Why is that?

[00:14:00]Because these are exactly these redundant copy and paste tests, where each test is 80, 90, 99% of spending 99% of its time doing the same thing that a test has already done before.

[00:14:12]So, you can really see this redundancy that's in this particular test suite.

[00:14:17]Now, watch what happens when we sort this test suite by dissimilarity.

[00:14:23]So, put all the redundant tests at the end and all very different tests in the beginning. And then do the same coverage recording. And you can see how for this particular system, we reach almost 100% of the test coverage with less than 5% of the run time.

[00:14:40]So, you can now really see the redundancy in that particular software system, right?

[00:14:45]And the idea of AI test clustering is now, if we want to reduce the test suite, and we really want to have less tests, then we should rather take tests from the green part than from the red part, right? Because those provide much more value than the red ones.

[00:15:02]And that is what we want to do with AI test clustering. So, sort the tests by dissimilarity, and then you can of course decide where the line should be drawn, right? Do you want to draw it here where we reach 100%? Do you want to go a bit further to the right? Depends on this particular software system.

[00:15:18]I have some guidelines on that later.

[00:15:23]So, next question, how do we sort tests by dissimilarity?

[00:15:27]Well, first, we need to put all of our tests into a vector space.

[00:15:32]And what that means is in this picture here, we have a 2D vector space because the slide is 2D, obviously. Um and every bubble here is a test case.

[00:15:41]And all that means is we assign coordinates to those test cases.

[00:15:46]And we do it in a way where similar test cases get similar coordinates.

[00:15:51]So, in this particular example, these five test cases at the top, these are probably created by copy and paste because they got very similar coordinates.

[00:16:00]And now to pick the most dissimilar test cases, we can start with any one of those.

[00:16:05]We pick it, and then we pick always that test case next that is furthest away in the vector space because that will be most dissimilar.

[00:16:13]So, this particular one is furthest away from the first one, and this one is furthest away from the first two, and this one is furthest away from the first three that we picked, and so on.

[00:16:24]And we can do this as long as we want until the budget is full that you have defined.

[00:16:30]So, if you say I want to keep 30% of my tests or 70% of my tests, then that's how long we do it.

[00:16:38]Now, the important point here is, of course, we need to assign similar coordinates to similar tests.

[00:16:46]And that's where the AI comes in.

[00:16:49]So, that's um where we use the large language models like Claude Code, like uh Gemini, uh ChatGPT, and so on.

[00:16:57]But maybe not in the way that you think because we're not just taking all the test cases, putting those in a prompt, and writing, "Please give me the most dissimilar ones." That actually turns out to not work in practice.

[00:17:09]Now, what we do is we really only use the first part of these large language models.

[00:17:13]Because what is the first thing that ChatGPT does when you throw any text at it like a test case.

[00:17:19]Let's take a very simple example like only the word man, then it turns that into a vector of numbers like you can see here.

[00:17:26]Where each entry corresponds to one category of meaning that the model knows. So for example, how likely is it that the word man in the context of the text that it's in refers to human beings.

[00:17:40]Of course, this slide is a huge simplification in for real large language models, these vectors don't just have seven entries, they have hundreds of entries.

[00:17:49]And we don't really know what each entry what kind of what the semantic category of each entry is. So whether it's about being human beings or royalty or being a verb and so on because this is not given a priori to the models while they're being trained.

[00:18:06]But rather this comes naturally out of this training process when the model consumes these billions of tons of text that it's being fed.

[00:18:16]But we don't need to know what these categories mean. All we need is that similar words get similar vectors.

[00:18:24]So that we can then use these vectors as the coordinates for our vector space.

[00:18:28]So for example, you can again very simplified see this in 3D here where man and woman are sort of on the same trajectory and king and queen as well.

[00:18:36]And that is how these so-called embedding vectors actually work because otherwise these large language models wouldn't work because they couldn't understand the intent behind the text and couldn't reason about that.

[00:18:49]So that's a great property that we can use. So basically we get these embedding vectors for our tests and then we can run the approach as I showed with the clustering.

[00:18:58]And the cool thing is this works equally well for manual test steps as well as automated test code because these models are trained both on automated both on code and natural language.

[00:19:14]And the only thing that's left is you need to decide how many tests you want to keep. And what comes out of the approach is the list of the most dissimilar tests, and that's a really good starting point for deciding which tests to keep.

[00:19:33]Now, that is how it works, but how well does it work is the obvious next question. And that's where the research comes in, um where we spend a lot of time trying this out in different um contexts with different universities to see how well does this approach perform on both open source and industry systems and on real bugs from the history of those systems and fake bugs that we injected into those systems with a technique called mutation testing.

[00:20:04]With all of this data, we could um judge based on um all of these many different software systems that we studied.

[00:20:12]We can check how much time do I How much How much of my test suite do I have to keep at least to reach a certain amount of bug finding power.

[00:20:22]So, we have all of these hundreds of bugs that we're evaluating on, and you can say I want to uh find at least 90% of them, then how small can I make the test suite?

[00:20:31]And then on average over all these many systems, it turns out you need about 13% of your test suite to find 90% of the bugs.

[00:20:41]And that is quite the trade-off here. I want to remind you that means you have about 90% less maintenance effort, 90% faster test execution. If you need to migrate the whole test suite to different technology, that's 90% of tests that you don't have to migrate, and you still find almost the same amount of bugs.

[00:21:00]And the great thing is we can play with this percentage value here, right? Because this budget is defined by you. So, if you want higher bug finding power, you all you need to do is sort of draw the slider down here and then for these systems in the in the particular studies that we did, if you go up to for example 25%, you're already at 99% of bug finding power.

[00:21:22]So, you can see how this can be a really powerful approach to shrink down a test suite and not have to worry that you're losing much bug finding power.

[00:21:34]That's the research, but of course the jump to practical application is quite a big one, right? I think maybe everyone who's listened so far has had a sort of bad gut feeling about throwing away tests.

[00:21:46]And I think that's quite natural because it always feels like we're losing a lot of the security net that we built over all these years with our test suites.

[00:21:54]So, usually it takes more to convince people to throw away tests than just some research study. You need to see it for your own particular system that the approach is working for your test suite, right? So, when we approach teams with this kind of approach, let's say okay, we want to try this, then there's usually some steps we take to show that it actually works on the particular system.

[00:22:19]In the end, it will be sort of a management decision because you're balancing two types of costs, right? You're balancing all the maintenance effort, the slow feedback, the automation cost, and so on versus the cost that happens when one or two bugs slip through that you would have caught with the whole test suite.

[00:22:37]And so, in the end, how much you're going to throw away really depends on how big the pain is over here, right?

[00:22:44]So, that will be different for everyone.

[00:22:47]But I'll show you how we can make the trade-off more visible and make a give give you more confidence that it's a trade-off worth making.

[00:22:58]All right. So, what can we do if we have your particular software system, your test suite, to make sure that the approach is working well there.

[00:23:05]Well, the first thing that we do is we run the approach and we cluster it.

[00:23:09]So, we get these clusters for your test suite and we can look at these clusters.

[00:23:12]So, we can check are there enough clusters? Are they big enough so that we have a good chance that there is enough redundancy in your test suite that it's worthwhile to run the approach.

[00:23:22]When we have that and that looks good, then that does not yet tell us anything about the selection that the approach makes. Which tests are is it picking?

[00:23:30]So, we can look at that and we can make sure that there are no large functional areas that are remain uncovered with tests, that the selection is uniform across all these clusters that we have.

[00:23:44]Next step we can do is we can make some coverage measurements.

[00:23:48]That means we measure the coverage of the whole test suite and then the coverage of one or two of these reductions.

[00:23:55]And then we can compare because obviously if there's large gaps in the coverage after we do the reduction, then for those we will lose bug finding power.

[00:24:04]So, we want we want to see a reduction down to for example 30 20% but still virtually the same test coverage.

[00:24:13]And then finally, those are all proxies for what we really want to measure. We can of course look at the real thing and look at bugs.

[00:24:21]And that means looking at how the particular selection performs over a longer period of time for the bugs that actually happen in your development.

[00:24:30]And you can sort of see how the cost of each of these steps increases as we go further down. So, looking at the clusters is the least amount of cost and looking at real bugs takes some time and takes some effort to do. So, the more security you need, the more effort you need to put in, but in the end it's it's not that much. And I'll show you with examples from real industry systems now.

[00:24:52]So, let's start with these clusters.

[00:24:55]Now, this is not modern art and it's not a picture of an alien baby spitting out a fireball. These are 6,600 tests from a test suite.

[00:25:04]And in the colors, you can see the clusters that were detected.

[00:25:09]Now, beware. These are these embeddings, and you remember that the embeddings are like 100 dimensional vectors, and this is a 2D picture. So, what we've done here is we smooshed down those 100 dimensions to two dimensions with some smart approaches um from from other parts of uh of uh computer science research.

[00:25:28]And you have to take those pictures with a grain of salt because, of course, you can't compress uh do this compression without loss of information. So, the spatial arrangement of these uh embeddings is not exactly like it is in 100 dimensions, but that's not really what we're looking at here anyways.

[00:25:44]What we're interested in is are there enough different clusters in this test suite?

[00:25:49]Which I would say yes, because, as you can see, there's lots of different colors.

[00:25:53]So, there's lots of different clusters.

[00:25:55]And then second, are these clusters large enough?

[00:25:58]So, you can see there's hundreds of tests here in the orange, in the purple, in the blue. There's lots of tests in red as well. So, I would also say that yes, there are large enough clusters, which means there's lots of similar tests, which means there's probably a lot of redundancy. And that's what the approach uses, so that's a good good signal.

[00:26:16]So, for this particular test suite, I would say that's a good first step.

[00:26:21]But it doesn't tell us anything about which tests are going to get selected.

[00:26:25]So, that's what we do next. I'll show you a little video of a 3D visualizations of these uh embeddings.

[00:26:31]And in red, you can see all the tests that we want to keep, and in blue, all the tests we want to throw away. In this case, for example, let's keep 25%.

[00:26:40]Now, the thing to notice is there's a little bit of red everywhere, right? There's no big areas where there's lots of blue tests and no red tests.

[00:26:50]And that means that the selection is working. We're really picking from all over that let's call it semantic space that we've mapped out and not leaving any particular functionality completely uncovered.

[00:27:04]So that is a second good indication that the approach is working for that test suite.

[00:27:10]But the thing we're really after is bugs and there's two things we can look at for that.

[00:27:15]Coverage, which our research has shown for this particular problem is a good proxy for bug finding power, and then the actual ground truth, the bugs themselves.

[00:27:26]Now for coverage, we just make two measurements one for the whole test suite, what does that cover, and one for the reduction.

[00:27:35]And I'm going to show this for a different industry system, but to visualize this, I need a good visualization for test coverage.

[00:27:41]And what I like to use here is this.

[00:27:43]This is called a tree map.

[00:27:46]And you have to imagine this like a map of a software system. So this is 1 million lines of code and each rectangle here represents one component in the code.

[00:27:55]This is a business information system, so you've got some UI dialogues, you've got authentication, data validation over here, and so on.

[00:28:03]The more lines of code in a component, the bigger the rectangle.

[00:28:09]And what we're going to do is we're going to drill into those rectangles, and first we're going to draw all the classes in each component, and then we're going to draw all the methods in each class.

[00:28:21]And then you have all of the executable code of this particular software system on a on a slide.

[00:28:27]And what we can do is we can use colors to show what we know about the test coverage for that code.

[00:28:34]And then we compare the full run versus the reduced run.

[00:28:40]And this is a reduction for the industry system, the first one that we made when we started working with this one, where we picked 10% of the tests.

[00:28:48]So, everything that is green is covered by those 10% and everything that is red is only covered by the things that we would want to throw away.

[00:28:56]And you can see here that um we would preserve about 60% of the original test coverage when we keep only 10% of the tests.

[00:29:05]And I mean as nice as that is to get 60% coverage with just 10% it's obviously not enough, right? We've got all of these big red spots here, whole components that are not covered anymore.

[00:29:14]So, that's that's not acceptable. That's not what we want.

[00:29:19]What can we do?

[00:29:20]As I said, you just turn the slider a little bit more. So, we can try higher percentages to keep more tests and see if the test coverage improves. So, the next thing that we tried for this system was keep 20%.

[00:29:34]And you can see how the picture improves drastically here.

[00:29:37]So, now there's not these whole big components that are uncovered, but only single files here and there.

[00:29:43]And in fact, this is 92% of the original coverage with just 20% of the of the tests.

[00:29:50]And now you could say, "Okay, that's still not enough for me. I want 30%. I want 40%."

[00:29:55]For this particular team, this was already a great result because they had to migrate all of those tests to another testing technology and that would have taken hundreds of person days. And so, if they could migrate only 20% and get 90% of the coverage, that's already pretty cool. And what they did in addition is an analysis of all the red spots and they decided to write new, better tests for those. So, they added like 10, 20 additional tests here.

[00:30:23]So, you can see how you can sort of play around with these percentages to find the right trade-off given a particular software system and test suite.

[00:30:33]And also how all of this reduction really frees up um all of this maintenance effort and migration effort that we can put into just better new tests.

[00:30:46]So, we've seen coverage.

[00:30:48]For many people, that's already enough to decide to do this, but sometimes you want more security. So, there's a final step that we can do to show what the bug finding power actually looks like.

[00:30:59]In our research, all of the systems we observed there, coverage really correlated with bug finding power. So, I think if for most cases this should be enough, but if you need the additional security, you can do the last step.

[00:31:10]And here what I like to recommend is to do what I call quarantine and evaluate.

[00:31:15]And what that means is instead of throwing away the red part right away, let's just run it less frequently than the green part.

[00:31:24]So, for example, let's run the green part, let's say 20% of your tests, Monday through Friday, and you will see most bugs are caught already here. On the weekend, run everything, and then you can see what kind of bugs are being caught only by the red part.

[00:31:39]And if you do this for a few weeks, that really lets you experience what it's like to have only the that reduction.

[00:31:46]And then maybe you make a few adjustments based on the results of that. But so far, whenever we've tried this, the result was that the the reduction was actually pretty good.

[00:31:59]All right.

[00:32:00]I hope I could give you a good impression of how test suite minimization works and AI test clustering in particular, how it's really easy to apply because all you need is your tests as input and a large language model, and then how you can play around with the data to really find the right reduction for however big your pain is with the current test suite, and um what makes the most sense based on the data from your particular system.

[00:32:26]And you can either delete those tests, you can refactor them, or you can put them in quarantine, depending on again, what makes most sense for your system.

[00:32:36]If you like this, I invite you to scan the left QR code. You'll get my slides and you also future talks for me.

[00:32:43]Um and if you're particularly facing this challenge of migrating from one automation technology to another, um you can scan the right QR code where we have uh PDF that uh talks about how AI test clustering works specifically for that migration use case.

[00:32:59]So, thank you very much for listening in.

[00:33:01]And I'll hand it back to Richie and I hope you have some questions for me.

[00:33:08]>> Yeah, thank you very much, Fabian, for this uh interesting talk. Uh I have a lot now a lot to do with my clients and to to bring them there.

[00:33:18]>> [laughter] >> Um yes, we we have uh some questions here. Uh I hope you can answer them. The first one is uh does this only work for unit tests or also E2E tests, e.g., Playwright, Tosca, etc.?

[00:33:31]>> Ah, yes. Excellent question. So, actually, not only does it work for all of these kinds of test suites, we usually only use it for those because it doesn't really make sense to usually to minimize unit tests because they're fast to run anyways. They're not much overhead to maintain, so the pain is really small there. The costs are really small there. The costs are really big for these large end-to-end test suites like Tosca [clears throat] test suites, Playwright, and so on. And that's where we use them. And we've used it for many different um technologies in the research including, for example, um Tosca test suites, Playwright, uh Java-based Selenium tests, and so on.

[00:34:08]>> Mhm. Okay.

[00:34:11]Yeah, the second one is uh aren't those tests worth throwing uh away things like our boundary value tests? They are very similar, but don't they also provide lots of value?

[00:34:23]>> That's a really interesting question and and I've actually thought about this a lot when we when we did the research.

[00:34:29]And I think that yes, these boundary value tests are probably very similar to each other. So, probably the approach would not keep all of your boundary value tests.

[00:34:38]And the thing that I realized after a while is that why is this not a big problem? Because um because I mean in the research results we see that bug finding power is still preserved, right? For these tests even though we probably threw away a lot of boundary value tests. And I think the reason is that while these tests are really valuable when you first develop your feature and you want to make sure that really all the boundaries are covered and so on.

[00:35:02]How valuable are these tests after two, four, or 10 years?

[00:35:06]Because what I know from from teams that I work with is the kinds of problems that you face after 10 years are not that you broke one particular boundary value, but rather we fixed something in module A and now module B is completely broken.

[00:35:18]And then it's not going to be one boundary value test that fails, but rather all of these related tests are going to again cluster and fail together. And so, if you leave out some of them, there's really not much bug finding power you're losing in practice.

[00:35:30]>> Mhm. Mhm. Okay, thank you. And the third one is one I had to uh during the speech. Uh what if I have some functionality that is super important? Can I somehow keep all the tests for that?

[00:35:44]>> Uh yes. Yes, and actually that touches on a point that I sort of had to leave out of the talk for for time reasons. Of course, this whole selection process is is very structural, right? We're only talking about similarity of tests.

[00:35:56]We don't really take into account how much the business value is for a particular functionality that we're testing, how critical is it, and so on.

[00:36:03]But that's still something you can bring into the process. You don't have to take the output of the selection at on at face value. You can make changes, right?

[00:36:12]For example, if you have one particular cluster and it selects three tests, you could say, "Okay, I want to swap test two for another test from that same cluster because I think that's the better test to keep."

[00:36:23]>> Mhm.

[00:36:23]>> Or hey, here's certain functionality, it's super critical. I don't really want to throw anything away there, then just keep those tests, right? You can just augment the list.

[00:36:32]>> Mhm. Okay. Thank you very much.

[00:36:36]Yeah, thank thank you, Fabian.

[00:36:38]Uh uh I think if there are many more questions uh and they are in the comments or you they can write you, so we're you were happy to answer them then, too.

[00:36:47]Uh thank you that you were part here of this conference. Uh and for your people, uh don't forget to vote uh for the best speaker and best presentation. You see here the the QR code uh where you can also uh get to the to the page of of Fabian and vote there. And I wish you a good rest of the conference. And Fabian, thank you very much for coming here.

[00:37:10]>> Well, thank you for having me. And thanks for moderating moderating regime.

[00:37:17]>> [music] >> AI systems are different. [music] They learn. They evolve. They behave probabilistically.

[00:37:37]Testing them requires a new level of expertise. The ISTQB Certified Tester AI Testing certification focuses [music] exclusively on validating AI-based systems, from data quality [music] and model behavior to bias, robustness, and life cycle risks. Structured around the machine learning life [music] cycle, CTAI equips professionals to assess trust, reliability, and safety [music] in AI solutions. As AI adoption accelerates, quality assurance must evolve [music] with it. For those ready to test beyond traditional software, AI testing [music] begins here.

[00:38:15]Discover the ISTQB CTAI certification.

[00:38:21]>> [music] >> Mhm.

Related Videos

Computer Science

Agentforce NOW AMA: Build with React and Salesforce Multi-Framework

SalesforceDevs

490 views•2026-05-28

Computer Science

How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust

aiDotEngineer

450 views•2026-05-28

Computer Science

WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅

LearnwithSahera

1K views•2026-05-29

Computer Science

Search Algorithms Explained in 60 Seconds! 🤖💨

samarthtuliofficial

218 views•2026-06-01

Computer Science

People of Game of Thrones using JavaScript DOM

AltCampus

296 views•2026-05-30

Computer Science

Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA

ascensionix

107 views•2026-05-29

Computer Science

🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam

Pranavaa-y4y

104 views•2026-06-02

Computer Science

So What's Odin Lang Even Good For

TechOverTea

131 views•2026-06-01

Trending

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

Political Science

My response to the Police

RecklessBen

1496K views•2026-06-01

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30