AI testing requires understanding the fundamental dichotomy between software development (building artifacts from ideas) and software testing (extracting ideas from artifacts), which creates an anti-parallel information flow; this distinction is crucial because when using AI for testing, generating test cases is actually software development (producing artifacts), while analyzing system behavior to extract insights is true testing (performing activities), and recognizing this helps practitioners avoid conflating code generation with system comprehension.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
The Dichotomy of AI Testing.Added:
AI systems are different. [music] They learn, they evolve, they behave probabilistically.
>> [music] >> Testing them requires a new level of expertise. The ISTQB Certified Tester >> [music] >> AI Testing certification focuses exclusively on validating AI-based [music] systems from data quality and model behavior to bias, robustness, and life cycle risks. Structured [music] around the machine learning life cycle, CTAI equips professionals to assess trust, [music] reliability, and safety in AI solutions. As AI adoption accelerates, quality [music] assurance must evolve with it. For those ready to test beyond traditional software, AI testing begins here.
Discover the ISTQB CTAI certification.
[music] >> [music] [music] [music] [music] >> Welcome Welcome everyone to the A4Q Testing Summit, and thank you for joining us. We are excited to bring together the global testing community for 2 days of insights, practical experiences, and discussion around software testing and quality engineering.
Today, alongside the A4Q sessions, we also have a dedicated ISTQB track in English. And tomorrow, the summit will continue with our effort Q tracks and also we have our German track is answered by German testing board and our French track sponsored by CFTL. With many interesting sessions still waiting for you.
My name is Atifa Amblesi and today here it's my pleasure to introduce Youssef Itskovich, co-founder and co-CEO of Exactpro, an independent provider of AI enabled software testing solution for financial organizations. Hello this is Youssef.
>> Hello.
Thank you for the introduction.
>> Thank you. Youssef is experienced in software engineering and software testing. He's also co-author of introduction of AI testing guide to ISTQB CTAI certification.
Today Youssef will explore the relationship between the building AI systems and understanding how to test them.
And why these perspectives can often seem like competing approaches in the AI area. If you like Youssef's presentation, please vote for him at our website and please join me in welcoming Youssef Itskovich.
>> Thank you. Thank you very much.
Exactpro is a software testing business focused on mission-critical financial systems.
And we spent a lot of time working with very complex non-deterministic systems.
And today I would like to drill down into why uh, like generating tests with AI uh, requires some uh, scrutiny. And in order to do it, I'll go through the concept of dichotomy and what it is.
I will give some examples from software testing domain and will try to use this framework to explore why it is not ideal to think uh, about the process as specification to test.
Uh, so what is dichotomy?
Uh, it is a simple concept like good versus evil, uh, good versus bad, uh, there are many bad materials uh, on software testing available in the internet. There are some good materials and I hope my presentation will provide references to some of the good ones. So, a dichotomy is a division of something into two distinct uh, often opposing parts or categories.
It is when a single idea, group, or concept is split into two contrasting sides.
And, uh, to know that something is fast, for example, to know that your application is fast, you need to compare it to something. So, the concept of fast exists only in the presence of concept of slow. Dichotomies are attractive because they let us define uh, what something is by declaring what it is not.
Uh there is also idea of a spectrum. Say one bug is uh just a little, a few.
Hundreds of bugs is many. So, if you introduce one more bug, it is still just a few.
If you introduce one more bug, it is still uh a few. But somewhere along the line, we get into many.
So, some things exist on a spectrum and it is not easy to define when just a few turns into too many.
Uh this presentation uh is inspired by a great book, The Dichotomy of uh Leadership, that uh explores the balance in managing yourself, uh managing people, and managing uh projects. I highly recommend this book.
But the point of balance is not staying in the middle.
It is rather preserving the capability to apply the necessary extreme at the necessary moment.
So, it is not as simple as just do not stay at extremes, but it is always thinking about the capability to jump into the necessary solution when it is required.
There are true and false dichotomies in uh software testing, and also some things, of course, exist on a spectrum.
So, we'll go through this list of dichotomies in software testing.
Uh the first one is probably already annoying is testing versus quality assurance. It is of course a true dichotomy.
Unfortunately, people use uh uh these two words uh as synonyms and then in the same sentence explain that these are two uh different things. So, we need either to accept that these are uh exactly two identical things or there is indeed a dichotomy and these are two very distinct things and two very distinct roles uh where quality assurance is not related to the role of software tester.
Testing versus checking is of course a true dichotomy and it is now is more relevant than ever. You can see a great uh article that was published just a day before yesterday.
It covers both testing and checking and also AI and relevance to the AI domain.
Manual versus automated is of course false dichotomy because there is no such thing as manual testing.
Uh in reality, there are some reasonable alternatives and some reasonable dichotomies uh, is that more accurately reflect what we do when we are trying to interact with complex systems and when we are trying to use uh, sophisticated tools like AI's. And these dichotomies are interactive versus unattended, experiential versus instrumented, and exploratory versus scripted. Of course, the last one is a spectrum.
So, uh, most of the testing exist uh, on a spectrum from informal to formal, from scripted to exploratory.
Verification versus validation, building the product right versus building the right product, is, of course, a true dichotomy.
Prevention versus detection is a more complex topic. I would say it is complicated.
I highly recommend a book called Drawn to Testing by Ben Rady.
Uh, it contains a chapter on what is required to prevent something and also a great thought-provoking and insightful cartoon on what is prevention.
Deep versus shallow, broad versus narrow, a full testing coverage is, of course, an example of a spectrum.
Testing mindset versus development mindset. This one is controversial, but I highly recommend a book called The Scout Mindset by Julia Galef.
It explains motivated reasoning approaches and compares what she calls scout mindset versus warrior mindset.
And both have their merits and both have origin in human history and evolution.
And the scout mindset is the motivation to see things as they are, not as you wish they were.
It is an interesting dichotomy that is worth exploring.
Of course, uh taking testing seriously versus taking testing not seriously or modern software engineering versus outdated software engineering, probably proponents of both approaches will say that it is a true dichotomy, but it is a difficult question.
Rapid software testing versus minimum CD continuous delivery. Here, people will definitely insist that these are two incompatible, two uh separate approaches uh and paradigms, schools that has nothing to do with each other.
Uh I personally think it is a false dichotomy uh because uh different approaches are applicable at different parts of the overall development life cycle.
So, some time we do testing, some time we do uh checking.
And it is not possible to uh release software now efficiently without thorough thinking and critical exploration and without having some ways of performing automatic checks that can quickly uh confirm if something unexpected happened in our build.
Uh it is also >> [clears throat] >> in exploring this particular dichotomy, it is worth mentioning what is software.
And software is a collection of instructions and data that direct computer operation.
Uh these instructions frequently include control flow constructs that evaluate incoming data. If data is accepted, the system internal state is modified. If data is rejected, the state remains unchanged.
So, from this perspective uh and to explore the dichotomy between uh modern software engineering, CI/CD pipelines, and deep testing, one can see that it is very easy to understand uh what are unit tests, what are Gherkin scripts and step definitions, and what are CI/CD pipelines.
So, if you look at the definition of software, we will see that unit tests, Gherkin scripts, CI/CD pipelines, they all match the definition of software.
So, TDD, BDD, is just software development.
Now, the main dichotomy that explains why it is not as good to that good to think about uh generating tests.
There is a key dichotomy between artifact and activity.
So, test scenario or test script that exist in a fixed snapshot form is an artifact.
Software testing is activity. Test is activity. So, test is something temporal. It is performance that happens in uh time.
And it is a key dichotomy.
So, test cases not testing.
So, when we think about generating something we think about producing artifacts.
But uh when we do think about doing something we think about uh working with activities.
So, activities can use artifacts. So, you can have uh some scenario or you can have some text and use this scenario as a part of the activity.
Activity can also produce artifacts. So, you can go and perform a number of checks and then have built in uh checklist form.
But, artifacts and activities are two different things. You can't generate activity. You can generate an artifact.
And you can perform activity.
And uh thinking about this distinction is important to understand what software and what humans can and can't do.
Finally, there is a dichotomy between software development and testing.
And it is a true dichotomy because it these two processes have a anti-parallel structure of information flow.
Software development engineering or building something is taking idea and turning it into a product, into a useful product. So, information goes from idea to something that exists in the real world.
And research, science, and software testing has information flow that goes in exactly opposite direction.
So, software testing is when we explore real object, software, and try to extract ideas about it.
So, software development versus software testing is a true dichotomy. And it is very important to understand it now when we are trying to use AI for both software development and software testing.
When we use LLMs or uh code assistants to generate some artifacts, most of the time what we do is we are doing development. So, we are producing software or we are producing artifacts. We are producing uh instructions.
Software testing is when we are looking at the artifacts or at we are looking at software. We are looking at something and we are trying to produce ideas.
So, when we generate tests, most of the time we generate some code or some instructions. We are producing software.
We are doing coding. We are doing software development. When we are reviewing something or when we are analyzing a data set to get some ideas, we are doing software testing.
Understanding this distinction helps to understand what we actually doing, whether we are doing software development or software testing.
There are several very important dichotomies in relation to machine learning and uh artificial intelligence uh that helps to understand what we can and what we can do.
So, the first one is data types dichotomy.
There are types of data uh like handcrafted, natural, interactive, attended, manual.
So, this is data that is produced from the real world by humans.
So, it is something that we created with our hands or in constant interaction with it.
And then there is data created by software, created by uh computer programs. This data is synthetic, artificial, generated, unattended and automated.
There is a very important dichotomy in data volumes.
So, data volumes, while of course uh data volumes exist on a spectrum, we can say that there are two distant, separate categories of data volumes.
So, one of them is uh artifacts or snippets.
It is uh these are volumes of data that can be processed using narrow bandwidth of human attended work.
So, if we believe that a human can interactively process uh this volume of data, we can call it artifact or a snippet.
So, it is uh by definition something that human can process within a reasonable uh period of time.
But then there are of course data volumes that human will never be able to process.
These are data sets or corpuses. This is machine-scale big data.
So, this is a very important distinction and a very important dichotomy.
So, humans can only process artifacts.
Uh software can process data sets.
Uh dichotomy of reliability and responsibility.
So, currently there is common perception that AI systems are very unreliable, but there are technical ways of making them at least more reliable. And of course, uh humans can be also very unreliable.
On the other hand, uh computer software can never be made responsible.
So, software is always irresponsible.
Only humans can be responsible.
So, there is no such thing as responsible AI.
There is only responsible use of AI.
Finally, so, what is the natural and artificial intelligence dichotomy?
Intelligence is the capability to acquire, process, and apply uh knowledge and skills.
Humans have intelligence, but we do not know yet what it truly is.
So, we know that we do have intelligence. We do not really know what it is.
And we as humans we interact with reality using skills. So, our ability to interact with reality uh is skills.
Uh we also hold knowledge in our minds in the forms of ideas. So, humans have ideas.
Software has no ideas.
So, but we as humans do. And uh, we can uh, internalize something. So, we can look at reality and produce ideas uh, in our heads. And we can also externalize something. We can have idea and then we can produce an artifact.
So, humans can convert ideas into artifacts and back in continuous cycle.
And when we think about intelligent human, the only way for us to judge their intelligence is to look at their interactions with reality.
Uh, looking at both interactions and artifacts that humans produce uh, as a proxy for understanding human intelligence.
And important that humans are not able to work with data sets.
Well, data sets in a sense of uh, volumes of data that is beyond uh, human uh, processing capacity.
Software on the other hand, is just running software uh, is executing set of instructions.
And software can process data and produce synthetic artifacts. And also it can produce and process synthetic and natural data sets.
Uh, software interacts through data flows and control loops.
And its interactions with reality, because we live in a digital world in a software, can be represented as data.
So, artificial intelligence is software capable of acquiring, processing, and applying knowledge and skills.
And when we look at it, as humans, we are not able to judge about it by looking how software processes huge data sets, because we have narrow bandwidth, and we are not able to process the data sets. So, synthetic artifacts produced by the software is the only proxy for us to explore artificial intelligence. So, what it means from scenario generation perspective.
So, first first of all, scenario generation, most of the time, comes from ideas.
And as the humans, we can produce some input that will be a natural uh, artifact that we can produce and we can interact with.
Software can turn it into a large data set and generate uh, scenarios or something.
Then, it is possible to take this data set and apply it against uh, the systems under the test. It is a mechanical process.
And if we have enough data, we will produce another huge data set.
And uh due to dichotomy of data volumes, a human is never able to process a huge data set.
It means that we will also need a process that shrinks down a huge data set into artifacts or snippets that we as a human can uh process.
If we use uh software to produce a volume of data that can be immediately processed by human, most of the time we are using software on because this way we are not using the most powerful aspect of the software, ability to work with the volume uh that uh is beyond what humans can do.
On the other hand, in order for us to interact with these data volumes, we need to shrink it down.
And uh when we work with it, we clearly need to separate what is activity and what is artifact.
So, uh I do not think this uh easy concept, but it is a good thing.
So, the more you think in terms of dichotomous spectrum, artifact activities, you can figure out what can actually be achieved with the software and what can be achieved with the software.
So, when you use AI to produce scenarios like uh unit tests or Gherkin scripts or step definitions, you produce software.
So, you are using AI for software development.
If you use LLM to process some large data set and produce some shrink-based shrink down artifacts that you can try to turn into ideas to internalize then you are using AI for software testing.
We are trying to cover this and similar ideas in uh our group uh that I very uh highly advise to uh join and to explore.
Uh and we also do have a lot of materials on the subject where we are trying to explain from our own experience how to work with AI systems, how to better understand them what can be achieved and what can be achieved by using AI in software testing. Thank you very much.
>> Thank you very much, Youssef.
Uh for giving us a deeper perspective on the relationship between AI, software engineering, and testing.
I can already see some questions coming from the audience. So, let's open the floor for the questions. Please feel free to share your question in chat in in the the So, the first question, is the industry currently overvaluing code generation and undervaluing system comprehension?
>> Well, I think the own word here is currently. It was always the case code generation was uh always um in more respect than understanding what we have uh actually produced. That's why people dislike uh software testers.
Uh but uh now a clear indication would be that when people think about using AI in software testing, they're using phrase generating tests.
So, generating tests, but again, test is performance. It is not an activity. So, [snorts] what they really mean is generating scenarios, generating cases, something like that. But, when you generate something, you actually produce code.
And it is a clear indication that people do value producing code more than trying to understand it. So, if you want to if you have to see that somebody is trying to apply AI for to software testing is when they're not doing specification to testing.
But, when they're taking something produced from the system like logs, data, output, and trying to turn it into a set of ideas about the system behavior. So, there are um anti-parallel structures between development and testing.
And I do hope that people eventually start uh thinking more in terms of comprehension of what they produce rather than in terms of generation.
>> Thank you.
And the other question that I can see here, how do you distinguish between fast delivery and reckless acceleration in AI-driven software development?
>> Well, Well, well, so uh most things should be judged by their output.
So, if we manage to uh quickly put uh the systems into the live service and they operate there reasonably okay without huge outages and functional problems, we can say that it is fast delivery.
But, if we then instead getting problems and outages, then we say this was reckless acceleration. So, distinction between fast delivery and reckless acceleration is in its output. So, depending on what we get as uh end result, we can say what it was, but it is of course something that indeed exists. It is not a dichotomy. It is probably something that exists on a spectrum and uh any technical leader has to make constant value judgment to which of these two extremes uh they should look in their work.
>> Thank you.
Don't forget if you enjoy Yousef's presentation to vote for him by scanning the QR code or you can find the voting on his page his this uh the dedicated page for this talk at our website.
Also, if you have missed any presentation or if you would like to watch this session again, recordings is available at our web on our website and YouTube channel.
Thank you everyone for joining. Thank you Yousef for this nice presentation and enjoy the rest of effort you testing summit.
>> Thank you very much. Enjoy the conference everyone. Thank you.
>> [music] >> AI systems are different. [music] They learn. They evolve. They behave probabilistically.
Testing them requires a new level of expertise. The ISTQB Certified Tester AI testing certification focuses exclusively on validating AI-based [music] systems from data quality and model behavior to bias, robustness, and life cycle risks.
>> [music] >> Structured around the machine learning life cycle, CTAI equips professionals to assess trust, [music] reliability, and safety in AI solutions. As AI adoption accelerates, [music] quality assurance must evolve with it. For those ready to test beyond traditional software, AI testing begins [music] here.
Discover the ISTQB CTAI certification.
>> [music]
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 viewsโข2026-05-28
How agent o11y differs from traditional o11y โ Phil Hetzel, Braintrust
aiDotEngineer
450 viewsโข2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation๐ฏโ
LearnwithSahera
1K viewsโข2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 viewsโข2026-05-29
Search Algorithms Explained in 60 Seconds! ๐ค๐จ
samarthtuliofficial
218 viewsโข2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 viewsโข2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 viewsโข2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 viewsโข2026-06-01











