Large language models (LLMs) are fundamentally non-deterministic, meaning identical inputs can produce different outputs, making them unsuitable for direct trading decisions. This non-determinism arises because LLMs predict the next plausible word rather than calculating deterministic results, which is essential for their creative capabilities but problematic for trading where consistency and reproducibility are critical. Traders should use AI as a research assistant for ideation and prototyping rather than a decision-maker, and build tools that run independently of AI to avoid dependency. The key principle is that AI should augment human judgment rather than replace it, as the trader good enough to run AI is the trader who never needed one in the first place.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
AI Realism in Trading
Added:The name Radge is probably familiar to many of you.
Tonight we have Zach Radge presenting to us.
Zach is a professional trader, analyst and Head of Trading and Research at The Chartist.
With a Bachelor's Degree in Computer and Software Engineering, Zach incorporates real world engineering design experience into the realm of systematic trading.
He does that to design and to build robust trading strategies for both the US and the Australian markets.
And tonight, as you can see on the title slide there, Zach is going to talk to us about AI realism in trading.
The sub note there, the subheading is a trader and engineer's view of what AI, artificial intelligence, what it is, what it isn't and what it costs to confuse the two. Zach, welcome to our ATAA meeting tonight and it's over to you.
Can everyone please join me in welcoming Zach to the session?
Thank you very much, Robert.
Thank you very much for having me as well and thank you very much to Keith for a fantastic presentation.
So I'll just confirm that everyone can still hear me before I launch into things.
Yes, we can. Thank you.
Great. All right, let's get to it.
A computer can never be held accountable. Therefore, a computer should never make a management decision.
This is a quote from a 1979 IBM internal training manual.
What the author of this quote was almost certainly talking about VisiCalc, which was a brand new spreadsheet software. It was a Proto Excel, Microsoft Excel, spreadsheet software. We all know how that works.
And in this manual, the author is warning the employees that the computer is merely a calculator.
You can't blame a spreadsheet for making a bad decision.
A spreadsheet with bad data in it can mislead, but the spreadsheet itself is fundamentally not responsible.
For the next hour, or hopefully less than an hour, I want to talk about why most AI adoption is misguided, why AI could destroy your trading edge, and why that 1979 author was more right than they possibly knew at the time.
We're seeing AI adoption absolutely everywhere these days, and it's no surprise at all that we want to use it in our trading systems and in our trading processes.
I know for certain I'm not the first person to give a talk about AI and trading to the ATAA and nor am I particularly early to approach the topic. However, most of the presentations I've seen, I've seen have been affirmative.
They speak about how we can use AI to do this or to do that.
So I want to approach the problem more generally and from a higher level because we're in such a rush to implement all of these AI features that we haven't stopped and asked what the consequences of this might be, or if AI is even the right tool for the jobs we're trying to do.
But I also want to make it clear upfront that I'm not going to take an emotivist point of view of this.
I'm not going to tell you that AI is either good or bad because it's neither of those things and it's both of those things.
But what's for certain is that it's here and it's not going away.
So we might as well consider what good is this for us and how can we use it and how can we use it safely?
But first of all, who am I? As Robert said, my name is Zach Radge.
I'm head of trading and research at The Chartist. First and foremost, I'm a professional trader, but before that I was a software engineer.
I have a degree in computer and software engineering from the Queensland University of Technology and I implement my engineering background in trading systems, most obviously through systematic trading, but also through trading processes and automation and more recently bringing AI into that.
So what I'm talking about here isn't hypothetical.
I've built the things that I'm about to talk about and criticize.
This isn't an outsider's complaint.
A friend of mine works at a major law firm, which I won't name for obvious reasons.
And last year he received an email from the management of that law firm.
They were announcing a competition.
The management had made a significant investment in AI technology. They'd spent millions of dollars.
They'd made a million dollar investment in AI and they were launching a competition and they were asking all the staff to participate in this competition.
The competition was to come up with the best use case for the investment they had just made.
They'd spent millions of dollars on an answer that they didn't quite know the question yet, to the extent that they were willing to pay their own staff to come up with the question, to come up with a use case for the investment that they had just made.
But it makes you think in what other circumstance would management spend millions of dollars on a solution and then ask the staff to find the problem for them?
What's the strategy there? From a management point of view, thank you for the lot. What is the strategy?
And the truth of it is, it's not a strategy. It's a fear of missing out.
When a company says that we're driving innovation through transformative AI integration to unlock competitive advantage, what they actually mean to say is that our competitors are doing something with AI. We don't know what, but we don't want them to beat us to the punch.
So we need to do something and we need to do it quickly.
The company knows the competitors using the AI and they need to figure out how they need to use it before they can be displaced.
So it's a fear of missing out, but this fear of missing out in tech is not something new.
It's a pattern we can see play out over and over again over the history of technology.
We can see the same mistake being made over and over again with new tools.
When an innovator, when a new technology comes out, when an innovator comes out and disrupts the landscape, entrenched names are forced to scramble to meet the challenge of the new technology, but often the panic to not be left behind has given companies worse products than they started with.
Think back to 2007 with the launch of the iPhone.
It was a completely new technology, the multi-touch screen on a phone, on a mobile phone.
This was the first phone to do this, the iPhone in 2000 and it posed a new question where it reposed a question. What is a phone?
Up until this point in time, or a mobile phone specifically, had primarily been something that you called people on.
It was something that you could text people on.
Some phones were fancy and they could send emails.
You could take pretty bad pictures on them and then you could spend a few dollars to MMS that pictures. So the iPhone fundamentally challenged what the idea of a phone.
Phones had physical people. That's what a phone was. The iPhone, as we know, and as we now take for granted, did not have a physical view.
It had this multi-touch screen. It had more space to look at photos, to look at videos, to look at webpages, to look at, I don't know, any app you can think of.
So the other phone manufacturers had a challenge, they had a new technology, they had a disruptor.
They had to come up with an answer to the phone. Blackberry, who was the dominant business phone developer, their whole thing, their whole ship was their big physical quoted keyboard.
It was tactile. It was typed up. It was faster typewriter. It was accurate.
So Blackberry had a problem.
The iPhone was a fundamental challenge to their business model and to their product.
So how did Blackberry needed to respond and how did they respond?
They made their response to NextGen with the Blackberry storm.
It's a touchscreen on top and it's a tactile click underneath.
It's got a multi-touch screen. You can actually click the screen down.
It's tactile, just like the keyboard is. It's the best of both phones, but it wasn't the best.
I don't know if anybody remembers the Black Berry store and probably not. But if you Google it, it's commonly called the worst smartphone ever or the biggest smartphone mistake ever. It tried to be both the new thing and the old thing.
The old thing was a phone with a tactile keyboard.
The new thing was a touchscreen, they slapped the two together.
They bolted the touchscreen onto the old thing and in the process they made something that was worse than both because the iPhone crowd doesn't care about a touchscreen Blackberry. They've already got the iPhone, but the Blackberry and the Blackberry crowd who like their tactile touch keyboard, sorry, not touch keyboard, tactile keyboard, they've been given something worse, which is not quite either.
So in Blackberry's rush to implement this new technology, they not only undermined their current product but their future product because they ruined their reputation essentially.
So that was the Blackberry store, the worst market. But fast forward to 2024 and we saw the same pattern play out with the rise of artificial intelligence.
Entrenched names, when a new technology comes, the entrenched names need to figure out how they're going to use that, how they're going to implement the new technology into their old product.
What Google did was they implemented Google AI overview summaries. What the Google AI overview summary was, or still is, because it's still a thing, was it would summarize the results of the page below the search box.
Companies like OpenAI and Anthropic had put their AI chatbots into a chatbot window. You would talk back and forth with it, but Google had a search engine and Google wanted and arguably the best search engine, lots of people, definitely the most used search engine.
So what Google could do was get their new AI Gemini and put it in front of everybody and it would summarize the page below.
They don't even need to click on the links anymore. The Google AI would just summarize for it, the page or all the links for them.
So when somebody Googled, "How do I stop the cheese from sliding off my pizza?" The result that the Google AI overview gave was to keep cheese from sliding off pizza at about an eighth of a cup of non-toxic glue to the sauce.
The Google AI overview had read the pages below and that was one of the things it found, but it didn't discern that this was a joke that somebody had made on a Reddit thread. It merely summarized the results.
It didn't take any of the context of what was being said.
So similarly, when somebody Googled how many rocks a day should I eat, the Google AI answer was that geologists recommend eating at least one small rock per day because the Google AI overview was summarizing a satirical article from the onion newspaper, satirical newspaper.
So Google had bolted their new technology onto the old technology without asking if it was fully ready to be done.
They'd rushed to meet the new challenge and in the process undermined their Gemini offering by releasing it in a way that it probably shouldn't have been released. But Google's not done.
So why did Google do this? Why do we keep falling into the same trap?
It's because there's more to the pattern than merely bolting new technologies onto old one.
The pattern shows us how we perceive new technology within the framework of what already exists.
There's a famous quote misattributed to Henry Port.
So Henry Ford didn't say this, but everybody says he said this, so I'm just going to put it up there. It goes, "If I had asked people what they wanted, they would have said faster horses, not Henry Ford." The obvious way to interpret this and the way most people use this quote, most people usually spruking disruptive technologies is that people can't imagine a solution that doesn't exist yet.That's the surface level meaning, but I don't think that's what it really means.
I don't think somebody on a horse couldn't imagine a car or somebody who only has a horse couldn't imagine a car.
Imagine you're a messenger in the 1800s and you're sitting on a horseback and you need to get a message from place A to place B as fast as possible.
Sitting there on the back of a horse, your obvious thought process is going to be, "Man, I wish I had a faster horse because that's what you've got and that's how you've always done things.
It makes sense to simply want to improve your current process.
I don't believe the messenger on a horseback could not fathom the idea of a car.
In the future, when people are teleporting around, will they look back on us and think, look at these idiots trying to make a faster car. Why don't they just teleport?"
It's not a problem of imagination. We can imagine teleporting.
We can imagine something faster than a car.
So it's an unfair comparison, but what it does tell us is that in solving a problem, we often approach it with the framework that currently exists rather than stepping back and asking, "What is the problem that we're really trying to achieve?" So people imagine new tools inside the framework of the old ones.
For example, Blackberry, they had a phone with a keyboard.
They needed to put a touchscreen on it. What did they do?
Phone with a touchscreen with a weird hybrid touchscreen clickable keyboard thing. Google, they had a search engine.
They needed to put generative AI on it. So what did they do?
Generative AI summary of the search results. Traders, we have processes, so why not put generative AI into it?
Traders, we have strategies. Why not put generative AI into them?
Let's think about this. Let's walk through it. Recently, I was talking to a member of the Chartist.
They had a problem which they'd solved.
Their problem was that they needed to get a brokerage trade list into a portfolio tracker.
They needed to get it a CSV downloaded from interactive broker, interactive brokers into their portfolio tracker. They used to do it manually.
They'd download the CSV, they'd move the columns around, they'd change the date formats, they'd check it, whatever. They'd upload it.
But they had a new solution using AI.
Their new solution went like this.
They would screenshot their trade list from inside the broker.
They would upload that screenshot into ChatGPT.
They would ask ChatGPT to reformat it.
They'd download that and then they'd upload it to the tracker.
They've put AI in the loop, but it's still a manual process.
They still need to screenshot the trade list.
They still need to download the result. They still need to upload it.
It's fragile. It's still manual.
They haven't really improved the process. Sure, they've made it faster.
They haven't stepped back and asked the question, "What am I really trying to achieve?" So here's a better solution to this simple problem that a lot of us might already do.
A lot of you listening might've already got this sort of solution.
Here's a better solution.
We can use the AI to write a script that fetches the trade list from our broker automatically. It converts it to the right format, it checks it for duplicates or for errors and then it gets sent automatically to be uploaded to the tracker.
This is a fully automated process.
It can run on a timer like clockwork. We've removed both the human and the AI dependencies.
We don't need to intervene in the process nor does the AI.
And just to clarify that, if this was a one-off process, if we only needed to do this once, then we don't need to build this pipeline.
The AI solution is fine if it's a one-off process.
But if this is something we're doing every single day, then this is a far better solution because it's not dependent on AI and it's far more robust. But you're probably thinking, what does that matter if I'm dependent on AI?
If my process is dependent on AI.
AI is here and AI is here to stay.
So what does it matter that I'm building this dependency in?
Let's think about what happens when the bill comes due.
What happens when we make AI load bearing?
Let's zoom out. Let's make a more difficult problem.
This is a classic discretionary trade workflow, non-AI.
Most of us have probably done something like this at some point in time, classic discretionary trade workflow.
We run a scan using RealTest or Amy Broker to generate a trade short list.
A human, the trade-off, us, then double checks the scan, looks over it and filters that for the best signals. I don't know.
Maybe we're looking for flags, triangles, whatever.
Once we've narrowed that down and we're happy with the trades, we execute that filtered set manually. If you've ever done this, you know it's a repetitive task. It's a time intensive task.
It takes a lot of time. It takes a lot of brain bow.
So how could we speed this up?
How could we make it more efficient with this new AI technology?
Let's imagine putting AI into this process. Let's put it right at the beginning. We get the AI to do most of the work.
The AI scans does the initial scan for us by momentum, whatever. But unlike the previous process, the AI can maybe look at fundamentals, maybe it can look at headlines, maybe it can bring in some meta information, it can make a smarter decision.
After that happens, the human, the trader can double check those signals and then execute it themselves.
It's a much faster system. It's definitely faster than the first one.
The problem though is we've made the AI load bearing. If we remove the AI, the whole system will stop working.
We have to go back to the old system at least.
But I still haven't answered the question. Why can't we make AI load bearing?
Here's four questions that are worth asking. Number one, what happens when the model is deprecated or changed? Pretty straightforward.
OpenAI, Google, Anthropic, they're releasing new AI models all the time.
They slowly phase out the old ones.
That means our process is going to change slightly each time because that's out of our hands.
We don't have control of that unless we're hosting the LLM ourselves or the AI ourselves. So that's a risk.
Our process or the decisions could become slightly different.
Maybe we're fine with that. Maybe the new models are better.
What happens when the AI pricing quadruples? Claude is $30 a month, $150 a month. What happens if that price quadruples?
What happens if it's now $1,000 a month or $500 a month?
Maybe we're happy to pay that price. This is a business.
We've got to spend money to make money, so maybe that's fine.
What happens when the AI is offline at the open?
I mean, ChatGPT or Claude, they have up times of 99.5%.
They're available 99.5% of the time.
That 0.5% of the time they're not available.
We won't be able to access our trading strategy. Maybe we can live with that.
Maybe we can't. Maybe we're not doing a day trading strategy. Maybe that's fine.
But the most important one is this last one.
What happens when the output changes despite identical inputs?
What happens when the output changes despite identical inputs?
A calculator is deterministic. When you're using a calculator, on plus one will always equal two.
Two plus two will always equal four and so on and so forth.
We trust that the calculator is deterministic.
The same inputs into the calculator will always equal the same outputs.
It's a traditional algorithm and we depend upon it being deterministic.
If one plus one only usually equaled two, then we wouldn't have much faith in the calculator. Imagine doing your taxes with a calculator where 10% of $100,000 is usually $10,000, but sometimes it's $11,000.
Sometimes it's $12,000. If you did your tax with that calculator, I think you'd be getting a call from the ATO pretty quickly.
So we need our calculator to be deterministic.
We need the same inputs to always equal the same outputs. Likewise, real test is deterministic.
The same data going into the same strategy, same script, will result in the same signals every time.
Same with Amibroker, same with TradingView.
It's deterministic that deterministic algorithms.
So with a traditional deterministic algorithm, A plus B will always equal C.
Every time we trust that A plus B will always equal C. However, with a large language model, which is what most AI that people are using these days is and most generative AI is also this, LLMs are non-deterministic.
A plus B will usually equal C.
Sometimes it'll equal a slight variation of C, which we can call C prime.
Sometimes it'll equal an entirely different answer.
Sometimes it'll equal D because it's non-deterministic.
So why is this the case?
What is an LLM?
An LLM is not a calculator with extra steps.
It doesn't remember your last conversation.
It doesn't learn from your corrections. It doesn't reason.
It predicts the next plausible word.
Different runs produce different outputs by design.
An LLM is non-deterministic on purpose. It's intentional.
It's a feature of the LLM and it's a fundamental feature too.
Let me explain.
This is how a large language model works.
This is a massively oversimplified explanation of how an LLM works, but this is fundamentally what an LLM is doing at the very base level. The LLM, which is trained on a huge amount of Word documents and sentences and documents and all sorts of things like that.
From this, it is able to predict the next plausible word contextually.
So if an LLM is assessing the sentence the moon was, there's a high probability that the next word in that sentence is full. The moon was full.
There's also a high probability that the next word would be bright.
The moon was bright. The moon was round. The moon was low.
The moon was not full. The moon was not bright. The moon was hidden.
There is a 0% probability I assume a 0% probability that the moon was to pay or the moon was conglomerate because those sentences probably haven't come up in its training data before. So it says these are non-existent probabilities.
They're 0% probabilities. However, the LLM doesn't simply pick the highest probability word.
If it always picked the highest probability word, then it would be deterministic and the same inputs would equal the same output. So the moon would always be full.
The problem with that though is the LLM would have no creativity. It would have no ability to think.
So if the moon was always full, if the moon was always full, we wouldn't have the creativity, the humanness of the LLM. To make the LLM... So we take lower probability words sometimes.
So sometimes the moon will be bright. Sometimes the moon will be round.
Sometimes the moon will be hit.
If we were to take away that non-determinism and the moon was always full, we would essentially be giving the LLM a lobotomy.
It would lose all its creativity. It would lose all its humanness.
So the non-determinism is a feature of the LLM.
But what are the consequences of this?
Here's an experiment I conducted in which you can conduct yourself.
Give two Claude windows or any AI, any LLM, the price data for the ASX 200 stocks over the last calendar year and ask it to rank and grade each stock by its potential for immediate entry.
So we're giving it the price data from Naugate data so it's good data going in.
The same input data, the same prompt, the same model at exactly the same time on the same computer, on the same internet, internet connection.
Everything going in is the same. It's the same inputs.
We should get fairly similar results from the two clauds, right?
No, we get two very different outcomes.
So here's the top 10 from those two clauds.
Consider this in the framework of the discretionary trader we spoke about earlier.
The discretionary trader looking at his signals or their signals, they're going to be looking at the top 10 stocks.
What are the best buys in the ASX 200 right now?
That's what we've asked of Claude. So looking at these two top 10 lists, we can see some pretty obvious differences straight away.
I Claude A has ranked IFT is number one. Claude B has ranked IFT as number nine. Similarly, Claude B has ranked Macquarie Group as number one.
Claude A has ranked that as number 10.
We can also see that four of the results in the top 10 of each don't appear in the results of the other one. Claude A says NHC is the fourth best buy.
Claude B doesn't even put that in the top 10. But you're probably thinking, well, this is the top 10. Maybe they're all just reasonably good.
They're all very good signals. They're all high quality.
The order doesn't matter. The human's going to filter it out anyway.
So let's look at the dispersion of these top 10 and see where they fall.
So the same data, the same prompt, the same time, same computer, same model.
But NHC, which was number four on Claude A is now number 38 on claude B.
34 places down the list. Imagine you're that discretionary trader.
If you are looking at the list of claude B, are you even going to get down to number 38?
Are you even going to assess NHC?
Which claude A thought was the fourth best buy on the whole ASX 200.
It's all the way down at number 38.
Claude B doesn't even give it a buy grade. It doesn't tell you it's a strong buy. It says it's constructive.
So we can see that there's a lot of differences between the two and that's just the top 10.
There's actually an average of 14 places between the rankings of the two.
The ranking is high, giving a specific number is difficult.
So what about the gradings? Let's look at the gradings.
Claude A gave 23 stocks a strong buy.
23 stocks were strong buy immediate entry soon.
Claude B said 35 are strong bias immediate entries.
So there's 12 stocks there that Claude B thinks are a strong buy that Claude A doesn't agree with.
And we can see that same difference down the whole list.
Every category is different except category C, but the constituents of category C are different between claude A and Claude B.
So what's the consensus between Claude A and Claude B on this?
65% of the time Claud A and Claude B agreed on the grading, not the ranking, the grading.
35% of the time they disagreed.
So 35% of the time what Claude A said was a strong buy claude B didn't agree with. The categories are completely different.
So how do we reconcile these two opposing outcomes?
Well, first of all, the discretionary trader running the Australian model probably isn't going to be running Claude twice.
They're probably not going to be doing the assessment twice.
So they're only going to be looking at one of these lists.
So which of the lists is right? Which can be trusted?
And beyond that, if we can't even use it to get the same result at the same time, then we can't use it to get the same result if we run the same test that we ran a week ago. So it can't be back tested.
It's not repeatable. It's not reproducible.
So how can we validate something that we can't reproduce?
More importantly, how can we trust something that we can't validate?
And more important than that, again, how much would you risk on something that you can't trust?
Sorry.
How much would you risk on something that you can't trust or you can't validate or you can't reproduce? How about $2.5 billion?
In late 2017, an investor from Hong Kong, by the last name of Lee, invested $2.5 billion into an AI supercomputer hedge fund run by a supercomputer called K1.
After investing in late 2017, the AI hedge fund immediately started losing money and immediately went into a drawback. But that's nothing strange. We're all investors.
We know that strategies lose money. Every strategy loses money at some point.
That's the nature of the game. And Lee's a billionaire.
I'm sure he knows that too. But after just a few months invested, Lee pulled the plug. He'd had enough.
He pulled the plug after K1 lost $20 million of his own money in one day on a stop loss.
He pulled his money. He said, "I've had enough. I've lost confidence." A lawsuit immediately ensued between the investor, Lee and the company running the AI hedge fund.
The main point of contention wasn't merely that the fund was losing money.
The point of contention was that the back tests that Lee was shown or the simulations that Lee was shown were misleading.
They were unvalidated. They were unrepeatable. They were unreproducible.
The K1 supercomputer trading system was a black box so the decisions it made could not be audited. They couldn't be validated.
When the fund started losing money, Lee lost confidence immediately because he couldn't scrutinize or explain what had happened, where the money had gone, nor could the people running the hedge fund because it's a black box.
They don't even know.
So there's no confidence without validation in a strategy. On the screen here, this is a trade from our US momentum portfolio from the start of this month or the end of last month. As you can see by this chart, this is an unlucky trade, not necessarily a bad trade, but it's an unlucky trade.
The day after we exited this trade, the CEO of NVIDIA, Jensen Juan said that Marvel, this company, the stock, is the next trillion dollar company.
So immediately the price of that stock jumped 50% the next day after we just exited.
And one of our clients who's invested in this portfolio, they emailed and they said, "Why did we exit that?
It went gangbusters the very next day. Why did we exit? We missed out on this.
" And I was able, because this is a real test-based systematic strategy, I was able to go back and I was able to see exactly why we exited. I was able to see that this stock exited because of the volatility.
The stock was too volatile. It had momentum. It was giving us a gain, but it was too volatile.
So it exited for something showing similar momentum, but less volatile.
So I was able to explain that to him. But more than that, I was also able to say, "What happens if I loosen that volatility filter?
What happens if I loosen that volatility filter to the point that this specific stock didn't exit?" And from that, I was able to see that by loosening the volatility, we actually make the whole performance worse. The drawdowns get bigger, the performance gets worse. So not only can we validate the specific case, but we can validate the rule as well.
And that's what gives us confidence in our strategy, the ability to validate it, the ability to reproduce it. And I explained this to the member and he was perfectly happy with that.
So from all of this because they are unvalidatable, we can say that non-determinism makes large language models fundamentally unsuitable for direct decision-making.
So then what is it good for?
We have this incredibly advanced, high-tech, new technology, but it's non-deterministic.
So we can't use it for direct decision-making, but let's not throw the baby out with the bathwater.
There's got to be some way that we can use this new technology. Direct decision-making is off the table.
So how about indirect decision-making? Don't let it pull the trigger.
Let it inform us. What do I mean by indirect decision-making?
I mean, research or investigation. Hey, Claude, give me some ideas. How does this strategy work?
How does this person make their money? For research, for strategy coding or prototyping.
So RealTest comes packaged with Claud instructions now.
So you can use that to quickly whip up some real test code or prototype a new strategy. For automatic test fronts, just similarly, you can say, "Hey, Claude, do a robustness test on this strategy for me. Find me all the holes in this strategy. See what I've missed.
We can use it for idea generation. Hey, Claude, give me some ideas." And I'm saying, "Claude, I'm not paid by Anthropic." You can use OpenAI the same way or ChatGPT.
Okay. So that's what I mean by indirect decision-making.
However, what about the non-determinism problem?
It's still non-deterministic.
The same inputs will still equal different outputs.
For example, I gave Claude the instructions to build a momentum strategy, an ASX 100 momentum strategy, quite specific instructions of what I wanted it to do and yet I still got two different strategies out of it.
Completely different performances, quite different rules, same general structure, but two completely different outputs.
Similarly, I asked Claude to analyze one of these strategies for me, two different claudes to analyze one of the strategies, give me a list of what needs to be fixed or test. And what did it give me?
It gave me two different lists, two different lists from the same input.
There's some overlap between the lists, but fundamentally they are different lists. They're different priority lists.
So now we have two problems.
To execute the findings of either output of the two clauds that I've just spoken about or either of those experiments is still our direct decision-making problem. If we execute those findings without questioning them or without testing them, it's still the direct decision-making problem.
The second problem is that there's no way to meaningfully reconcile the two outputs into one truth without some form of external judgment.
The results that the Claudes gave are informative, but they're not definitive. They're different.
They can't both be right. Each is founded on correct inputs, but the outputs value very different aspects.
The two outputs don't synthesize into a singular truth and they can't be relied on to form a singular judgment because they're conflicting.
And we can't get a third AI to adjudicate between the two.
We can't bring in Claude C and say, "Hey, Claude C, which is Claude A right or is Claude B right?" Because it's still non-deterministic because Claude D might come in and say, "Hey, claude C is not right either." So it's turtles all the way down.
They're all non-deterministic.
So if we have these differing outputs, what value is in them?
What value is there in these differing outputs?
The value exists only in your own ability to interpret those results meaningfully because they're founded on correct inputs, assuming you put correct inputs in, they're founded on correct inputs.
And if you've ever used AI for this sort of thing, you would know that usually it can be quite insightful.
Sometimes it can be wrong, but for the most part they can be quite insightful.
So how do we reconcile them?
Only with our own ability and our own knowledge.
You still need the experience therefore and you still need that understanding.
So then in order to have the ability to safely and meaningfully run the AI, we need to have a firm base of practicable knowledge of actual understanding of what it is telling us so we can take what it's telling us and interpret it meaningfully.
We can't just blindly execute the results because that's still direct decision-making, but now with extra steps, we can't blindly execute results.
A novice trader blindly executes advice. Sorry, they don't blindly execute advice because they are... Sorry.
The novice does not blindly execute advice because they are careless and that's not just from AI. Maybe that's from a YouTube video. Maybe they saw something, maybe they saw somebody on Instagram and they saw that piece of advice and they blindly execute it because a source of authority told them.
The novice trader blindly executes advice because they have no basis on which to disagree.
The Novis trader doesn't know what they don't know, nor does an expert trader, but the expert trader obviously knows more than the novice trader so they have more of a basis on which to disagree, on which to criticize the outputs of the AI.
So how do we get that basis on which to disagree with the AI in the first place? How do we make that foundation of knowledge?
How do we form that basis of judgment?
We form that basis of judgment by doing the work ourselves, by making the mistakes ourselves. When you mistake, when you make a mistake and you have nobody to blame but yourself, that is when you really learn. I'm sure we have all made mistakes and we pay for them dearly sometimes and that's when we really learn.
But we also learn by finding the successes ourselves, by tinkering with things, reflecting on our decisions and finding the successes from that ourselves and by solving and fixing the problems that we come across ourselves by really thinking about them and engaging with the problems and the successes too, reflecting on successes. It's not all losses.
Sometimes we will. So there.
We form that basis of judgment by doing the work ourselves.
So what happens when we let the AI do the work for us? Well, we never get to do the work that builds the judgment.
Imagine you are a novice trader coming into trading right now.
The first thing that a lot of novice traders are going to do, they're going to open up Gemini, they're going to open up ChatGPT, they're going to open up Claude or Mistrell or whatever.
They're going to ask it, "How do I trade?
How do I start trading?" And it's going to tell them exactly what to do and likely they're just going to go and do that.
And then when something goes wrong, they're going to ask Claude or ChatGPT, why did that go wrong?
And ChatGPT is going to tell them and then they're going to do whatever ChatGPT's solution is.
So by delegating completely to the AI, you lose your ability to grow.
You lose the ability to gain those skills that we need to form that basis of judgment, which in turn makes you incapable of supervising the AI because that foundation of knowledge was the predicate to running it in the first place. It's a paradox.
We can't gain the knowledge if we let the AI do all the work for us.
The more you let AI's findings stand in for your own judgment, the less judgment you will build until you can no longer tell whether the AI's findings are any good at all.
So from that, we can say that the trader good enough to run an AI is the trader who never needed one in the first place.
So then we've still got this miraculous new technology.
It's still meaningful. It still gives us useful insights.
Other people are using it successfully.
Some of us here are probably using it successfully.
So how do we use that AI without losing our edge as a trader?
How do we use it without becoming dependent on it?
Because we're not talking about dependency in the form of a business process anymore.
We're talking about dependency by means of surrendering your whole growth as a trader, your whole path to becoming a better trader, your whole means of improvement.
Making yourself reliant on something is not empowerment.
The more you outsource that decision-making, the more you'll lose your ability to make decisions at all.
So how do we use AI without losing our edge as a trader, not our strategy edge, not our mathematical edge, our human edge, our edge of knowledge as a trader. It's the difference between reliance and augmentation.
Reliance is making your decisions and systems dependent on AI.
Augmentation is using the AI as a tool to empower or enhance your trading without surrendering your ability to the AI or to the process because the more you let the AI finding stand in for your own judgment, the less judgment you will build.
So how do we avoid falling into the trap of reliance?
Here's four rules to avoid the trap.
Rule number one, build tools, don't be one.
Use AI to build something that runs without it, without the AI, like the member's problem earlier.
To rely is to get the AI to do the work for you directly.
To augment is to get the AI to help you build tools to improve your workflows. By your workflows, I mean workflows that you actually need and you actually use.
Don't build stuff for the sake of it. You can all build cool tools now, but just because you can doesn't mean you need to. You don't need to overcomplicate things for the sake of it.
Rule number two is to use it as a research assistant, not a decision maker.
Visualization, exploration, but don't let it pull the trigger.
To rely is to say, "Hey, Gemini, what is the solution for the problem I have?
Give me the solution." To augment is to say, "Hey, Gemini, what are all the solutions?
How have other people solved this problem?
What are the possible options? Is there anything I haven't considered?
Is there a better solution?" And then to engage with that, not to just take its best solution, you need to make the judgment yourself.
Rule number three is to use it when it's cheap to be wrong and valuable to be right. Ideation, prototyping, not execution.
To rely is to say, "Hey, ChatGPT, build me a completely finished trading solution that's ready to trade right now.
Test it for me. " To augment is to say, "Hey, ChatGPT. Here's an idea I've got.
Build me a prototype of that idea. Help me get there faster.
Help me explore the idea because I'm sure all of us here have wasted time investigating a bad idea.
I know I definitely have.
So that's one great thing about AI is we can realize our ideas are bad faster, but sometimes the ideas are good. They're not all bad.
Rule number four is the disappearance test. If this AI vanished tomorrow, would my trading break?
And I don't mean merely from a process point of view, not merely if I don't have access to ChatGPT, will I be able to change the format of my CSV? I mean, if I don't have access to Claude, will I be able to make the decision I need to make at all?
Would your trading break if it's gone? If yes, then you've built a dependency.
So to rely on AI is to outsource the decision-making.
To augment is to use it as a tool to empower yourself.
Somebody once said that a computer can never be held accountable.
Therefore, a computer should never make a management decision.
Perhaps they weren't talking about mere process accountability or business accountability.
Perhaps they were talking about the moment you lose the ability to answer for yourself at all, to account for yourself, full stop.
Can you explain your decisions fully? Do you understand them?
Thank you very much. That's my talk.
Thank you, Zach. Thank.
You, Robert.
I think Zach's happy to take questions.
Yes. Happy to take questions. Any.
Questions in the room or questions online?
No questions in the room. Oh, there is one. Two questions.
Microphone, please.
Got some questions.
Got a question in the room just passing the microphone around.
Sure. Let me turn my volume up.
When you're ready. Zach, my name's Robert Grigg.
I'd like to thank you for what I think is an absolutely superb presentation.
Yeah. Thank you very.
Much. I've been heavily involved in AI over the last 12 months and I thought that everything you said fits exactly, that AI can be a fabulous aid, but it is an aid, not an answer.
Exactly. Exactly.
Thank you.
No worries at all. Thank you very much.
Another question in the room.
Oh, go.
Right. Okay. Thanks.
Thanks very much for the presentation. Yeah, just very insightful.
Now I've got the word deterministic and non-deterministic in my mind and explains a lot about AI and how I can and can't use it.
I have been using Gemini to help review trades and what I found was that I needed to upload my trading plan to Gemini so it could refer to that when it's coming back and giving me feedback, giving it guardrails on how to think, if you like.
What's been your experience on giving Claude or whichever AI you are using reference material so that it helps fine-tune the responses that you are getting back?
Yeah, so great question.
One of the most powerful things about the AI or LLMs, one of the biggest improvements that we're seeing with it is its ability to expand its context window and not just the context in what you've been talking to it with, but its ability to place your question within the context.
So if we think back to the example of the moon is whatever, if the AI knows it's speaking in the context of a children's book, it will say the moon, it's more likely to say the moon is full or the moon is bright.
But if it knows it's talking to somebody, an astronomer or an astronomer, then it will know to give a more technical answer.
So instead of just saying the moon is bright or the moon is full, it will say the moon is a planetary body that orbits around the earth and other facts about the moon that I don't know.
So the ability to expand the context window of the LLM is what really gives it its power and that not just in the general training sense of what OpenAI or Anthropic, how they actually train the model, but in how we can train it ourselves.
And that's why, for example, with RealTest, when Masten started packaging the Claude instructions with RealTest, it suddenly never made a coding mistake ever again with... Well, I haven't seen it make a real test coding mistake while I've been using the instructions from Masterm.
So the more context it has, the better the answer it's going to give you and the more specific answer it's going to give you. I know when I talk to Claude, sometimes it will give me ridiculous suggestions.
It will base decisions or suggestions off prior conversations we've had because I let it build the context window. If you let it build through the Claude MD file, for example, I'm not sure what that is with the other LLMs that are available, but if we let it build the context window, it will understand that's when it starts to remember and that's when it starts to become more insightful.
But a consequence of that is sometimes it can mix ideas up.
So I had it running some tests for me not that long ago exploring an idea of different position weightings essentially. And in a new conversation, because it had saved those results to memory, it started suggesting taking these other ideas that we'd done before and that we'd found really valuable and putting them into the new thing.
But I'd actually killed the old session because it was actually really useless what it was coming up with, but it was Bringing those new ideas across. So the ability to expand that context window is both incredibly powerful, but it doesn't need to be managed because that is essentially it's memory and we don't want to give it bad memory.
And I've forgotten what the original question was. Did I answer the question?
Yeah, but partially. It's really about the type of information that you might upload, I guess with RealTest, then you're looking at the instructions for the software.
But from a strategy point of view, giving the AI the actual strategy that it should be thinking about and nothing else.
So I have conversations with Gemini.
It's got the universe of stocks.
It's got what is the state of the market at the moment?
These kinds of things that it needs to think about and only think about when I have a chat with it, when I want to have it help me evaluate some trades.
Yeah.
So I think one of the important things when you're using it or using an LLM is to give it a goal, to give it an end goal that it needs to reach.
So if you are giving it a set of trades that are either open or you are considering and then you're giving it news headlines or that sort of context, I think it's also really important to give it a goal or what your aim with those trades is. So to expand that context window, like I was saying, whether that goal is to avoid losses or whether that goal is as high as gain as possible or to balance the tooth, if you want the best output, you need to give it a goal and you need to keep it within bounds as well. Yeah.
That's great. Thank you.
Thank you so much at all. Thank you very much.
I'm wondering if you can see the chat messages.
There are a couple of questions in the chat box.
One from Roy Williams asking if you've used AI to create the slides.
Yes, I did. So was that odd?
So to what extent I built the slides myself and the concept myself and refined it over time, but I'm not a designer.
I'm an engineer. I have a pretty terrible taste in things. So I gave the slides that I had made and also a transcript of the first run through that I'd done.
I gave that to Claude and I gave it with our company brand book.
So the brand bok that our, I don't know, what do you call them? Brand designers, graphic designers had come up for us a while ago and I gave the brand book to the AI and I said, "Can you make this look good for me please?" Because it looks terrible.
And now it looks.
Nice, I think. Okay. Very good. Thank you.
And Izzy has asked a question in the chat box.
I can't see the chat box. Oh, here we go.
Let me expand this. Here we go.
Izzy, does mixture of experts improve consistency?
Are there techniques to reduce non-determinism?
Is that two questions? Does mixture of experts improve consistency?
I'll answer the second question first.
Are there techniques to reduce non-determinism? Yes.
If you are running, yes and no.
On a browser-based LLM like ChatGPT or Claude or Gemini, which is out of your control, you won't be able to change those parameters.
Those parameters are set by Anthropic or OpenAI, whatever.
If you're running a local LLM, which is actually really easy if you want to try it, you can play with the parameters so you can turn the temperature all the way down to zero and make it essentially deterministic.
However, the answers will not be good.
It will lose that creativity that makes it useful.
It's incredibly frustrating to talk to a deterministic AI when you have spoken to a non-deterministic one because it does lose its edge as an AI. So there is a reason it is non-determinism. And for the most part, for the vast majority of use cases that non-determinanism is good and it is useful. However, when we do need in a situation where we do need the outputs to be completely repeatable every time, that's when the non-determinism becomes a problem. Okay.
Does that answer the question?
We can come back to the first question if you want.
And Scott has asked a question too in the checkbox.
Cool. Scott says, "Have I noticed any quality differences between different AI models, Claude Gemini, Codex, especially with respect to trading system design?
I've been pretty dedicated to Claude, so I haven't done this analysis myself yet." So I haven't used a huge array.
I am very much a Claude person if that wasn't obvious through that presentation.
I do believe Claude is, especially for coding, the best on offer. ChatGPT way back used to be the best. These days, I think it's terrible. Well, not terrible, it's limited. It hasn't aged.
I have seen people do very impressive things with Gemini, but I haven't used it myself because Claude has been able to do everything I wanted it to do. And for the most part, so is ChatGPT, but I just think Claude is less pedantic and less of a sycophant than ChatGPT is. So I've also used Mistrel, that's the local LLM that I was talking about. So I self-hosted that and I've used DeepSeq as well for that. I didn't find them as good, but that's not a fair comparison because it was hosted on a local computer.
So it's not backed by a huge data center, that's not a fair comparison. So I'm very much Team Claude for the time being.
If Claude somehow deteriorates, I'd be happy to change from it. So yeah, any other questions?
Zach, thank you. We've got a question in the room here from.
Keith.
Keith, he's sitting right next to me.
Keith.
Hello, Zach. Keith, maybe Keith. Hey, Keith. Yeah, thanks for excellent presentation. It was really good.
I use AI.
I use ChatGTP every day of the week and I use a lot of times to ask questions about options.
I just thought the other day that I really should learn to do this myself because I thought if this wasn't available, I wouldn't really know how it's coming to these conclusions.
And so you've pointed that out tonight and so tomorrow I'm going to start and ask it, well, how did you work that out so that I'm not dependent on it?
So look, thank you for that. That was excellent.
No worries at all. And thank you for your presentation too, Keith.
It was very good. Very interesting. Very true.
I think we're running out of questions here and online. Oh, Jim's got one.
Bear with us. Jim's got the microphone.
Well, Jim here.
Hey, Jim.
You mentioned that non-determinism is an issue in some regards, a benefit in other regards.
The fact that sometimes you can give you great answers, sometimes give you terrible answers. To me, it seems that it's not a good idea or not a good property of the system.
Given that the AIs apparently seem to improve in performance every few months, double in performance every few months, according to some measurements, do you see a time in the not tradition future where that problem will not be as much of a problem anymore?
Well, I think it's a problem of what we're expecting from the AI.
So the non-determinism is great when we want to, for example, creative writing.
If you want to get the AI to write a newsletter for you or write a book for you, then the non-determinism is great for that because the writing doesn't sound as derivative.
But if it is deterministic and it's the same output every time, that's when you start to get cliches and you start to get tons of M dashes and the weird sycophantic phrasing that we all associate with AI now. So the non-determinism lets it stray from that path. It lets it feel more human as it becomes less predictable, not just in its outcomes, but in its turns of phrase.
When you've got a creative writing task, we don't need to revalidate it.
We don't need the creative writing to be repeatable.
It doesn't matter. If a sentence is slightly different, it doesn't really matter.
When it's giving us instructions though and we're following those instructions, then the phrasing really does start to matter because it's not just a matter of hallucination or coming up with things. And by the way, the non-determinism is why it hallucinates as well.
That randomness is why it just makes stuff up sometimes.
But when we start asking the AI to prioritize things for us or to rank things or to some sort of qualifiable outcome or quantitative outcome, that's when it becomes the problem and that's also when we need it to be repeatable in those situations. So can I see it getting better? Yes. For another reason as well, you may also know that AIs can use tools now.
AI can run code and the reason the AI can run code is because it has the ability to Give the tool instructions, but then also to look at the output of the tool and compare it against a desired outcome or a goal, which is why a goal is important again for the AI when you're using tools with it. So the better it gets at that tool use, or the better it gets to knowing what the outcome should look like, then the better it should be at operating within those contexts.
The problem with trading though is what does a good signal actually look like? What is a perfect setup?
The answer to that is obviously debatable and there's no one answer to that.
So if the AI can back itself with the answer of what a perfect signal actually looks like, or if you can provide the AI with what a perfect signal looks like, then that's when it would become useful. Unfortunately, we probably, well, I definitely don't know what a perfect signal looks like and you probably don't know either. If anyone does know, feel free to let me know.
But until we can quantify what a good outcome looks like, then the AI doesn't know what it's aiming for.
So it's the problem of the goal again.
But if we also know what the perfect outcome looks like, then we know how to find it without the AI in the first place.
So why do we even need the AI?
So again, it's a bit of a paradox.
Did I confuse things or does that answer the question?
No, I think Jim's happy with that. Okay, that's good.
I think we're out of questions, and it's almost time to wind up anyway.
Zach, you.
Go ahead. Thank you.
Very much for your time and effort putting that together.
As a couple of our people here have said, and there are comments in the chat box, there was a very good presentation and you've certainly opened our eyes in terms of the usefulness or otherwise of AI these days.
And as you've suggested there, and Jim suggested too as well, AI is changing almost weekly and some of these AI models are leapfrogging each other in terms of what they can do or can do better. So there was a really good eye-opener. Zach, thank you again for your time and effort and everyone here in the room and online, thank you all for your participation
Related Videos
AI Agent Mastery Certification Course: Lab 4 – Tools & MCP
arizeai
350 views•2026-06-16
Real-time Voice cloning, Kimi K2.7 CODE, GLM 5.2 and 3D reconstruction | AI News
kaiexplainsYT
111 views•2026-06-16
He Believes AI Could Replace Humanity Faster Than Anyone Expects
LondonRealTV
815 views•2026-06-15
General Session by Rami Rahim-The next generation of networking: From vision to self-driving reality
HPE
108 views•2026-06-17
[PLDI 2026] Flatirons 3 - LCTES (Jun 16th)
acmsigplan
191 views•2026-06-16
Google DeepMind’s AI Halves UK Housing Planning Time
60secondsignals
467 views•2026-06-17
The Creators of Claude Code and OpenClaw don't Prompt Their Agents Anymore?!
ColeMedin
569 views•2026-06-18
Why prompt injection is AI's biggest fail
usemultiplier
1K views•2026-06-17











