Knowledge graphs multiply the value of enterprise data by capturing relationships between data points, which enables three key benefits: connecting data across silos, improving retrieval strategies like RAG (Retrieval-Augmented Generation), and revealing hidden patterns such as loops and critical dependencies that are invisible in traditional table-based data representations. Graph databases offer flexibility in data modeling, superior performance through navigational storage, and enhanced explainability compared to vector-based AI approaches, making them essential for bridging the gap between symbolic and sub-symbolic AI in enterprise environments.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Knowledge Graphs: The Strategic Foundation for AI-ready Data in the Enterprise - GT Pharma 2025Added:
Thank you everyone for joining us today.
It's really great to be here. Thanks for the to the team for the invitation.
Alex, Conrad, Pa and all the rest of the team. It's it's really great to to be here in Munich opening this this session today. Uh, I was actually watching the news this morning and and was flattered to hear that they were saying it's going to be the hottest day in the year. Say NEFJ is in town, but it was about the weather of course anyway. But it's it's great. It's going to be I think they've managed to line to get a amazing lineup of presenters. So, it's only going to go uphill from my presentation here. So, I'll spend the these next 30 minutes talking about graphs. Of course, I know there's loads of experts in the room, but there's also people that are relatively new to it. So I thought you know let's try to level set a little bit and and introduce the concept of graphs.
I will do it in the context of of AI of course and um because I believe and that that graphs can be really your your uh kind of your big advantage in the sense that they uh they will multiply the value of your data. That might sound a little bit like a marketingish slogan, but I I'll explain what I what I mean by that because uh because I I I think there's this uh this genuine value in it and um and I think it's very timely because uh now is the time to to make your data AI ready. I'm sure you've heard the expression from the likes of Gardner and others and um we don't control the speed and at which LLMs are evolving right that's that's only going to keep you know accelerating in a way they're going to get better they're going to get more efficient but what we do control is the state of our data and that's what what I mean by getting your data AI ready so we have to make sure it's our responsibility to get the most value out of it at least all the value that that this data holds. And um normally when you're in the data space, it's common to take kind of a cautious or like a conservative approach and say maybe, you know, take it easy, slow down and uh wait till these models get more stable, more reliable. You're not going to get anything like that from me today. I think uh it's the opposite.
I mean, I think it's time to to to accelerate. I mean we we of course will put guard rails around this technology.
We'll make sure that we ground the LLMs with trusted uh sources of data but uh we will do all that in the same way we use the brakes in our car. I mean it's not to slow us down is to help us going faster but doing it safely. So that's that's that's my my um kind of my my thinking around this this situation. And uh and I'm also uh convinced that uh you know the future of AI is going to be the the result of the convergence of the two main types of of of artificial intelligence out there. Of course, there's the most popular these days represented by uh large language models and generative AI which is the more if you want more opaque more implicit that's the subsy symbolic AI represented by uh um neural network neural based systems but then there's the other type of AI which is the more transparent and more explainable represented by knowledge representation systems like uh knowledge graphs so if I manage to you know convince convince you of this and we agree by the end of of my presentation. Well, I hope you'll get out of this session with the you know the conviction that you need to have both fronts cover, right? The the two types of of AI for your in order for your AI strategies to be successful. So, that's that's my plan. We have a lot to cover. So, let's let's get started and and um I'll start with a bit of a of an intro. So I've uh spent the last few years working with organizations like yours helping them adopting graph technologies and one of the things that I've observed is that uh I was talking about getting all the value of your data but I've see I see organizations ignoring a good part of the of the value in their data by ignoring a critical aspect in in in in it and that uh critical aspect is the the relationships right the relationships as as in the connections between the data points. So these is the the connections between the the references between the the records in your data warehouses, the the the references between the documents in your document databases. So these relationships capture the context and to some extent the semantics of your data. And at NEOJ we believe that these are as valuable if not more than the data points themselves. And uh and we believe that because you get three things out of uh out of these uh connections, these relationships in your data. The first thing that you get is connections.
Relationships build bridges across data silos. So you get connected data. The second one is they're going to enhance they're going to make a lot better the process of retrieving information. And retrieval is the R in in rag these days.
I'm sure you've all heard about rag.
unless you've been living under a rock.
It's it's the way that AI systems get data uh uh and and and use data from your your your enterprise systems. Uh and u so they we're going to see and and and hold the thought because I'm going to talk a lot more about that. We're going to see how graphs augment and improve this uh process of retrieving information from from uh from your you know your your enterprise data. And finally, the the third um thing that you get from relationships is that uh when analyzed at scale, they they reveal patterns in your data that are extremely important to your business. And and uh and these patterns are are are nothing but uh shapes in your data, right? And I'm I'm going to give you some some examples uh because a shape can be uh a ring, right?
A ring as in a loop in your data. So a link can be representing a uh a case of fraud in in scientific uh research, right? So these kind of situation where uh there's cyclic citations in articles, right? That that can be done for, you know, by teams or individuals that want to artificially augment their impact metrics, right? Or they want to create some form of perception of consensus around certain topics to get benefits in a fraudulent way. So that's a ring and and think of of trying to identify a ring. If your data is represented in a flattened way like the one that Alex showed a minute ago, it's not an easy one, but if your data is represented as a graph, it's going to be more natural.
We'll see that in a moment. Another shape can be um star kind of an hourglass a kind of a shape where things converge into a critical point and that can be representing something like a you know a critical manufacturer that's producing a component that's critical for some drug so for some elements and you have a number of things depending on it. If something goes wrong there some kind of contamination some kind of you know geopolitical situation that can have a massive impact. So you want to identify this kind of dependencies on critical points and that's another shape. It can look like a star. It can look like a like an hourglass shape. And uh another one that's hard to identify if your data is in the form of tables.
I'm not saying it's impossible. You know we we do that and we have to load that into memory. You have to kind of rebuild the graph. It will be a lot of lines of of of Python. But uh but that there's more natural way to do that and and that's what what we're going to see here. So essentially none of the existing systems in your uh uh existing architecture is going to help you uncovering these shapes in your data and translating them into value. So a fundamentally new approach is needed and I'm here to introduce your data's new best friend which is of course graph and graph is simple and and intuitive but at the same time it's really really powerful and expressive. So what you see there is um a very simple data set with all the elements in the graph model. So in a graph model you have two main building blocks. There's oh that's pretty cool. There you go. So there there's nodes the circles that represents things entities like people uh objects locations events and then there's relationships that connect them right that connect them to one another.
And both nodes and relationships can be enriched, can be characterized with with properties. You see how a node has a collection of properties there that represents a person uh and we have the name, the date of birth, the employee ID. So what you see there is a it doesn't really need an explanation, right? So you you see that there's a an individual that works for a company.
It's a particular employee. It's the CEO and the company is headquartered in a in a city.
uh saying that this doesn't really need explanation because that's the way we think about any particular domain right if you get together with your team to discuss any uh uh any subject or even if you go into data modeling you probably will go to a whiteboard and and represent you know which are the critical elements how they relate to each other so it's kind of a natural way of of uh of representing things now I'm sure some of you in the room might be saying he's called that a data set that sounds that looks more like a a visual representation of the data and and and you would be right in both cases because this is uh what we're doing at NEFJ is we're sort of inverting the paradigm so for a very long time we've been storing data the way you saw it in Alex's slide so in tables and then putting a lot of effort in presenting it like that so that our business can make sense of it can can can understand it and and do reasoning on it now I was saying that inj we're inverting that because this model is the one that we use to store to persist to query and to analyze the data. So that's exactly what we the shape of the data as we store it. Then when when you store data like that when you model data like that and you pair it with a pattern-based uh structured query language cipher then which looks something like that um then surfacing the type of patterns the type of shapes that I was talking about becomes trivial and and I'm giving you an example here uh I don't expect you to to be experts in cipher but I hope it's it's understandable. So the I'm using the same color coding. So the red nodes are are are individuals and the blue nodes are companies. And what I'm showing there is the logic that probably LinkedIn or any other professional network would use to send you a recommendation uh on in terms of who you you should be following or which companies you should be interested in.
And what it's doing is the exploration starts at a node that represents you down here. And it navigates your working history, which companies you've worked for over time, and then which individuals uh you've worked with, which which are your ex-colagues from the past. And then we can keep navigating and find which companies do they work for. And there will be patterns, right?
And you see that there's two or three that are converging into a given company, right? So you see that there, this blue node represents a company where three of your ex-colagues, sorry about my pointing here, three of your ex-colagues are working for at the moment. So that's an I understand it's not rocket science but it's an interesting pattern and might be interesting for you to know that hey three of your ex-colagues are now working for snowflake right you might want to follow this company or connect with them and that's achieved in a very concise fashion with a pattern that's saying well start navigating from an employee that's you and navigate the works at relationship up to three levels I've decided to cut it up to three steps away but and then just aggregate and you know the number of of uh of common paths that take you to that company is the strength of the recommendation. So it's a pretty simple expression to capture a reasonably sophisticated logic. Right? So that that's what I what I mean by having data represented as a graph and having a language a structural language that leverage these structural patterns structural features in your data.
Good. So I've talked mostly about the um the exclusivity of graphs but it's worth spending a couple of minutes on the the platform behind the this model platform is NEAJ and I want to talk about three uh performance or three characteristics of this platform the first one is the the flexibility in uh uh in Neo forj we don't need to define up front the structure of your data you can establish some constraints but you don't need Like in the case of relational databases, a perfectly clear and rigid definition of the structure of your tables and the data types. If you think of relational databases, once you've define a table, you h you can only in insert rows that align ex exactly with this with this structure, right? In terms of data types, in terms of required fields and if you want to make that evolve over time, you have to go through migrations, which are not the nicest thing. I mean if you've been in that space you know how data migrations in the relational world are you know not not not the most uh you know nice thing to to go through and and they don't particularly uh add flexibility to the to the process especially in the you know in the early development phases. It's important to be able to to have your u model evolve over time and and grow organically as you keep adding sources or as you keep changing it. Right? So this flexibility is native in NEFJ in the sense that as long as you describe data in terms of nodes and relationships, you're good. You don't need to to kind of formally or rigidly define uh what that uh shape of your data is. Another con another conversation would be is it useful to do that in some cases and and we'll see that it it will be and and it's good to sometimes uh overlay some form of semantic description of your data but you have this flexibility which accelerates and and and introduces this agility in the development phase. The second one is the performance. I was saying that when we invert the paradigm and we store the data as a graph what we're doing is we don't have tables. We don't persist data in tables. we have is kind of a uh let's call it kind of a navigational storage where we have this the the pointers the connections between between nodes they are persisted in disk and then mapped to to memory and what happens is if we compare it to the way relationships are managed in in in in the relational world where they actually are not I mean it's like through four in keys right if you have a connection between a customer and a product what you will have in a table is a reference that matches another value in another table and you would have to scan these uh um these tables, these indexes and find a match and then determine this customer purchases this product. So you have to work these relationships out at query time in graph. You don't have to because these connections are persistent. So what you're doing effectively when you're navigating the graph is you're chasing pointers which is a lot more efficient and that's where you get the performance improvement the speed in in uh in deep traversals. So that's the second characteristic that I think is worth mentioning. And then the third one is the predictive nature of graphs. So um typically uh when we build uh a predictive model we will look at features of of individuals. I mean, I like to quote here a book by uh by professor James Fowler that's called connected and and he takes a couple of examples that are quite interesting and he tries to predict whether an individual is going to is going to vote in the US or whether they're going to suffer from obesity and they he tries to build a a predictive model and it's interesting that he built a much more robust and better prediction if uh the focus is not only on the individual's features but in their neighbor neighborhood and the neighborhood as in the family, friends, uh colleagues and even more mind-blowing is the fact that if you extend that network to friends or friends, the prediction becomes even better. So again, having a graph representation and being able to extract these structural features easily and then inject them on your machine learning pipelines is another great advantage of of uh of graphs. So these are this is the third characteristic. Yeah, we like to think of it as what's important, what's unusual, what's next in in the graph.
But the idea is that we can leverage these structural features and and make use of them in a really efficient way.
Then the second part is um we've talked about these performance characteristics of the platform, but there's also the idea of capturing semantics, capturing the meaning. And there's two main approaches to it, and I kind of uh uh hinted at it at the in the opening. So there's the um what I call more opaque or less less explicit which is the sub symbolic approach which is the one represented by uh neural systems and and uh and the way we capture meaning in this space is by using this vector representation this idea of embedding right you might probably be familiar with this. So an embedding is kind of a numeric representation that captures somehow the essence of something. It can be a word it can be a paragraph. It can be a video. It can be an audio, it can be anything. And um there you can you can pass that whatever to a model and it will give you a vector that captures like I say somehow magically the essence of it. And what's interesting about this representation is that uh it acts as a proxy of semantic similarity. And what that means is that if two things are topologically close to each other because these are vectors and you can compute the distance between them. So if two things are close to each other, it means that their meaning is related is close to each other. So that that's that's the interest of these kind of representations. But then there's the other more symbolic, more explicit, more transparent which captures the same.
Here what we're trying to do is okay I I want to capture the notion that an apple and an orange are both types of fruits.
So in a graph I would put something like that. I would say well these are two entities or two categories and they are sub subcategories and we can put it in a hierarchical representation of the notion of fruit. That's great. That's something that I can understand. And the right hand side, yes, it's kind of built into that vector and it happens to work, but I don't really get it. I mean, it's not for human consumption. Let's put it like that. Now, when we have these two representations, we can now do comparison. Like I was saying, if we compare two vectors and there's two main ways of comparing them to, you know, ukidian distance or or angular distance, proximity in the vector space means semantic similarity. That's great. That just works. It's amazing. I mean we just have to trust the the the model that builds these vectors but that just works. Hard to explain but it works. Now on the on the left hand side you see the equivalent of uh similarity computation but then on the graph space and we have a collection of algorithms that do that for you. It can be done based on the fact that two nodes have a a strong common neighborhood. That's what based on jakard and overlap similarity.
There's if you data is uh structured in some form of taxonomy or hierarchy.
There's there's taxonomy based u similarity metrics. You can see some of them here. But the the important thing on this side is that you can always explain that and and and it will be something like this here. You can say okay the apple is similar to the orange because they have a common ancestor which is the notion of fruit. So that's very explainable and that's that's important because we're going to see how uh the explanability is key especially in the interaction with LLMs. And then finally uh with these two ways of comparing comes to two ways of exploring the data and searching doing search.
Search search in the vector space is uh basically uh you define a point your question your search uh term is going to be is going to define a vector and you will find anything in proximity of it.
So it defines kind of a a circle a space a multi-dimensional kind of sphere and and anything that falls in that in that space would be uh semantically similar in the graph. This search would be based on the graph exploration graph traversals. Now the great thing is that these two approaches are very different but are very complimementaryary and uh and it's great that both are supported in NEAJ. That's not news because that's been around for for a few it's it's going to be a couple of years now. And uh what's great is that uh in a graph you can store vectors. So we can have the model doesn't change. So you see uh you see it here. Now the thing is that um you can add as properties of a node something that's a vector representation and that can be indexed and that's great because now we can combine the power of vector search and vector similarity with that of the symbolic representation in the graph and we're going to see what that means in a moment when we come to um to the retrieval in in uh in AI applications. So both types of um of representations are are present and and available in inj the link to the document there. Good. So this is not a vision. This is a reality.
So I know it sounds amazing but there it's it's something that leaders in in all uh uh industries are already embracing and and 75% of the Fortune 100 are already using graphs and that includes as you can see top retailers top farmer companies some of which represented here today automakers banks and and the list go on Telos etc. But um I want to I don't know how I'm doing for time because I don't have the timer here. How much 10 more minutes.
>> Good. So, let's speed up. So, let me skip these two. I thought they were meant to be hidden. Give me just one second because I want to jump into the um AI side of the presentation. So, I said that, you know, my my thinking is that the future of AI is going to be the convergence of of these two approaches.
and and I'm going to try to explain why I believe that agents or genai in general do need graphs and u I'm sure you've all seen this uh kind of contradiction in a way. So there's a huge promise on the value that uh this new technology is going to deliver and this is just a random uh report from uh from Bloomberg where it's kind of quantified in over a trillion in in revenue in in the next uh sort of uh few years. But at the same time we keep hearing that organizations are struggling to uh go past this prototyping phase and getting projects into production and getting actual value from from these deployments. So why is that? And um and there there's a number of reasons, but the key one is that sometimes there's a bit of misunderstanding about this new technology and and uh and the fact that it's uh non-deterministic.
It's hard to explain. It creates some skepticism from leaders because it's hard to build systems that rely on something that cannot be explained or on something that can behave in different ways in reaction to the same question.
So that's the that's the key. um I would say obstacle to the to the adoption and and the the reason like I was saying before is that we have no idea of what happens inside these kind of uh uh big models. We might get the impression that they can explain themselves but it's important to understand that you can you can I'm sure you all have interacted with with chat JPT ask a question and then ask it to to explain how how it came up with this answer and it will produce an explanation but it's important to understand that this explanation is itself generated as well.
So it's by no means a description of what happens internally to produce the answer. So explanability, attribution.
So being able to yeah these systems know lots of things but they can't tell where they where they learned them from.
Right? So all these things uh create problems especially in highly regulated industries where it's not sufficient to provide answers but you have to explain where do you get the the information from. And um this is a a test. I know we're past the time where we used to do the, you know, these type of jokes, but it's kind of I mean it's is a test that I run every time there's a new release and and this is on on on something one of the is a couple of weeks ago, but it's just a an an example of of this idea of of kind of a hallucination, but it's also the auto reggressive nature of these systems in the sense that when they get something wrong, the answer is kind of used to continue the conversation and it gets even worse and worse. And an example of that is well I ask a simple question like you know give me uh European capitals that are southern of Rome I mean south of Rome and it's very articulate and it tells me being south of a place is having a lower latitude which is perfect and it gives me a list.
I happen to be from Madrid. So I said, "Well, I'm missing Madrid from that list." And I say, "Well," and Madrid, of course, is not there. And he said, "Of course you you're correct. It's not there because uh it's uh uh it's not south of Rome." And it gives me the latitude and basically it's telling me that 40 is bigger than 41. And it's getting it obviously wrong, but it's trying to explain, you know, it feels like it sounds because it's very hot there and and blah blah blah. So it it tries to convince you and it it's you know this I understand what being north of south or south is but if it's an area where I'm not an expert I would struggle to to to to determine if what I'm getting is real or not. So this is a um you know a characteristic that we need to understand and that's where the idea of of um grounding system comes from. So we have to make sure that you know we we can provide domain specific you know our enterprise uh knowledge to these models so that they can provide answers that are not necessarily generated but grounded based on information that we trust and that we have control over. So that's the idea of of of grounding and the answer like I said before and I'm sure you've heard about it is rag at least the one where we are kind of uh as an industry converging and uh and we believe that graphs uh uh can that graph can augment these approach in what we call graph rag and just a a refresher for those of you that uh um that are less familiar with this idea of rag but it's pretty simple I mean and graph is basically is rag with retrieval uh path includes a knowledge graph. But the idea is pretty simple, right? Instead of having an application up here. Where am I pointing?
Oh, here there we go. So here we have a we have an application to which we can ask questions. Instead of having that application just talking to an LLM and producing answers, what we're going to do is have an additional step where this application is going to go retrieve some information from trusted sources.
It's kind of a generalization of the approach that we take when we upload a PDF to chat GPT and then we ask questions about this PDF. I don't want chat GPT to give me random answers from the training data. I want it to generate answers based on this PDF.
Now, if you think of this at enterprise scale, there's loads of PDFs. There's loads of data sources. So, I don't know up front which is the one that's going to be relevant.
How many minutes? One minute. Oh my gosh.
find oh my apologies I need a I need a timer here that's anyway so basically the approach is given a question I need to find which are the relevant sources retrieve them on the fly basically and that's what the retrieval step uh is for and then build them in the answer so that the the whatever the LLM produces is grounded on that information so very quickly because I and and this is what this would look like and and without going into too much detail these days you know one of the standard ways in which we offer for access to uh sources of data is through this MCP protocol and when you build that into something like uh now instead of chat GPT I'm using claude so I asked the same questions question that I was asking before and now claude knows that he he's he has access to a source I'm saying he it has access to a source uh that can answer that question and this source is in this case nej so what it does it will translate that question into a cipher query and retrieve an exhaustive list of of of capitals that would match these answers. So in this case, it's grounded in information that I have control over.
Now, uh that's the French version of it because I presented this a couple of weeks ago. But anyway, the idea is we can still leverage the the linguistic uh you know abilities of these um of these systems, but we still ground it with uh with a trusted database.
Now, I was talking about how graphs multiply the value of your data. So what they the way they do it and I'm going to go fast over that is by giving you a rich collection of retrieval strategies.
So ultimately these applications these AI applications are going to need to retrieve data from your your enterprise systems. Now if we offer a rich collection of strategies we're going to be able to get the data in so many different ways. And I've shown briefly in the example before the possibility of of generating structured queries. In this case it was cipher but we can combine search based on the vector index that I talked about before. We can contextualize that with navigation of the graph. We can do exclusively vector search as well. We can define fixed patterns. I mean the idea is that we can offer a rich catalog of strategies that will make sure that we get all the value of your data that these systems can get all the value of your data. Well I I'm conscious of time so I really don't want to go too much over but the benefits of taking this approach is that we get three things. you get a much better accuracy. It's easier to develop because we saw that as a developer, as a data person, it's much easier to deal with the symbolic representation. I can understand a graph whereas it's harder to understand a vector representation.
And then we have the explanability element. So we can bring uh this additional uh important element which is explanability. I'm going to stop here because uh you know I think I had some additional elements but I'll I'll share the slide with you and I'll hand it over. Apologies for for that but thank you very much. I hope that that was a bit of an introduction and uh thank you
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











