Data catalogs are evolving from traditional metadata management tools into AI-powered context layers that combine metadata, semantic models, business glossaries, and ontologies, continuously updated by both humans and AI agents. This transformation is driven by the need for data democratization, where business users (not just analysts) need to understand and access data through natural language queries. The key challenges in data catalog implementation remain adoption and maintenance, not technical complexity. Companies should focus on providing differentiated value and understanding specific customer needs rather than building generic tools, as the market becomes increasingly fragmented with AI lowering technical barriers.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Why Snowflake Bought SelectStar - and What "Data Catalog" Means Now w/ Shinji KimAdded:
Good to see you. How's things?
Hey Joe, great to see you and thanks for having me here. Of course. Yeah, you've been on before.
Um Yeah, we had a good >> ago. A long time ago. Yeah, you were you were running a company back then.
Yes, and also more earlier stage I think of the company. Maybe 3 years ago. Uh maybe not.
>> 3 years ago. Yeah. 2 3 years ago. You've some big things have happened since then with said company.
Sure. I guess for those who haven't seen me before, I am Shinji Kim. I am the founder CEO of Selectstar. Selectstar is a data uh catalog, governance, and metadata management platform. We were acquired by Snowflake in December.
So, it's been about 3 months since I've been at Snowflake. I am integrating our technology into Snowflake's Horizon catalog.
And we will be actually releasing this new version of the catalog soon.
No way.
That seems pretty fast. I mean, that uh It is fast. Yeah.
I wouldn't say everything is going to get be available all at the same time.
It will be a slow roll of all the features that we have.
Also, there's a lot of existing constructs of Snowflake like tags or contacts that we will be just like, you know, using what exists. So, there's a some of that I guess transition as well.
But in terms of integration perspective, I would say it is a quite fast. But Snowflake also have acquired many companies in the past and I think they have now different aspect of how to try to get startups to get integrated faster now. Yeah, I have a few friends over at Snowflake before, so that's that's a good Yeah, I mean, if you think about how Snowflake was even like when I started Selectstar 6 years ago, the amount of features and capabilities that they've added, like we don't use that term data warehouse anymore. It's a data platform, you know, has all, you know, from, you know, not just like storing the data, but ingesting the data, transforming the data, running machine learning learning models on top, whether and now, you know, running BI and AI workloads, and build applications on top like it's pretty insane to be in this giant machine that has everything.
So, what was that like? If if you can if you can say like what how did this happen? Did they just call you and say, "Hey Shinji, come join us. You're pretty awesome. See you on Monday." How how does this work?
>> It doesn't work that way. Okay. Yeah, Yeah, I mean, it would be very easy to make it happen, but it's not that it's not that straightforward.
Well, I I we've been a Snowflake partner for more than 3 years. We've been a premier partner of Snowflake. 80% of Selectstar customers were Snowflake customers already.
Uh so, we've had some relationships with the product team as well as their business team for some time. And then we've just been in a more deeper conversations regarding future of data catalogs, Horizon catalog, and where they want to go in the market, and it just made sense for us to join the forces.
Yeah, was that I guess the 80% number of customers is that an intentional on your part or is that just sort of how No, it wasn't intentional. I think one of the things that Snowflake did well from the beginning is opening up a lot of their metadata. It's the easiest integration that any vendors can make.
Really, this I think, you know, really helped the Snowflake to have so many partners like, you know, other SaaS platforms or yeah, data tooling platforms to run, you know, or integrate with Snowflake.
I would say the other piece is the piece of infrastructure that we or platform that we fit in is primarily brought on and leveraged by data platform teams, data governance teams, and analytics engineering teams.
So, naturally they do have Snowflake as a platform that is kind of usually the main platform. And don't get me wrong, a lot of you know, out of those 80% of the customers that we supported, I would say about like 40 to 50% also ran big, you know, workloads on Databricks, too.
Interesting.
But I think Snowflake is a has been adopted as a really the main solution for a lot of these companies because it's very easy to use, especially for BI analytics workloads.
Yeah.
Uh if you think about why companies adopt data catalogs and wanting to bring more data democratization, data observability, and data governance, it's because the data analytics team has done a lot of work to pull out insights out of data. And as they are standardizing their data models and want to also equip everybody but else in the company to be able to understand what it means and also be able to find their own answers about data, this becomes more important. So, I think there is a little bit of that that I think also why Snowflake was kind of like the most most used platform out of the all the integrations that we have supported from Selectstar. That makes a lot of sense.
Yeah, I used to be a Snowflake partner, too, but on the SI side. And it was almost too easy to use, I would say.
It's [laughter] like, do you really need SI type of thing?
Yeah, I mean, there's definitely some of that, but it's also easy in the sense where you can onboard yourself. If you I think if you have good uh I think it's a good mirror on like your data chops. If you have good data modeling, for example, like you're less likely to get in trouble with it cuz I think it gives you it's easy to use. It certainly gives you enough liberty to do whatever you want, which is good and bad. And so, we definitely seen teams where because it's easy to use, like, let's do this crazy thing, and then they blow through all their credits. Like, oh, that wasn't a Right.
We shouldn't have done that. So, but yeah, it's yeah, I've been long time friends with Snowflake crew. Sometimes I make appearances at the summit there down in your hometown of SF. So, it's a it's fun time. But it it seems to be changing though, right? So, and I'd love your perspective on this. Like the the term data catalog that was traditionally a BI focused thing. Now, obviously with AI, the whole world is sort of I I think flipping on its head and the ground's moving. And now data catalog seems to be, at least with the people I talk to, it seems to be pointing at something different or means something different now where it's kind of a data catalog, but not just for BI, but also for increasingly agents and I guess whatever comes after that. So, what do you Is that what you're seeing as well? Or Yeah, for sure. I was telling my team that the term data catalog, we may not use it as much after a couple of years.
Interesting. Almost like we don't call Snowflake or Databricks as a database company. Right. Database, data, you know, warehouses kind of get becoming the old term. Right? Even though underneath like the core of where it started was a database, right? So, uh that's I think how I see it, you know, data catalog will become in the next 3 to 5 years.
It will still exist and people will refer to it. It is a core concept that will remain. Will there be a or would we still call Gartner would still call you know, market as a enterprise data catalog? Mm.
>> Well, we'll see because we're going to expect a lot more than just cataloging the data. The term data catalog and also where this come from like from the beginning like I'm talking about like, you know, let's say like '90s or '80s, like it's to inventorize all enterprise data.
This is one place you can go to search what you have inside the database, and it becomes an enterprise catalog when because you have multiple data systems, you know, not just one database, but like different types of databases, tools that you're running. I mean, this is like where, you know, like IBM data catalog or Collibra, a lot of these Alation like these uh companies were focused on. And with the modern yeah, more modern stage of data catalogs, I think this is was really introduced to because companies are adopting more data-driven nature, wanting to democratize their data. There is an explosion of data models and also also the BI, data visualization, data analysis that's happening. It is important to standardize your definitions and models and also make sure that the meaning is meaning is translated correctly throughout the organization. And with AI, I think the the where we will be calling or what used to be a data catalog was like more of the terms that we're starting to see in the market is this context and context layer. Mm.
Which is a primarily built on top of data data.
Whether the metadata is a data that was provided for or that resides within data catalog or uh metadata that you are generating on top of your existing data. Whether you're because you're summarizing different files and putting a specific attributes of every data set into a format that you want to ensure you're preserving and leveraging that context for. This context layer where any agent or system or tooling to be able to access and continuously update and maintain, I think is going to be kind of like the next generation.
Yeah. Platform coming up.
Now, I'm curious what you think about this aspect.
>> Well, coming from I guess more I guess data engineering and also also as a practitioner and someone who's implemented a lot of the workflows >> Yeah. of the data teams and companies.
>> it it is an interesting time that we're in cuz you mentioned context layers and stuff. Oh my dog has a um so yeah, context has become the word du jour. I sort of poked fun at it the other week uh >> [laughter] >> with both a few articles. But I think underneath all of that, there is a kernel of truth that we're going to have to start moving beyond the the ways we've been doing things and the tools we've been using and the way we've been describing our work. I think you're absolutely correct that the term data catalog, it'll still exist in some form and it'll still be a catalog of data.
But it'll be different types of data, probably not just tabular data, I'm guessing, but other types, unstructured data, ML artifacts and all these other things that we traditionally sort of just said, "Well, we don't need to worry about that."
Right? And so I think the catalog will expand, but then there's just other things, too.
I'd love your take on this. To sort of reverse the question, but you know, you're we're hearing semantic layers have been around for a while in some manifestation and that's becoming another term du jour that we need semantic layers and ontologies and taxonomies and all this other stuff. And I'm like, you know, depending, maybe maybe not, right? And knowledge graphs is the other one. And so I'm kind of wondering, okay, do these Is there an intersection of all these things in the catalog layer? Where do these things reside? I think What I have noticed is the discussions have We've had the data world sort of over here doing its thing for ages. We've been doing basically the same thing since Bill Inmon invented the data warehouse back in 1980-something and then you know, the practices have evolved, the platforms have evolved, but we're still fundamentally doing the same thing that we have. We're getting data from source systems, making it available for analytics.
That's kind of the the game. But now that's changing a lot, right? And so you know, but then you have a data over here and then a knowledge world over here. Why is I cover my face? And the knowledge world is This These have been separate camps traditionally. This world's focused on meaning and semantics, you know, and then the semantic web comes out of this community and so forth. These worlds are colliding now.
And so it's it's a fascinating thing to watch where uh you know, the word ontology really hadn't been used before. Now everyone seems to be an ontologist all of a sudden. It's just a it's it's um you know, so So it's it's fascinating to watch. I don't know. I mean, I'm trying to you know, one of my favorite things to do is talk with AI about the systems it would build if it were to start from scratch and we it didn't have sort of the uh the baggage of dealing with uh the systems that humans have built.
And it's a fascinating thing to to discuss with AI, like Claude, uh what it would build. Now obviously it's just a bit of self-serving and based upon my chats with it, you know, it kind of it's my own bias there, but it's like, yeah, according to at least what it tells me, it would want some sort of a semantic grounding, some so- something that's probably represented as a graph that updates in real time, something where it can agents can go to, you know, work with existing context, but also update it in real time and that's shareable among other agents. And so I don't know what if that is what the data catalog becomes, but at least that's what Claude tells me that's what Claude wants. So it's interesting.
>> Yeah, I mean, conceptually that is like also what I was referring to when I was saying context layer.
Exactly. There are I guess a couple aspects of this. So as a data catalog, you know, you are or we are continuously pulling metadata from multiple different systems to keep this up to date. Um but it may not be like fully up to date, you know, maybe a couple hours delayed.
But like it's not the the biggest deal.
That can certainly be more real time. I think there are some systems that are more push-based.
On top of this, there are um uh I would say this more business layer information that uh users add directly.
That it that is more I would say, you know, ad hoc, but real time and you know, is true. And this is the piece where agents can really come in to also continuously update.
And now the input of or uh the trigger of adding these uh business metadata, business uh context, however you want to call it, whether this is maintained in a form of ontology graph or business glossary uh or some context file that summarizes different business processes. These are actually I think you know, there there are multiple I think layers or types that users should be able to add.
But some of them will actually continue to get, you know, evolved or get updated by agents is is uh how I envision this.
For the construct of what can be done within catalog, this has traditionally been represented as a uh a business glossary where you define the company-specific acronyms, metrics, perhaps processes in a documentation.
And these definitions will get used throughout. It's a way to standardize your documentation and be able to also attach them to the actual data sets and uh the fields that you have.
And on top of this, uh I would say there is also you know, there can also be a form of uh this ontology or quote-unquote business context that can be more unstructured.
It could be you know, MD file or it could be um yeah, in a graph that uh has a further explanation around the business processes, relationships between the business entities uh that may or may not always refer to a specific table or field. And I think everyone's definition of how this like looks like may be different. But also what I've seen and where I'm coming from is that that specific format, whether that's a graph or text file or something else, AI doesn't really care.
It will be able to consume and understand.
So it's it's actually for humans to still be in control, to review and easily like decipher that the context that uh the system's managing is correct and is true to the actual business processes and intention. Yeah, so that's kind of like how I envision and and where we're heading because of from Select Star, we started from the metadata layer.
We automated metadata updates and also driving analysis of on top of metadata and logs and identify like our own operational metrics like popularity score or column level lineage. We will build or infer and build out an entity relationship diagram for our customers even if they didn't have full ERD before. All of those does become a really amazing context for AI agents to consume because it also comes with like per table like you know, what are the most top used queries, joins, like that are I wouldn't say like verified by human, but is a very close to that because you know, we will like you know, review of like all of these and kind of like based on like you know, how kind of trustable this is. Like if this is a query that runs often and and it's powering an official dashboard or chart, then this will be one of the queries that will be suggested for a table for example, right? So those are all really helpful. Uh but uh in the last uh year, like in uh yeah, early 2025, we launched our uh version of semantic models and semantic model management in Select Star because we have recognized that there is a need for customers to specifically define actual like business metrics on top of uh the, you know, pure physical metadata like data dictionary.
So semantic models will define metrics and also form of quote-unquote um the logical model on top. And that basically gives a because if you think about like a lot of the data models, so you also mentioned it uh earlier. We live in a lot of data models that are currently uh defined and built for analytics. The reason why we have uh uh the the dimensional modeling is for data modeling itself. And then when you get to like, you know, uh when I say analytics, if you think about reporting, a lot of this aggregation and everything that we do or create a metadata layer architecture uh for example, uh it's so that we can actually uh uh drive calculations faster and also more efficiently. Mhm.
Whereas uh if you get to the semantic model, this is really actually more about the core of what are the entities, fields, and the metrics that you are trying to compute. That is a a So hence it's not usual it's not very usual that a lot of the traditional data models have this.
Uh semantic model is a way to define this on top of, you know, all the, you know, models that physical data models that you already have. That's how I saw it.
Um and and I do also and so there are I think uh there can be another ontology level on top.
But semantic models uh I believe it is actually evolving also, you know, cuz there is also the uh open semantic interchange initiative.
And there's a lot of attributes of these semantic models like defining relationship that as, you know, one-to-one or one-to-many, many-to-many, but as more of, you know, that relationship between the entities is more of a business relationship. Mhm.
You know, has or, you know, in relationship with like, you know, something that can be actually customized. That hence I think semantic models could be that layer.
But I also understand that for some specific use cases and businesses, there is a need to just write out and define the business processes. And these businesses processes, I think as you build up this layer, the more you can refer to the source layer.
So, for semantic layer to refer to the physical layer, and then for semantic layer to be referred by the ontology, I think that part is important.
Because it gives you it gives you an the agents an understanding and a better explanation when it's answering questions to the end users.
The biggest piece that I've like really like try to like wrangle like I had a wrap on my my head around was how do we um plan for uh not necessarily correct, but appropriate responses for our agent when we do not we when we cannot 100% predict what the end user is going to ask. We can always say, "Oh, this is the only data we have, so, you know, I cannot answer that question."
Most of the time LLM is going to try to answer that question and try to explain however it, you know, comes down to.
That's that's how that it's designed.
So, obviously the eval and tracking the responses are important and this is like where I'm coming from regarding the agents actually updating the context.
Part of this should come from the customer input and the response. That should be evaluated and whatever that context needs to be updated, that should go directly into the context.
But I I feel like still the documenting and contextualizing the business process and how things work outside of just the, you know, how the data bottle has been implemented is still very helpful for agents to be able to explain or fathom why they why their answers might not be like 100% correct.
Yeah. And um yeah, this just really comes from us also from SelectStar iterating a lot on our own AI agent that answers business questions to any end users. Because a lot of the times we At the beginning we started to help the data team to find data and understand the data. But a lot of our end users actually became more business users looking to get answers to their business questions directly from SelectStar just because we're we're connected to data and we were already processing and building the context for our customers.
Oh, that's interesting.
So, it went from analyst to kind of just regular uh non-technical business users.
Yeah, it's it's actually really interesting. And also it it also really became mhm two different types of users asking different questions and also some customers are still preferring to see the UI because it gives you like all information at once. Whereas some users preferring to now use only the chat interface because they don't want to click around. And they just want the answer in front of them. But yeah. How did you get to your business users?
>> Actually, one more piece on this is the MCP server.
>> Ah, yes, yes. So, we have number of customers that have built their AI BI solution on top of our MCP server because they were able to, yeah, get the context or they built their semantic layer like Snowflake semantic views using the MCP server because they'll use cloud uh or and use our context to find the right queries, the most important queries, and they'll review them and then, okay, like, you know, now let's generate the semantic view on top based on this. So, how do I know if the users are business?
This is a so, part of Well, I guess I'm not sure if this was our intention, but for a user management perspective, whenever, you know, someone was signing up for SelectStar, like they would get an invitation from their admin, someone, right? Most of the time they will have a team that they belong to. So, we can look that up. And also we've done a lot of like user interviews and many of them turned out to be not necessarily a data user.
Also, each of the users like there are kind of like if they've ran queries or any of the activities, these also show up on the on their interface, which kind of gives an indication how technical they are. Do they actually run queries or what's their main consumption model of data almost? It's really interesting. Do you do you think that that's going to become more widespread?
I'm not so sure whether that is Like one is um So, for example, companies like Snowflake, they have very strict, you know, restriction around which data gets shared and that what data is actually being customer data is being used automatically to generate all of this.
>> Yeah. So, I'm not so sure how this could be actually implemented in a company like this. I think for internal customer I guess I can imagine that some data teams would run this analysis on their own. But I guess like for I guess I'm curious to understand where you're coming from. Like are you saying that could this be more or this type of approach will be used by more tools to personalize the experience of the user or are you asking yeah, something else?
Yeah, it's more of the type of user that will be using these tools in the future and the present, I suppose.
Traditionally, it's been the the data team or the the data person, whatever that is. I have no idea. Could be a number of things, but as the tools become easier to use, like say you you can just use your phone and chat with your data. I mean, I'm seeing this already where people that I would never have expected to be using AI to answer these to do analytics-like questions are are having a heyday with it right now.
Yeah. In addition to writing solutions on top of their their data and so forth. So, I think that's that's real fascinating to see. I have many non-technical friends who have suddenly become very technical in quotes.
And it's and it kind of makes you wonder in the future like, you know, if you're a vendor, for example, who exactly are you trying to target? Traditionally, that has been the data team. In the future, I would think that that scope probably widens a bit, which would be interesting for a companies like Snowflake, for example, that have a consumption model that's predicated on using a computer. I wonder if that starts increasing because the addressable market just went up.
I think it depends on the tool. Sure. Um so, for uh I would say I'll give you like a two example. Like for SelectStar, we've always had a goal of supporting uh business users in mind. Because we wanted to help all data consumers. If it's only the data team that's leveraging the catalog, it's not going to be as effective or helpful for our customers. It has to be Yeah, at the end of the day, anyone that's like looking at a dashboard, they should be able to also look up those definitions and be able to like encouraged to also, you know, find and understand other data as well and use more data. So, that's one piece.
Whereas with Snowflake's Horizon catalog, Snowflake overall Snowflake interface today is designed for more of technical users. Yeah.
>> Data analysts, data scientists, data engineers are the core users and you can see that throughout the platform. And Snowflake Horizon catalog was also fairly technical catalog in the past.
And now we are kind of changing this with SelectStar to bring more of this aspect of business catalog uh and and part of this is like almost like a redesigning the user experience and also will have more features to cater for more business users to use. I'm sure there are and and this is like a you know, conscious decision on Snowflake part to do this. And I'm sure there are many companies that are also starting to make this change because especially for data, I think data is very versatile in this way that it doesn't being able to write SQL doesn't mean that you're the only one who can write use data. Even if you don't write SQL, there are a lot of data consumers in the BI tool or I mean, even for company a lot of business professionals that leverage Excel on top of, let's say, SAP or different ERP systems.
These people are usually the subject matter experts in finance, operations, and supply chain, and they actually are the business data owners in a lot of enterprises. So, I think supporting these user personas are are also important. But going back to what you were saying, AI certainly lowered the bar of the data access and uh So, I think more companies are aware of that. And then and then the the other piece is now like will they decide to uh expand you know, their user persona support because of that. I think that also comes down to now inherently, what are the users using this tool for?
Mhm. And does that still apply to this new persona or no? Yeah, persona being not just humans I suppose too. So Sure, yeah, but the intention right? Like if the intention is to write ETL jobs that's probably not going to be applicable even if you know, you don't have to write the code and you can just use the agents. It was in my my Discord group a bit ago somebody was asking about They wanted to make an no code ETL tool for non-technical users.
I was like this is a really interesting proposition and it has to be open source and it has to be easy to use and it would save CSVs and I'm like this is a very interesting and strange use case that I've not yet encountered. This person I think ended up vibe coding it using DuckDB and react or something.
But it was one of the things I was like why would a non-technical user need to do ETL? This is very interesting. I don't know why.
I'll tell you why.
One big what should I say?
I don't want to call it industry marketing is the best use case. Why?
Non-technical users need to do ETL.
There are many tools that is designed for this like visual UI.
You know, you get data from your Google Ads, Facebook, better YouTube. You need to aggregate them together. You need to have your metrics.
This is a pure ETL job that a lot of marketers have to do and want to >> would do that. Yeah, I didn't bother asking what what industry this person was in. So I I don't know I don't know the context of the request but it was um it was what is interesting was it was the request itself is interesting. I think you're right it might be marketing but also the fact this person is like I'll just make this by hand or using AI and just give it to them and and so forth. So yeah, it makes a lot of sense.
But and then you're seeing obviously you know, people non-technical users also just like vibe coding and spinning up their own solutions these days. So I'm not quite sure what what this means if people are going to spin up their own data catalogs just for funsies or what happened.
Yeah, I mean I think their customers building their own data catalog has always existed and we may see more. I actually saw one this morning as well.
Oh, interesting.
And in a way that type of customized version will always have more features and always have more workflows catered that to that specific company. Now, would that actually scale? Will that also be able to support all different types of users within their company?
Right.
>> That that is a you know, different question, right? Were you commenting more on how I guess many companies can now build their own startup services or were you talking about more of like a lot of companies are kind of build their own custom tools now with AI?
I think it's sort of a mix of the two. I mean you're starting to see this more.
It just that's why this has apocalypse is happening or whatever you want to call it these days. And so but I I think that it like for something like like a data catalog, you can build the solution but I always I'd love your take on this.
When I see data catalogs being implemented typically some get used, some get populated. Most of them get populated. The usage is a different story, right? And then so maintaining it has always been one of those giant question marks. Will you populate it? Is the data fresh? Are the meanings still hold? So even if you can vibe code your data catalog, it still doesn't remove I think one of the biggest bottlenecks that I've seen which is just freshness and so forth. What's your take on that? What I guess a broader question is where do most data catalogs go wrong?
When they're implemented. When they're implemented or when they were built in house?
Either one. Could be something they buy or could be something they build in house.
Yeah, well I guess there are similarities and differences. First of all like if someone implements their own data catalog then you've already pointed out adoption is one of the biggest problem. Usually these are become just a technical data catalog. It has all the documentation but then do everyone actually use it to update their documentation or look it up? Do people come back like There's a lot of things that that teams don't necessarily want to take care of but having to. So I think it's I think this is why customers look for a more off-the-shelf solution or solution that is designed for that purpose. We've seen many internal catalogs you know, and these teams coming to us wanting to replace that with a purpose-built tool the the real tool that will be automatically updated no maintenance work required and also designed for more users to use well and get benefit out of. Regardless of yeah, even if you're buying a third-party solution the pieces and areas where data catalogs fail the most would be on the adoption side.
Adoption can fail in multiple ways. One because you actually didn't put as much effort other than buying the tool. So if you there's so much that the tool can do. The tool brought on everything and it can show you a lot of different things calculated automatically like popularity score. We can even document your data so on and so forth.
If you have but but if that data if you have thousands of tables and you know, you just have access to the catalog. It is still important to put some curation in the data to put some annotation of which data assets belong to which data which business domains. So who are like the main owners and stewards of these assets. And now a lot of these can be automated but some team should spend some time to denote those meaning you you you should define your taxonomy of how you're going to organize your data.
And this this should be very obvious you know, to the end users consumers to follow. So that's one. And then the other piece is you know, which is also about adoption which is If you don't tell everyone that this is the single pane of glass and this is the source of truth of data documentation.
If you don't tell them, if you don't tell them to use it then it's also not going to get adopted.
Turns out.
Yeah, it it's because also a lot of teams who are used to oh just like oh let me ask so-and-so about this question or yes and in a way you also have to teach people to fish or where to ask that question, you know.
Absolutely.
>> So I think this those are the two main things I've seen.
That's really interesting. Let me let me ask you this is something I like to ask founders especially right now but if you were to start like start today how would you approach it?
>> Approach the product or something else?
>> Yeah, and starting a company everything.
Huh. It's a big question but you know, Yeah, I'm also curious to hear what you're noticing in the market and how I you know, if you see something that's different for founders building their businesses.
Um Yeah.
For obviously I mean product building perspective there is just a lot more that can be built more quickly. At the same time you know, reliability and getting it to scale I think is all still challenge that a lot of companies will have to get through.
But at at a at a very early stage just uh I think that the big piece that I would focus on is narrowing it down to how how much of this like value we are driving for the customer.
So when I started Select Star six years ago I saw a gap in the market between open source solutions and enterprise data catalog solution.
There was nothing in SaaS and also there was a trend of companies adopting quote unquote modern data stack which told which you know, got me thinking that this is one of the aspect data observability, data cataloging, data governance. This is a piece that many more companies outside of enterprises are also going to need and run into and this is like why I decided to start Select Star. Whereas today I think there are like a lot of these kind of quote unquote market needs are getting very fragmented because of everyone has this ability to build you know, whatever tool that they need.
So I think getting down to the what specific value you can deliver as a I don't know if I should call it off-the-shelf tool but you know, Yeah.
Whatever system platform you're building. I think it will have to come down to like the aspect of how much well not necessarily just faster more Yeah, the the differentiated value that you will bring to the table.
I think it becomes more important.
And then because you asked me a question as more of like if you were to start a company you know, start start like a startup again. I think the go-to-market distribution and this is the other piece that I think is quite different the just the landscape itself. You see so many companies that are getting like overnight traction quote unquote more that way more than before and I would say more enterprises and B2B perspective people are very wary of buying tool. So I don't know what is actually the you know, best way right way to uh Yeah. cut through that noise for startups today?
If you the tool is primarily a B2B enterprise solution. Yeah, I do it.
There might have been or you might have to find a way to make it as a more PLG type of motion. Mhm.
But I don't have a really good answer to that part yet.
It's funny. I I don't think I don't get the sense a lot of people will do. It seems that the playbook from back in the day parts of it still holds. Distribution's a big one that I I hear a lot where it's it's easier than ever to start something, but that's exactly the paradox where everyone else can do the same thing. So, as you point out like what differentiates you from you know like your kid basically making a competing product against you cuz like the the product is I mean I could literally go ask my teenage son to make a data catalog right now and he wouldn't have no idea what it is, but I'm sure he could build one. You know, I think a good sense of taste is another thing, but it seems like nowadays you'd want to start small, stay as small as you can, get growth. Um I got I asked I asked this to uh to George Frazier from Five Tran recently and he asked him if you were to start Five Tran again, what would you do? And he's like, I would probably just stick with the three founders, the three people that we started with and go to like 5 million ARR.
And then maybe hire, but just have agents run everything.
And so that was uh that was interesting, but yeah, it's it's a hard one.
Yeah, that makes sense and I think that's how a lot of companies are yeah, thinking about it, too.
Um and the way that your product may get differentiated will come from like your years of experience and the intuition that you have around the product.
Obviously, I'm sure there might be a way time that AI buys our first surpass that, but AI still needs like skills and ways to like really, you know, like you have to train and customize AI so that it fits just like you have to retrofit it to your purpose.
The tool that you make or the startup or service that you make is something that, you know, if if you have the expertise, you know what is more important than others than even AI. So, I think at least that is one aspect. In terms of the execution, pure execution, I think yeah, I think the leveraging agents properly is probably the way to go at least to start, but I also still think that yeah, small team with the small team that knows what they're doing with agents would be really powerful to start a company on.
Yeah, that's awesome. And closing out like what are you most excited about over the next year? I think this space is changing quickly and I'm uh yeah, excited to see how the management of context will come.
Um There when we did our fireside chat last week, there was a question about oh, what if like the metadata gets polluted because it's coming from multiple sources, there's like a conflict on on that. I think there is a a bit of a cleanup that definitely needs to happen in relation to that, but earlier I was talking about how this like context management system should have agents and ways for these contexts to be updated almost like automatically automatically because your data is changing.
The way that the data is being used is also continuously changing cuz I feel there are new queries coming in and all that.
As users are using users are interacting with let's say data analyst agents like when your data consumers are interacting with them, you will continuously get also more input and ways to evaluate that. That I think all come being combined together is going to be kind of like next level of system that I think will be really powerful. It's just because like I've seen how much we can do with just the the data side, this like next phase with uh more of this user input and then also continuously updating these into one system is something that I haven't seen, but is something that I'm going to come related to yeah, semantic model and everything else. So, that's something I'm excited about. What about you? I mean I have a lot of things to be excited about. Yeah.
Probably too many things. Obviously books, all that kind of stuff that I finish.
Excited over the next year. I think the biggest question I have and I think I'm excited about and also sort of dreading is like where how good do the models get over the next year? Like what's the capabilities that we're going to look back on and say, okay, that was an interesting inflection point. Sort of like what happened in November late November of last year when Opus like it was 4.5 came out and suddenly if you're writing code, crap like that just something's something flipped and that became not just good, but like really really good and it's sort of this collective awe.
So, I think that's what I'm most excited about is sort of what is next if it happens. It may we may have just hit the wall and this is it. I don't know. I don't think so, but but just how how good do these models get? How much capability is I I think that's the biggest thing that I'm looking forward to as well as potentially dreading. Well, who knows what the I guess what Anthropic supposedly has a new model coming out that was I'm sort of leaked. Right. With their their CRM or something like that and Cloud Code recently also got dropped accidentally.
Oops, but you know, but you start getting glimpses of of these models.
Like I guess the newest model is supposed to be a major cybersecurity risk or something like that. Like it could just take out a lot of things and so that's yeah, I'm just curious like what the hell these models look like say by year's end versus where they are today. That's fascinating. Yeah, it's a very exciting and also scary at the same time. But uh it's it's the world we're in right now. So, why not have more interesting adventures going on in the world right now. So, never a dull moment. It was good to see you. For people who want to learn more about you, how can they find you?
I am on LinkedIn. So, >> Okay. I think just find name and yeah, if anyone's on Snowflake, they should look out for the new Snowflake Horizon catalog.
Sick. I'll be doing that, too. So, well, CJ, it's always good to chat with you and yeah, thanks for thanks for the chat. So. Thank you, Joe.
All right. Sounds fun. Yeah, thanks.
Bye.
Related Videos
The #1 Reason Your Top People Keep Leaving (How to Fix It)
Entreleadership
470 views•2026-05-29
What Happens After A Motorcycle Dealership Shuts Down?
FastestWay.1
374 views•2026-05-29
The Evolution of DSP's Pokemon Unpack-ack-acking Grift
Toxicity_Unmasked
2K views•2026-05-29
Help re-structure my finances, I want to buy a house, save and invest
JennNxumalo
2K views•2026-05-29
Asian Paints Q4 Results: Revenue Beats Estimates, 5 Key Takeaways For Investors
NDTVProfitIndia
111 views•2026-05-29
Trying to Afford Vancouver on a Single Income | $2,550 Mortgage
chelseaspursuit
308 views•2026-05-28
Are you busy but still feeling broke?
TaraWagner
305 views•2026-06-01
7 Nigerian Stocks That Could Explode Because of Dangote Refinery IPO
femiakinwale9269
478 views•2026-05-29











