Zero-copy lakehouse architecture enables fast, bidirectional access to Apache Iceberg tables by leveraging open table formats that store metadata directly on object storage, allowing data to be accessed through multiple engines without vendor lock-in. This approach uses a catalog layer to manage table metadata and enables interoperability between different data platforms like Microsoft OneLake and ClickHouse, where data can be queried directly from the lake or loaded into high-performance engines for real-time analytics, with the industry trend moving toward catalog-first architectures that unify data representation across platforms.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Zero‑copy lakehouse: Implementing fast, bidirectional Iceberg access for scale workloads
Added:[music] >> Everyone, welcome. Uh we're going to talk about zero copy lakehouse implementing fast bi-directional iceberg access for scale workloads.
Uh my name is Kevin Liu. I work on Microsoft uh Fabric. I'm also a member of the Apache Iceberg PMC. With me is uh Melvin.
Hey.
He's the director of product management at ClickHouse. And today we're going to talk a little bit about uh the integration between Microsoft and ClickHouse as well as um you know, open table formats, Apache Iceberg, and uh things related to that.
Just very quickly, I want to spill give a spiel about Microsoft's uh data platform offering. So, has anyone heard of Microsoft Fabric?
Okay, that's good enough. Uh Microsoft Fabric is essentially the data platform uh offering from Microsoft. Uh think Snowflake, Databricks, uh competitor. And uh we sell data products and data storage and data engines. Uh OneLake is a component of the uh of Fabric. Uh it is a unified storage solution uh similar to object store. It is a way to unify everyone's data under the sun. So, this is for all clouds, on-prem, all the databases, anything you can think of.
At least that's that's the goal.
So, OneLake data once uh inside of OneLake is available to every single other product inside of Microsoft. So, there's a lot of logos here that you can probably recognize. But, the the idea is you store it once and uh make it accessible to everything inside the the data platform itself.
And not only do we it to be uh do we want the data to be accessible internally, we also want the data accessible externally because we recognize that customers have different wants and needs and we want people to use the best engine to do the best work.
So we are integrated with a lot of the logos here. ClickHouse included. So you can see there a bunch of different vendors, open-source solutions.
The goal again is to make data actionable and accessible whether or not it's inside or outside.
So going a little bit further into what actually is this you know what actually is OneLake? It's a single unified SaaS data lake. So think object storage and then we build a product on top that makes your data available to any of the engines platforms inside of inside of Fabric.
So think you have multiple S3 buckets.
There's just one bucket for everyone.
And and that way you can access you can see the same thing your organization access the same thing.
And OneLake itself is a logical lake, right? The data can be physically stored anywhere around the world. It can be stored in different data centers, different clouds.
The same idea is just like you want data to be available when you need it so that it can be actionable and valuable.
And you know data mesh all of the all of the it's like what everyone's talking about and data mesh is once you get your data in in in Fabric and OneLake, you can use it in different aspects and different grouping. The underlying data, the representation of data, still exists.
There's no copying. You're just moving, you know, what you're referencing around.
And we unify everything through shortcuts and mirroring. You can shortcut what we call, like, you know, if you're on Windows, you can create a shortcut from here to there. It's just a logical representation again. And for things that we cannot shortcut, we do mirroring. We bring data into OneLake so that it can be used throughout um throughout the different use cases.
Okay.
So, that was that was the spiel about the product. I'll I'll get to the the fun part.
So, as you can see, part of unifying OneLake is unifying data outside of the system.
So, we have shortcuts and mirroring through uh internal Microsoft products as well as external. So, anyone any customers with you know, other platforms, other products in different clouds is able to unify their data within Fabric and OneLake.
Um And then, as you're bringing the data, we also provide additional functionalities and values. For example, shortcut transformation when you're bringing your data, you can run a light ETL on top so that by the time you're you're using it, it's already in the right shape and you don't have to create additional ETL pipelines on top to make the data useful.
And we have this for a lot of different use cases. Very common is, you know, different formats, file file format to table formats.
And it's just a really quick and easy way to get value out of your data.
And one thing we really care about is interoperability. Uh this is part of the reason why I'm here and talking to you all the ClickHouse folks over here. Uh interoperability is really important to us. We want to make sure that once you get the data inside of uh OneLake and your data lake, that the single rep uh unified representation isn't locked away inside of Microsoft. So, we want you to bring all the data in, but also be able to bring the data out and make it useful.
And again, this this is why we have a lot of partners, a lot of open source solutions, is that we want customers to use the best tool for the job. And we recognize that, you know, you can lock away the data in the platform, but at the end of the day, you can't stop someone from using another better solution.
So, um the way we do this is through both uh access at the access layer and um the engine layer, I would say. So, the OneLake itself is a object storage.
So, there's a set of APIs uh to access OneLake just as how you would access um ADLS, for example, S3, GCS. It's just an object storage that sits uh inside of Microsoft. The other way is through open table formats, where well we'll talk a little bit later, but the idea is you have these set of building blocks that everyone recognizes and everyone can reuse. So, if you go to any of the vendors or the um the other platforms, open source, everyone recognize uh how to use and access and access this data. So, for example, um you know, open source softwares understand how to go to S3 and talk to S3. Similarly, uh they they understand how to go to one lake and get the data from there. So, we're enabling all of the use cases uh through integrations and partnerships.
So, a little bit history about one lake.
Um there's kind of two prominent table formats in the wild. One's called Delta Lake, the other one is called Iceberg.
Uh historically, one lake has been using Delta Lake as its primary primary um table format. It's still the case today, but for interoperability, we would also like to add uh Apache Iceberg as well.
And this gives uh our customers and our users um options to choose whichever format that suits their needs the best. And uh under the hood for us, uh we hope customers virtualize the table metadata from one to the to another. So, again, this enables more interoperability no matter which format you're in.
And we we do that through another Apache project called Apache XTable. Basically, it does the translation so that, you know, your external engines or external platform understands that specific protocol whether or not it's whether it's uh Delta or Iceberg.
I'm a big Iceberg fan. So, the rest of this stuff is about Apache Iceberg.
Does anyone hear about Apache Iceberg?
If you're in the data lake space, it's been kind of uh much talked about in the last couple years.
Uh so, you know, just just to level set, Apache Iceberg is a open table format. Uh and it's designed on top of object storage.
What does that mean? The way I think about it is um you know, you store your data in a bunch of files, right? They're parquet files on the object store.
Uh they're there out in the cloud. Now, do want more uh you want more functionality out of the table out of the file format? Uh things like, you know, I want a group of files, right? I want a common schema.
Uh and this is the problem that Iceberg solves is given given a bunch of file formats, can I make a table out of it?
Right? And this table metadata actually lives directly onto the lake itself, right? So, it is in S3, it's in ADLS, it's on one lake. And the the reason why that's um a great thing is that now your data is not locked in to a specific vendor, it's not locked in to a warehouse, it's not in a database somewhere, and you can't take it out unless you do some kind of ETL. Right? It is on on your lake inside your object store, and you can take that wherever you go.
Right? And that that's kind of the the from my opinion why customers are really interested in open table formats is this promise of interoperability and no lock-in.
Now, another reason, I think, is that there's a great community around Apache Iceberg. So, uh this is uh a a screenshot from the summit we the conference we had this year uh specifically for Apache Iceberg, which is an open source project. Uh 600 attendees is a lot of people for especially for open source uh software. We had 80 sessions from a lot of the companies uh tech companies uh you'll recognize. We had 100 speakers, 30 plus sponsors. So, there's a really big community working around this open source software, and it makes it so that it gets better for everyone.
And, you know, just to drive the point uh through, it's not just for the users, the vendors are now uh getting on board as well. This is uh the sponsorship for the the summit for open source summit.
These companies are, you know, paying money to be part of the to be part of the community, and you can recognize a lot of uh names on there.
Uh so, all of the clouds are there, Snowflake, Databricks, ClickHouse. Uh essentially, a lot of companies are working um with the Iceberg community so that they're able to um join this kind of interoperability ecosystem, where data can be taken from anywhere in the world and uh come into their platform, and they can start uh with a with a data state that that is already out there.
Yeah. I already talked about this a little bit, but essentially, there's two parts to Iceberg. One is like the layout of the metadata itself. This is all on the lake. So, essentially, it's just a a metadata layer on top of uh your your files, and then there's there's a catalog somewhere.
And then this catalog part is um where a lot of the interoperability are happening right now. So, essentially, we have a catalog that tells each of the clients where exactly uh the table is.
And then the uh the the clients themselves understands this Iceberg protocol, and they can read the data directly from the lake as a table. So, different from your traditional warehouse approach, in which you log into the warehouse, you grab some data from the warehouse, the warehouse runs everything, and like gives you back the data, this is saying, "Hey, everything lives on the lake.
Here's a protocol that is an open standard. Go and read it." And um the the catalog itself is a way to tell you here's where your data is.
And this is where um a lot of the innovation around interoperability has been. And this is where our partnership starts uh between Microsoft and ClickHouse.
So, um from the OneLake side, we launched this uh product called the OneLake table endpoint. Essentially, it builds upon what we already have, which is object storage, and it gives you a catalog layer that tells uh you know, external engines, external uh partners that share all of the data that I have.
And the integration becomes go look at my uh table endpoint, my catalog, figure out what tables exist, and then you can use that and then go directly into the lake to uh read data in open table formats, either Iceberg or Delta.
I have a few demos on uh this working with open source open-source software, but I think I'll just skip through it because I think the ClickHouse demo is more interesting. Uh but essentially, we built an interoperability layer that works with every um every open-source engines out there that speaks Iceberg because there is a standard and a specification.
It's almost like a protocol that everyone understands. So, we're we're instead of making integrations to each of the engines, we speak Iceberg and Iceberg protocol, and so does everyone else, and that's our integration. So, it works with uh DuckDB out of the box.
It works with Spark out of the box.
Uh it works with PyIceberg, which is in the Python ecosystem, out of the box.
And it works with ClickHouse.
Um I have a demo, but I think it's better if we do a live demo.
So, I'll give it to Melvin.
>> My demo is on top of master, so it's going to be a dangerous demo. I hope it works.
I compile the pull request this morning, so I'm going to show you a couple of things.
So, this is our cloud.
Um Who is using ClickHouse Cloud in the room?
Okay, a few people. Um okay, so for people that don't know about our cloud, so uh this is what our cloud look like.
And um in this cloud, we kind of uh have a tool called ClickPipe, which give us a way for integrating with you know, different uh data source. So, for example, we can add different type of data source uh such as uh PRDB, for example, if you have a PostgreSQL database and you want to sync in sync sync it into ClickHouse, we have things at Confluence Cloud, if you want to consume data from uh Kafka, etc. So, we have a bunch of connectors, uh but recently, what we have spent time adding is also a new section called catalogs. Uh so, it allows you to connect directly from ClickHouse Cloud to uh whatever um Iceberg catalog out there. Not only Iceberg, we also support uh the Unity catalog for Delta tables. Uh but you can just select this one, for example, and you will be able to pick uh whatever catalog you want to use. So, for example, here I want to use uh Microsoft OneLake. You can select it, and then you can just specify the credential of your catalog, and you will have a connection directly with your uh OneLake ecosystem. So, in a couple of clicks, you can have all of your data available right away. So, here for example, um you can see I've already created my OneLake catalog. You can see the number of table, etc. And I can directly query them from uh ClickHouse, but I'm I'm also a uh an engineer, so I use a terminal, and especially uh given uh the pre-request is very fresh, It's not in my cloud environment yet. I want to show you some new cool stuff we've added in ClickHouse.
Let me find my terminal here.
Okay.
So, I'm in my terminal. I'm going to connect to my Yeah, I'm going to zoom it.
So, I'm connected to my um to my instance. I'm just going to connect to my cloud uh build. So, this is a fresh pre-request this morning and I worked on a fresh pre-request for a couple of reason. Uh the first one is because there is a new feature uh around uh ClickHouse ability to write to Iceberg table that I wanted to do a quick demo on. But, in my uh ClickHouse instance, I've already created a couple of things. So, let me show you what I've I've done.
So, here if I type uh show tables uh show database, so I can list all the uh database I've created. You will see that there is one that should be called OneLake. I'm not seeing it from here.
Yeah.
Yeah, so this is uh the database which is basically connections to the Microsoft uh OneLake catalog.
Uh so, it just like takes a couple of credentials and connect to my uh external, you know, uh catalog. And through this uh catalog, I can I can actually start looking at, okay, what table do I have in my OneLake catalog?
So, let me list the different table I have.
So, here I'm sending a request of show table tools one that catalog. So, here I'm going to ask okay, show me all the table iceberg table I have in this external catalog, so which is stored in Azure. So, you can see I have like a couple of tables and I'm going to start querying one. So, for example, we have this table about uh uh year 2018 green trip data 2000 Yeah, this I guess this is a taxi data sets.
Uh it's one of the default data sets you can find in one leg. So, I'm going to run this query.
So, this query is not querying any data in ClickHouse. I have no data in my uh ClickHouse uh uh cluster right now. It's just going to go directly to uh one leg and search this iceberg table uh that I have. I can make I So, this is a very trivial query, but of course I can do any type of analytics. So, if I have like a more complex query that is really trying to do some uh some analytics, I can have like a a more complex query and here um you can see that uh it's going to take a couple of seconds to run this query. It's going to go to my one leg and collect all the data. So, we are here we are basically reading parquet files in iceberg table.
You can see how long does it take? Like maybe 8 seconds, 7 seconds. I'm not seeing here, but it tooks a few seconds.
Um eventually, uh ClickHouse has been designed for real-time analytics. So, the goal is to make ClickHouse extremely fast on top of you know, uh on top of its own native format.
We are trying to optimize as much as we can, you know, querying your iceberg table directly, but there will always be a ceiling that uh ClickHouse native format will not necessarily have.
So, the thing that I'm going to try to do here is I'm going to create a table in ClickHouse, an empty table, but which matches the schema of my iceberg table.
So, now I've created like a a ClickHouse table.
And it's contains the same data points than my uh iceberg table in OneLake.
Um except that you can see I have uh order by close here, which is how I'm going to basically order my data in ClickHouse. Uh one of the main advantage of ClickHouse >> [snorts] >> is uh I will say ClickHouse has a very strong indexing structure. And we have like a primary index that we can use to sort data on disk and then use the sorting to kind of index uh some different values and have like a a performance boost. But of course, you could create many different type of index. So, for example, recently in ClickHouse we have added full-text search. Imagine you have like textual data in your iceberg table, it will be very slow to search this data using iceberg directly because iceberg and parquet don't have like, you know, full-text search. ClickHouse does. So, actually having this data uh uh in ClickHouse could help you populate uh this index.
So, here I've created my uh I've created my table. And what I'm going to do is uh in a single query I can actually load the data from OneLake into ClickHouse, right? So, this query is uh reading my iceberg table in OneLake and loading it into my local table in ClickHouse. You can see it's progressing.
So, you can see like we are in like it's a local uh instance. I don't have like a lot of CPU or memory on this machine, but you can see we have processed like uh 630,000 rows per second loaded read from iceberg and loaded into ClickHouse.
In uh how long did it took? 13 seconds.
So, now my data are in ClickHouse, I can run the same queries that I used to run on my iceberg directly on on my iceberg table directly.
So, let me run these queries.
So, here you can see that I it's a simple count. I have the same number of data that I have in my Iceberg table, of course, and I can run the same query that took 8 seconds in OneLake, and I can try to execute it in ClickHouse Cloud, and you can see that it took like, I don't know, less than a uh less than a second, 0.5 seconds, or something like this to to run this uh to run this query. So, that's one of the benefits of, you know, ClickHouse, like the strong indexing structure, the fact that we have also in-memory uh components that we can use to speed up queries, which we don't necessarily have when we are querying like a full, I would say, state less table like like Iceberg. And one of the cool thing that I wanted to show is um I have this table uh here.
Let me So, res- like historically, uh ClickHouse was able to read uh Iceberg table, but more recently, we have added support for writes. So, we can write on Iceberg table, and I want to write back to uh let's say, to my um to my uh OneLake environment. So, let's say, for example, I'm doing a bunch of transformation in ClickHouse on top of my data, and then this transformed data, I want to send them back to OneLake for another team to analyze. So, I have this table here uh in my OneLake environment, which I created this morning. And I keep playing around with it. So, I want to see how many records do I have in this table.
So, here is going to check my Iceberg table. You can see I have like 20,000 20,000 uh record in this table. And what I'm going to do now is I'm going to use ClickHouse to generate some data, and then write them into my OneLake table.
So, this query will be interesting because it's a new uh concept in ClickHouse. Before, we were not able to write through a catalog, and hopefully, in 26.6, we'll add write through catalog support in ClickHouse, so you can write uh through your catalog. So, we are going to support Unity right now, and OneLake is going to to be the second one. And you can see that this query is doing a couple of in is doing a couple of interesting things.
So, here we are uh we're going to insert into this table. I provide some, you know, like the columns of my table. I provide the settings that it's a beta feature, so I just enable the beta feature. And then here I'm going to just generate a bunch of random data points using ClickHouse's native functions to generate data. So, these data are just purely random. I'm going to execute this query.
And what we're doing is here we are both generating the data uh and inserting them into our Iceberg uh table. So, it's going to take a couple of second. Now, my data have been sent to the OneLake catalog, and of course my um uh my OneLake co-worker uh can use these data that I've just inserted. So, normally if normally if I run a count, it should show the new number.
Uh let me It shows 30,000.
But it's still still 20,000.
There's a some lag in the OneLake side uh here.
Yeah, okay. So, now is the data committed like the catalog has been updated probably. Uh so, now I have access to the latest uh uh snapshot of the of the Iceberg table, and I can query my new data. So, I have briefly just have used ClickHouse as a query engine, uh loaded my data into ClickHouse for real-time analytics, and you can really see the performance difference between an Iceberg table and and ClickHouse. And then I've transformed some data and sent them to my OneLake table. Of course, like we have um further goal in at ClickHouse at least to you know, foster this interoperability which we're going to develop in the next coming quarters. Hopefully this year we'll have like a couple of exciting feature that kind of make a lot of these workflows more automated. So for example, being able to write automatically to your Iceberg table without having to do any type of manual queries on the engine user side.
Okay, great.
>> Yeah, and that that's super interesting because now you can have your data live in ClickHouse but also exported. I don't know if that's the right word, but you can age them out into the object storage for interoperability and cheaper storage.
And then you know, it's it's multiple engines are able to now access that data. So if you don't need speed, you can offload that into the object store and everyone wins.
So I just want to close off by talking a couple of things that I think is you know, is is going on in the in the industry especially related to to data lakes, lake houses, open table formats and whatnot. It's not very ClickHouse specific, but I think it serves a good backdrop on you know, where the industry is moving collectively.
So one thing we we've seen from both Iceberg in terms of open source and also from Microsoft's perspective is that we're seeing catalogs are becoming a first class citizen in any of the data platforms.
You probably hear announcements about you know, vendors supporting Iceberg which means they have a catalog set up somewhere. Uh know even Oracle is has a Iceberg compatible layer now and you can export data as Iceberg. So, what we're seeing is that a lot of the vendors are supporting Iceberg and the protocols and the catalogs. And what you're what you end up getting is, you know, I have one catalog that is part of my Microsoft Fabric. I have another catalog that is uh part of my Snowflake uh instance. I have another one that's Databricks, another Google one. It would be a has many.
Uh and what end up happening is you have a very disaggregated set of views for what your data looks like. And this goes back to what the problem uh the problem that Fabric wants to solve, which is trying to get all of the data together so that um there's a single logical representation.
So, you know, as as the world evolved to be, you know, catalog first, catalog as the integration layer, what we're trying to see is to say, "Okay, now we have all of these like discrete ponds of data in all these platforms, uh can we um again bring them in together so then there's a single unified representation?" And the way we do that is through what we call like catalog mirroring. You bring uh things from one catalog into into your platform. Uh these catalogs are very lightweight. They just it's key value.
It says, "There's a there's a table here that points to this bucket over here."
So, bring those in is uh very lightweight, but it gives a a a great way to uh have a single unified representation even in this catalog-first world.
And then another thing that we're seeing is that through the catalogs, there's actually like two schools here. One is like your operational catalog. It's like everything you need for a query engine to run fast. And then like everything you need for your analyst to know what the heck this table uh this data is. Uh but I think the two are converging. Uh, I think like contextual metadata is super important, either for for engines, for agents, for your analyst, for everyone. So, one of the things we're looking at is to say, can we bring in uh metadata uh either it's for engines or for analyst or business context into uh into the platform just as how we bring in uh iceberg tables and and your data into the platform. So, what we really want to do is to say, you know, if if a table belong is in Snowflake somewhere and it's at a annotated and classified, we want to bring those in, too, because that that metadata is important, especially when you're using the data.
So, just making data available is not enough. We really really want to bring in everything along with that data.
And but every everything comes to security and governance.
Uh, part of what we want to bring as well is the secure and govern uh governance to open table formats. So, we allow we allow data and tables to be interoperable, to be able to uh use open source, use in different vendors. What customers really want is a single representative policy definition, security role for these tables, and hopefully it travels with these tables no matter where it is accessed.
Right? That's kind of the North Star.
And we want to get there through the open table formats, through open source, to define a way to say, you know, a table defined in Snowflake is accessible the same way to the same set of folks with the same policies to people on other platforms. Uh, and we want to make sure that the governance aspect of it is also interoperable.
And on the Microsoft side, we started kind of looking at this internally. We have something called OneLake security.
So, if you're inside of Microsoft, you define it once and all the engines respect it. Uh and we want to expand this out into other ecosystems, other integrations. Uh right now it's like, you know, it's just a table. You know, I you get access to a table. But um but pretty quickly what we want to do is your table has the metadata, it has the context, has the security definition, and it querying you through the owning platform is the same as querying you through open source clients or your vendor integrations.
Uh we're starting on this journey. It's still very early. We're kind of defining what it means to have interop- interoperable governance uh and doing it through open source.
And, you know, if you're interested in this area, I think one of the super interesting thing right now, especially in open source and Apache Iceberg, is the fact that we're essentially redefining uh warehouses and databases.
Um a lot of the concepts that we that we come to admire and love inside of a database is getting re-implemented in the cloud and object storage. So, you know, we have metadata on top of parquet data files. Now, we want to do more, right? We want to add indexes. We want to add column families for your feature uh engineering work uh workloads. We want to add UDFs. Anything you can think of inside your database, we want to add it as a feature on top of open table formats because uh every single thing that you add becomes a standard and there's a distribution that is available in in every single cloud, right? Because all the vendors want to participate in the community. So, um we have a lot of discussion topics on this at the uh YouTube channel. So, we have a lot of great speakers, great talks at the summit uh for Apache Iceberg this year.
If you're interested, uh reach out to me or uh you know, go look at look at the YouTube video, join the community. But, yeah, hopefully this is a good spiel on Iceberg, ClickHouse, open table formats, and the likes.
Thank you.
>> Good.
Thank you very much, everybody. We are open to question. If you have any questions, it could be unrelated, could be about Iceberg, about ClickHouse, about Microsoft.
>> Yeah, I had one quick question when Melvin, when you were writing to Iceberg, like is that taking advantage of the compression ratios for ClickHouse as well, or is the data already written to Iceberg and you're just rewriting the metadata on top?
>> Uh sorry, say that again?
>> So, in your demo, you were you were essentially writing new data to Iceberg, right?
>> Yeah.
>> Um my question is, like when you're writing that data to Iceberg, is that data already sitting Let's just say we're using Azure blob storage. Is it already sitting there and you're writing it re- taking advantage of the compression or rewriting the metadata?
What does that mean?
>> So, so we we write both things. So, we write both the parquet files and the uh Iceberg metadata. Like Like the thing is like we cannot just write Iceberg metadata because ClickHouse has its own native format, uh called ClickHouse native format. So, this is a very different, you know, format than what Iceberg typically supports. Uh it's it's open source, you know, format, [snorts] but it's uh not open specification, right? So, uh of course we you know, if you read the code, like the specification of our own format doesn't change quite a lot.
We could one day decide to have like open specification for ClickHouse, uh but uh but uh but yeah. So, we have to write both the parquet file and the metadata file, but keep in mind that also using ClickHouse, there's a couple of things that you could do that are pretty cool, is you can uh when you are actually selecting your data and inserting them into your Iceberg table, you could order them, for example, and that also help with the compaction, That also help with the performance because you have like better statistics in the Iceberg metadata file.
So, you can do like I've just done like some random data generation and writing them stupidly into an Iceberg table, but you there is a lot of optimization that you could do. So, I could for example partition my tables per specific key values, so order them by based on the specific rows and ClickHouse will, you know, write the Iceberg metadata and try to optimize it for that.
>> Got it. So, thank you. And I guess another example I'll kind of give there.
So, like let's say I have I'm writing data to Iceberg today, and maybe it's parquet files, and I have my own like compression ratios, and maybe uh one gig like, you know, 10 one gig files becomes like a total of a couple gigs written to Iceberg. If I instead fork that through ClickHouse first and write that, would I maybe be able to take that 10 gigs and turn it to like a couple hundred megs? So, I'm utilizing ClickHouse to write to Iceberg in a more compressed data set or not?
>> try, but I think like all the query engine could do the same thing as well.
Like when when another query engine write data to Iceberg, of course it depends on the query engine because as you know, like there is a lot of of engine that can write to parquet. They could do kind of the same thing. Like the compression is really depending on like the file structure first of all, and also the data that you're writing in it. So, if you have like an engine that is relatively optimized to write the data correctly to a specific file structure, then it will be optimized.
But ClickHouse can do also good job in optimizing it. So, I will say depend on the engine that you're using.
>> But compression is usually like a file format problem, right? Well, it's both that and the layout. So, either you you improve your format or you improve your layout so you can sort or partition, or you can uh you know, and then it takes it takes advantage of the encoding, right? Or you just have a better format, right? And you know, ClickHouse has their own um as Iceberg community is looking to uh see like if we can plug in other formats as well because there's a recognition that, you know, Parquet was made uh for specific workloads and there are now uh different file formats uh specifically for like other other type of workloads. So, there there's kind of a discussion around there, too, but, you know, Parquet is is pretty much a dominant format in in Iceberg and that's where uh a lot of the interoperability is because everyone knows how to read Parquet.
Yeah.
>> Maybe we could make uh ClickHouse native formats like uh >> Yeah.
>> as part of the Iceberg specification.
>> Microsoft also has their native format.
Everyone does. If you want performance, you need to own the entire, you know, end-to-end. So, um you know, either we we take the best of out of ClickHouse format, put it in Parquet, or something else, or we maybe plug Who knows? [laughter] >> Yeah, there is some discussion internally actually about making ClickHouse native formats like Parquet parts, so you could actually have like Parquet directly in within your ClickHouse database.
Um of course, you don't have like all the feature like indexing structures and all of that that, you know, has come with our own native format, but that could give you like a nice interoperable subset of feature.
>> So, in in your example, Melvin, you had to like to to really get like the ClickHouse performance, you had to take the data out of Iceberg and and download it or and, you know, into your local instance and put it in ClickHouse.
Which which makes sense, right? But what do you see as the best way to keep your your local fast ClickHouse storage in state with the Iceberg catalog and the data that's constantly streaming into Iceberg? Is it like a batch job or is it a push model or what does that look like there?
>> So, you want to have ClickHouse being very fast on your data, but you still want to keep this interoperability layer.
>> Yeah, like is it incremental data loading and and updates from the Iceberg data or >> So, we have different approach of doing it. So, like I did a one-time job, so that's useful for static data set, but when you have like forever changing data, it's it's a bit complicated. So, there's different angles that we're thinking about it internally at ClickHouse.
Is that we could use a TTL approach. So, imagine that instead of writing to your Iceberg table, you actually write to ClickHouse.
And then after let's say you define a TTL policy, after 30 days, 90 days, then your data age out into Iceberg format.
So, basically ClickHouse just is writer for your uh ClickHouse uh data and for your Iceberg data once they age out. And now you you have your interoperability layer. The second approach that we're seeing quite a lot of people doing is doing dual writes.
So, they write both into ClickHouse and both into Iceberg, and then they can do whatever they want. So, um that require like I will say uh some some different writers and uh you will also need to maintain all of your Iceberg table, but that give you like the interoperability layers that you are looking for. So, that's kind of the I will say the two uh angles that we're seeing. And then we're seeing people using refreshable materialized view in ClickHouse to load incrementally the data from their Iceberg table into uh into ClickHouse. So, that's also something that we uh that we see people doing.
But then the main problem is like uh how do you handle when there is like mutation in the data? Um that's uh that that that add some some level of complexity here.
>> I Iceberg itself is also versioned. So, like a mutable version. So, you can load it halfway. New data comes, you load the other half. So, that can work also. Uh there's a lot of nuances here cuz like you can't have multiple writers for example. And then like you know there's it gets messy, right? But there there are a lot of different solutions here that you know push base, there's like you know notification, right? Like hey, the catalog tells you something changed.
Or pull base, you know, you like like load it every so often and see what changed.
>> I think one of the great feature of Delta format is the change data feed. So Delta has a change data feed which you can actually read and you get the delta between different snapshots. Actually ClickHouse on top of Delta we support this change data feed so you can provide to ClickHouse all this is my first snapshot, this is my second snapshot.
Give me all the data that changed between these two snapshots and Delta support that Iceberg doesn't. That would be great. It would be yes.
Yeah, so >> of these table formats features are converging almost. So like great great features from Delta will eventually get implemented in Iceberg because there's a need for it, right? Like getting the change feed.
So there's a lot of like new innovation there as well.
>> So for the TTL approach that you mentioned, the first approach where part of the data will stay in ClickHouse and part of the data you're moving to Iceberg based on the TTL. How the query will work for that? So >> That's an interesting question. That's not implemented yet so we are still looking into it.
>> So the query engine is kind of that feature that you're trying to look at.
>> Yeah, we're trying to So this is something that we're trying to work on.
But like one of the main things that you need to keep in mind is and I think this is something that is a bit tricky on the database layer because when ClickHouse write data to ClickHouse, right? We keep track of all these parts in keeper, right? So we know where are all these parts.
So you know, we can read them basically.
Especially with ClickHouse Cloud, it's very important because we have separation of compute and storage, so we need to keep track of all these blobs.
So, we have this keeper layer that keep track of all the data that we have in ClickHouse. Now, let's say that you start like migrating this data to an external table.
There is this transition period that is important, right? So, you take the data from ClickHouse and you start writing them to Iceberg. So, now you actually have two times the same data. So, actually you have like at least one type of semantic because you can have duplicates. So, what you need to do is there is different situations. So, you can either accept at least one semantic, or you can say, "Okay, no, first we remove it from ClickHouse and then we write it into Iceberg." But then you lose your your blob of data. Or the more elegant alternative is eventually you need to treat need to keep track of this data both in ClickHouse and in Iceberg. And now like that's a whole new I will say kind of warm of problems because now ClickHouse need to track the ClickHouse data parts and the parquet parts of your Iceberg table to avoid like this at least one issues. So, there there there's, you know, like that's an idea that we're working on, but like as we scratch the surface like we know that it's a bit tricky. And then to answer your second question, to work on top of these two table, there is a very trivial solution that you can use. You could use a merge table function in ClickHouse, which take two tables and merge them together and then you can query this merge table function and it's actually going to query the two table, aggregate the results.
But then you have like a performance impact, of course, because you you your performance of your query will depend on the slowest performance, which will probably be Iceberg.
So, that's that's also like one of the other contention is that I think if we finish implementing this feature, then we'll have like some type of syntaxing, I will say, syntaxing sugar uh you know, simplify this user experience, uh but overall we'll need to kind of find a more elegant way of combining all of that together for the long term. So, that will be stages on how we can implement such such a feature, uh but there's that's interesting database problem when you start working with different format and different storage.
>> So, the current way is using the merge engine?
>> The current way is using the merge table function, yeah.
>> Thank you.
>> Thank you. I have a question about um Iceberg um capabilities in terms of uh concurrent writers.
Is it a real problem or >> The The Define a real problem.
>> [laughter] >> As in um So, So, on the concurrent writer piece, I think So, I mentioned that it's it's immutable, right? You go from state A to state B, right? So, when you have concurrent writers, uh there's a potential for a conflict, right? There's a couple There's a couple cases here. One is, you know, the the catalog or the commit protocol is smart enough to figure out that hey, there's actually no conflict here, right? And there there's some work to to like to to make sure that that's optimized, but essentially it comes down to you know, uh figuring out if there's a conflict and if there's a conflict, what you do.
So, in that sense, there is a there currently is a concurrent writer problem as in uh there's only one commit allowed if there's a lot of conflict, right? And that's just just the nature of like the commit protocol. But uh there's a separate kind of optimization here to figure out, okay, well, for example, if I'm just appending data, right? Appending data will never uh will never have uh uh what's it called? Concurrent um Forgetting the word.
Yeah, it would it would never like run into each other, right? Have race conditions. So, you can always append.
But yeah, in terms of like writing over the same data, there there's a um uh uh a lag in terms of like someone has to retry. So, that's the concurrent writer issue.
So, so if you do fast commit, like um what's it called? There there's a lot of like real-time streaming use cases that is bottlenecked by the commit protocol.
>> And and I think that's also like one big difference between ClickHouse and Iceberg, right? Iceberg rely on how do you call it? Concur- uh optimistic rights?
>> OCC?
>> Op- optimistic right, right? So, uh whereas ClickHouse is relying on concur- like a quorum-based inserts. So, like we have like Keeper, you try to write data, you need to have like a quorum from Keeper before we actually insert the data. And Keeper will give you the lock, so you don't need to uh wait for a specific lock like So, ClickHouse will help you having like a very high ingest throughput because we have like this uh uh quorum-based allocation of uh the metadata. And this is kind of why we need Keeper in ClickHouse as well to avoid these type of bottleneck.
But that's one more component that you need to manage then.
>> Did that answer your question?
>> Uh yes.
>> Okay.
>> Somewhat, yeah.
>> If I had to summarize, like Iceberg was made for batch use cases, right? That then evolve into people wanting to do more streaming use cases, where there is a lot of concurrent write, uh fast write. So, I think there's a lot of discussion around like how do we make it so that it is suit more suitable for those use cases? So, one of the upcoming spec version V4 is to modify the like the the metadata structure that it's tree-like so that you can have these fast rates and modify the the commit protocol so then you can have like more throughput. So, these are all like kind of engineering problems essentially.
>> in progress.
>> It's a work in progress.
>> Okay, this is good. I have another question about the data types. I noticed that some very common data types are not supported like for example um unsigned integers.
So, for example today the the atom of computing is a big int.
>> Mhm.
>> And it seems that in Iceberg you need to represent that as a number 200.
>> Yeah.
So, a lot a lot of it is historical comes from Java I guess.
From Hive and Spark and it was it was made it was evolved from you know, I have these Hive and Spark jobs, Hive and Hive and Spark data types and and Parquet and what do I do with it? So, it might have been just like things that is is historic historical but we've been adding a lot of new data types to to to the project. So, like a bunch of geo types, timestamp nano, so so there it's it's again evolving and [snorts] work in progress.
>> Okay, great. I think we're running out of time so thank you very much for everybody for attending and see you soon. If you have any more question we'll be around.
>> Thank you.
>> [music]
Related Videos
LBF101 Creating an XML Changelog
liquibase7511
3K views•2026-06-15
Alta Labs Cloud Dashboard Real time Network & Xnet Insights!
ShinyTechThings
158 views•2026-06-17
Wait... Group Policy Not Applying? Check This First!
keeplearning_iT
144 views•2026-06-15
Leetcode Weekly Contest 506 | Life's boring these days
Pudeesht
2K views•2026-06-14
microJAM: MAKING A MICRO GAME FOR A GAME JAM IN CLOJURESCRIPT AND TOTALLY NOT C
janetacarr
156 views•2026-06-18
Partitioning vs Bucketing vs Clustering: How to Make Queries 100x Faster
thedataandaiguy
194 views•2026-06-16
Design Claude Code Like a Senior Engineer
hayk.simonyan
344 views•2026-06-19
Linus Torvalds: AI Won’t Replace Understanding Code
SavvyNik
140 views•2026-06-19











