Conversational music recommendation systems differ from traditional recommendation approaches by leveraging multi-turn dialogue interactions to understand user preferences and provide personalized music suggestions, requiring systems to generate both recommended items and natural language responses while maintaining context across conversation turns.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
[ACM-Recsys 2026] Tutorial Session for the Music Conversational Recommendation ChallengeAdded:
Uh thanks for waiting and I will start today's uh tutorial session about the Lexus challenge. Uh thanks for participant today's session and uh we are the challenge organizer uh and this year uh Lexis uh challenge is relate to the music recommendation but conversational context. So let's start and before the starting uh we are really really sorry for the delay of this tutorial because we have some uh problem in June session. So yeah thanks for your waiting and patient and uh this challenge is organized with six person uh so I'm also serio Bruno Cladio and Francisco is a really really big help to organizing this uh Lexus challenge.
And today's content is relate to the introducing the task and framework of this challenge and also we uh introduce some challenge data set and also uh there are several available resources and we make the baseline system and evaluation system and so how can we uh designing the evaluation metrics and also scoring uh final scoring uh things.
Can you >> can you mute your thank you and another part is the how can we improve this system. So we have several uh advice and last part is how to participant and question from you guys. So if you have any question in this Lux challenge feel free to ask in this uh J session. Yeah let's start. uh uh before I start I I will introduce our timeline. So uh currently we are uh today's date is May 25.
So uh actually sir we are in the uh the blind data set A phase. So uh we already start the red system and maybe one weeks later we will start a blind data set B phase. And this blind data set B is really really critical in final readable. So actually the training development and blind A data set is just uh some validation set. So you can uh try every uh submissions with the blind data set A and leaderboard. So you can find the score and overfeed on the blind data set A. But uh we will defer more generalization ability with the blind data set B. So please uh uh stay tuned of uh some messages and after the blind data set data set B phase uh this challenges ended and we will announce the final readerboard and winners and winner need to update your code and also paper. So uh this if you submit your paper then we in Lexis uh conference we will make some workshop about this challenge. So you will be present your paper uh uh and your idea in lexis 2026 and current participant is over 150 participant uh 100 160 uh participants uh with the teams and also they submit more than 3,000. So uh please uh keep uh submit your nice and brave ideas to our the challenging leaderboard site.
Yeah. So let's start with the our task.
So our task is our conversational recommendation system. uh this system is um very similar to the traditional recommendation system or some text to item approaches but um main difference is we more focus on the multiton interactions. So uh what if you at the first turn user will be ask to the system that they need a vision jazz with a syn bass drum and saxophone and maybe your uh assistant or system can find the relevant music in your database but uh this type of the oneshot music search or recommendation is uh maybe some limitation because user need a more improvement of recommendation. So in second term a user maybe uh ask a little bit more groupy songs uh based on the user's previous query then our assistant understand the whole the chat history and recommend the nice music for the users. So uh in this type of the multiton interaction is really important for this task.
When we compare with the previous systems like uh query by user is a traditional recommendation system then what if user uh join the some platform or music streaming service then we have a user ID then our recommendation models like the matrix factorization model or collaborate factoring model can make measure the dot product between the user embeddings and our item DB and we can find the most relevant item uh in our database right uh and another system is a query by text so in this uh system user uh they can handle the user's text query so uh this is very similar with the search bar in the your the music streaming service so user enter the natural language query then our joint embedding model measure the um for embedding and item measure the similarity between the query embedding and item embedding and we can give some recommended items but our uh this year's challenge uh task is our conversational lexis so input is a little bit different uh our input can cover the user ID or text query and also chat history so within the multi context u our system need to understand the very wrong context uh users previous uh searches uh search history in same chess session and model can find the relevant item and and also they can need to um generate the text response. This text response is a very um important role for the users satisfaction or some usability in your the music streaming service. So that is our main difference between our previous uh approaches and this year's um the challenge task and let's deep inside of our output of model. So we need a two types of outputs. First one is a recommended item and also second thing is a text response. So uh if you check the our uh submission format then we need a uh we have four uh five different column.
First one is a session ID and user ID and also turn number and also we have a predicted track ids. So our Lexus can predict the most relevant item from users uh users ID or text query and chat history. But also we need to predict the response. So um this system need to predict the natural language response for the how can help users uh music listening uh the trajectory. So and this is not a singleton search this is a merchant search. So we need to help to the how can uh help to user to more focus on the music listening or music recommendation system.
Uh and next part is our data set. So uh in this challenge we utilizing the top play data set and uh this data set main characteristic is uh this data set containing user profile. So in user profile there is a unique user ID and also user's age and country and gender.
So that is very useful information for recommendation system and this data set is our conversational data set. So uh this is our maron conversation with the user's query and recommended music and assistance uh response. So that is important characteristic. So uh there are several uh important characteristic. So we have our user information and also music item information including the metadata rick semantic tag audio and album image.
Unfortunately we did not release the audio information and album art and rex because we bypass the copyright issue.
So uh we only release the metad only the text model information. Yeah. And we have our the uh response from the assistant and also we designed the meritton interaction. So that is our main characteristic and uh one important message of data set is this data set is a synthetic data set. So this is not a real conversation between human and the our the system. This is our um llm generated data set. So uh input of data set is our uh logic scale user listening history data set we call the LFM2B and they containing user information and session information and user listening history. So that is a really important uh information and uh that is some source and grounding of this data set and uh we utilizing uh LLM Richard of O this is some kind of the L plane with the uh large language models and we make our some synthetic data set.
So uh we have our some uh input listening history and we make a profiling track and recommendation track ports and we utilizing two different type of large link motor. So we have a listen and lexis LLM and these two LLM convertation each other. So but we give quite isymmetric information. So recer LLM does not um access to the recommendation track first. So they have a uh they have a limited knowledge and conversation with the Lexus. But the Lexus LLM is some kind of a wizard. So they already know every recommendation trackers and they can give some more uh they already have a really nice information from the users listening history. So that is are one method for generating quite diverse conversation and another important information is our listener profile and conversation group.
So um actually we need to make a quite diverse and we want to cover the diverse user scenario. So we set up the conversation go for the uh for condition on this generation process.
So conversation goal is relate to the um conversation topic. So uh we designing several conversation topic including the audio and lyrics. Sometimes our album art and sometimes are contextual or um ste some users uh move and themes like uh music for working or music for studying something like that and sometimes the topic is related to interactive refinement and sometimes the user want to listen music with the relate to the metadata. So, and sometimes they want to listen music with the mood and emotion and some some users more focused on the artist and some users focus on the their own culture or geographic characteristics and social or popularities or important access and last part is a temporary uh information.
So, we designed the several conversation group as a uh some condition on generation process.
And another the conversation go is related query and target specificity. So what if this is a uh is a some kind of the search mindset. So what if user have a very spec uh they have a very specific needs and specific target for music then they maybe ask a system with the very specific queries but sometimes they does not have a very specific ideas. So sometimes they want to play some chill music. Then groundes are really really many types of uh music because uh what if query specific cities are very low then the relationship between query and item is one to many mapping but what if increase the specificity then is more one to one mapping query. So we uh designed the several the specificity. So this one is another uh condition of generation process.
So finally we release four different type of the data set split. The first one is a train data set. So uh we and also we released the model >> sorry but anyone can mute your voice.
Thank you. And uh secondly we also released the development data set. Uh uh one characteristic of the train and development set is we release the ground truths also. So you can find measure the every uh recommendation metrics and also diversity metrics. Yeah. And in case of the blind data set A and B we does not release ground truth. So uh blind data set A and B is are utilizing for the our readerboard and you can measure the performance via the readable only. So that is on one characteristic.
Um, yep. And if you want to build a train your own model then you can utilize in train and development data set and for the final test uh we only utilizing blind data set for the generalization ability. Yeah.
So uh next part anyone have a question?
Maybe not. Okay then let's move on to the available resource. So we we released conversation data set but we also released the track metadata and track better uh DB and also user metadata and user vector DB. So uh yeah I will introduce one by one.
Firstly this is our conversation data set. They have several columns session ID, user ID, session date, user profile, conversation goal and conversations and core assessment. So actually the session ID is a unique ID of the session and user ids are very important because what if if you want to utilizing the ranking model then this user ID information is really critical and session date and user profile and conversation go the user profile and conversation go is uh yeah I I already explained so I will focus on the conversation column and go assessment. So in the conversations column is our main data set. This is our main data set and this is a list of dictionary and within this dictionary we have our four different type of the sub column. The first one is a row. So every conversation containing the um user query and recommendated uh recommendation music and also assistant response. So uh this role is already always one of the this one of three uh role types and second part is a content.
The content is a actual user query or assistance response or recommended track ID. So that is the content part and thought is are very optional things but when we generate the this data set with the large lang model we uh also generate the some dot the dot is intermediate part for the final output. So uh before they generating the response or mu uh or query uh this synthetic data set already containing the thinking process so prefer thinking and give some final output and also we give some number so we can align the uh every turns. Yeah.
And another column is our goal progress assessment. uh this is are some uh optional information and uh this information is relate to the user satisfactions. So uh during the conversation sometimes the user does not want to uh the current recommendation item. So that is the reason why they can can move on the next turn. So we give we designing some uh you user satisfaction columns. So this is our uh yeah optional information but I hope this is useful for you guys.
Yeah. Next column uh next data set is our track DB. So in track DB we have a unique track ids. This is a unique identifier of the each music tracks and we release ISRC is another unique identifier with a global standard and we released some metadatas like a tren name, artist name and addon name and also we list the tag list. Uh this tag list is copy and paste from the LFM2B data set. So this is our uh tag annotation from the user. This is a foxonomy tab and also the main platform is the last effect. So very noisy but very rich information. Yeah. And we also release some ability information and release date and duration and artist and uh album ID information. Yeah.
And this is our another available resource is our uh track vector DB. So uh we also released the vector database.
Uh this is a pre-extracted embedding vector. So we released the audio embedding vectors and embedding vectors and collaborate to fresh embedding vectors and also text embedding vector.
The attribute is our tag list and also rex and also metadata. Um we extract the every music items uh of the the each uh music items each modality embedding vector and we just released uh in hugging face. Yeah.
And in case of the uh user DB uh in we have our user ids and age and age group, country code, country name and gender.
So uh yeah and this user ID is also shared with uh this conversation data set. So if you guys want to uh utilizing previous users listening session then you can filter out this data set with a user ID that these are some way to get uh see the user's previous session. Yeah.
Yeah. And we also released the user user vector DB and uh this is a user indings and we released the basian personalizing linking model. So uh this CFO BPL is a uh mapping with the track DB CFP. This uh this P track DB pair vector is a track emitting vector and this user vector DBS vector is a user emitting vector. So you can do product between the this same columns embedding vector.
Yeah.
So you can uh utilizing the linking model. Yeah.
And next part is our baseline system. So uh our uh main reference of uh baseline system is from the Lexus 2020s tutorial on conversation recommendation system.
So uh in the system um the user there are several uh sub component in conversation recommendation system. The first one is our natural language understanding module and also dialog state management module recommendation engine explanation engine and natural language generation module and also knowledge database.
uh uh compared to the 2020 uh this year is a 2026 so everybody knows a larger language model right so I think natural language understanding module natural language generation module and dialogue management module and explanation module is combined to the larger language model so we designed a larger lang model uh as a covering quite diverse module and recommendation module engine is also we can utilizing the text to item and user to item module. So uh we can utilizing lexis module and external knowledge is our user and item DB. So external knowledge is keep updated. So new users come to our system and new released music is addit to the item. So uh really important thing is our Lexus can handle the uh change of our external knowledge. So yeah that these are also important things. So we just copy and paste uh previous uh uh Lexus tutorial module in modern modern design system.
Yeah.
So let me give inside of our baseline system. So we have our four different type of component. The first one is larger language model. So main law is generate the natural language response.
So that these are really important. So uh we utilizing llama 1B model but I hope you guys uh util use use more more nice linking model because llama 1B is really bad. So and lexis module we improve uh we implement two different type of the um recommendation things. So first one is our BM25 for the text to item and second part is the birth model.
So uh this two model need uh main characteristic is that they can measure the similarity uh measure the uh distance between the user query and target item and also we have our user DB and also item DB. So that is our main characteristic and our baseline system is uh uh only utilizing text to item module. So that is some limitation of baseline system. So what if uh uh what if the uh input data is a query and conversation context our retriever module the the recommendation system module is a p 25 and bird. So they tokenizing everything and they extract the uh sparse index or global embedding for the the input conditions and that product with our uh item DB. So we can uh extract the 20 top 20 thread ids and we just select the top one uh item and that top one item is utilizing for the response generation that is current system. So our current model is a llama 1B model and this one model understand the chat history and current recommendation item they can generate the natural language response. So that is our current baseline system and next part is our evaluation metrics.
So uh as you see before our systems input is a user profile and check context uh and our conversation or recommener recommending the music and the response and recommendation music is uh uh input of lexistic evaluation and response is input of the uh response evaluation. So we can uh scoring the NDCG and diversity and LM score and diversity. So let me introduce one by one. The first evaluation metric is NDCG at 20. So we extract the uh 20 recommend top 20 recommended item and this item is a input of the uh NDCG at 20 measure. So we check the writing quality of the recommendation tracks.
And secondly we measure the track diversity uh actually the catalog diversity. So what if we have some set of uh evaluation data set and we can collect all the recommendated tracks and we measure the set of that track risk and we check the how many diverse track is recommended with this system. Yeah, that is our category diversity. And second uh another part is our lexical diversity. This is not relate to the recommendation, is more related to the response generation. So we want to check the how varied the generated ranges. So we measure the distinct two. So we measure the bram and how many diverse bar is generated from our system that is lexical diversity. And last part is LMS.
We check the quality of the generated response. We utilizing Gemini and some private prompt. We utilizing private prompt. So we can check the personalization explanation quality.
So if you just think about the just 2x two metrics. So we want to measure the quality and diversity in recommendation task and response generation task. So that is our how can we measure and uh so uh and we our the weight function is a little bit uh more focused on the recommendation quality because 0.5 for the NTCG and 0.1 for the category diversity. So actually the recommendation cover the 60% of the total score and in case of the response generation we uh only give the 0.1 weight for the lexical diversity and 0.3 for the LMSR charge. Yeah. And LM SR score is a one to five. So we normal uh we normalize this score to the 0 0 to one u utilizing the minmax normalization.
Yeah, that is our metric and scoring and next part is ah in case of the LMS charge we already check the uh relation between the human preference and this type of the uh LMS charge metric and compared to the reference based metric like a blue score low score and birth score or sentence birth this LMS as our judge is a quite high correlation with the human preference So yeah uh yeah that is our our initial evaluation of LMS.
Yeah that is our end of our evaluation system and next part is our improve the idea. So actually we have uh some tips for how can improve the ideas. So if you go to the our baseline system GitHub repository uh there have some tips. So first idea is add the re ranking model.
So uh currently we only utilizing the text to item module but we do not utilizing user to item module. So utilizing user embeddings and also uh some ranking model embeddings and if you after the this retriever model you utilizing reanker for the this type of system then maybe you can improve the your performance. So I hope you guys utilizing the user embedding and linking mod. Yeah. And another part is improved the item representations or some retrievable module. So currently we only utilizing the BM25 and B module but this is a super weak and actually this BM25 and B does not have any ability of modeling the B tong conversation. So in case of the B module they uh utilizing average pooling over the whole conversation context. So actually the retriever performance is really bad. So I hope you guys more focus on the advanced text embedding or some uh more uh item embeddings for the uh yeah more more item embedding is maybe uh very nice uh approach to improve the system especially if you check the our conversation group then you need to utilizing quite diverse embeddings or diverse lexis model because uh our conversation is not only focus on the metadata or text modality.
So I hope you guys uh think about how can utilizing the multimodal item embeddings for the retriever or recommenumentation system. Yeah, that is another tips and another thing is improve your uh generative uh your recommendation system model. So maybe nowadays semantic ID is a quite promising direction. So maybe not only utilizing the text to item model. Maybe you can utilizing the semantic retriever model or a more advanced approach is quite promising uh direction. So maybe you uh retrain the uh another language model with the quantise the semantic ID.
And maybe uh you can utilize in a generative way. Yeah.
Then these are some tips. And next part is how can participant. So uh participant is mostly easiest things. So uh firstly uh you join the our official Lexis home page uh our challenge homepage and go down and you can find the register now. So click the register now then you can find a very simple Google form. So you can write your team name uh team track institution name and write down the email address. we send a email to you guys and also you can uh you can make a koden username. So after before the register you go to the kod bench and this is our official readable.
So if you want to participant then you can read the order all the things and just join uh just participant this uh Rexis uh codimenion reader and very nice starting point is you utilizing the our baseline system. So baseline and evaluation code is already provided. So especially in the evaluation uh GitHub repository we already extract some uh stock models inference result. So you can just download the inference result and to the colen is nice to starting this challenge. Yeah.
Yeah. That is all the participant part and yeah this are all the today's material and do you guys have any question then feel free to ask me. Yeah.
>> Uh yeah I have one question can I ask?
>> Oh yeah yeah yeah go ahead. Uh yes. So we tried to experiment with the clap embeddings um lately and we were wondering which um type of encoder you used because there are >> three or four of them. I don't remember exactly and we couldn't find this information anywhere else.
>> Yeah, I'm sorry for the confusing and our audio embedding is a lion club. So uh I I will add this information in our the uh website. So so if you go to the lion then they have a several pre-trained weight. So uh we utilizing the lion collab weight of music version.
So this weight is a music specific version and we utilizing this weight for the extracting the lion cl. So >> okay >> thank you so much >> and I will >> okay >> uh one more thing is uh I think I have her some extracting code so I will show some uh audio.
Yeah, this is my uh club embedding extractor code.
Yeah, thanks for the uh question and uh anyone have question about this challenge?
>> Yeah, sorry. I have a question regarding the registration for academic teams since we are an academic team but we did not receive uh like uh any form or anything to prove that we are actually an academic team and we were wondering if uh there were some problems or uh it's fine like that.
So your question is you're you're currently academic teams but you don't have any the probing material that is your question.
>> No we we we have proving material but we did not receive an email or something that uh like to to send it or to like uh ask for proving that we are actually team. Okay. Uh you you your question is the how can we prove the academic team that's your question right yeah okay >> actually we don't have it now because we just collect the you the every user's >> uh participant information that is only to collect and but uh I think that is really important because some nonacademic members apply to the academic teams and the price is there some problem right so maybe uh after the finish this uh lex challenge uh in case of our academic team we will ask them some water ids or some official documentation yeah >> okay thank you >> yeah and I think the some member ra hand the prianka prianka is Can you?
>> Yes. Uh, hi.
>> Yeah. Yeah. Yeah. Hi.
>> Yeah.
Yeah. Hi. So, my question was like, so in the conversation history, we have a track ID given by the recommended assistant. So, can we consider it as a track listening history of a user?
Uh your question is uh how can we got a previous users recent history? That's your question.
>> Yes.
>> Oh yeah. Currently actually we did not release the users uh listening history because there are some uh data set issue. So one uh yeah one way to I recommend it is just to filter out your training data set. So in the training data set uh what if there are sever uh users previously listening history but very limited but you can access uh users history with the filter out your training data set with the user ID.
Uh yeah. So in the conversation uh >> Mhm.
>> sixth sixth column. So >> uh so the um the role music we have this content the track ID. So can we consider it as user history?
>> Mhm.
>> Listening history of the track.
>> Mhm. Look, can uh should we can we consider this track ID uh as a listening history of a user?
>> Yeah, that's right. Because uh this data ah okay okay thanks for clarification.
Yeah uh my answer is yes because uh every this conversation is grounded and based on the user's listening history.
So actually uh yeah thank for the great case. Yeah we have >> okay thank you.
>> Uh I will check the uh test session.
Ah uh you you already left your question and uh actually we we considering release the uh this trade history because uh if what if we release the user's previous uh chat history then you can retrain the your safe model or linking model but we also consider we worry about the data set wage issue. So uh we will keep discuss and uh yeah I'm not sure but maybe we will release the user track uh a user's recent history uh but maybe not so I'm not sure. Yeah.
>> Hi uh I'm Sanjay. Uh so I just have a question.
>> Uh so we have uh train data set dev data set and blind data set. Uh I'm just wondering how will this train be actually be used to train uh because uh uh even in train we have all these uh I think I think eight conversations and in >> and in blind we have seven and we have to predict the last conversation I believe. Uh so wondering how can we use this train uh this would be like directly I mean how can we use this train to uh >> uh because we have around 300 or uh common users between train and dev but uh can you shed some light on this like how can this train be useful for us?
uh >> uh how can your question is how can utilizing train data set for exact training something that's your question >> yeah yeah yeah >> oh yeah I think there are several method because we have two component in our uh baseline system right so first uh first method is fine tune your log model so this my first uh answer so Maybe uh if you utilizing the true calling or generative retriever things then you can fine-tune larger language model for the lexis and another way to utilizing larger language model uh the training data set is uh this llm are more familiar with the recommendation context because large language model is a general purpose model. So maybe not perfectly fit on the recommendation system. So training data set may be useful for the fine tune LLM and second part is updating the Rexis. So >> uh in our data set is a conversational data set. So maybe you can train the another text to item module or user to item module or multimodel recommendation system module. So uh that is main purpose. Yeah. Guard. Thanks. That answers.
>> Yeah. And just one more followup like you have user goal which is like moving.
>> Yeah. C can that be taken as a positive?
If the if the recommendation is good then it's moving towards goal and if it's not moving away from goal would mean that the recommendation was not good. Can it can we take like that?
uh actually not not relate to the that things uh because uh uh I think the uh what if the the does not moving toward goal is uh current recommendation system is not fully satisfied but partially satisfy the goal. So uh yeah that these are things and what if goal accessment is move toward goal is are quite move on to the uh more close to the user's final goal.
So yeah is some sudo pseudo signal of the user satisfaction. Yeah >> thank you.
>> Thank you and I think another message on the chat rooms. Could you share the this presentation talks? Yeah I will do that.
So >> I think >> sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry sorry for the edit I think the David bridge his hand >> yeah thank you >> earlier yeah yeah thank you for the uh waiting edit so after the David I directly ask your I answer your question yeah >> and we were wondering how many submission we have for blind B um period because for blind we have 10 submissions per day while uh >> uh yeah that is a really good question. Uh currently we our first idea is only give some maybe the 10 submission for the blind data set B. But main purpose of blind data set B is generalization ability. So uh we worry about the overfeed on the blind data set B. So uh we will send an email to the submission number of blind data set before the start of the blind data set B. So my answer is not decided yet uh because we more considering the uh generalization ability.
>> Okay, thank you.
Yeah, because uh now nowadays we think about the uh more submission make more overfitting because our current blind data set a readable performance is insanely uh higher than what I expected. So I think uh it's uh I hope to the blind data set B uh represented the generalization ability. Yeah.
>> Okay. Thank you.
>> Thank you. And Edit, can you ask your question?
>> Yep. Uh, can I ask a question please?
>> Oh, sorry. Or like we have an order. So, yeah, sorry.
>> After the edit, I will ask you. Yeah.
>> Yeah. So my question is that um um like here you have like move towards does not move towards uh and those uh hints right. So essentially like are you planning like is that for essentially like training an RL system for like fine-tuning LLM or like is that what so basically like when you were designing this problem what uh like you know what kind of like uh problem solving capability did you think like basically tra fine-tuning LLM is that what you were thinking RL thing.
>> Uh yeah so yeah thanks for the great question and firstly this uh core progress assessment is designed for the is right and >> uh and what we think is this conversation is trajectory for the find uh final music uh recommendation. So actually we think about the uh merit interaction is some trajectory and journey for find relevant music. So this is what we want to design and we think about the user query is sometimes not very spec specified so very hard to find the relevant music. So that is reason why we need a merit ton right. So uh yeah this third uh uh goal progress assessment is related to the that uh trajectory. So we we modeling that some trajectory design. Yeah.
>> Got it. And essentially like uh you know there there is like recall and uh like rec like there is candidate generation and then uh deranking right so so where does the LLM fine-tuning come into picture in this uh in both these steps can you >> uh sorry but your question is uh for uh generating the final response we allow the multi- iteration of the model inference that is your question.
>> No no no like uh when you have to like generate candidates like 20 songs. So if that like in your like 47,000 songs first you have to like uh generate candidates right like let's say 100 songs or thousand songs or something like that or 10,000 songs >> and then you have to rerank them right uh uh like and then submit the top 20 >> right so that is my basic understanding so in that where does the finetuning of LLM come into picture and like What uh step will fine-tuning LLM support in that?
>> Uh your question is are uh is it possible to fine-tune LLM for the reranking?
>> Uh exactly. So like are we doing it for reranking or are we doing it for generic response >> which like for what for which task are we doing reank uh the fine tuning?
M uh I think the LLM rank query is also useful because uh we already give the every track information in train metadata DB right. So maybe your right language model understand the very uh famous music item then they can uh model they can understand the relationship between the user query and the recommended candidate items and LLM understand that every natural language and rerank something is a possible idea and I remember some of the Lexus paper already uh uh study about the LLM reanker so yeah it that is possible possible idea and I think a reanker is not uh quite we we can design the quite diverse reanker so we can using the ranking model for the reanker and we can utilizing the multimodal limiting model for the reanker and we can also utilizing LLM as a reanker so yeah that is our open question I think yeah >> got it so essentially you're saying reanker can be multiple ways but we can use this uh fine-tuning for generating a response like is that >> Yeah. Yeah. Yeah. Yeah. Exactly. So we can we can fine-tune uh every motor. So we can fine tune the LM, we can fine tune the Lexus and we can finetune the L anchor. So that is our uh yeah everything is possible.
>> Oh sh >> Thank you.
And next part is maybe Olak you you have some question.
>> Yep. Yep. Um so the question is u uh it seems that u there is lots of turns in a train data set when uh the goal track doesn't match um the exact user requests. I mean uh the user requests uh exact track or group artist or genre and so on. and the golden track isn't doesn't always match it. Uh so um I'd like to ask you whether the blind A and blind B data sets uh have the same uh you know nature of the data when the requested track from user doesn't match uh their goal track. I mean um there is uh whether there is any recommended system which uh gave that golden track for a user or not cuz um you know we can make a perfect uh match for a user but uh this track wouldn't match the golden track in blind data set. So our recommendation matrix would be low. Um but the quality of the answer and the quality of recommendation uh will be will be high.
Uh like uh and your question is the you say about the some limitation of current data set right? So >> um um the question is whether the blind A and B data sets uh their nature is the same as the nature of train set. So it's like a >> sub yeah >> or or not or the golden tracks are in blind AB data sets always matches >> uh his preference users preferences.
Uh yeah. Uh actually yes. Uh the training code is same. So actually train test blind A blind B is are same distribution because we utilizing same code and same data set. But in case of the blind data set A and B we added one more filtering phase. So the last turn of the blind data set A and B is always the borden track. So uh that means uh we only utilizing the the last turns the core progress assessment is always uh pastic.
So that is are one difference between the training test and blind data set.
Yeah.
>> Thank you.
>> Thank you.
I think there is no uh anyone has more question about the today's session.
Okay, great. So then I think we can stop the session in here. So thanks for the participant and uh I hope everyone uh participant in this challenge and uh the final readable is based on the blind data set B. So the generalization ability is more important than overfitting. So I hope you guys think about the uh participant with the generalization ability that is super important and yeah and see you at the Lexus conference.
Bye.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01











