Anna masterfully bridges the gap between technical engineering and personal nostalgia by transforming raw memories into a structured, searchable digital legacy. This project serves as a compelling blueprint for how AI can be leveraged to preserve human identity rather than just automate tasks.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
I recorded my entire childhood... and now I'm building an AI to relive itAdded:
I have been documenting my life since I was 7 years old. It started with diaries and then video diaries. Hours and hours of footage just me talking to the camera. None of it really ever meant for anyone's eyes but mine. And for years it's just been sitting there scattered around a dozen old drives, old laptops, my parents family computer. But there was no way in the world I was ever going to watch all of that again, let alone organize it. So that got me thinking, what if I could build an AI to do it for me? So as I mentioned, I started physical diaries when I was 7 and then for my 10th birthday I got a camcorder and eventually I started making a lot of video diaries because talking about my day and my thoughts out loud was faster than writing them all down. And when the COVID era started, I started making them almost every day. And none of them were ever meant to be posted or for anyone's eyes really, literally just a diary but in video form. And I would then spend a lot of time when I was bored manually editing them to cut out spaces and going through and naming the files with some of the keywords of what I'd talked about there to kind of make it easier to find things. But I never got through most of it and I definitely don't have time for something like that now because wading through hours and hours of teenage ramblings that no one was ever meant to watch is not the highest ROI activity. But it would be really cool to be able to look back and reflect kind of like flipping through a diary, which is easier [music] to do than watching videos. But that got me thinking, perhaps we could even go a step and make it even cooler. With things like AI summaries on YouTube getting so good and seeing how AI in general understands context so well and is able to find connections and patterns in data, I became curious about what could AI do with this much unstructured personal video data. Cuz they are an absolute treasure trove of information and someone who is very excited about tech and data, I was excited to see what I could get my hands on. And on top of that, I decided to challenge myself and see if I could do it 100% locally and privately. Because I recently started self-hosting a bunch of things on old desktop tower that we turned into a Linux server. Things like um N8N, Image, Jellyfin. And so, I wanted to see if I could build this project with kind of a similar philosophy. The first step, in true data science fashion, was gathering and cleaning the data. At this point, the videos were scattered across thumb drives, SSDs, SD cards, multiple old laptops and computers. I even made a trip to my parents' house over spring break to rescue some stray videos from old laptops in the family computer.
Next, I needed to consolidate them all into one place. I recently bought an 8 TB hard disk drive, mainly for YouTube stuff, but it had space for a second purpose. So, I decided to gather all the videos onto there. But, even with all the videos in one place, a lot of them had inconsistent metadata. Not only did they have varied naming, organization structures, even different file types, but many files actually had the wrong creation dates, mainly because either I had edited it and it was the export date instead of the actual film date, or if I had filmed them on my phone, for some reason, when I put them on the computer, the creation date was set as the day that the transfer occurred, not the day that it was filmed. The way I tried to deal with this in the past, as it was happening, was I had a naming convention that had the real date embedded in the file name. And [snorts] so, I tried to carry that forward, and I wrote a script to rename files with a truly standardized naming convention, and treat existing file name dates as the source of truth to fix bad metadata.
Because now I know you can change any piece of metadata about a file, if by no other method, then at least through the command line. So, once I had the files relatively cleaned up, it was time to move on to the next step and start transcribing them. For this, I used Whisper, which is OpenAI's open-source speech recognition model, and true to my challenge, I ran this entirely locally.
But, I did start running into issues right away, and the first one was I started getting a lot of hallucinations.
Whisper invented content during quiet moments when I left the room, and it also added ghost timestamps after the actual video ended, and sometimes those were not even in English, and it was just like random gibberish and not even the Latin alphabet. The fix, one of the things that kind of helped was setting the language explicitly to English, and this dramatically reduced mid-video hallucinations. But, an interesting side effect of that was that it started translating any non-English audio into the transcription into English, like the literal meaning of what was being said.
As for the ghost timestamps, I added a step in Python just to clamp the transcript segments to the length of the video and then discard any that were generated for step timestamps that don't exist. Another issue I ran into with Whisper is that it kept duplicating phrases. I didn't notice this at first because I genuinely repeated myself on camera a lot, but in some places the same phrase was repeated way more times than actually was in the video. I dug into Whisper's GitHub discussions and I found it was actually a known bug, and there were some community workarounds documented there. Most notably, tweaking the compression ratio threshold seemed to make the results a lot better. The transcriptions were still far from perfect, but it was already a great start for our purposes. From there, I started building out the actual interface through which I would access all of these features that I would build. For the back end, I used FastAPI with SQLite for the database. SQLite is perfect for a self-hosted single-user app, so there was no need to uh over-engineer the database layer. And then for the front end, I made a basic UI with Next.js. So, Next.js is not a framework I have used extensively.
Normally, I use Flutter for pretty much everything. However, as much as I have made many web apps with Flutter, I don't love the default material design that it uses. I wanted something clean and modern out of the box. I didn't want to spend too much time fighting with the UI. Also, I wanted to step outside my comfort zone and try learning something new. And then for the UI design itself, I used Mobbin for inspiration and references. Oh my goodness, best website ever. It has a bunch of screenshots of actual real apps and their interfaces.
So, it's like amazing for reference. As I was building the UI, I found a problem with the video playback. So, most of the video diaries were in MTS format, which were not natively playable by the browser. So, when my back end was serving those videos as is, the browser wasn't able to play them at all.
Initially, I thought I'd have to transcode all of the files into a browser compatible format, but it turns out I didn't quite have to. I could just remux them using FFmpeg. What that means has to do with video file science, I guess. So, my camera recorded in AVCHD format, which stores H.264 video inside an MTS file. Browser can't play MTS files. However, it can play MP4 files, which also commonly uses H.264 codec. So, because the codec was already fine, I didn't have to transcode anything. And I was able to just use FFmpeg to repackage the video into an MP4 container or file. And that meant the conversion was very fast and completely lossless. The only thing that changed was the audio format, which I converted to AAC for compatibility. So, once those videos were actually playing, I felt pretty happy with the beginnings of a UI. So, I turned my attention back to the data. Another issue I had with the data is that there were often videos that were recorded back-to-back in one sitting, but they were saved as separate files, even though logically they're one video. As part of the naming convention I had previously tried to use, I would write a title for the video after the standardized date that I would write, and then have the exact same name with just a number after it for videos like this that should belong together. For the files that had this, I wrote a script to find those videos and then use the FFmpeg concat demuxer to join separate videos into one. So, the concat demuxer is basically just video editing but with code. However, a lot of the videos, especially the later ones, did not have any titles, did not have any keywords at all, but still had the same problem of being recorded all in one sitting and really needed to be one video. I have an idea for how to automatically identify those and process them, but it will have to wait a little further into the project once we have more of the actual AI architecture built out that'll be able to identify these.
Speaking of AI architecture, [music] if you're looking for a structured way to learn AI engineering, DataCamp offers some excellent hands-on [music] courses.
If you're in a spot where you can write Python, you're comfortable with basic AI concepts, but there's still a gap between using AI tools and building projects that actually use AI, that's exactly what DataCamp's Associate AI Engineer tracks are designed to bridge.
I really like that it's hands-on and project-based rather than just video lectures. The Associate AI Engineer for Developers track covers practical AI engineering, so working with models, building pipelines, writing production-ready AI code, exactly the kind of skills you would need for a project like this. And if you're more interested in going a little bit deeper and learning how to train and fine-tune your own models, there's also the Associate AI Engineer for Data Scientists [music] track. And these aren't just random courses thrown together. They're structured paths that end with a real certification. And AI engineering is one of the fastest-growing roles right now, so getting ahead of it is worth the investment on top of, in my opinion, just being really cool. So, the link for that is in the description, and thank you to DataCamp for sponsoring this video. So, at this point in the project, I have the data consolidated and the standardized. I had the transcription pipeline running locally. I have basic web UI working, and I have the FFmpeg pipeline for remuxing [music] and video concatenation if needed. In the next video, we'll talk about building out the actual AI architecture.
So, embeddings, semantic search, summaries, all the cool stuff. If this is your first time here, I'm a computer science student sharing my journey to becoming a software engineer here on this channel. So, if that's interesting to you, feel free to subscribe and stick around.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 viewsโข2026-05-28
How agent o11y differs from traditional o11y โ Phil Hetzel, Braintrust
aiDotEngineer
450 viewsโข2026-05-28
Re: ๐ฃ๏ธ๐theprophedu๐2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 viewsโข2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation๐ฏโ
LearnwithSahera
1K viewsโข2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 viewsโข2026-05-29
Search Algorithms Explained in 60 Seconds! ๐ค๐จ
samarthtuliofficial
218 viewsโข2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 viewsโข2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 viewsโข2026-05-29











