This video masterfully dismantles the "junk DNA" fallacy by revealing the functional complexity hidden within non-canonical sequences. It serves as a sharp reminder that our current genomic maps are merely a simplified sketch of a far more intricate molecular reality.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
What even is a D̸A̴R̵K̸ ̶P̴R̶O̷T̴E̵I̴N̷?Added:
The great astronomers Vera Rubin and Kent Ford noticed something peculiar when studying the rotation of galaxies.
Stars far away from the center of galaxies appear to be moving so fast that if our understanding of physics is correct, they should be flung away from the spiral. But instead of galaxies spiraling away into nothingness, these galaxies remained intact, as if there was extra mass holding galaxies together. This observation was a key piece of evidence that led to broader acceptance of the idea of dark matter, an invisible, difficult to study, but almost certainly real form of matter.
What if biology had its own dark matter?
Its own invisible, difficult to study, but almost certainly real molecules.
Introducing the dark proteome, which houses dark proteins. Proteins with cryptic function and an even more cryptic origin. But to understand the dark proteome, we must first understand the genome. Our genome is a code composed of very long strings of A, G, C's, and T's, the nucleotides of DNA.
Within those strings are regions of DNA that contain the blueprints for useful biologics like functional RNAs and proteins. You know these regions as genes. And since the start of the human genome project, humanity has uncovered thousands and thousands of genes that map directly onto proteins. Huge leaps and strides have been made in just the last two decades decoding the puzzle that makes us us. But as biology collided with informatics, weird glitches started to appear in the data.
Researchers were beginning to report the existence of proteins that shouldn't exist at all. Because we can read the code of canonical genes, we can predict what every gene's protein should look like. But as more and more biological data comes pouring in, it is becoming increasingly obvious that there is a subset of proteins whose existence cannot be mapped onto the sequences we have identified as genes. Our cells are making proteins that are not accounted for in our established genomic libraries. Much like dark matter, the existence of dark proteins forces our understanding of the standard rules of biology to evolve. In this video, we're going to talk about how researchers found these dark proteins, what it means for our understanding of biology and what functions, if any, these dark proteins actually have. This is a weird one. So, strap in. Ready? Then let's begin.
To understand the dark proteome, we first have to understand the central dogma.
The central dogma is a fundamental framework to understand the direction of information flow within biology. It states that information flows from DNA to RNA than to protein. And while there is more and more research coming out these days that makes this rule of biology seem more like a helpful guideline, the central dogma still holds solid for the overwhelming majority of biology. DNA, the hard drive of biology, contains all of the data required to build all of the biologics that make you you. There are special regions of the DNA called genes that specifically code for useful biomolelecules like RNA and protein. An enzyme called RNA polymerase can read those code snippets to make RNA. A lot of RNAs are actually immediately functional. There are so many different types of functional RNA that it's getting kind of hard to remember them all. But for the central dogma, what we care about is messenger RNA or mRNA. After being made, mRNA is shuttled out of the nucleus and into the ribosome where it can then be read to make protein. Every three bases in mRNA codes for a specific amino acid. To sum, DNA contains the data. mRNA is the readable form of that data. And the ribosome is able to read that mRNA and convert it into protein. We know from the intro that some dark proteins are by definition proteins that do not have a canonical gene origin. So, how were researchers able to find these dark proteins to begin with? To answer that question, we need to have a quick history lesson because I need to geek out about how incredible this lineage of scientists responsible for this breakthrough is.
I'm going to outline the contribution of each of these scientists to the technology that would enable the discovery of the dark proteome, then tie it all together. So, don't worry if this seems a little confusing at first. This story starts off with two titans of molecular biology, Joan Stites and Marilyn Kak. If you've taken a biology course in college, even if it's just at an introductory level, I can guarantee you that their work has wormed its way into your textbook. Joan Stites set the foundation for understanding the mechanics of how one gene can give rise to multiple proteins through the action of splicing. She is one of the pioneers of understanding the molecular machine that conducts splicing, the splyiosome.
Together with a scientist named Learner Stites identified and characterized the delightfully named SNURP, which are the building blocks of the splyosome. I gasped while reading her Wikipedia page when I realized that the queen of the snur herself has her fingerprints in this story. Marilyn Kak's name is literally printed in your biology textbooks as she was the one who discovered that the conversion between mRNA and protein in ukarotes happens at a site called the KAC consensus sequence. You may have heard of the start codon, a sequence of A, U, and G, which denotes the actual start of the protein coding region of mRNA. But KAC discovered that it's not just that AUG that is important. The ribosome needs a larger handle to grab onto before making protein. A whole sequence that ensures enough binding surface area for proper protein translation to start. And I just wanted to squeeze this in here, but Stites was trained by Watson of Watson and Crick fame. So, it's pretty cool to see that the roots of this story go so far back to the very classic days of molecular biology. While the work of KAC is also important, I'm going to focus on stites for this video. The work of stites showed that when the ribosome binds to mRNA, a fragment of that mRNA is completely encased within the ribosome, protected from anything that might be happening on the outside. It would take nearly 40 years for Stites's work to serve as the inspiration for Nicolas Engolia and Jonathan Weissman to create the high throughput technique of ribosome profiling, also known as ribosome footprinting or ribosk.
Ribosome profiling is a technique that answers a simple question. What proteins is a cell actually making? If you could freeze a cell in time and sample all of its protein production factories, what exactly is currently being produced?
Engolia and Weissman took a look at protein translation and realized they could use Stit's finding that the ribosome protects mRNA where it binds to be able to develop a method to sample every actively produced protein in the cell. I could tell you how Engolia and Weissman invented ribosome profiling, but I think it's a lot more fun to introduce a new and hopefully repeating segment called You Be the Scientist. The game everybody at home, including the 50% of my audience who doesn't have an advanced biology education to put yourself in the shoes of scientific giants. Engia and Weissman took a look at the fact that the ribosome protects certain fragments of mRNA by wrapping fully around them. How could you take advantage of that information to find out what proteins every ribosome is currently making? If you're not a biologist, don't worry. Don't think about what is practically possible. Let your imagination run wild. Go ahead and make your prediction in the comments below. Ready for the answer? Engolia and Weissman realized that if you could somehow find a way to destroy all the exposed RNA in the sample that only the fragments of RNA stored within the ribosome would survive. Then if you could somehow free those fragments later in time, you could use sequencing technology to determine the identity of every mRNA fragment that was stored inside a ribosome. From there, you could decode what genes those mRNAs came from, as well as the proteins they would eventually become. There was a bit of a problem, though. While this idea works in theory, what's stopping ribosomes from just slipping off of the mRNA while it's being destroyed? Engolia and Weissman needed a way to freeze the ribosome in place. And I assume after a fair bit of researching, they found their perfect reagent, cycllohexomide.
Cycllohexomide is a fungicide produced by the bacteria streptoycis grecius, the same bacteria that gave us the antibiotic streptoycin. So the plan was pretty straightforward. Treat the cells with cycllohexomide, which freezes ribosomes in place. Break open the cells. Add RNAs to destroy all of the RNA not protected by the ribosome. Break open the ribosome with a chemical reagent called EDTA. Then sequence the remaining fragments. I had a lot of positive feedback in my aiogenesis video when I got on my soap box for epistemology. So I'll get up on another soap box here. This time around the concept of elegance. Ribosome profiling is a very elegant solution to a very interesting problem. To me, elegance in science is when a solution is so simple it makes you wish you thought of it first. You don't need a background in theoretical quantum non-uclitian hyperdimensional electrical engineering to see the solution yourself. Once you have Stites's observation that ribosome protects RNA, you just need a bit of intle biology and chemistry, a clear thought process, and the luck that the tools to make your plan happen exist somewhere. I've had a couple of viewers in stream mention that they were dumb, perhaps even jokingly, but I really, really don't believe that. The weird thing about the mind is even if you know you're joking, that still does something to normalize the thought that you are dumb. You're sitting in a video breaking down cuttingedge biology in depth.
You're probably not dumb. And if you were able to put together that Engoleia and Weissman needed to destroy the non-ribosomebound RNA, congrats. The Eureka moment that happened in your brain is the same Eureka moment that happened in their brains. And if you didn't get it, don't worry. I think a young me would have missed this, too.
While ribosome profiling was not the breakthrough required for the discovery of the very first dark proteins, that honor goes to weird glitches found in yeast biology many years prior. It was instrumental in revealing the fact that dark proteins are not uncommon in biology. Ribosome profiling showed that ribosomes were binding RNAs that scientists thought were non-oding. And ribosomes don't typically bind to RNAs by accident, meaning there was a potential plethora of proteins coming from RNAs that themselves were coming from DNA that was not suspected to be protein coding regions. But before we get into the true nature of the dark proteome, if you're enjoying the video so far, consider liking and subscribing.
And if you're really enjoying the channel, I've got merch, YouTube, and Patreon memberships. Thank you for supporting the channel. And now, back to the dark proteome.
If ribosome profiling revealed that ribosomes were making proteins off of non-coding RNAs, then that must mean our definition of what a protein coding region is must be incorrect. Before I can explain to you where dark proteins come from, we first must look at the conventional definition of a protein coding region and ORF, which stands for open reading frame. There are a couple of rules an ORF needs to follow to be considered valid. First, it must have an AUG start codon. It must also have a stop codon, which denotes the end of a protein coding region. The amount of DNA in an ORF should also correspond to at least 100 amino acids in length. And ORFs should not overlap each other. If you're a modern biologist, you might find that these rules feel a little restrictive. And you'd be right. Because non-cononical orphs, the source of data that codes for dark proteins dance all over these rules. Before we get into that though, we first need to understand the logic of why these rules existed in the first place. In ukareotes, AUG is the start codeodon. It doesn't really make sense looking for protein coding regions that don't begin with aug. The size restriction for protein coding regions is also very logical. If you had a random string of DNA, the average space between a start and stop codon would roughly be 21 codons long. Meaning the average protein size would be 21 amino acids. And you can't really build biology on only small proteins. The longer the distance between the start and stop of a protein coding region, the more likely it is to actually code for a useful protein and not a tiny piece of spaghetti slop. The final rule on overlapping orphs is pretty much an extension of the previous rule on size.
Aug is the start codon, but it can also code for a plain old methionine. If you see an AUG near a stop codon, it is likely that AUG is not acting as a start codon for a teenytiny protein, but rather a plain old methionine as a part of a larger ORF. A non-cononical ORF violates these extremely reasonable rules for finding protein coding regions of DNA. A lot of these non-cononical orphs are actually hidden in plain sight. If we do away with our restrictions on AUG as the canonical start codeodon, minimum size, and rules prohibiting overlap, we actually find tons of potentially legal protein coding regions hiding in places like the untransated regions before and after canonical genes. We find them in regions between genes that were thought to be non-coding, potentially missed due to their tiny size. We can even find genes embedded within other genes. We've actually seen this phenomenon before in our influenza video. Influenza has a gene called PB1, but within that gene is another gene hidden inside of it. PB1F2 encodes an entirely different protein because the reading frame has been shifted and it would have been missed by genomics projects only focused on finding canonical genes. These original rules were not arbitrary. These tiny proteins, dubbed mini proteins, even if they did exist, would prove to be incredibly difficult to isolate and study. And back in the day, they probably didn't want a whole bunch of potential junk to gunk up their data.
But recent research is now showing that these hypothetical hard to study dark mini proteins might actually be kind of important. Like really important. 5% of the genome of the ukareote sarcomyces cervicier, the brewer's yeast, contains these non-cononical orphs. And many of these are important for the yeast's survival. This also holds true for the multisellular Drosophila melanagaster or fruitfly. Four non-cononical orphs were shown to have coded for proteins that when broken caused fly embryos to become nonviable. All four were involved in regulating the actin cytokeleton of the dividing embryo. You don't even have to look too far from our own biology as the protein myorean is one such dark protein. This teeny tiny 46 amino acid mini protein interacts with calcium pumps that actively remove calcium from the cellular cytool and stores them inside the cycoplasmic reticulum enabling muscle relaxation. And it comes from an RNA that people assumed was non-oding. An example of a dark protein being brought up to the light. What weirds me out is the fact that these dark proteins might be playing a really big role in cancer biology as well. The connection isn't obvious, but there are cancers like medulas that rely on dark proteins like asdurf in order to survive. Azdorf is a dark mini protein that makes up a part of the larger protein complex, the pacosome. The pachosome is a molecular chaperone, meaning that it's a protein that helps other proteins fold. And I'm going to be so real here. Understanding Azdurf's role in cancer is pretty complicated.
From what I could understand, Azdorf may be shifting protein production towards proteins that are beneficial for cancer survival. What's interesting to note is that this means that the connection between the transcryto, the set of mRNAs made by the cell, and the proteome, the set of proteins made by the cell, might be actively corrupted by the presence of cancer dependent aderf. How cancers evolved to be dependent on these dark proteins is an interesting problem. One potential explanation for how cancers grew to rely on some of these dark proteins is through certain stress response pathways. Now, I'm not trying to get you to feel bad for cancer, but being cancer is stressful. A cancer cell's entire life is spent growing as fast and recklessly as possible, actively ignoring safety features and trying to dodge the immune system. It is known that certain stress responses can cause the process of protein translation to go out of whack. And one of those consequences is having ribosomes get redirected to five prime untransated regions of genes, portions of non-oding DNA upstream of canonical gene sites.
Remember that these untransated regions can still house these dark genes within them. Meaning cancer cells have their ribosomes redirected from canonical, I guess, light genes towards these mysterious dark genes. Just like with dark matter, these dark proteins are difficult to study. They're often really small, and it's hard to know which hypothetical dark genes actually make real protein. Cells don't usually make a whole lot of them. But myulan and Azdorf prove that if humanity is going to remain unrelenting in their battle against cancer, the dark proteome must be illuminated. That's where this paper comes in. It's actually really hard to identify whether a dark protein is real or not. There also isn't really a centralized resource for researchers to mine information about these dark proteins. Where this paper comes in is trying to legitimize the study of dark proteins by providing a standardized way to prove that certain dark proteins exist and stuffing them all into an accessible database. Remember, ribosome profiling only gives you what mRNA sequences ribosomes are actively binding to. But until you see the protein itself, it's all just speculation.
Fortunately, this is a problem that one of the goats of molecular biology can solve. The mass spectrometer. This temperamental but extremely powerful machine can essentially tell you the identity of every protein in a complex mixed sample. But where conventional mass spec struggles is with proteins that are already very very tiny. They kind of get lost in the data. So the researchers here had to think outside the box. This massive collaboration set to work on coaxing out these small proteins for identification and they were able to exploit a feature of the biology of the immune system to do so.
Get this. These researchers used my favorite protein, the major hystocompatibility complex class one to handd deliver them dark proteins. Let me explain. MHC1s are a class of proteins whose job it is to display fragments of internal proteins onto the surface of a cell's membrane. If the protein fragment MHC displays is from the body, patrolling tea cells, leave that cell alone. If that protein fragment is from a foreign invader, like a virus, that TE-C cell will mount an immune response and destroy the offending cell. The great thing about MHC1 is that if you just tease out the short peptides that MHC is displaying, you can get a sense of what proteins are being made inside of the cell. Only looking at MHC1 peptides means that you can ignore every other protein inside of the cell. By reducing the complexity and reducing the noise of the sample, these researchers are able to give massspec the biggest chance of finding their teenytiny dark proteins. Across nearly 100,000 different proteomics experiments, a cohort of about 7,000 non-cononical ORFs, potential dark proteins were identified. And by using this neat little MHC peptide display trick, 1,785 of those hypothetical dark proteins were found to be real. And I say this with all of the excitement in my heart. I don't think anybody has any idea what these little glitches do. The world of dark proteins is so incredibly cool, and I am shivering in my timbers, awaiting what researchers will find in the shadow of the genome. Wow, what a wonderfully composed final sentence to end this video off. Surely nothing bad will happen to me on the day that I'm planning to publish this. Okay, so it's 8 a.m. on this fine Friday, and while double-checking some sources and making a few corrections, I came across this paper, which states that the dark proteome refers to proteins that definitely exist but have an unknown function. What we're talking about, according to these guys, is the ghost prote. Well, the paper that inspired this video refers to these non-cononical proteins as the dark proteome, as well as all of the press releases surrounding their publication. What you're seeing here in real time is a linguistic battle for a scientific definition and I kind of hate it because ghost prote-recording everything again. Bye. Y'all going to make me lose my mind. Up in here, up in here. Y'all going to make me go all out.
Up in here. Up in here. Y'all going to make me act a fool. Up in here. Up in here. Y'all going to make me lose my cool. Up in here. Up in here.
Related Videos
Secrets of the Sea: The Ocean’s Most Powerful Creatures & Their Amazing Abilities! 🌊🦈
SwampyTales
3K views•2026-05-29
POV: You're a Shark. The Octopus Already Knows You're There.
tentacleeeee
297 views•2026-05-28
How Do You Know If You're Getting Enough Vitamin D?
DrPeterKan
765 views•2026-05-29
800+ New Species Discovered in the Pacific!
raizen05-j6k
295 views•2026-05-30
@CreatureCases - 🌊☀️ 🌈🦊 Kit & Sam’s Sunny Adventures! 💖🐝 | Best Friends in Action 🌴✨| Compilation
CreatureCases
1K views•2026-05-28
Bird Nest Monitoring | Hidden In Plain Sight!!
thegeordierambler4373
251 views•2026-05-30
Seedling under seize #pest #plant_predators
Makeitsimple99
181 views•2026-06-01
When A Lonely Harpy Decides You're Her Mate
dreamaudiova
1K views•2026-05-30











