拡張機能をインストールして、あらゆる動画内を即座に検索しましょう

Program Uniqueness the Easy Way | Rethinking Python Part 2
追加: 2026-05-25

110 回視聴913:54dutctv元のリリース: 2026-05-20

In CPython, the built-in id() function returns the memory address of an object, which is a point-in-time unique identifier that can be reused after the object is garbage collected; this means that storing object IDs for later comparison is unsafe because the same ID may refer to different objects over time, and while equality (==) is a transient property that can change, identity (is) is permanent until name rebinding occurs, with no strict relationship between the two concepts.

[00:00:06][music] >> Now, there are a couple of qualifications on uniqueness.

[00:00:15]I first want to tell you what the unique identifier for an object actually is in CPython to the current day.

[00:00:25]When we actually ask for the idea of an object in or for the idea of a name in Python, what we're actually going to do is get the memory address of the underlying pi object pointer that it that it's set to. And so here you can see some code from the CPython interpreter where the ID return value is really just take whatever the pointer is for the current pi object and give me a numerical value, give me a long integer value for this.

[00:00:50]And that's all this is.

[00:00:52]This is not necessarily the case in other Python interpreters. If you happen to use IronPython, you may discover that your ID is just some kind of monotonic counter. If you happen to use PyPy, it could be something else entirely.

[00:01:05]However, most of us are using CPython and as a consequence for the most part the ID of the object is actually just a memory address where that object lies.

[00:01:13]There's not a whole lot that we can do with that though and you'll see something very interesting here.

[00:01:19]I participated in the initial development of this feature of Python, the addition of audit hooks and you can see somebody has added an audit hook to our built-in ID to just trigger, wait, why is somebody doing an ID on this object? And likely the reason for that is if we can figure out where the memory address of an object is, this can then lead to us doing some nitty some some bit twiddling tricks that we might or might not want our end users to do. They may be able to break out of a sandbox or break out of some sort of uh code signing mechanism through this way. And so an example of that would be using the ID of an object and knowing that it's the memory address, what we can do is we can have NumPy create a NumPy ndarray and we can grow that NumPy ndarray between the space between it when it was first allocated and where the object is. We can use uh NumPy lib stride tricks to do a little bit of C magic here, and then we can mutate things that we shouldn't be able to mutate. This is a classical example of what you could do if you knew that the ID was actually the memory address. You could, for example, mutate a tuple, which otherwise supposed to be immutable. However, you're not supposed to do this. You shouldn't do this. And if somebody had something connected to their auto hook, they'd be able to catch this, and they'd be able to tell you, "Hold on a second. Don't do this." It's kind of clever. It mostly works. But for the most part, this isn't anything that we really want to do in real code.

[00:02:35]Now, one consequence of the ID being the memory address in CPython is, well, memory's finite, but our program may run for a very long amount of time, and we don't want to never reuse a location in memory, because then we'll easily run out of memory if our program does a lot of a lot of data analysis or a lot of whatever kind of processing. And so, we are very likely to see IDs reused. Here in this example, you'll see that X's and Y's are clearly not the same data. One is 1 2 3, the other one's 1 2 3 4.

[00:03:12]However, they have identical IDs. And the reason for that is, in between the creation of the object to which X's refers and the object to which Y's refers, we have deleted X's. We've garbage collected X's cuz there's no other references to it, and CPython has decided to reuse that memory location.

[00:03:31]What this means is that the ID is a point-in-time unique identifier for a given object. If we looked at the ID of an object, and then we spent a minute doing something else in our program, and then took that stored ID of the object and compared it to something, there's no guarantee that this ID still refers to an object that's live, and there's no guarantee that something else hasn't fit into that memory location and might misrepresent itself as being identical to the original object.

[00:04:00]It is only the case that if we look at the idea of two things at the exactly the same point in time that if those IDs are equivalent, then that means that these two objects are identical. The the names refer to identical objects. What this means is we should not store IDs in structures unless the storage of that ID is tied to the lifetime of the underlying object. In other words, we may have some system where we need to find unique elements. We need to find unique elements of something that's not hashable. And we might say, "Well, I could put these into a set if they were hashable and then see, is the element that I'm processing in the set? Have I processed it already? Have I seen it before? If so, skip it." But if it's not hashable, I can't do that. Cleverly, I might say, "Well, I could put the ID of the object in the set and see if the ID belongs to that set or not." And here you can see we have uh this doesn't work. We can't put this into the set by itself because it's not hashable, but it appears to work if we put the IDs in there. However, if this structure were stored long-term in between when we stored an ID in it and when we checked if something belongs, that ID may be stale and it may be refers referring to some object that is no longer around, which has been deallocated, whose memory has been reused by something else, and thus something will share the same ID despite not being something that we've seen before. In general, we should be very careful not to really use IDs in this fashion because it can lead to trivial bugs quite quickly because you can see memory is reused in certain parts of Python like in that list almost immediately. So, it's very likely that this might lead immediately to a problem. However, what we could maybe do is we could maybe make this a dictionary mapping the ID to the underlying object, which then ensures that that object exists as long as that entry in that dictionary exists, and as a consequence will ensure that there's always one reference to this object, so it's not garbage collected, that might work for our purposes. And in fact, when we talk about things like hashability, immutability, we may see some other approaches that have the same property.

[00:06:07]Note that this is actually a fairly similar type of problem that we might see in a very different domain, the domain of shell scripting.

[00:06:13]When we have a shell script and we create a child process, we might want to later in that shell script terminate that child process. For example, we might want when a parent process is terminated for it to first go through and terminate all its child processes, so it doesn't leave child processes lying around doing work that is no longer needed. And one way that we may try to do that is we may try to capture the process ID when we create that child process, and then later on use the kill built-in in order to kill that process. However, process IDs are also a finite resource, and they may be reused, and so between when we capture that process ID and when we use it for something, well, that process might have terminated by its own, and another process may have have started and might be reusing that process ID.

[00:07:01]If we were to run this block of code here, we would see that eventually the process IDs would be reused for this process here that runs and then terminates immediately. We'd eventually see it. But because the maximum number of process IDs that I have on my system is quite large, it may take a while before we see these reused. I could even reduce the number of uh process IDs, pids, that I can actually create, but that's not a very safe thing for me to do, and in general, you get the idea that, you know, just because we capture this value here doesn't mean that what we look at down here is the same thing.

[00:07:35]This is why, for example, if we want to terminate all child processes when a parent process terminates, we actually need some additional mechanisms, mechanisms such as the use of process namespaces, in which case we might say, "Well, nothing else in this process namespace, in this PID namespace, is going to be killing other processes. So, if I see this PID again, well, it's got to be the thing that I created five lines above, cuz there's nothing else running in this PID namespace, so there's nothing else that could have created this process." Or, we might need some other mechanisms, process groups, or mechanisms like B reps die with parent.

[00:08:07]Now, if we take a look at the properties of equality and identity, one very interesting thing that we can see is that equality is a transient property.

[00:08:19]That is to say, if two things are equal at one point in time in our code, they may not necessarily be equal at a later point in time in our code, irrespective of whether there's been a name rebinding or a name binding in between. Because somebody could have cleared one of these two structures, somebody could have appended to one of these two structures. They're equal only at a point in time in our code. However, as long as there are no name bindings, as long as nowhere in our code do we see X's colon equals or X's equal something or Y's colon equals or Y's equal something, if two names refer to identical objects at some point in our code, they will always refer to identical objects. This is a permanent property up to a name rebinding. Note that we could say, "Well, up to an operation they're both permanent properties." But, do note that one, name rebindings can really only occur through a very small number of means, either this style or this style or somebody's direct manipulation of say the globals. So, maybe you could do something like this, but this is very rare. So, really these are the two things that you might see or maybe something that brings a name into scope like an import, you know, from some module import X's. There's a finite number of things which can create a name binding, and those are usually pretty easy to spot and we usually know when they're happening. Whereas, the number of things that could possibly mutate an object, well, it's going to be determined by whatever the type of that object is and there's a lot of different possibilities here.

[00:09:48]Now, do note that the relationship between identity and equality, well, it may seem that well, one's a permanent property, one's a transient property, there might be a relationship that one implies the other, but that is not the case.

[00:10:01]It's possible for two objects to be equivalent but not identical. And it's possible for them to be equivalent and also identical. Both cases are possible and as a consequence, there is no relationship. Equality does not imply identity or non-identity. Similarly, it is possible for two objects or two names to refer to an identical object and for them to be equivalent and it's possible for two names to refer to an identical object but for them to not be equivalent. And thus, identity also does not imply equivalence or non-equivalence. There's in fact no strict relationship here. However, in this last case, it is fairly rare to see a case of an object which is not even equivalent to itself. The one case that's common enough, or at least the most common one that we can see, is float NaN, but even things like a none value, well, the none value doesn't have this property. It really is float NaN is probably the only time you'll actually see this in common use. That said, because you can implement EQ however you want, nothing stops you from creating a class that is always equivalent to something or never equivalent to something. There is nothing constraining you in this fashion.

[00:11:16]Note that if two names refer to identical objects, we are however guaranteed that the IDs are equal. And so, if it's the case that X is Y, then ID is equal to ID. However, do note that we are also guaranteed within CPython within a 64-bit machine probably on Linux at this point of time with a bunch of different qualifications that if two things are identical, the IDs are probably not identical because each of these ID calls is going to create an integer and that integer is going to be large enough that it won't be interned because it'll represent a memory address that's located in the upper ranges of your memory and as a consequence they will be the same numeric value but they will not be the same numerical object.

[00:12:03]Now, do note that we can implement I equivalence however we want but we cannot override that identity check. The identity check always checks for the same thing. It always checks is the unique identifier for these two objects the same, which in CPython means is the memory address for the object to which these names refer to the same.

[00:12:23]However, because the double equals can be implemented and the is cannot be implemented, that also means that the is check in Python is really just going to do a strict pointer comparison and doesn't run custom code, which means it doesn't need to potentially create another PyEval frame, which means that in some cases you may discover that the is check, if that's appropriate for your data, is faster than the double equals check and we'll even see some performance optimizations in Python which will try to make use of that.

[00:12:56]This may be a reason why we might want to use an Enum that Enum because an Enum represents singleton values. In this variation of the function where we check if X is the enumerated value a b c d e f g, this might be faster than checking if X is equal to a b c d e f g the string.

[00:13:16]However, we may discover that it's actually very difficult for us to observe that difference. We'll run this a bunch of times and then the answers will be inconclusive in part because the string equality check is so fast that likely what we're seeing here is just a lot of noise in the system. And we just won't be able to observe this unless the string is really, really, really long or unless other circumstances arise.

[00:13:39]Thanks for watching. If you enjoyed this video, give it a like, hit subscribe, and check out the recommended video on the screen. See you in the next one.

関連おすすめ

コンピュータサイエンス

Agentforce NOW AMA: Build with React and Salesforce Multi-Framework

SalesforceDevs

490 views•2026-05-28

コンピュータサイエンス

How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust

aiDotEngineer

450 views•2026-05-28

コンピュータサイエンス

WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅

LearnwithSahera

1K views•2026-05-29

コンピュータサイエンス

More tests are always better? How to use AI to identify tests that bring little value

Alliance4Qualification

335 views•2026-05-29

コンピュータサイエンス

Search Algorithms Explained in 60 Seconds! 🤖💨

samarthtuliofficial

218 views•2026-06-01

コンピュータサイエンス

People of Game of Thrones using JavaScript DOM

AltCampus

296 views•2026-05-30

コンピュータサイエンス

Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA

ascensionix

107 views•2026-05-29

コンピュータサイエンス

So What's Odin Lang Even Good For

TechOverTea

131 views•2026-06-01

トレンド

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30

The Fastest Way To Board A Plane 😮

zackdfilms

6504K views•2026-05-29

DOOM Runs On Everything...except Neo Geo

ModernVintageGamer

143K views•2026-06-01