Use tuple as key in Python

created at 07-27-2021 views: 4

In fact, when I got this conclusion, I thought it was very simple and normal, but when I saw this usage for the first time, I thought it was very novel:

class Solution(object):
    def lenLongestFibSubseq(self, A):
        index = {x: i for i, x in enumerate(A)}
        longest = collections.defaultdict(lambda: 2)

        ans = 0
        for k, z in enumerate(A):
            for j in xrange(k):
                i = index.get(z - A[j], None)
                if i is not None and i < j:
                    cand = longest[j, k] = longest[i, j] + 1
                    ans = max(ans, cand)

        return ans if ans >= 3 else 0

Note that longest[j, k] in the last three lines of code can actually be used like this? At first I thought it was the function provided by defaultdict, so I went to look through the implementation of collections, in _collectionsmodule.c, but did not find methods such as __getitem__, but I saw the definition of defaultdict is as follows:

typedef struct {
    PyDictObject dict;
    PyObject *default_factory;
} defdictobject;

When there is no key in the dict, it will call default_factory to get the default key. So, does dict also support this usage? Let me try:

>>> a = {}
>>> a[1, 2] = 3
>>> a[1,2]
3
>>> a
{(1, 2): 3}

It really is. Seeing this, I probably know why, because 1, 2 is actually a tuple, and a tuple is an immutable object and is hashable. Let’s verify it. But I did not find the realization of __getitem__, but I found the realization of key in dict operation:

/* Return 1 if `key` is in dict `op`, 0 if not, and -1 on error. */
int
PyDict_Contains(PyObject *op, PyObject *key)
{
    Py_hash_t hash;
    Py_ssize_t ix;
    PyDictObject *mp = (PyDictObject *)op;
    PyObject *value;

    if (!PyUnicode_CheckExact(key) ||
        (hash = ((PyASCIIObject *) key)->hash) == -1) {
        hash = PyObject_Hash(key);
        if (hash == -1)
            return -1;
    }
    ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &value);
    if (ix == DKIX_ERROR)
        return -1;
    return (ix != DKIX_EMPTY && value != NULL);
}

/* Internal version of PyDict_Contains used when the hash value is already known */
int
_PyDict_Contains(PyObject *op, PyObject *key, Py_hash_t hash)
{
    PyDictObject *mp = (PyDictObject *)op;
    PyObject *value;
    Py_ssize_t ix;

    ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &value);
    if (ix == DKIX_ERROR)
        return -1;
    return (ix != DKIX_EMPTY && value != NULL);
}

It can be seen that as long as hash = PyObject_Hash(key); this step is successful, it is fine. In other words, as long as the hash value can be calculated, then we can try to rewrite the __hash__ method for a class to see if it can be used as a key:

class A(list):
    pass

a = A()

d = {}

d[a] = 1

Originally, an error would be reported:

$ python test.py 
Traceback (most recent call last):
  File "test.py", line 8, in <module>
    d[a] = 1
TypeError: unhashable type: 'A'

Because the list is not immutable, it is impossible to calculate the hash value, but we can add methods to it:

class A(list):
    def __hash__(self):
        return 1

a = A()

d = {}

d[a] = 1

that's it.

Okay, here I finally figure out why I can do this. So what are the advantages of writing dict[1, 2, 3]? That is, originally if we encountered a two-dimensional matrix or a three-dimensional matrix, we had to make a matrix: [[1, 2, 3], [4, 5, 6]] and then when accessing the data, we would access it like this: a[1][2], but with this feature, we can combine defaultdict like the original code, and then a[1, 2] to access and assign values, which feels much more convenient.

created at:07-27-2021
edited at: 07-27-2021: