Python Basics 1: Collections

This article will cover Python’s sequence, mapping, and set types. I will refer to these collectively as collection types. Collections are defined by the special methods __len__, __getitem__, __setitem__, __delitem__, and __iter__. __len__ is used by the built-in function len to get the length/size of a collection. __getitem__, __setitem__, and __delitem__ are used to define the [] (item access) operator. __iter__ is used by the built-in iter function to get an iterator over the collection’s elements. Iterators will be covered in a later article. Note that a str is actually a sequence of characters, and follows the same conventions as other sequence types, such as list and tuple.

Lists

Python’s list type is analogous to C++’s vector<void*> or Java’s List<Object>. It can automatically resize itself and the items can be of any type (including other lists).

list literals are created using square brackets:

>>> my_list = [1, 2, 3]

As with parentheses, Python ignores whitespace between square brackets:

my_list = [
    1,
    2,
    3,
]

Also notice that the final item can have a trailing comma. This allows for easy modification of list literals.

my_list = [
    1,
    2,
    3,
    4,
    5,
    'six',
]

Like str, list supports indexing, slices, and len.

>>> my_list[0]
1
>>> my_list[-1]
'six'
>>> my_list[-3:]
[4, 5, 'six']
>>> my_list.find(3)
2
>>> len(my_list)
6

You can modify the items of a list by assignment.

>>> my_list[0] = 'one'

list supports modification using append, insert, and pop.

>>> my_list.append('seven')     # add 'seven' to end of list
>>> my_list.insert(0, 'zero')   # place 'zero' at index 0, moving existing items back
>>> my_list.pop(0)              # remove and return the item at index 0
'zero'
>>> my_list.pop()               # remove and return the last item
'seven'

Note that inserting or popping anywhere except the end of the list takes linear time; inserting or popping at the end of the list takes constant time.

Since all variables are pointers, assigning a list does pointer assignment. That is, both names will refer to the same object in memory.

>>> my_other_list = my_list         # pointer assignment: both variables refer to the same list!
>>> my_other_list.append('seven')
>>> my_list
['one', 2, 3, 4, 5, 'six', 'seven']

To make a copy of a list, use slicing.

>>> my_copied_list = my_list[:]
>>> my_copied_list.append(8)
>>> my_copied_list
['one', 2, 3, 4, 5, 'six', 'seven', 8]
>>> my_list
['one', 2, 3, 4, 5, 'six', 'seven']

Note that slice copying performs a shallow copy, where both lists share references to the same objects.

>>> my_list_list = [[]]                 # list with a list in it
>>> my_copied_llist = my_list_list[:]
>>> my_copied_llist[0].append(1)        # item 0 is shared
>>> my_copied_llist.append(2)
>>> my_list_list
[[1]]
>>> my_copied_llist
[[1], 2]

(see Sequence Types and Time Complexity )

Tuples

tuple is like list, except it is immutable (cannot be modified). tuple supports all operations of list except ones that modify the tuple. tuple is often used like a C/C++ struct.

>>> point = (3, 5)
>>> point[0]
3
>>> point[-1]
5
>>> point[0] = 4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

Tuple Packing and Unpacking

The parentheses are only required to prevent ambiguity, and can be omitted.

>>> point = 3, 5

Items in a tuple (or any sequence type) can be unpacked into variables by specifying a list of variables on the left-hand side of the assignment operator.

>>> x, y = point

The number of variables must match the length of the sequence (and therefore must be known when the script is written).

>>> x, y, z = point
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 2 values to unpack

Packing and unpacking can be combined in one statement.

>>> x, y = 3, 5

This can be used for swapping variables, possibly modifying them in the process.

>>> x, y = y, x
>>> a, b = b, a+b

This can also be used for returning multiple values.

>>> def make_point():
...     return 3, 5
... 
>>> point = make_point()
>>> x, y = point
>>> # or
>>> x, y = make_point()

Sets

set is an unordered collection that contains unique values. An empty set can be created using set(). Non-empty set literals can be created using curly braces.

>>> s = set()
>>> s.add(3)
>>> s.add(3)
>>> s
{3}

Testing for membership in a set is much faster than for list or tuple.

>>> l = [3]
>>> s = {3}
>>> 3 in l # this must compare 3 to every item in l
True
>>> 3 in s # this locates 3 using it's hashcode, and takes constant time
True

set also supports set arithmetic operations union, intersection, and difference.

>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a      # unique letters in a
{'a', 'r', 'c', 'b', 'd'}
>>> b      # unique letters in b
{'a', 'z', 'c', 'l', 'm'}
>>> a - b  # set difference: letters in a but not in b
{'r', 'd', 'b'}
>>> a & b  # set intersection: letters in both a and b
{'a', 'c'}
>>> a | b  # set union: letters in either a or b
{'a', 'c', 'd', 'l', 'r', 'm', 'b', 'z'}
>>> a ^ b  # symmetric difference: letters in a or b but not both
{'d', 'l', 'r', 'm', 'b', 'z'}
>>> a ^ b == ( a - b ) | ( b - a )
True

(see Set Types)

Dictionaries

Dictionaries (dict) are Python’s associative array. Like list, dict contains several values that are accessed by index. Unlike list, dict’s indices can be any hashable object, and are referred to as keys.

An empty dict can be created using dict() or {}. Non-empty dict literals can be created using key : value pairs separated by commas and enclosed in curly braces. Like parentheses and square brackets, Python ignores whitespace between curly braces, so dict literals can span multiple lines for readability. You can test if a key is in the dict using in.

>>> tel = {'jack': 4098, 'sape': 4193}
>>> tel['guido'] = 4127
>>> tel
{'sape': 4193, 'jack': 4098, 'guido': 4127}
>>> tel['jack']
4098
>>> del tel['sape']
>>> tel['irv'] = 4127
>>> tel
{'irv': 4127, 'jack': 4098, 'guido': 4127}
>>> 'guido' in tel
True
>>> 'jack' not in tel
False

The dict() constructor builds dictionaries directly from sequences of key-value pairs:

>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
{'sape': 4139, 'jack': 4098, 'guido': 4127}

When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments:

>>> dict(sape=4139, guido=4127, jack=4098)
{'sape': 4139, 'jack': 4098, 'guido': 4127}

You can get a list of the keys in the dict using dict.keys(). The return value is an iterator, which will be covered later. For now, just know that you can convert it to a list or tuple by passing it to the appropriate constructor.

>>> tuple(tel.keys())
('irv', 'jack', 'guido')

The dict itself is also an iterator over its keys.

>>> tuple(tel)
('irv', 'jack', 'guido')

The keys are returned in arbitrary order. To list the keys in sorted order, use sorted.

>>> sorted(tel)
['guido', 'irv', 'jack']

You can get the values in the dictionary using the values method.

>>> tuple(tel.values())
(4127, 4098, 4127)

You can get the key-value pairs using the items method.

>>> tuple(tel.items())
(('irv', 4127), ('jack', 4098), ('guido', 4127))

(see Mapping Types)

Hashability

Python’s set and dict only work for hashable types. A hash code is a number computed from some data, such that the same data always has the same hash code, and different data should have a different hash code. An object is hashable if that object can compute a hash code of its data. A hashable type implements the method __hash__. An object’s hash code can be retrieved using the built-in function hash. Because list is mutable, it is not hashable.

>>> hash([3, 5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Since tuple is immutable, tuples are hashable as long as all of the items are hashable.

>>> hash((3, 5))
3713083796996318431
>>> hash(('john', 'doe'))
3003296910174079013
>>> hash((0, []))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

This means that tuple can be used as items in a set or keys in a dict.

>>> names = set()
>>> names.add(('john', 'doe')) # add a tuple to the set
>>> points = {}
>>> points[3, 5] = 'floor' # use tuples (3, 5) and (4, 4) as dict keys
>>> points[4, 4] = 'wall'

Implementing Hashable Types

In order for a type to be hashable, it must implement the __hash__ method. A general requirement is that two objects which compare equal must have the same hash code. You must first implement the __eq__ method so you know which fields must be incorporated in the __hash__ method.

Consider a 2d point class:

class Point:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y
    def __eq__(self, other):
        if isinstance(other, Point):
            return self.x == other.x and self.y == other.y
        return NotImplemented

Since the __eq__ method compares the x and y attributes, the __hash__ method must also use those attributes. An easy way to do this is to exclusive-or the hash codes of x and y:

class Point:
    # ...
    def __hash__(self):
        return hash(self.x) ^ hash(self.y)

Another option is to pack x and y into a tuple and use the tuple’s hash code:

class Point:
    # ...
    def __hash__(self):
        return hash( (self.x, self.y) )

(see Python’s requirements on __hash__, Java’s requirements on hashing, Java’s List.hashCode, and my notes on hashing in Java and C++ )