Python的堆模块heapq

Python的heapq模块实现了一个适用于Python列表的最小堆算法。

堆是一种树形数据结构,其中子节点与父节点之间是一种有序关系。最大堆中父节点大于或等于两个子节点,最小堆父节点小于或等于两个子节点。heapq模块实现的是一个最小堆。

创建堆

创建堆有两种方式

方法一:使用heappush

1
2
3
4
5
6
7
8
In [18]: import heapq

In [19]: data = [1,3,4,5,7]

In [20]: heap = []

In [21]: for n in data:
...: heapq.heappush(heap, n)

方法二:使用heapify

如果数据已经存在于内存中,使用heapify原地重新组织列表中的元素会更加高效

1
2
3
4
5
In [24]: import heapq

In [25]: data = [1,3,4,5,7]

In [26]: heapq.heapify(data)

还有一种merge()方法,可以从多个可迭代对象中创建一个堆,相当于

1
2
3
In [27]: import itertools

In [28]: heapq.heapify(itertools.chain(*iterables))

访问堆内容

创建好堆之后,可以使用heappop()删除最小值的元素

1
2
3
4
5
6
7
8
In [29]: import heapq

In [30]: data = [1,2,34,67,8]

In [31]: heapq.heapify(data)

In [32]: heapq.heappop(data)
Out[32]: 1

如果希望在一个操作中删除现有最小元素并替换成新值,可以使用heapreplace()

1
2
3
4
5
6
7
8
9
10
11
12
In [33]:  import heapq

In [34]: data = [1,2,34,67,8]

In [35]: heapq.heapify(data)

# 先删除 再压入堆
In [36]: heapq.heapreplace(data, 12)
Out[36]: 1

In [37]: data
Out[37]: [2, 8, 34, 67, 12]

堆的数据极值

heapq还包含两个检查可迭代对象的函数,查找其中包含的最大值与最小值范围。

1
2
3
4
5
6
7
8
9
In [38]: import heapq

In [39]: data = [1,5,3,2,8,5]

In [40]: heapq.nlargest(3, data)
Out[40]: [8, 5, 5]

In [41]: heapq.nsmallest(3, data)
Out[41]: [1, 2, 3]

只有在n值比较小的情况下 nlargest 和 nsmallest 才比较高效

这两个函数还接受一个关键字参数key,用于更加复杂的数据结构中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
In [44]: portfolio = [
...: {'name': 'IBM', 'shares': 100, 'price': 91.1},
...: {'name': 'AAPL', 'shares': 50, 'price': 543.22},
...: {'name': 'FB', 'shares': 200, 'price': 21.09},
...: {'name': 'HPQ', 'shares': 35, 'price': 31.75},
...: {'name': 'YHOO', 'shares': 45, 'price': 16.35},
...: {'name': 'ACME', 'shares': 75, 'price': 115.65}
...: ]

In [45]: cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])

In [46]: cheap
Out[46]:
[{'name': 'YHOO', 'price': 16.35, 'shares': 45},
{'name': 'FB', 'price': 21.09, 'shares': 200},
{'name': 'HPQ', 'price': 31.75, 'shares': 35}]

In [47]: expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])

In [48]: expensive
Out[48]:
[{'name': 'AAPL', 'price': 543.22, 'shares': 50},
{'name': 'ACME', 'price': 115.65, 'shares': 75},
{'name': 'IBM', 'price': 91.1, 'shares': 100}]

其他有趣的方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 定义一个 list
In [1]: h = []

In [2]: import heapq

In [3]: heapq.heappush(h, 5)

In [4]: heapq.heappush(h, 4)

In [5]: heapq.heappush(h, 8)

In [6]: h
Out[6]: [4, 5, 8]

# append 操作仅仅只是将 1 追加到 列表 h 中 并没有压入到堆中
In [7]: h.append(1)

In [8]: h
Out[8]: [4, 5, 8, 1]

# 这里输出的最小值是 4 而不是 1
In [9]: heapq.heappop(h)
Out[9]: 4

# 这个步骤不仅会将 2 压入堆 还会将 1 压入
In [10]: heapq.heappush(h, 2)

In [11]: h
Out[11]: [1, 2, 8, 5]

# 最小值是 1
In [12]: heapq.heappop(h)
Out[12]: 1

heapq.heappushpop(heap, item)

是上述heappush和heappop的合体,同时完成两者的功能.注意:相当于先操作了heappush(heap,item),然后操作heappop(heap)

1
2
3
4
5
In [13]: heapq.heappushpop(h, 3)
Out[13]: 2

In [14]: h
Out[14]: [3, 5, 8]

实战

实现堆排序

1
2
3
4
5
6
7
8
9
10
11
In [50]: from heapq import *

In [51]: def heapsort(iterable):
...: h = []
...: for value in iterable:
...: heappush(h, value)
...: return [heappop(h) for _ in range(len(h))]
...:

In [52]: heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
Out[52]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

由于堆中的元素可以为元组。所以可以对带有权值的元素进行排序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
In [53]: from heapq import *

In [54]: h = []

In [55]: heappush(h, (5, 'write code'))

In [56]: heappush(h, (7, 'release product'))

In [57]: heappush(h, (1, 'write spec'))

In [58]: heappush(h, (3, 'create tests'))

In [59]: heappop(h)
Out[59]: (1, 'write spec')

In [60]: heappop(h)
Out[60]: (3, 'create tests')
知识就是财富
如果您觉得文章对您有帮助, 欢迎请我喝杯水!