Yunhong Min

Posted on Apr 14, 2021 • Edited on Apr 16, 2021

[파이썬 ] 우선순위 큐(priority queue)를 위한 heapq 모듈 활용법

#python #heapq #priorityqueue #binaryheap

우선순위 큐는 우선 순위가 가장 높은 자료(data)를 가장 먼저 꺼낼 수 있는 자료구조다. 배열을 사용하면 직접 구현하기는 어렵지 않지만, 파이썬에서는 heapq라는 내장(built-in) 모듈로 제공되기 때문에, 내장 모듈의 API 기능만으로 충분히 활용 가능할때는 내장 모듈을 사용하는 것이 좋다.

이 글에서 힙은 이진힙을 의미하고, 우선순위 큐는 개별의 값을 가지는 힙이므로 두 단어를 번갈아 사용하겠다

기본 사용법

1. 우선 순위 큐의 생성 및 원소 삽입

heapq.heappush를 사용해 우선 순위 큐의 원소를 삽입할 수 있다. 첫번째 인자는 힙으로 사용할 list 이고, 두번째 인자는 삽입할 데이터이다.

heapq.heappush(heap, item)

삽입별 시간 복잡도는 O(log n)이다. n개의 원소를 삽입한다면 O(n log n)의 시간 복잡도를 가진다.

다음 예제를 보자.

import heapq

h = []
heapq.heappush(h, (3, "Go to home"))
heapq.heappush(h, (10, "Do not study"))
heapq.heappush(h, (1, "Enjoy!"))
heapq.heappush(h, (4, "Eat!"))
heapq.heappush(h, (7, "Pray!"))

print(h)

결과는 배열로 구현된 힙을 보여준다.

[(1, 'Enjoy!'), (4, 'Eat!'), (3, 'Go to home'), (10, 'Do not study'), (7, 'Pray!')]

2. 원소 꺼내기

원소 꺼내기는 heapq.heappop 으로 구현하면 된다. 각 꺼내기 별 시간 복잡도는 O(log n)이다.

import heapq

h = []
heapq.heappush(h, (3, "Go to home"))
heapq.heappush(h, (10, "Do not study"))
heapq.heappush(h, (1, "Enjoy!"))
heapq.heappush(h, (4, "Eat!"))
heapq.heappush(h, (7, "Pray!"))

count = 1
while h:
    print("%dth : %s" % (count, heapq.heappop(h)))
    count += 1

우선순위 순서대로 나온 결과를 확인할 수 있다.

1th : (1, 'Enjoy!')
2th : (3, 'Go to home')
3th : (4, 'Eat!')
4th : (7, 'Pray!')
5th : (10, 'Do not study')

3. 배열로부터 우선순위 큐 만들기

1번의 방법으로 n개의 요소를 가진 우선순위 큐를 만들었을 때, 시간 복잡도는 O(n log n) 이다. 그러나 배열로부터 힙을 만드는 최적 알고리즘의 시간 복잡도는 O(n)으로 알려져 있다. 파이썬에서는 O(n)의 시간으로 배열을 힙으로 만들 수 있는 heapq.heapfy을 제공한다. 주의할 점은 인자로 넣은 배열 자체가 힙으로 바뀐다는 점이다.

import heapq

h = [(3, "Go to home"), (10, "Do not study"), (1, "Enjoy!"), (4, "Eat!"), (7, "Pray!")]
heapq.heapify(h)

count = 1
while h:
    print("%dth : %s" % (count, heapq.heappop(h)))
    count += 1

결과는 1번으로부터 생성한 우선순위 큐와 동일하다.

1th : (1, 'Enjoy!')
2th : (3, 'Go to home')
3th : (4, 'Eat!')
4th : (7, 'Pray!')
5th : (10, 'Do not study')

heapq의 한계점

눈치 챘을 수도 있겠지만, heapq 모듈은 최소 힙(min heap) 밖에 지원을 안한다. 최대 힙(max heap)을 사용할 수 있는 방법은 없을까? heapq에서 공식적으로 지원해주는 기능은 없다. 검색을 해보면, heapq._heapfy_max 나 heapq._heappop_max를 사용하는 방법도 있지만, push를 지원해주지 않기 때문에 반쪽짜리 기능이고, protected method라서, python의 버전이 바뀌면서 interface가 변경될 수 있는 위험도 있다.

유일한 방법은 키 값을 변환하여 삽입하는 수 밖에 없다. 위에서 사용한 예제를 기준으로 최대 힙을 구현한다고 하면 다음과 같다

import heapq

h = [(3, "Go to home"), (10, "Do not study"), (1, "Enjoy!"), (4, "Eat!"), (7, "Pray!")]
max_h = [(-ele[0], ele[1]) for ele in h]
heapq.heapify(max_h)


count = 1
while h:
    print("%dth : %s" % (count, heapq.heappop(max_h)))
    count += 1

결과는 다음과 같다.

1th : (-10, 'Do not study')
2th : (-7, 'Pray!')
3th : (-4, 'Eat!')
4th : (-3, 'Go to home')
5th : (-1, 'Enjoy!')

모든 요소의 값을 변경하는데 드는 시간복잡도는 O(n)이고, 모든 힙 요소를 얻는데 드는 시간 복잡도는 O(n log n)이므로, 요소를 모두 돌면서 키를 바꾼다 해도, 전체 시간복잡도에는 영향을 주지 않는다.

그러나 이 방법은 원본 데이터를 건드릴 수 밖에 없는 점과, heapq.push를 사용해 추가로 데이터를 삽입할 때마다 데이터를 변경해줘야 하고, 추출했을 때도 원본 데이터와 다르다는 단점이 있다.

해결 방법 - 최대 힙 모듈 구현

직접 데이터를 변경하여 넣어줄 때 생기는 단점을 극복할 수 있는 방법중에, 기존 heapq 모듈을 래핑하여, 최대 힙을 구현하는 방법도 있다. 파이썬의 __lt__ 내장 메소드를 사용하면, 어렵지 않게 구현할 수 있다.

# maxheapq.py
import heapq

class _ReverseLessThan(object):
    def __init__(self, value):
        self.value = value

    def __lt__(self, other):
        return self.value > other.value

    def __repr__(self):
        return str(self.value)


def heappush(heap, item):
    reverse_item = _ReverseLessThan(item)
    heapq.heappush(heap, reverse_item)


def heappop(heap):
    reverse_item = heapq.heappop(heap)
    return reverse_item.value


def heapify(lst):
    for i, ele in enumerate(lst):
        lst[i] = _ReverseLessThan(ele)
    heapq.heapify(lst)

_ReverseLessThan 클래스는 생성자(__init__)에서 데이터를 인자로 받아서, value property로 저장한 후, __lt__ 메서드의 결과를 value property 기준으로 역으로 반환하여, 이 클래스의 인스턴스로 less 비교 연산 시, 비교 연산 결과가 반대로 나오도록 해준다.

heappush, heappop, heapify 함수는 _ReverseLessThan을 이용해, heapq 모듈의 함수를 래핑하여 구현하였다. heapq 모듈은 최소 힙을 지원하므로, less 연산의 결과가 반대로 나오는 인스턴스를 데이터로 넣으면, 최대힙이 구현된다.

주의할 점은 삽입 연산인 heappush, heapify 에서는 _ReverseLessThan 클래스로 데이터를 래핑해주고 있으므로, 추출 연산인 heappop에서는 언래핑 해주어야 한다.

각 함수의 전체 시간 복잡도는 heapq 모듈의 함수와 동일하다.

이 모듈을 사용해, 최대 힙을 만들어보자.

from maxheapq import heappush, heappop, heapify

# heappush로 생성하기
h1 = []

heappush(h1, (3, "Go to home"))
heappush(h1, (10, "Do not study"))
heappush(h1, (1, "Enjoy!"))
heappush(h1, (4, "Eat!"))
heappush(h1, (7, "Pray!"))
heappush(h1, (4, "Dup Eat!"))

print('>> h1 생성 결과')
print(h1)

count = 1
while h1:
    print("%dth : %s" % (count, heappop(h1)))
    count += 1


# heapify로 생성하기
h2 = [(3, "Go to home"), (10, "Do not study"), (1, "Enjoy!"), (4, "Eat!"), (7, "Pray!"), (4, "Dup Eat!")]

heapify(h2)

print('>> h2 생성 결과')
print(h2)

count = 1
while h2:
    print("%dth : %s" % (count, heappop(h2)))
    count += 1

결과를 보면 최대 힙이 정상적으로 생성된 것을 볼 수 있다.

>> h1 생성 결과
[(10, 'Do not study'), (7, 'Pray!'), (4, 'Dup Eat!'), (3, 'Go to home'), (4, 'Eat!'), (1, 'Enjoy!')]
1th : (10, 'Do not study')
2th : (7, 'Pray!')
3th : (4, 'Eat!')
4th : (4, 'Dup Eat!')
5th : (3, 'Go to home')
6th : (1, 'Enjoy!')
>> h2 생성 결과
[(10, 'Do not study'), (7, 'Pray!'), (4, 'Dup Eat!'), (4, 'Eat!'), (3, 'Go to home'), (1, 'Enjoy!')]
1th : (10, 'Do not study')
2th : (7, 'Pray!')
3th : (4, 'Eat!')
4th : (4, 'Dup Eat!')
5th : (3, 'Go to home')
6th : (1, 'Enjoy!')

지금까지 파이썬의 최소 힙 모듈인 heapq를 활용하는 간단한 방법과, heapq에서 제공해주지 않는 최대 힙을 직접 구현하여 사용할 수 있는 방법을 알아보았다. 실제 업무에서 이진힙을 사용하는 경우는 많지는 않지만, 혹시나 활용할 기회가 생긴다면 이글에 있는 내용이 도움이 되기를 기대해본다.

DEV Community

[파이썬 ] 우선순위 큐(priority queue)를 위한 heapq 모듈 활용법

기본 사용법

1. 우선 순위 큐의 생성 및 원소 삽입

2. 원소 꺼내기

3. 배열로부터 우선순위 큐 만들기

heapq의 한계점

해결 방법 - 최대 힙 모듈 구현

Top comments (0)

Read next

Building a Document Retrieval & Q&A System with OpenAI and Streamlit

12 tips on how to think like a web scraping expert

Sort Items in a Directory by Descending Size Using Python, PowerShell, C#, or Go

FireDucks: Get performance beyond pandas with zero learning cost!