Håvard Krogstie

Posted on

# Remove terrible bus routes (find an algorithm)

Here is a challenge for those who want to do some algorithms programming, similar to what you would find in algorithms competitions, such as the IOI. It is not terribly difficult, but a great way to combine pen and paper solving with programming. This task uses 2D sorting, and my python solution is ~20 lines and runs in O(n*log n) time

# Problem statement

You are taking the bus from A to B, sometime today. You have a list of all possible trips, with a trip being represented as the tuple `(leave_time, arrive_time)`. The time you leave A, and the time you get to B, respectively. Both times are represented as integers.

You don't care about what the route looks like, so even really bad choices are included. There may be arbitrarily many ways of getting from A to B, some worse than others.

Turns out many of the trips are so bad that you would never want to take them. Therefore you would want to tidy up the list. Duplicates should be removed, and any trip that is objectively worse than another trip should be removed.

A trip `X` is objectively worse than another trip `Y` if `X.leave_time <= Y.leave_time && X.arrive_time >= Y.arrive_time`. Basically if you can leave from A later, and still arrive at B earlier, there is no point in ever taking trip X.

Given the list, write a function that returns a sorted and tidied up list.

## Input

Input is read from a file or the console.
The first line contains an integer `N` (`1 < N <= 1e5`), the number of trips given.
The following `N` lines contain two integers `leave_time` and `arrive_time` (`1 <= leave_time <= arrive_time <= 1e6`), denoting a trip in the list.

## Output

Output is printed out in the console or to a file.
One the first line, print the number of trips in the tidied list `M`.
Followed by `M` lines, with the integers `arrive_time` and `leave_time` for each trip in sorted order.

## Sample input 1

``````8
15 30
12 28
15 32
8 42
15 30
10 30
25 40
28 38
``````

Expected output:

``````3
12 28
15 30
28 38
``````

### Explanation

Each interval represents a possible trip. The `(leave_time, arrival_time)` tuple is written on the box. If the trip is objectively worse than another, the box is red, with an explanation written on it.

The routes that are left (green) are then sorted.

## Sample input 2

``````30
23 59
17 82
85 90
76 95
44 87
78 78
51 88
73 80
10 31
84 95
38 56
92 96
66 71
77 98
94 98
94 98
91 99
83 98
91 94
63 77
33 69
3 63
13 54
37 80
27 40
52 92
90 98
41 91
11 96
16 65
``````

Expected output:

``````9
10 31
27 40
38 56
66 71
78 78
85 90
91 94
92 96
94 98
``````

If you want a solution, I will publish a commented one written in python in the comments. However, the solution is probably easier to understand if you get to it on your own.

E. Choroba

Perl solution, inserting each line into the final array right after having read it. Implementing the binary search myself.

``````#! /usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

<>; # Ignore the size, just process all the lines.

my @good;
while (<>) {
my (\$leave, \$arrive) = split;

# Binary search.
my (\$from, \$to) = (0, \$#good);
while (\$from < \$to) {
my \$middle = int((\$from + \$to) / 2);
if (\$good[\$middle][0] < \$leave) {
\$from = \$middle + 1;
} else {
\$to = \$middle;
}
}
my \$new = \$from + (@good && \$good[-1][0] < \$leave);

if (\$new < \$#good && \$good[\$new][0] == \$leave) {
\$good[\$new][1] = \$arrive if \$arrive < \$good[\$new][1];
next
}

if (\$new > \$#good || \$arrive < \$good[\$new][1]) {
splice @good, \$new, 0, [\$leave, \$arrive];

if (\$new) {
my \$before = \$new;
do { --\$before } while \$before && \$good[\$before][1] >= \$arrive;
splice @good, \$before + 1, \$new - \$before - 1;
}
}
}

say scalar @good;
say "@\$_" for @good;
``````

Håvard Krogstie

# Python

Complexity O(nlog n)

``````#!/bin/env python3

n = int(input())
trips = []
for _ in range(n):
times = input().split(' ')
trips.append((int(times[0]), int(times[1])))

# Sort by decreasing arrive_time when the leave_time is the same
trips.sort(key=lambda x:x[1], reverse=True)
# By increasing leave_time
trips.sort(key=lambda x:x[0])

# Remove duplicates
trips = [trips[0]] + [b for a,b in zip(trips, trips[1:]) if a!=b]
stack = []
for trip in trips:
while stack and stack[-1][1] >= trip[1]:
#The trip on the stack is worse than trip
stack.pop()
stack.append(trip)

print(len(stack))
for a, b in stack:
print(a, b)

``````

### Explanation

The stack contains trips not yet determined to be bad. The trips are processed in ascending `leave_time`. This means any trip already on the stack, has a lower or equal `leave_time` than the trip we are currently looking at.

If the trip on the top of the stack arrives later than the current trip,
that means the trip on the stack starts earlier and ends later
than the current trip. Therefore the stack trip is worse, and is popped off the stack.

This is repeated until the trip on the top of the stack ends before the current trip.
At that point, all trips down the stack are safe, because
no trip further down the stack ends after the top trip.
After the stack-popping is done, we add the current trip.

If multiple trips start at the same time, the one
with the larger arrive_time comes first, because
it needs to be popped by the better trip, when it comes later.
The sorting we do at the start makes sure
that a trip always comes before the trip it is objectively worse than.
This means the last trip is guaranteed to be good.

E. Choroba

Why is `27 40` not part of the solution 2?

Håvard Krogstie • Edited

Ok, there is not an issue with the program, but the test input, rather. The first line says `20`, when there are 30 trips to read. Fixed.

Håvard Krogstie

That's a very good question! I'll debug my program right after breakfast.

Dmitry Yakimenko

Have you tested on the maximal `N = 1e6` as stated in the problem description? Your solution seems to be `O(N*N)` so it will be really slow.

Håvard Krogstie

One possibility is comparing every pair of trips to see if one is objectively worse than another, but that would take a long time if `N` were the theoretical maximum of 100000.