Saturday, March 24, 2007

Computational Complexity, Part II

Complexity theorists classify problems based on the amount of time and memory it takes to solve a problem. Let's say that we can define the size of a problem with a simple non-negative integer N, representing something countable such as the number of nodes in a tree or the size of a matrix. It makes sense that the larger N is, the longer it takes and/or the more memory is required to solve a problem. The question is, how do the time and memory requirements grow with N?

The time to solve many problems grows proportionally to a polynomial in N. For example, factoring an N by N matrix takes an amount of time proportional to N3. We denote this relationship by saying that factorization is "of order N-cubed," written O(N3). If we can find a polynomial-order algorithm, then the problem is in the complexity class known as P.

For some problems, such as the load balancing problem above, nobody has figured out a polynomial-order algorithm that always finds the correct answer. The only algorithms that we know will always work are prohibitively expensive, O(2N) or even worse, O(N!). Basically, the only algorithms that we know of to solve these sorts of problems require you to examine every possible combination and permutation. Problems for which there is no polynomial-order algorithm are often in the complexity class known as NP, which is an acronym for nondeterministic polynomial. "Nondeterministic" has to do with making a guess; essentially, the problems in this class are hard to solve but their solutions are easy to verify. If Stephen Hawking came to you with a solution to one of these problems, you could verify the solution in polynomial (O(Nk)) time.

There is a set of the hardest problems in NP that are equivalent to one another, so if you could solve one you could solve all the others. These problems are in a class known as NP-complete. The load balancing problem falls into this class, as do many other problems, such as tetris, N by N sudoku, and minesweeper.

The seminal NP-complete problem is called the Traveling Salesman Problem. Suppose that you are a traveling salesman, and you need to visit N cities and then return home. What is the shortest route that you can take? (This is a very important problem. If it's not a salesman traveling from city to city, it could be a message on a network, a truck belonging to a moving service, or an airplane in an airline's fleet.)

The corresponding decision problem (problem with a yes/no answer) is "Is there a route with length less than k?" If we can solve the decision problem, then we can solve the optimization problem (shortest route) by solving the decision problem for different values of k.

Another NP-complete problem is the Hamiltonian Cycle Problem. Given a graph containing vertices and edges (think airports and flights, respectively), is there a path beginning and ending at one point that visits all the vertices exactly once?

These problems sound similar in some way, but they have their differences. But we can show that they are equivalent by developing an algorithm for one that uses the other as a subroutine.

Suppose we had a good algorithm for the Traveling Salesman Problem. Then we could solve the Hamiltonian Cycle Problem with the following algorithm:

  1. Convert the airport/flight graph to a complete graph (a graph with an edge between every airport) with the following method:
    • If a flight exists between airport A and airport B, then give that edge a weight of 0. If there is no flight between A and B, then give that edge a weight of 1.
  2. Find the solution to the traveling salesman decision problem "Is there a route with length less than 1?"
    • If yes, then there is a Hamiltonian cycle. If no, then there is not.
If we had a polynomial-time algorithm that solved the Traveling Salesman Problem, then we could use that algorithm to solve the Hamiltonian Cycle Problem in polynomial time, too.

Complexity theorists have found ways to reduce the other problems I described above to be equivalent to the Traveling Salesman Problem. So if we could find a polynomial-time algorithm for one, we could solve all of them in polynomial time.

An open question in the complexity theory field is whether the complexity classes P and NP are one and the same. It's clear that P is contained within NP: if we can solve a problem in polynomial time, then we can verify a solution that someone provides to us in polynomial time, simply by solving the problem. But it's not clear that NP is contained within P; nobody's yet found a polynomial-time algorithm to solve any of the NP-complete problems. Most complexity theorists believe that P and NP are not the same, but nobody has any proof of that. If they did, then they could win a million dollars from the Clay Mathematics Institute.

Some people, such as traveling salesmen, folks in the shipping and transportation industries, and those in the load balancing or scheduling business, hope that P and NP are equivalent. Because if they are, then all our problems are solved.

Others, such as bankers, software engineers, privacy advocates, and those in the computer security business, hope that P and NP are not the same. Because if they are, then it has dire consequences for the security of our banking industry, web-based commerce, private correspondence and cell phone conversations, and password protection.

In my next entry, I will talk about problems that are even harder than NP.

2 comments:

Anonymous said...

A result of P = NP might not have "dire" consequences for the security industry. A polynomial solution on the order of n^100 is not much less prohibitive.

Rebecca said...

Job,

Good point! Although I guess the security industry might need to prepare for the eventuality of quantum computing, which can evidently crack the factoring problem.