Sunday, July 5, 2009

Navigating in Small World Networks

Piere Fraigniaud and George Giakkoupis from the University of Paris at Diderot have a really nice paper in this upcoming PODC. Their paper titled "The Effect of Power-Law Degrees on the Navigability of Small Worlds" builds on the classic paper by Jon Kleinberg on navigating small world networks.

Jon Kleinberg's classic paper concerns navigation in a grid network where each node, in addition to its local edges, has one additional long range edge. What Kleinberg showed is that provided that, for each node, this long range link covers a distance d with probability proportional to d^2, then a greedy routing algorithm will ensure that any node can reach any other node in the network within no more than about log^2 n hops where n is the number of nodes in the network[1]. Moreover, the exponent in this probability is pretty important, even a slight deviation from an exponent of 2 results in networks that can not be efficiently navigated by greedy algorithms. In this way, Kleinberg was thus one of the first people to describe a type of network that might mimic the social network that allowed quick routing in Stanley Milgram's famous six degrees of separation experiments.

So what about this new paper by Piere and George? Well for many years the exponent of 2 in log^2 has bothered people. Piere and George show that it is possible to get rid of it with power laws. In particular, they show that if instead of each node having exactly 1 long distance link, the number of long distance links per node follows a certain powerlaw distribution, then greedy routing works in about log n hops. A powerlaw distribution means that the number of nodes with a number of long distance links x is proportional to x^k for some fixed constant k. This is a so called heavy tail distribution which occurs in many natural complex systems. Surprisingly, Piere and George show that the type of powerlaw distribution for which greedy routing works is when k is in the range between 2 and 3, which is very similar to the exponent one observes for degree distributions in many naturally occuring social networks.

As far as I know this is one of the first papers that suggests a nice functional property of powerlaw distributions. In particular, it shows that powerlaw distributions are more powerful than other distributions in achieving a specific mathematical goal. Are there other algorithmic or mathematical problems that powerlaw distributions are "good" for. It looks like a very nice paper.

[1] I'm using the term about to meet O(log^2 n) or roughly a function that grows like C log^2 n for some fixed constant C.

Wednesday, July 1, 2009

Bulwer-Lytton Fiction Contest Winner 2009

"She walked into my office on legs as long as one of those long-legged birds that you see in Florida - the pink ones, not the white ones - except that she was standing on both of them, not just one of them, like those birds, the pink ones, and she wasn't wearing pink, but I knew right away that she was trouble, which those birds usually aren't."

Raymond Chandler channels an ornithologist?

Sunday, June 21, 2009

random sampling

A fundamental statistical operation over a network is to sample a node uniformly at random. For large networks like the internet or the Facebook social network, this is the only principled way to gain information about properties of the network like: node degree distributions, fraction of nodes with a given property, clustering coefficient, etc.

For the most part, all techniques for sampling a node uniformly at random are currently just heuristics. Maybe you take a random walk around the graph for a while and then throw in some tricks to try to correct for bias in the stationary distribution. However, I think there is the possibility of a great and important theory problem here. Increasingly, many interesting networks are huge and not completely known. Randomly sampling nodes is a great way to measure properties of such networks, but bias in the sampling can lead to all kinds of problems [1]. Here's a stab at a problem: Given an unknown graph and an arbitrary node (or two or more) in that graph, devise an algorithm to sample (close to) uniformly from the set of all nodes, when the only operation available is to move along edges of the graph. Of course, the graph would have to have "sufficiently nice" properties to do this, e.g. ergodic, good expansion, good degree distribution, etc. Coupling theory would be a good candidate tool to use for this problem.

[1] For example, see this paper for an example of how bias due to traceroutes can lead to erroneous prediction of power-law degree distribution where it does not actually exist.

P.S. A few years ago, Valerie King and I wrote a paper on choosing a random node in a peer-to-peer network. This only worked for highly structured networks and so did not solve the problem described above. However, we came up with the following algorithmic tool that might be useful for the harder problem. Imagine we can broadcast a message from some node to all other nodes in the network once for free and we want to minimize the number of nodes that need to reply to this broadcast. What is the minimum number of responses that need to be sent in order to select a node uniformly at random? A naive approach is for every node to choose a number uniformly at random between 0 and 1; have the broadcaster send out a random number between 0 and 1; have nodes respond that have random numbers closest to that of the broadcaster (mod 1); and choose the single responder that is closest. Surprisingly, this scheme has significant bias. So how then can you solve the problem? Check out the paper for the answer!

Bertinoro

Last week, I attended the Algorithms and Data Structures (ADS) workshop in Bertinoro. Bertinoro is a small medieval town near Bologna, Italy that is perfect for a workshop. There's a hilltop castle offering well-protected meeting spaces for talks, and a nearby nunnery offering living accommodations for participants, plus the restaurants in town are uniformly excellent.

The workshop had a great collection of talks, including several excellent talks on data structures, an area I don't normally keep up on. Some that stood out for me:
  • Monika Henzinger , formerly head of research at google and now at EPFL, surveyed recent results on the problem of choosing a random web page. This problem is rendered non-trivial by the fact that nobody knows the entire network of web pages and links and so the only tool available for sampling is to walk from one page to another. More on this topic in a later post.
  • My friend Amos Fiat, at the University of Tel Aviv, gave a great talk on a connection between differential privacy and computational geometry. A coreset is a small subset of points that can be used to compute an approximate solution to a problem over a larger set of points. Amos introduced the concept of a private coreset, which essentially has low sensitivity to the replacement of a single point in the original point set
  • Andrew Goldberg introduced the concept of the highway dimensionality of a graph to show why certain heuristics for speeding up shortest paths computations work so well on graphs related to road networks. Essentially this is a number like the tree-width of a graph that measures the structural simplicity of the graph. Smaller highway dimention equals a more algorithmically tractable graph. I really like these results that generate analytic techniques that move beyond worst case analysis.
After the conference, Amos and I traveled to Rome where we worked with Stefano Leonardo for several days on a problem related to auctions with budgets.