All people in computer science and some fields of engineering (e.g. industrial engineering?) are very familiar with “Graphs” — those nodes and arcs. And, actually, we can represent the web as a [huge] graph. Where node=webpage, arc=(hyper)link.
From this representation, it gives us a way to understand the characteristic of the web better (as we do well with normal graphs).
graph structure in the web | web graph | more on web graph
Talking about representing document/site as a node in a graph, Peer-to-Peer people already done this since their early day.
Making it more relevant to this blog, one of the most popular P2P application is obviously an IR-like system — search for mp3 song or DivX movie, given a title or singer’s name.
Searching things on P2P network is not like a traditional search engine searching its database (which is a snapshot of a part of the web at a particular time, collected by spiders/web spiders).
Rather, the P2P search visits each node, doing searching in that node, jump to other node .. and so on, in “real time”. Clearly, it is impossible to visits every nodes in the network, there are just too many nodes out there. To decide which node it will make a visit or not, it needs a routing algorithm.
As a result, we can simplified a search problem in P2P network as a routing problem, loosely.
[ to find a document is to find a way to that document ]
There are even some more advance routing algorithm that use semantics!
bact’: I used to think about using NLP with P2P routing. But it just “thinking” anyway, never do .. lazy me 🙁