The Web provides us with a vast resource for business
intelligence. However, the large size of the Web and its
dynamic nature make the task of foraging appropriate
information challenging. General-purpose search engines
and business portals may be used to gather some basic
intelligence. Topical crawlers, driven by richer
contexts, can then leverage on the basic intelligence to
facilitate in-depth and up-to-date research. In this
paper we investigate the use of topical crawlers in
creating a small document collection that helps locate
relevant business entities. The problem of locating
business entities is encountered when an organization
looks for competitors, partners or acquisitions. We
formalize the problem, create a test bed, introduce
metrics to measure the performance of crawlers, and
compare the results of four different crawlers. Our
results underscore the importance of identifying good
hubs and exploiting link contexts based on tag trees for
accelerating the crawl and improving the overall
results. pdf