SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers

Crawlers, also called robots and spiders, are programs that browse the World Wide Web autonomously. This paper describes SPHINX, a Java toolkit and interactive development environment for Web crawlers. Unlike other crawler development systems, SPHINX is geared towards developing crawlers that are Web-site-specific, personally customized, and relocatable. SPHINX allows site-specific crawling rules to be encapsulated and reused in content analyzers, known as classifiers. Personal crawling tasks can be performed (often without programming) in the Crawler Workbench, an interactive environment for crawler development and testing. For efficiency, relocatable crawlers developed using SPHINX can be uploaded and executed on a remote Web server. Keywords: crawlers, robots, spiders, Web automation, Web searching, Java, end-user programming, mobile code.

推荐站点:免费下载神经网络、遗传算法、人工智能源程序、源代码 ,我的关联规则综述 ,向大伙推荐一个相当不错的网站,请weka 高人解惑,java搜索引擎: lucene学习笔记 1 ,需要多少日志才能算是web usage mining