论文标题
parsisanj:一种基于半自动组件的搜索引擎评估方法
Parsisanj: a semi-automatic component-based approach towards search engine evaluation
论文作者
论文摘要
由于可用的数据大量,在过去的二十年中,通过搜索引擎访问Internet上所需的数据的范围很广,而且新数据的高率每天产生。因此,鼓励搜索引擎在Web上搜索中最有价值的现有数据。知道如何在搜索引擎程序的每个步骤中处理大量数据,从爬行到索引和排名只是专业搜索引擎应解决的挑战之一。此外,它还应该在处理用户的贩运,最先进的自然语言处理工具方面具有最佳实践,还应应对科学和技术边缘的许多其他挑战。结果,由于它们的内部复杂性水平,评估这些系统过于挑战,并且对于寻找现有系统的改进路径至关重要。因此,评估程序是具有构建其路线图的作用的搜索引擎的正常子系统。最近,几个国家开发了国家搜索引擎计划,以建立基础架构,以根据其在网络上的语言数据中的需求提供特殊服务。这项研究是为了启发两种伊朗国家搜索引擎的进步路径:Yooz和Parsijoo与两个国际搜索引擎相比,Google和Bing。与相关的工作不同,这是一种半自动方法,可以在第一个速度上评估搜索引擎。最终,我们获得了一些有趣的结果,这些结果基于它们,可以具体说明国家搜索引擎的基于组件的改进路线图。
Accessing to required data on the internet is wide via search engines in the last two decades owing to the huge amount of available data and the high rate of new data is generating daily. Accordingly, search engines are encouraged to make the most valuable existing data on the web searchable. Knowing how to handle a large amount of data in each step of a search engines' procedure from crawling to indexing and ranking is just one of the challenges that a professional search engine should solve. Moreover, it should also have the best practices in handling users' traffics, state-of-the-art natural language processing tools, and should also address many other challenges on the edge of science and technology. As a result, evaluating these systems is too challenging due to the level of internal complexity they have, and is crucial for finding the improvement path of the existing system. Therefore, an evaluation procedure is a normal subsystem of a search engine that has the role of building its roadmap. Recently, several countries have developed national search engine programs to build an infrastructure to provide special services based on their needs on the available data of their language on the web. This research is conducted accordingly to enlighten the advancement path of two Iranian national search engines: Yooz and Parsijoo in comparison with two international ones, Google and Bing. Unlike related work, it is a semi-automatic method to evaluate the search engines at the first pace. Eventually, we obtained some interesting results which based on them the component-based improvement roadmap of national search engines could be illustrated concretely.