Regresar

Semantic Architecture for the Extraction, Storage, Processing and Visualization of Internet Sources Through the Use of Scrapy and Crawler Techniques

Abstract:

The collection of structured data on the web involves a significant problem at the time of its abstraction in HTML pages, subsequently the processing of information for the reuse of any user and finally send it to a semantic process involves a difficult task to find an architecture that fulfill all these objectives. The present researching work has two main objectives that give solution to two of the major problems of the web of today. (a) Information overloaded: To provide a solution to the data collection hosted on the WEB in HTML format by merging data collection tools (Scrapy, Selenium) involving the user to perform a monitoring of the data to be collected. In addition, the existing limitations within tools that provide a similar service are taken into consideration. (b) Conceptualization of the data: To afford the user with a work space where the transformation of structured data to semantic data is allowed, taking into account the principles of Linked Data, moreover, the process of giving semantics to the data where aspects are taken into consideration Important such as: reuse of vocabularies, for covering this aspect it is made use of online catalogs that help to search existing vocabularies.