Thesaurus-based named entity recognition system for detecting spatio-temporal crime events in Spanish language from Twitter
Abstract:
Social networks offer an invaluable amount of data from which useful information can be obtained on the major issues in society, among which crime stands out. Research about information extraction of criminal events in Social Networks has been done primarily in English language, while in Spanish, the problem has not been addressed. This paper propose a system for extracting spatio-temporally tagged tweets about crime events in Spanish language. In order to do so, it uses a thesaurus of criminality terms and a NER (named entity recognition) system to process the tweets and extract the relevant information. The NER system is based on the implementation OSU Twitter NLP Tools, which has been enhanced for Spanish language. Our results indicate an improved performance in relation to the most relevant tools such as Standford NER and OSU Twitter NLP Tools, achieving 80.95% precision, 59.65% recall and 68.69% F-measure. The end result shows the crime information broken down by place, date and crime committed through a webservice.
Año de publicación:
2017
Keywords:
- Data extraction
- Crime
- NER
Fuente:
Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Minería de datos
- Crimen
- Ciencias de la computación
Áreas temáticas:
- Programación informática, programas, datos, seguridad
- Criminología
- Gramática del inglés estándar