Regresar

Word embedding application for rendering and exploration of unstructured medical information

Abstract:

Digital medical records store information that can describe a patient’s health. Word embedding allows textual information to be represented using low-dimensional dense vectors. These types of models capture the semantic and syntactic information of a corpus of information and therefore can be used to identify linguistic relationships between words. Considering the aforementioned, the objective of this work is to apply word embedding to represent medical documentation in digital format and identify similarities between certain medical terms. The results showed that the words most similar to obesity are: asthma, hypertension, diabetes, congestive heart failure, atherosclerotic cardiovascular disease, depression and sleep apnea, considered comorbidities of obesity. It was also found that there is a certain difference when finding similarities between two words, such as obesity and man, and between obesity and woman.