nference

Recapitulation and Retrospective Prediction of Biomedical Associations Using Temporally-enabled Word Embeddings

Abstract: The recent explosion of biomedical knowledge presents both a major opportunity and challenge for scientists tackling complex problems in healthcare. Here we present an approach for synthesizing biomedical knowledge based on a combination of word-embeddings and select cooccurrences. We evaluated our ability to recapitulate and retrospectively predict disease-gene associations from the Online Mendelian Inheritance in Man (OMIM) resource. Our metrics achieved an area under the curve (AUC) value of 0.981 at the recapitulation task for 2,400 disease-gene associations. At the most stringent cutoff, our metrics predicted 13.89% of these associations before their first cooccurrence in the literature, with a median time of 4 years between prediction and first cooccurrence. Finally, our literature metrics can be combined with human genetics data to retrospectively predict disease-gene associations, IL-6 and Giant Cell Arteritis provided as an example. We believe this framework can provide robust biomedical hypotheses at a much faster pace than current standard practices.

  • Authors:
  • Jiho Park1,
  • Agustin Lopez-Marquez1,
  • Arjun Puranik1,
  • Ajit Rajasekharan1,
  • Murali Aravamudan1,
  • Enrique Garcia-Rivera1
  • 1nference, Cambridge, MA, 02142, USA
  • Correspondence:
  • Affiliation:
  • Copyright:
  • The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
  • Related Tweets:
  • Affiliation:
  • Copyright:
  • The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
  • Related Tweets:
Scroll to Top