Tara Khodaei
Postdoctoral Research Associate,
Dept. of Scientific Computing,
Florida State University

"TopicContml: Genome-wide Phylogeny Reconstruction Using Topic Modeling"

Wednesday, Oct 30, 2024

Abstract:

Inferring the evolutionary history of species or populations with genome-wide data is gaining ground, but computational constraints still limit our abilities in this area. We developed an alignment-free method, TopicContml, to infer the genome-wide species tree. The method uses probabilistic topic modeling, an unsupervised machine learning method inspired by natural language processing, to extract topic frequencies from k-mers derived from multilocus DNA sequences, which are then used by Contml to generate a species tree. The approach operates in two phases: (1) The multi-locus or genome-wide data is broken down into k-mers; these k-mers are then used to learn a probabilistic topic model and extracts the topic frequencies of sequences using Latent Dirichlet Allocation (LDA) model for each locus. (2) These topic frequencies from multiple loci are then used to estimate a phylogeny with in the PHYLIP package. We evaluate our method with different biological datasets.

Attachments:
FileDescriptionFile size
Download this file (Tara_Khodaei_Research.png)Tara_Khodaei_Research.pngResearch145 kB
Download this file (Colloquium_TaraKhodaei_oct30.pdf)Colloquium_TaraKhodaei_oct30.pdfAdvertisement157 kB
Dept. of Scientific Computing
Florida State University
400 Dirac Science Library
Tallahassee, FL 32306-4120
Phone: (850) 644-1010
admin@sc.fsu.edu
© Scientific Computing, Florida State University
Scientific Computing