Our ongoing research addresses the task of finding topics at the sentence
level in email conversations. For instance, an email thread about arranging
a conference can discuss topics like “location and time”, “registration”,
“food menu”, “workshops”, etc. However, as an asynchronous collaborative
application, email has its own characteristics which differ from written
monologues (e.g., text books, news articles) or spoken dialogs (e.g.,
meetings). Hence, existing methods such as the generative topic models
(e.g., Latent Dirichlet Allocation (LDA)), and the lexical chain based
approach (e.g., LCSeg) which are successful in monologue or dialog, may not
be successful by themselves in asynchronous written conversations like
emails. We claim that in order to find topics we need to consider the
conversation structure and other conversation specific features. In our
experiments on a small development set we see that considering conversation
structure significantly improves the performance over the existing methods.
To this end, we propose a novel graph-theoretic framework to solve the
problem considering a rich feature set. Crucial to our proposed approach is
that it captures the discriminative email features and integrates the
strengths of the supervised approach with the unsupervised technique, still
considering LDA and LCSeg as important factors.