fotofotofotofotofoto



immagine esempio

15 December 2021, 14.30 - Series of IAC general seminars: Mario Santoro (IAC)

2021-12-15

To conclude the series of IAC general seminars for year 2021, the presentation of one of the scientific activities of a new researcher of the institute

Topic Modelling (TM) is a widely adopted generative model used toinfer the thematic organization of text corpora. When document-levelcovariate information is available, so-called Structural Topic Modelling(STM) is the state-of-the-art approach to embed this information in thetopic mining algorithm. Usually, TM algorithms rely on unigrams as thebasic text generation unit, whereas the quality and intelligibility of theidentified topics would significantly benefit from the detection and usageof topical phrasemes. Following on from previous research, in this paper wepropose the first iterative algorithm to extend STM with n-grams, and wetest our solution on textual data collected from four well-known ToR drugmarketplaces. Significantly, we employ a STM-guided n-gram selectionprocess, so that topic-specific phrasemes can be identified regardless oftheir global relevance in the corpus. Our experiments show that enrichingthe dictionary with selected n-grams improves the usability of STM,allowing the discovery of key information hidden in an apparently“mono-thematic” dataset.

LINK: https://youtu.be/iBUBPYwHiWU