Distilling Contextual Embeddings into A Static Word Embedding for Improving Hacker Forum Analytics

Benjamin Ampel, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Hacker forums provide malicious actors with a large database of tutorials, goods, and assets to leverage for cyber-attacks. Careful research of these forums can provide tremendous benefit to the cybersecurity community through trend identification and exploit categorization. This study aims to provide a novel static word embedding, Hack2Vec, to improve performance on hacker forum classification tasks. Our proposed Hack2Vec model distills contextual representations from the seminal pre-trained language model BERT to a continuous bag-of-words model to create a highly targeted hacker forum static word embedding. The results of our experimental design indicate that Hack2Vec improves performance over prominent embeddings in accuracy, precision, recall, and F1-score for a benchmark hacker forum classification task.

Original languageEnglish (US)
Title of host publicationProceedings - 2021 IEEE International Conference on Intelligence and Security Informatics, ISI 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665438384
DOIs
StatePublished - 2021
Externally publishedYes
Event19th Annual IEEE International Conference on Intelligence and Security Informatics, ISI 2021 - Virtual, Online, United States
Duration: Nov 2 2021Nov 3 2021

Publication series

NameProceedings - 2021 IEEE International Conference on Intelligence and Security Informatics, ISI 2021

Conference

Conference19th Annual IEEE International Conference on Intelligence and Security Informatics, ISI 2021
Country/TerritoryUnited States
CityVirtual, Online
Period11/2/2111/3/21

Keywords

  • Hacker forums
  • contextual embeddings
  • knowledge distillation
  • static word embeddings
  • text classification

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Distilling Contextual Embeddings into A Static Word Embedding for Improving Hacker Forum Analytics'. Together they form a unique fingerprint.

Cite this