Detecting cyber threats in non-english dark net markets: A cross-lingual transfer learning approach

Mohammadreza Ebrahimi, Mihai Surdeanu, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Recent advances in proactive cyber threat intelligence rely on early detection of cyber threats in hacker communities. Dark Net Markets (DNMs) are growing platforms in hacker community that provide hackers with highly- specialized tools and products which may not be found in other platforms. While text classification techniques have been used for cyber threat detection in English DNMs, the task is hindered in non-English platforms due to the language barrier and lack of ground-truth data. Current approaches use monolingual models on machine translated data to overcome these challenges. However, the translation errors can deteriorate the classification results. The abundance of data in English DNMs can be leveraged in learning non-English threats without using machine translation. In this study, we show that a deep cross-lingual model that can jointly learn the common language representation from two languages, significantly outperforms a monolingual model learned on machine translated data for identifying cyber threats in non-English DNMs. Unlike most studies, our approach does not require any external data source such as bilingual word embeddings or bilingual lexicons. Our experiments on Russian DNMs show that this approach can achieve better performance than state-of-the-art methods for non-English cyber threat detection in malicious hacker community.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018
EditorsDongwon Lee, Ghita Mezzour, Ponnurangam Kumaraguru, Nitesh Saxena
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages85-90
Number of pages6
ISBN (Electronic)9781538678480
DOIs
StatePublished - Dec 24 2018
Event16th IEEE International Conference on Intelligence and Security Informatics, ISI 2018 - Miami, United States
Duration: Nov 9 2018Nov 11 2018

Other

Other16th IEEE International Conference on Intelligence and Security Informatics, ISI 2018
CountryUnited States
CityMiami
Period11/9/1811/11/18

Fingerprint

hacker
threat
market
learning
community
language barrier
language
Transfer learning
Threat
intelligence
Experiments
lack
experiment
performance
Language

Keywords

  • Cross-lingual transfer learning
  • Cyber threat
  • Dark Net Markets
  • Deep learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Communication

Cite this

Ebrahimi, M., Surdeanu, M., & Chen, H. (2018). Detecting cyber threats in non-english dark net markets: A cross-lingual transfer learning approach. In D. Lee, G. Mezzour, P. Kumaraguru, & N. Saxena (Eds.), 2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018 (pp. 85-90). [8587404] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISI.2018.8587404

Detecting cyber threats in non-english dark net markets : A cross-lingual transfer learning approach. / Ebrahimi, Mohammadreza; Surdeanu, Mihai; Chen, Hsinchun.

2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018. ed. / Dongwon Lee; Ghita Mezzour; Ponnurangam Kumaraguru; Nitesh Saxena. Institute of Electrical and Electronics Engineers Inc., 2018. p. 85-90 8587404.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ebrahimi, M, Surdeanu, M & Chen, H 2018, Detecting cyber threats in non-english dark net markets: A cross-lingual transfer learning approach. in D Lee, G Mezzour, P Kumaraguru & N Saxena (eds), 2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018., 8587404, Institute of Electrical and Electronics Engineers Inc., pp. 85-90, 16th IEEE International Conference on Intelligence and Security Informatics, ISI 2018, Miami, United States, 11/9/18. https://doi.org/10.1109/ISI.2018.8587404
Ebrahimi M, Surdeanu M, Chen H. Detecting cyber threats in non-english dark net markets: A cross-lingual transfer learning approach. In Lee D, Mezzour G, Kumaraguru P, Saxena N, editors, 2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 85-90. 8587404 https://doi.org/10.1109/ISI.2018.8587404
Ebrahimi, Mohammadreza ; Surdeanu, Mihai ; Chen, Hsinchun. / Detecting cyber threats in non-english dark net markets : A cross-lingual transfer learning approach. 2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018. editor / Dongwon Lee ; Ghita Mezzour ; Ponnurangam Kumaraguru ; Nitesh Saxena. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 85-90
@inproceedings{fd71071bb6e1462b836358e0bd632004,
title = "Detecting cyber threats in non-english dark net markets: A cross-lingual transfer learning approach",
abstract = "Recent advances in proactive cyber threat intelligence rely on early detection of cyber threats in hacker communities. Dark Net Markets (DNMs) are growing platforms in hacker community that provide hackers with highly- specialized tools and products which may not be found in other platforms. While text classification techniques have been used for cyber threat detection in English DNMs, the task is hindered in non-English platforms due to the language barrier and lack of ground-truth data. Current approaches use monolingual models on machine translated data to overcome these challenges. However, the translation errors can deteriorate the classification results. The abundance of data in English DNMs can be leveraged in learning non-English threats without using machine translation. In this study, we show that a deep cross-lingual model that can jointly learn the common language representation from two languages, significantly outperforms a monolingual model learned on machine translated data for identifying cyber threats in non-English DNMs. Unlike most studies, our approach does not require any external data source such as bilingual word embeddings or bilingual lexicons. Our experiments on Russian DNMs show that this approach can achieve better performance than state-of-the-art methods for non-English cyber threat detection in malicious hacker community.",
keywords = "Cross-lingual transfer learning, Cyber threat, Dark Net Markets, Deep learning",
author = "Mohammadreza Ebrahimi and Mihai Surdeanu and Hsinchun Chen",
year = "2018",
month = "12",
day = "24",
doi = "10.1109/ISI.2018.8587404",
language = "English (US)",
pages = "85--90",
editor = "Dongwon Lee and Ghita Mezzour and Ponnurangam Kumaraguru and Nitesh Saxena",
booktitle = "2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Detecting cyber threats in non-english dark net markets

T2 - A cross-lingual transfer learning approach

AU - Ebrahimi, Mohammadreza

AU - Surdeanu, Mihai

AU - Chen, Hsinchun

PY - 2018/12/24

Y1 - 2018/12/24

N2 - Recent advances in proactive cyber threat intelligence rely on early detection of cyber threats in hacker communities. Dark Net Markets (DNMs) are growing platforms in hacker community that provide hackers with highly- specialized tools and products which may not be found in other platforms. While text classification techniques have been used for cyber threat detection in English DNMs, the task is hindered in non-English platforms due to the language barrier and lack of ground-truth data. Current approaches use monolingual models on machine translated data to overcome these challenges. However, the translation errors can deteriorate the classification results. The abundance of data in English DNMs can be leveraged in learning non-English threats without using machine translation. In this study, we show that a deep cross-lingual model that can jointly learn the common language representation from two languages, significantly outperforms a monolingual model learned on machine translated data for identifying cyber threats in non-English DNMs. Unlike most studies, our approach does not require any external data source such as bilingual word embeddings or bilingual lexicons. Our experiments on Russian DNMs show that this approach can achieve better performance than state-of-the-art methods for non-English cyber threat detection in malicious hacker community.

AB - Recent advances in proactive cyber threat intelligence rely on early detection of cyber threats in hacker communities. Dark Net Markets (DNMs) are growing platforms in hacker community that provide hackers with highly- specialized tools and products which may not be found in other platforms. While text classification techniques have been used for cyber threat detection in English DNMs, the task is hindered in non-English platforms due to the language barrier and lack of ground-truth data. Current approaches use monolingual models on machine translated data to overcome these challenges. However, the translation errors can deteriorate the classification results. The abundance of data in English DNMs can be leveraged in learning non-English threats without using machine translation. In this study, we show that a deep cross-lingual model that can jointly learn the common language representation from two languages, significantly outperforms a monolingual model learned on machine translated data for identifying cyber threats in non-English DNMs. Unlike most studies, our approach does not require any external data source such as bilingual word embeddings or bilingual lexicons. Our experiments on Russian DNMs show that this approach can achieve better performance than state-of-the-art methods for non-English cyber threat detection in malicious hacker community.

KW - Cross-lingual transfer learning

KW - Cyber threat

KW - Dark Net Markets

KW - Deep learning

UR - http://www.scopus.com/inward/record.url?scp=85061055388&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061055388&partnerID=8YFLogxK

U2 - 10.1109/ISI.2018.8587404

DO - 10.1109/ISI.2018.8587404

M3 - Conference contribution

AN - SCOPUS:85061055388

SP - 85

EP - 90

BT - 2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018

A2 - Lee, Dongwon

A2 - Mezzour, Ghita

A2 - Kumaraguru, Ponnurangam

A2 - Saxena, Nitesh

PB - Institute of Electrical and Electronics Engineers Inc.

ER -