Chinese underground market jargon analysis based on unsupervised learning

Kangzhi Zhao, Yong Zhang, Chunxiao Xing, Weifeng Li, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

With the rapid growth of online population, China has become the world's largest online market. This also gives rise to the Chinese underground market, which has facilitated many of the cybercrimes in China. Consequently, there is a need for research scrutinizing Chinese underground markets. One major challenge facing cybersecurity researchers is to understand the unfamiliar cybercriminal jargons. To this end, we are motivated to analyze jargons in Chinese underground market. Particularly, we utilize the recent advancements in unsupervised machine learning methods, word embedding and Latent Dirichlet Allocation. We evaluate our work on a research testbed encompassing 29 exclusive underground market QQ groups with 23,000 members. Specifically, we test the ability of the proposed approach to learn semantically similar words of known cybersecurity-related jargons. Results suggest the state-of-The-Art unsupervised learning approaches can help better understand cybercriminal language, providing promising insights for future research on Chinese underground markets.

Original languageEnglish (US)
Title of host publicationIEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages97-102
Number of pages6
ISBN (Electronic)9781509038657
DOIs
StatePublished - Nov 15 2016
Event14th IEEE International Conference on Intelligence and Security Informatics, ISI 2015 - Tucson, United States
Duration: Sep 28 2016Sep 30 2016

Other

Other14th IEEE International Conference on Intelligence and Security Informatics, ISI 2015
CountryUnited States
CityTucson
Period9/28/169/30/16

Fingerprint

Unsupervised learning
Testbeds
Learning systems
Market analysis

Keywords

  • Chinese underground market
  • cybersecurity
  • language model
  • unsupervised learning

ASJC Scopus subject areas

  • Information Systems
  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Cite this

Zhao, K., Zhang, Y., Xing, C., Li, W., & Chen, H. (2016). Chinese underground market jargon analysis based on unsupervised learning. In IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016 (pp. 97-102). [7745450] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISI.2016.7745450

Chinese underground market jargon analysis based on unsupervised learning. / Zhao, Kangzhi; Zhang, Yong; Xing, Chunxiao; Li, Weifeng; Chen, Hsinchun.

IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 97-102 7745450.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhao, K, Zhang, Y, Xing, C, Li, W & Chen, H 2016, Chinese underground market jargon analysis based on unsupervised learning. in IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016., 7745450, Institute of Electrical and Electronics Engineers Inc., pp. 97-102, 14th IEEE International Conference on Intelligence and Security Informatics, ISI 2015, Tucson, United States, 9/28/16. https://doi.org/10.1109/ISI.2016.7745450
Zhao K, Zhang Y, Xing C, Li W, Chen H. Chinese underground market jargon analysis based on unsupervised learning. In IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 97-102. 7745450 https://doi.org/10.1109/ISI.2016.7745450
Zhao, Kangzhi ; Zhang, Yong ; Xing, Chunxiao ; Li, Weifeng ; Chen, Hsinchun. / Chinese underground market jargon analysis based on unsupervised learning. IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 97-102
@inproceedings{068a4dce54f54500bbc03f673159d310,
title = "Chinese underground market jargon analysis based on unsupervised learning",
abstract = "With the rapid growth of online population, China has become the world's largest online market. This also gives rise to the Chinese underground market, which has facilitated many of the cybercrimes in China. Consequently, there is a need for research scrutinizing Chinese underground markets. One major challenge facing cybersecurity researchers is to understand the unfamiliar cybercriminal jargons. To this end, we are motivated to analyze jargons in Chinese underground market. Particularly, we utilize the recent advancements in unsupervised machine learning methods, word embedding and Latent Dirichlet Allocation. We evaluate our work on a research testbed encompassing 29 exclusive underground market QQ groups with 23,000 members. Specifically, we test the ability of the proposed approach to learn semantically similar words of known cybersecurity-related jargons. Results suggest the state-of-The-Art unsupervised learning approaches can help better understand cybercriminal language, providing promising insights for future research on Chinese underground markets.",
keywords = "Chinese underground market, cybersecurity, language model, unsupervised learning",
author = "Kangzhi Zhao and Yong Zhang and Chunxiao Xing and Weifeng Li and Hsinchun Chen",
year = "2016",
month = "11",
day = "15",
doi = "10.1109/ISI.2016.7745450",
language = "English (US)",
pages = "97--102",
booktitle = "IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Chinese underground market jargon analysis based on unsupervised learning

AU - Zhao, Kangzhi

AU - Zhang, Yong

AU - Xing, Chunxiao

AU - Li, Weifeng

AU - Chen, Hsinchun

PY - 2016/11/15

Y1 - 2016/11/15

N2 - With the rapid growth of online population, China has become the world's largest online market. This also gives rise to the Chinese underground market, which has facilitated many of the cybercrimes in China. Consequently, there is a need for research scrutinizing Chinese underground markets. One major challenge facing cybersecurity researchers is to understand the unfamiliar cybercriminal jargons. To this end, we are motivated to analyze jargons in Chinese underground market. Particularly, we utilize the recent advancements in unsupervised machine learning methods, word embedding and Latent Dirichlet Allocation. We evaluate our work on a research testbed encompassing 29 exclusive underground market QQ groups with 23,000 members. Specifically, we test the ability of the proposed approach to learn semantically similar words of known cybersecurity-related jargons. Results suggest the state-of-The-Art unsupervised learning approaches can help better understand cybercriminal language, providing promising insights for future research on Chinese underground markets.

AB - With the rapid growth of online population, China has become the world's largest online market. This also gives rise to the Chinese underground market, which has facilitated many of the cybercrimes in China. Consequently, there is a need for research scrutinizing Chinese underground markets. One major challenge facing cybersecurity researchers is to understand the unfamiliar cybercriminal jargons. To this end, we are motivated to analyze jargons in Chinese underground market. Particularly, we utilize the recent advancements in unsupervised machine learning methods, word embedding and Latent Dirichlet Allocation. We evaluate our work on a research testbed encompassing 29 exclusive underground market QQ groups with 23,000 members. Specifically, we test the ability of the proposed approach to learn semantically similar words of known cybersecurity-related jargons. Results suggest the state-of-The-Art unsupervised learning approaches can help better understand cybercriminal language, providing promising insights for future research on Chinese underground markets.

KW - Chinese underground market

KW - cybersecurity

KW - language model

KW - unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=85004093068&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85004093068&partnerID=8YFLogxK

U2 - 10.1109/ISI.2016.7745450

DO - 10.1109/ISI.2016.7745450

M3 - Conference contribution

AN - SCOPUS:85004093068

SP - 97

EP - 102

BT - IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -