Machine learning for attack vector identification in malicious source code

Victor A. Benjamin, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

As computers and information technologies become ubiquitous throughout society, the security of our networks and information technologies is a growing concern. As a result, many researchers have become interested in the security domain. Among them, there is growing interest in observing hacker communities for early detection of developing security threats and trends. Research in this area has often reported hackers openly sharing cybercriminal assets and knowledge with one another. In particular, the sharing of raw malware source code files has been documented in past work. Unfortunately, malware code documentation appears often times to be missing, incomplete, or written in a language foreign to researchers. Thus, analysis of such source files embedded within hacker communities has been limited. Here we utilize a subset of popular machine learning methodologies for the automated analysis of malware source code files. Specifically, we explore genetic algorithms to resolve questions related to feature selection within the context of malware analysis. Next, we utilize two common classification algorithms to test selected features for identification of malware attack vectors. Results suggest promising direction in utilizing such techniques to help with the automated analysis of malware source code.

Original languageEnglish (US)
Title of host publicationIEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics
Pages21-23
Number of pages3
DOIs
StatePublished - 2013
Event11th IEEE International Conference on Intelligence and Security Informatics, IEEE ISI 2013 - Seattle, WA, United States
Duration: Jun 4 2013Jun 7 2013

Other

Other11th IEEE International Conference on Intelligence and Security Informatics, IEEE ISI 2013
CountryUnited States
CitySeattle, WA
Period6/4/136/7/13

Fingerprint

Learning systems
Identification (control systems)
Information technology
Malware
Feature extraction
Genetic algorithms

Keywords

  • Cyber security
  • Malware analysis
  • Static analysis

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems

Cite this

Benjamin, V. A., & Chen, H. (2013). Machine learning for attack vector identification in malicious source code. In IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics (pp. 21-23). [6578779] https://doi.org/10.1109/ISI.2013.6578779

Machine learning for attack vector identification in malicious source code. / Benjamin, Victor A.; Chen, Hsinchun.

IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics. 2013. p. 21-23 6578779.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Benjamin, VA & Chen, H 2013, Machine learning for attack vector identification in malicious source code. in IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics., 6578779, pp. 21-23, 11th IEEE International Conference on Intelligence and Security Informatics, IEEE ISI 2013, Seattle, WA, United States, 6/4/13. https://doi.org/10.1109/ISI.2013.6578779
Benjamin VA, Chen H. Machine learning for attack vector identification in malicious source code. In IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics. 2013. p. 21-23. 6578779 https://doi.org/10.1109/ISI.2013.6578779
Benjamin, Victor A. ; Chen, Hsinchun. / Machine learning for attack vector identification in malicious source code. IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics. 2013. pp. 21-23
@inproceedings{071c970fdb4842fd88a5fa86c0c8b915,
title = "Machine learning for attack vector identification in malicious source code",
abstract = "As computers and information technologies become ubiquitous throughout society, the security of our networks and information technologies is a growing concern. As a result, many researchers have become interested in the security domain. Among them, there is growing interest in observing hacker communities for early detection of developing security threats and trends. Research in this area has often reported hackers openly sharing cybercriminal assets and knowledge with one another. In particular, the sharing of raw malware source code files has been documented in past work. Unfortunately, malware code documentation appears often times to be missing, incomplete, or written in a language foreign to researchers. Thus, analysis of such source files embedded within hacker communities has been limited. Here we utilize a subset of popular machine learning methodologies for the automated analysis of malware source code files. Specifically, we explore genetic algorithms to resolve questions related to feature selection within the context of malware analysis. Next, we utilize two common classification algorithms to test selected features for identification of malware attack vectors. Results suggest promising direction in utilizing such techniques to help with the automated analysis of malware source code.",
keywords = "Cyber security, Malware analysis, Static analysis",
author = "Benjamin, {Victor A.} and Hsinchun Chen",
year = "2013",
doi = "10.1109/ISI.2013.6578779",
language = "English (US)",
isbn = "9781467362115",
pages = "21--23",
booktitle = "IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics",

}

TY - GEN

T1 - Machine learning for attack vector identification in malicious source code

AU - Benjamin, Victor A.

AU - Chen, Hsinchun

PY - 2013

Y1 - 2013

N2 - As computers and information technologies become ubiquitous throughout society, the security of our networks and information technologies is a growing concern. As a result, many researchers have become interested in the security domain. Among them, there is growing interest in observing hacker communities for early detection of developing security threats and trends. Research in this area has often reported hackers openly sharing cybercriminal assets and knowledge with one another. In particular, the sharing of raw malware source code files has been documented in past work. Unfortunately, malware code documentation appears often times to be missing, incomplete, or written in a language foreign to researchers. Thus, analysis of such source files embedded within hacker communities has been limited. Here we utilize a subset of popular machine learning methodologies for the automated analysis of malware source code files. Specifically, we explore genetic algorithms to resolve questions related to feature selection within the context of malware analysis. Next, we utilize two common classification algorithms to test selected features for identification of malware attack vectors. Results suggest promising direction in utilizing such techniques to help with the automated analysis of malware source code.

AB - As computers and information technologies become ubiquitous throughout society, the security of our networks and information technologies is a growing concern. As a result, many researchers have become interested in the security domain. Among them, there is growing interest in observing hacker communities for early detection of developing security threats and trends. Research in this area has often reported hackers openly sharing cybercriminal assets and knowledge with one another. In particular, the sharing of raw malware source code files has been documented in past work. Unfortunately, malware code documentation appears often times to be missing, incomplete, or written in a language foreign to researchers. Thus, analysis of such source files embedded within hacker communities has been limited. Here we utilize a subset of popular machine learning methodologies for the automated analysis of malware source code files. Specifically, we explore genetic algorithms to resolve questions related to feature selection within the context of malware analysis. Next, we utilize two common classification algorithms to test selected features for identification of malware attack vectors. Results suggest promising direction in utilizing such techniques to help with the automated analysis of malware source code.

KW - Cyber security

KW - Malware analysis

KW - Static analysis

UR - http://www.scopus.com/inward/record.url?scp=84883413892&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883413892&partnerID=8YFLogxK

U2 - 10.1109/ISI.2013.6578779

DO - 10.1109/ISI.2013.6578779

M3 - Conference contribution

SN - 9781467362115

SP - 21

EP - 23

BT - IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics

ER -