A semantics-based approach to malware detection

Mila Dalla Preda, Mihai Christodorescu, Somesh Jha, Saumya K Debray

Research output: Contribution to journalArticle

64 Citations (Scopus)

Abstract

Malware detection is a crucial aspect of software security. Current malware detectors work by checking for signatures, which attempt to capture the syntactic characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes current detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter the syntactic properties of the malware byte sequence without significantly affecting their execution behavior. This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behavior of malware as well as that of the program being checked for infection, and uses abstract interpretation to "hide" irrelevant aspects of these behaviors. As a concrete application of our approach, we show that (1) standard signature matching detection schemes are generally sound but not complete, (2) the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers and (3) the malware detection scheme proposed by Kinder et al. and based on standard model-checking techniques is sound in general and complete on some, but not all, obfuscations handled by the semantics-aware malware detector.

Original languageEnglish (US)
Article number25
JournalACM Transactions on Programming Languages and Systems
Volume30
Issue number5
DOIs
StatePublished - Aug 1 2008

Fingerprint

Semantics
Detectors
Syntactics
Malware
Acoustic waves
Model checking
Concretes

Keywords

  • Abstract interpretation
  • Malware detection
  • Obfuscation
  • Trace semantics

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

A semantics-based approach to malware detection. / Preda, Mila Dalla; Christodorescu, Mihai; Jha, Somesh; Debray, Saumya K.

In: ACM Transactions on Programming Languages and Systems, Vol. 30, No. 5, 25, 01.08.2008.

Research output: Contribution to journalArticle

Preda, Mila Dalla ; Christodorescu, Mihai ; Jha, Somesh ; Debray, Saumya K. / A semantics-based approach to malware detection. In: ACM Transactions on Programming Languages and Systems. 2008 ; Vol. 30, No. 5.
@article{7e1b2b898d7d4382b99f7eae2fd6f726,
title = "A semantics-based approach to malware detection",
abstract = "Malware detection is a crucial aspect of software security. Current malware detectors work by checking for signatures, which attempt to capture the syntactic characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes current detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter the syntactic properties of the malware byte sequence without significantly affecting their execution behavior. This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behavior of malware as well as that of the program being checked for infection, and uses abstract interpretation to {"}hide{"} irrelevant aspects of these behaviors. As a concrete application of our approach, we show that (1) standard signature matching detection schemes are generally sound but not complete, (2) the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers and (3) the malware detection scheme proposed by Kinder et al. and based on standard model-checking techniques is sound in general and complete on some, but not all, obfuscations handled by the semantics-aware malware detector.",
keywords = "Abstract interpretation, Malware detection, Obfuscation, Trace semantics",
author = "Preda, {Mila Dalla} and Mihai Christodorescu and Somesh Jha and Debray, {Saumya K}",
year = "2008",
month = "8",
day = "1",
doi = "10.1145/1387673.1387674",
language = "English (US)",
volume = "30",
journal = "ACM Transactions on Programming Languages and Systems",
issn = "0164-0925",
publisher = "Association for Computing Machinery (ACM)",
number = "5",

}

TY - JOUR

T1 - A semantics-based approach to malware detection

AU - Preda, Mila Dalla

AU - Christodorescu, Mihai

AU - Jha, Somesh

AU - Debray, Saumya K

PY - 2008/8/1

Y1 - 2008/8/1

N2 - Malware detection is a crucial aspect of software security. Current malware detectors work by checking for signatures, which attempt to capture the syntactic characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes current detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter the syntactic properties of the malware byte sequence without significantly affecting their execution behavior. This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behavior of malware as well as that of the program being checked for infection, and uses abstract interpretation to "hide" irrelevant aspects of these behaviors. As a concrete application of our approach, we show that (1) standard signature matching detection schemes are generally sound but not complete, (2) the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers and (3) the malware detection scheme proposed by Kinder et al. and based on standard model-checking techniques is sound in general and complete on some, but not all, obfuscations handled by the semantics-aware malware detector.

AB - Malware detection is a crucial aspect of software security. Current malware detectors work by checking for signatures, which attempt to capture the syntactic characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes current detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter the syntactic properties of the malware byte sequence without significantly affecting their execution behavior. This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behavior of malware as well as that of the program being checked for infection, and uses abstract interpretation to "hide" irrelevant aspects of these behaviors. As a concrete application of our approach, we show that (1) standard signature matching detection schemes are generally sound but not complete, (2) the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers and (3) the malware detection scheme proposed by Kinder et al. and based on standard model-checking techniques is sound in general and complete on some, but not all, obfuscations handled by the semantics-aware malware detector.

KW - Abstract interpretation

KW - Malware detection

KW - Obfuscation

KW - Trace semantics

UR - http://www.scopus.com/inward/record.url?scp=51849164885&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51849164885&partnerID=8YFLogxK

U2 - 10.1145/1387673.1387674

DO - 10.1145/1387673.1387674

M3 - Article

AN - SCOPUS:51849164885

VL - 30

JO - ACM Transactions on Programming Languages and Systems

JF - ACM Transactions on Programming Languages and Systems

SN - 0164-0925

IS - 5

M1 - 25

ER -