Anomaly-based fault detection system in distributed system

Byoung Uk Kim, Salim A Hariri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

One of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this paper, we present an innovative approach based on statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in real-time all the interactions between all the components of a distributed system. We used data mining and supervised learning techniques to obtain the rules that can accurately model the normal interactions among these components. Our anomaly analysis engine will immediately produce an alert whenever one or more of the interaction rules that capture normal operations is violated due to a software or hardware failure. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise level.

Original languageEnglish (US)
Title of host publicationProceedings - SERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications
Pages782-789
Number of pages8
DOIs
StatePublished - 2007
EventSERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications - Busan, Korea, Republic of
Duration: Aug 20 2007Aug 22 2007

Other

OtherSERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications
CountryKorea, Republic of
CityBusan
Period8/20/078/22/07

Fingerprint

Fault detection
Hardware
Data mining
Supervised learning
Fault tolerance
Computer networks
Middleware
Application programs
Web services
Computer hardware
Servers
Engines
Anomaly
Distributed systems
Software
Interaction
Fault

ASJC Scopus subject areas

  • Software
  • Management Science and Operations Research
  • Engineering(all)

Cite this

Kim, B. U., & Hariri, S. A. (2007). Anomaly-based fault detection system in distributed system. In Proceedings - SERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications (pp. 782-789). [4297016] https://doi.org/10.1109/SERA.2007.55

Anomaly-based fault detection system in distributed system. / Kim, Byoung Uk; Hariri, Salim A.

Proceedings - SERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications. 2007. p. 782-789 4297016.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, BU & Hariri, SA 2007, Anomaly-based fault detection system in distributed system. in Proceedings - SERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications., 4297016, pp. 782-789, SERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications, Busan, Korea, Republic of, 8/20/07. https://doi.org/10.1109/SERA.2007.55
Kim BU, Hariri SA. Anomaly-based fault detection system in distributed system. In Proceedings - SERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications. 2007. p. 782-789. 4297016 https://doi.org/10.1109/SERA.2007.55
Kim, Byoung Uk ; Hariri, Salim A. / Anomaly-based fault detection system in distributed system. Proceedings - SERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications. 2007. pp. 782-789
@inproceedings{851046cf2a7f4552b167b38aa7bebed9,
title = "Anomaly-based fault detection system in distributed system",
abstract = "One of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this paper, we present an innovative approach based on statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in real-time all the interactions between all the components of a distributed system. We used data mining and supervised learning techniques to obtain the rules that can accurately model the normal interactions among these components. Our anomaly analysis engine will immediately produce an alert whenever one or more of the interaction rules that capture normal operations is violated due to a software or hardware failure. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise level.",
author = "Kim, {Byoung Uk} and Hariri, {Salim A}",
year = "2007",
doi = "10.1109/SERA.2007.55",
language = "English (US)",
isbn = "0769528678",
pages = "782--789",
booktitle = "Proceedings - SERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications",

}

TY - GEN

T1 - Anomaly-based fault detection system in distributed system

AU - Kim, Byoung Uk

AU - Hariri, Salim A

PY - 2007

Y1 - 2007

N2 - One of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this paper, we present an innovative approach based on statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in real-time all the interactions between all the components of a distributed system. We used data mining and supervised learning techniques to obtain the rules that can accurately model the normal interactions among these components. Our anomaly analysis engine will immediately produce an alert whenever one or more of the interaction rules that capture normal operations is violated due to a software or hardware failure. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise level.

AB - One of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this paper, we present an innovative approach based on statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in real-time all the interactions between all the components of a distributed system. We used data mining and supervised learning techniques to obtain the rules that can accurately model the normal interactions among these components. Our anomaly analysis engine will immediately produce an alert whenever one or more of the interaction rules that capture normal operations is violated due to a software or hardware failure. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise level.

UR - http://www.scopus.com/inward/record.url?scp=38649111955&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38649111955&partnerID=8YFLogxK

U2 - 10.1109/SERA.2007.55

DO - 10.1109/SERA.2007.55

M3 - Conference contribution

AN - SCOPUS:38649111955

SN - 0769528678

SN - 9780769528670

SP - 782

EP - 789

BT - Proceedings - SERA 2007: Fifth ACIS International Conference on Software Engineering Research, Management, and Applications

ER -