|Governmental and private institutions rely heavily on reliable computer networks for
their everyday business transactions. The downtime of their infrastructure networks may result in millions of dollars in cost. Fault management systems are used to keep today’s complex networks running without significant downtime cost, either by using active techniques or passive techniques. Active techniques impose excessive management traffic, whereas passive techniques often ignore uncertainty inherent in network alarms,leading to unreliable fault identification performance. In this research work, new
algorithms are proposed for both types of techniques so as address these handicaps.
Active techniques use probing technology so that the managed network can be tested periodically and suspected malfunctioning nodes can be effectively identified and
isolated. However, the diagnosing probes introduce extra management traffic and storage space. To address this issue, two new CSP (Constraint Satisfaction Problem)-based algorithms are proposed to minimize management traffic, while effectively maintain the same diagnostic power of the available probes. The first algorithm is based on the standard CSP formulation which aims at reducing the available dependency matrix significantly as means to reducing the number of probes. The obtained probe set is used for fault detection and fault identification. The second algorithm is a fuzzy CSP-based algorithm. This proposed algorithm is adaptive algorithm in the sense that an initial reduced fault detection probe set is utilized to determine the minimum set of probes used
for fault identification. Based on the extensive experiments conducted in this research both algorithms have demonstrated advantages over existing methods in terms of the overall management traffic needed to successfully monitor the targeted network system.
Passive techniques employ alarms emitted by network entities. However, the fault
evidence provided by these alarms can be ambiguous, inconsistent, incomplete, and
random. To address these limitations, alarms are correlated using a distributed Dempster-Shafer Evidence Theory (DSET) framework, in which the managed network is divided into a cluster of disjoint management domains. Each domain is assigned an Intelligent Agent for collecting and analyzing the alarms generated within that domain. These agents are coordinated by a single higher level entity, i.e., an agent manager that combines the partial views of these agents into a global one. Each agent employs DSET-based algorithm that utilizes the probabilistic knowledge encoded in the available fault propagation model to construct a local composite alarm. The Dempster‘s rule of combination is then used by the agent manager to correlate these local composite alarms.
Furthermore, an adaptive fuzzy DSET-based algorithm is proposed to utilize the fuzzy
information provided by the observed cluster of alarms so as to accurately identify the malfunctioning network entities. In this way, inconsistency among the alarms is removed by weighing each received alarm against the others, while randomness and ambiguity of the fault evidence are addressed within soft computing framework. The effectiveness of
this framework has been investigated based on extensive experiments.
The proposed fault management system is able to detect malfunctioning behavior
in the managed network with considerably less management traffic. Moreover, it
effectively manages the uncertainty property intrinsically contained in network alarms,thereby reducing its negative impact and significantly improving the overall performance of the fault management system.