Fail-fast

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1016/J.RESS.2010.06.014•

Combinatorial analysis of systems with competing failures subject to failure isolation and propagation effects

[...]

Liudong Xing¹, Liudong Xing², Gregory Levitin³, Gregory Levitin²•Institutions (3)

University of Massachusetts Dartmouth¹, University of Electronic Science and Technology of China², Israel Electric Corporation³

01 Nov 2010-Reliability Engineering & System Safety

TL;DR: This paper presents a combinatorial method for the reliability analysis of systems subject to such competing propagated failures and failure isolation effect, based on the total probability theorem, which is analytical, exact, and has no limitation on the type of time-to-failure distributions for the system components.

...read moreread less

75 citations

Journal Article•10.1109/MS.2004.1331296•

Fail fast [software debugging]

[...]

J. Shore

01 Sep 2004-IEEE Software

TL;DR: There's a simple technique that dramatically reduces the number of bugs in the authors' software, which will make most defects much easier to find and build their software to "fail fast".

...read moreread less

Abstract: The most annoying aspect of software development is debugging. We don't mind the kinds of bugs that yield to a few minutes inspection. The bugs we hate are the ones that show up only after hours of successful operation, under unusual circumstances, or whose stack traces lead to dead ends. Fortunately, there's a simple technique that dramatically reduces the number of these bugs in our software. It won't reduce the overall number of bugs, at least not at first, but it'll make most defects much easier to find. The technique is to build our software to "fail fast".

...read moreread less

66 citations

Journal Article•10.1109/TASE.2005.860613•

Diagnosis of repeated failures for discrete event systems with linear-time temporal-logic specifications

[...]

Shengbing Jiang, Ratnesh Kumar

03 Jan 2006-IEEE Transactions on Automation Science and Engineering

TL;DR: It turns out that repeatable failures can be specified as violations of invariant properties (i.e., properties that must always hold) in a system, and an algorithm is presented to refine the system model and label those states of the refined system where the property is violated.

...read moreread less

Abstract: In our earlier work, we introduced a state-based approach for the diagnosis of repeatedly occurring failures in discrete event systems (DESs). Since temporal logic provides a simpler way of specifying system properties; in this paper, a temporal-logic-based approach for diagnosing the occurrence of a repeated number of failures is developed. Linear-time temporal-logic (LTL) formulae are used to represent the specifications of DESs. Notions of prediagnosability for failures and diagnosability for repeated failures are introduced in the setting of temporal logic. A polynomial algorithm for the test of prediagnosability for failures is provided. The diagnosis problem for repeated failures in the temporal-logic setting is reduced to one in a state-based setting, and so the prior results of a state-based repeated failure diagnosis can be applied. Finally, a simple example is given for illustration. Note to Practitioners-Certain failures in a system are repeatable, such as routing errors in a manufacturing system. A theory for the diagnosis of such failures was presented in an earlier work of Jiang et al. The present paper uses temporal logic to specify such failures. It turns out that repeatable failures can be specified as violations of invariant properties (i.e., properties that must always hold). Given an invariant property that the system must always satisfy, an algorithm is presented to refine the system model and label those states of the refined system where the property is violated. The problem of repeated diagnosis then requires determining, within a bounded delay, each time a "failure-state" is visited. For this analysis, the existing theory developed by Jiang et al. is used.

...read moreread less

61 citations

Proceedings Article•10.2514/6.2012-3602•

Common Cause Failures and Ultra Reliability

[...]

Harry W. Jones¹•Institutions (1)

Ames Research Center¹

15 Jul 2012

TL;DR: Common cause failures occur when several failures have the same origin this paper, where the cause is a single external event, or common mode failures, where two systems fail in the same way for the same reason.

...read moreread less

Abstract: A common cause failure occurs when several failures have the same origin. Common cause failures are either common event failures, where the cause is a single external event, or common mode failures, where two systems fail in the same way for the same reason. Common mode failures can occur at different times because of a design defect or a repeated external event. Common event failures reduce the reliability of on-line redundant systems but not of systems using off-line spare parts. Common mode failures reduce the dependability of systems using off-line spare parts and on-line redundancy.

...read moreread less

39 citations

Patent•

Multi-agent cooperative transaction method and system

[...]

Qiming Chen, Umeshwar Dayal¹•Institutions (1)

Hewlett-Packard¹

23 May 2001

TL;DR: In this paper, a failure detector is coupled to a failure handler for failure recovery for intra-enterprise failures and an interenterprise failure handler is also coupled to the failure detector for performing failure recovery in a second enterprise to which a failure in a first enterprise has been transferred.

...read moreread less

Abstract: A method and system for processing multi-agent cooperative transactions. A failure detector is provided for detecting whether a failure is an intra-enterprise failure or an inter-enterprise failure. An intra-enterprise failure handler is coupled to the failure detector for performing failure recovery for intra-enterprise failures. Failure recovery for intra-enterprise failures can include identifying the scope of failure recovery within a first enterprise. Once the scope of failure recovery has been identified, a top-down undo operation of sub-transactions in the identified scope may be performed within the first enterprise. An inter-enterprise failure handler is also coupled to the failure detector for performing failure recovery for inter-enterprise failures. Failure recovery for inter-enterprise failures can include identifying the scope of failure recovery in a second enterprise to which a failure in a first enterprise has been transfered. Once the scope of failure recovery has been identified, a top-down undo operation of sub-transactions in the identified scope may be performed in the second enterprise.

...read moreread less

38 citations

...

Expand

Year	Papers
2021	1
2016	3
2015	2
2014	1
2013	2
2012	4

Topic Tools

Papers published on a yearly basis

Papers

Combinatorial analysis of systems with competing failures subject to failure isolation and propagation effects

Fail fast [software debugging]

Diagnosis of repeated failures for discrete event systems with linear-time temporal-logic specifications

Common Cause Failures and Ultra Reliability

Multi-agent cooperative transaction method and system

Related Topics (5)

Performance Metrics