Top 29 papers presented at Source Code Analysis and Manipulation in 2008

Showing papers presented at "Source Code Analysis and Manipulation in 2008"

Proceedings Article•10.1109/SCAM.2008.22•

User-Input Dependence Analysis via Graph Reachability

[...]

Bernhard Scholz, Chenyi Zhang, Cristina Cifuentes

1 Mar 2008

TL;DR: A static program analysis for computing user-input dependencies is introduced and can be used as a pre-processing filter to a static bug checking tool for identifying bugs that can potentially be exploited as security vulnerabilities.

...read moreread less

Abstract: Bug-checking tools have been used with some success in recent years to find bugs in software. For finding bugs that can cause security vulnerabilities, bug checking tools require a program analysis which determines whether a software bug can be controlled by user-input. In this paper we introduce a static program analysis for computing user-input dependencies. This analysis can be used as a pre-processing filter to a static bug checking tool for identifying bugs that can potentially be exploited as security vulnerabilities. In order for the analysis to be applicable to large commercial software in the millions of lines of code, runtime speed and scalability of the user-input dependence analysis is of key importance. Our user-input dependence analysis takes both data and control dependencies into account. We extend static single assignment (SSA) form by augmenting phi-nodes with control dependencies. A formal definition of user-input dependence is expressed in a dataflow analysis framework as a meet-over-all-paths (MOP) solution. We reduce the equation system to a sparse equation system exploiting the properties of SSA. The sparse equation system is solved as a reachability problem that results in a fast algorithm for computing user-input dependencies. We have implemented a call-insensitive and a call-sensitive analysis. The paper gives preliminary results on the comparison of their efficiency for various benchmarks.

...read moreread less

56 citations

Proceedings Article•10.1109/SCAM.2008.24•

Automated Detection of Code Vulnerabilities Based on Program Analysis and Model Checking

[...]

Lei Wang¹, Qiang Zhang¹, Pengchao Zhao¹•Institutions (1)

Peking University¹

3 Oct 2008

TL;DR: This work traces the memory size of buffer-related variables and instrument the code with corresponding constraint assertions before the potential vulnerable points by constraint based analysis, and introduces program slicing to reduce the code size.

...read moreread less

Abstract: Ensuring the correctness and reliability of software systems is one of the main problems in software development. Model checking, a static analysis method, is preponderant in improving the precision of vulnerabilities detection. However, when applied to buffer overflow and other bugs, it is hard to automatically construct the model for detecting the vulnerabilities. To address this problem we propose an approach that combines constraint based analysis and model checking together. We trace the memory size of buffer-related variables and instrument the code with corresponding constraint assertions before the potential vulnerable points by constraint based analysis. Then the problem of detecting vulnerabilities is converted into the problem of detecting vulnerabilities to verifying the reach ability of these assertions by model checking. In order to reduce the cost of model checking, program slicing is introduced to reduce the code size. CodeAuditor is a prototype implementation of our approach. With CodeAuditor, several yet unreported vulnerabilities are discovered in several open source software, and the performance is shown to be improved significantly with the help of program slicing.

...read moreread less

41 citations

Proceedings Article•10.1109/SCAM.2008.33•

Change Impact Graphs: Determining the Impact of Prior Code Changes

[...]

Daniel M. German¹, Gregorio Robles², Ahmed E. Hassan³•Institutions (3)

University of Victoria¹, King Juan Carlos University², Queen's University³

3 Oct 2008

TL;DR: This work proposes a method that, in a pre-processing stage, analyzes prior code changes to determine what functions have been modified and propagated throughout the rest of the system using the dependence graph of theSystem.

...read moreread less

Abstract: The source code of a software system is in constant change. The impact of these changes spreads out across the software system and may lead to the sudden manifestation of failures in unchanged parts. To help developers fix such failures, we propose a method that, in a pre-processing stage, analyzes prior code changes to determine what functions have been modified. Next, given a particular period of time in the past,the functions changed during this period are propagated throughout the rest of the system using the dependence graph of the system.This information is visualized using Change Impact Graphs (CIGs). Through a case study based on the Apache Web Server, we demonstrate the benefit of using CIGs to investigate several real defects.

...read moreread less

34 citations

Proceedings Article•10.1109/SCAM.2008.23•

Exploiting the Correspondence between Micro Patterns and Class Names

[...]

Jeremy Singer¹, Chris Kirkham¹•Institutions (1)

University of Manchester¹

3 Oct 2008

TL;DR: It is shown that words in Java class names relate to class properties, expressed using the recently developed micro patterns language, and a large corpus of Java programs is analysed to create a database that links common class name words with micro patterns.

...read moreread less

Abstract: This paper argues that semantic information encoded in natural language identifiers is a largely neglected resource for program analysis. First we show that words in Java class names relate to class properties, expressed using the recently developed micro patterns language. We analyse a large corpus of Java programs to create a database that links common class name words with micro patterns. Finally we report on prototype tools integrated with the Eclipse development environment. These tools use the database to inform programmers of particular problems or optimization opportunities in their code.

...read moreread less

31 citations

Proceedings Article•10.1109/SCAM.2008.26•

Fast and Precise Points-to Analysis

[...]

Jonas Lundberg, Tobias Gutzmann, Welf Löwe

3 Oct 2008

TL;DR: This paper presents a new context-sensitive approach to points- to analysis where calling contexts are distinguished by the points-to sets analyzed for their target expressions, and provides higher precision than the call string technique and is similar in precision to the object-sensitive technique.

...read moreread less

Abstract: Many software engineering applications require points-to analysis. Client applications range from optimizing compilers to program development and testing environments to reverse-engineering tools. In this paper, we present a new context-sensitive approach to points-to analysis where calling contexts are distinguished by the points-to sets analyzed for their target expressions. Compared to other well-known context-sensitive techniques, it is faster - twice as fast as the call string approach and by an order of magnitude faster than the object-sensitive technique - and requires less memory. At the same time, it provides higher precision than the call string technique and is similar in precision to the object-sensitive technique. These statements are confirmed by experiments.

...read moreread less

20 citations

Proceedings Article•10.1109/SCAM.2008.18•

On the Use of Data Flow Analysis in Static Profiling

[...]

C. Boogerd¹, Leon Moonen²•Institutions (2)

Delft University of Technology¹, IEEE Computer Society²

3 Oct 2008

TL;DR: The benefits of using more involved analysis techniques in such a static profiler are examined, and the use of value range propagation is explored to improve the accuracy of the estimates.

...read moreread less

Abstract: Static profiling is a technique that produces estimates of execution likelihoods or frequencies based on source code analysis only. It is frequently used in determining cost/benefit ratios for certain compiler optimizations. In previous work,we introduced a simple algorithm to compute execution likelihoods,based on a control flow graph and heuristic branch prediction. In this paper we examine the benefits of using more involved analysis techniques in such a static profiler. In particular, we explore the use of value range propagation to improve the accuracy of the estimates, and we investigate the differences in estimating execution likelihoods and frequencies.

...read moreread less

19 citations

Proceedings Article•10.1109/SCAM.2008.12•

DTS - A Software Defects Testing System

[...]

Zhao Hong Yang, Yun Zhan Gong, Xiao Qing, Wang Ya Wen

3 Oct 2008

TL;DR: This demo presents DTS (software defects testing system), a tool to catch defects in source code using static testing techniques and performs some experiments on a suite of open source software whose results are briefly presented in the last part of the demo.

...read moreread less

Abstract: This demo presents DTS (software defects testing system), a tool to catch defects in source code using static testing techniques. In DTS, various defect patterns are defined using defect patterns state machine and tested by a unified testing framework. Since DTS externalizes all the defect patterns it checks, defect patterns can be added, subtracted, or altered without having to modify the tool itself. Moreover, typical interval computation is expanded and applied in DTS to reduce the false positive and compute the state of defect state machine. In order to validate its usefulness, we perform some experiments on a suite of open source software whose results are briefly presented in the last part of the demo.

...read moreread less

19 citations

Proceedings Article•10.1109/SCAM.2008.19•

The Semantics of Abstract Program Slicing

[...]

Damiano Zanardini¹•Institutions (1)

Complutense University of Madrid¹

3 Oct 2008

TL;DR: The present paper introduces the semantic basis for abstract slicing, which is more general than standard, concrete slicing, in that slicing criteria are abstract, i.e., defined on properties of data, rather than concrete values.

...read moreread less

Abstract: The present paper introduces the semantic basis for abstract slicing. This notion is more general than standard, concrete slicing, in that slicing criteria are abstract, i.e., defined on properties of data, rather than concrete values. This approach is based on abstract interpretation: properties are abstractions of data. Many properties can be investigated; e.g., the nullity of a program variable. Standard slicing is a special case, where properties are exactly the concrete values. As a practical outcome, abstract slices are likely to be smaller than standard ones, since commands which are relevant at the concrete level can be removed if only some abstract property is supposed to be preserved. This can make debugging and program understanding tasks easier, since a smaller portion of code must be inspected when searching for undesired behavior. The framework also includes the possibility to restrict the input states of the program, in the style of conditioned slicing, thus lying between static and dynamic slicing.

...read moreread less

16 citations

Proceedings Article•10.1109/SCAM.2008.34•

Beyond Annotations: A Proposal for Extensible Java (XJ)

[...]

Tony Clark, Paul Sammut, James Willans

3 Oct 2008

TL;DR: A proposal for a Java extension which generalises annotations to allow Java to be a platform for developing domain specific languages.

...read moreread less

Abstract: Annotations provide a limited way of extending Java in order to tailor the language for specific tasks. This paper describes a proposal for a Java extension which generalises annotations to allow Java to be a platform for developing domain specific languages.

...read moreread less

16 citations

Proceedings Article•10.1109/SCAM.2008.20•

The Evolution and Decay of Statically Detected Source Code Vulnerabilities

[...]

M. Di Penta¹, Luigi Cerulo¹, Lerina Aversano¹•Institutions (1)

University of Sannio¹

3 Oct 2008

TL;DR: An empirical study on the evolution of vulnerable statements detected in three software systems with different static analysis tools and on the decay time exhibited by different kinds of vulnerabilities is reported.

...read moreread less

Abstract: The presence of vulnerable statements in the source code is a crucial problem for maintainers: properly monitoring and, if necessary, removing them is highly desirable to ensure high security and reliability. To this aim, a number of static analysis tools have been developed to detect the presence of instructions that can be subject to vulnerability attacks, ranging from buffer overflow exploitations to command injection and cross-site scripting.Based on the availability of existing tools and of data extracted from software repositories, this paper reports an empirical study on the evolution of vulnerable statements detected in three software systems with different static analysis tools. Specifically, the study investigates on vulnerability evolution trends and on the decay time exhibited by different kinds of vulnerabilities.

...read moreread less

14 citations

Proceedings Article•10.1109/SCAM.2008.29•

Automated Migration of List Based JSP Web Pages to AJAX

[...]

J. Chu¹, Thomas R. Dean¹•Institutions (1)

Queen's University¹

3 Oct 2008

TL;DR: This paper describes the process of converting a class of Web pages from round-trip to AJAX, a Web application programming technique that allows portions of a Web page to be loaded dynamically, separate from other parts of the Web page.

...read moreread less

Abstract: AJAX is a Web application programming technique that allows portions of a Web page to be loaded dynamically, separate from other parts of the Web page. This gives the user a much smoother experience when viewing the Web page. This paper describes the process of converting a class of Web pages from round-trip to AJAX.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.21•

Parfait - A Scalable Bug Checker for C Code

[...]

Cristina Cifuentes

3 Oct 2008

TL;DR: Parfait as discussed by the authors is a bug checker of C code that has been designed to address developers' requirements of scalability (support millions of lines of code in a reasonable amount of time), precision (report few false positives) and reporting of bugs that may be exploitable from a security vulnerability point of view.

...read moreread less

Abstract: Parfait is a bug checker of C code that has been designed to address developers' requirements of scalability (support millions of lines of code in a reasonable amount of time), precision (report few false positives) and reporting of bugs that may be exploitable from a security vulnerability point of view. For large code bases, performance is at stake if the bug checking tool is to be integrated into the software development process, and so is precision, as each false alarm (i.e., false positive) costs developer time to track down. Further, false negatives give a false sense of security to developers and testers, as it is not obvious or clear what other bugs were not reported by the tool. A common criticism of existing bug checking tools is the lack of reported metrics on the use of the tool. To a developer it is unclear how accurate the tool is, how many bugs it does not find, how many bugs get reported that are not actual bugs, whether the tool understands when a bug has been fixed, and what the performance is for the reported bugs. In this tool demonstration we show how Parfait fairs in the area of buffer overflow checking against the various requirements of scalability and precision.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.35•

Modular Decompilation of Low-Level Code by Partial Evaluation

[...]

Miguel Gómez-Zamalloa¹, Elvira Albert¹, Germán Puebla²•Institutions (2)

Complutense University of Madrid¹, Technical University of Madrid²

3 Oct 2008

TL;DR: This paper presents the first modular scheme to enable interpretive decompilation of low-level code to a high-level representation, namely, it decompile bytecode into PROLOG and introduces two notions of optimality, which demostrates empirically the scalability of modular decompilation by partial evaluation.

...read moreread less

Abstract: Decompiling low-level code to a high-level intermediate representation facilitates the development of analyzers, model checkers, etc. which reason about properties of the low-level code (e.g., bytecode, .NET). Interpretive decompilation consists in partially evaluating an interpreter for the low-level language (written in the high-level language) w.r.t. the code to be decompiled. There have been proofs-of-concept that interpretive decompilation is feasible, butt here remain important open issues when it comes to decompile a real language: does the approach scale up? is the quality of decompiled programs comparable to that obtained by ad-hoc decompilers? do decompiled programs preserve the structure of the original programs? This paper addresses these issues by presenting, to the best of our knowledge, the first modular scheme to enable interpretive decompilation of low-level code to a high-level representation, namely, we decompile bytecode into PROLOG. We introduce two notions of optimality. The first one requires that each method/block is decompiled just once. The second one requires that each program point is traversed at most once during decompilation. We demonstrate the impact of our modular approach and optimality issues on a series of realistic benchmarks. Decompilation times and decompiled program sizes are linear with the size of the input bytecode program. This demostrates empirically the scalability of modular decompilation of low-level code by partial evaluation.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.11•

90% Perspiration: Engineering Static Analysis Techniques for Industrial Applications

[...]

Paul Anderson¹•Institutions (1)

Ithaca College¹

3 Oct 2008

TL;DR: This article describes some of the engineering approaches that were taken during the development of GrammaTech's static-analysis technology that have taken it from a prototype system with poor performance and scalability and with very limited applicability, to a much-more general-purpose industrial-strength analysis infrastructure capable of operating on millions of lines of code.

...read moreread less

Abstract: This article describes some of the engineering approaches that were taken during the development of GrammaTech's static-analysis technology that have taken it from a prototype system with poor performance and scalability and with very limited applicability, to a much-more general-purpose industrial-strength analysis infrastructure capable of operating on millions of lines of code. A wide variety of code bases are found in industry, and many extremes of usage exist, from code size through use of unusual, or non-standard features and dialects.Some of the problems associated with handling these code-bases are described, and the solutions that were used to address them, including some that were ultimately unsuccessful, are discussed.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.10•

CoordInspector: A Tool for Extracting Coordination Data from Legacy Code

[...]

Nuno Rodrigues, Luís Soares Barbosa

3 Oct 2008

TL;DR: The scope of application of CoordInspector is quite large: potentially any piece of code developed in any of the programming languages which compiles to the .Net framework.

...read moreread less

Abstract: More and more current software systems rely on non trivial coordination logic for combining autonomous services typically running on different platforms and often owned by different organizations. Often, however, coordination data is deeply entangled in the code and, therefore, difficult to isolate and analyse separately. CoordInspector is a software tool which combines slicing and program analysis techniques to isolate all coordination elements from the source code of an existing application. Such a reverse engineering process provides a clear view of the actually invoked services as well as of the orchestration patterns which bind them together. The tool analyses Common Intermediate Language (CIL) code, the native language of Microsoft .Net framework. Therefore, the scope of application of CoordInspector is quite large: potentially any piece of code developed in any of the programming languages which compiles to the .Net framework. The tool generates graphical representations of the coordination layer together and identifies the underlying business process orchestrations, rendering them as Orc specifications.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.25•

An Empirical Study of Function Overloading in C

[...]

Cheng Wang¹, Daqing Hou¹•Institutions (1)

Clarkson University¹

3 Oct 2008

TL;DR: The study described in this paper is focused on discovering how C++'s function overloading is used in production code using an instrumented g++ compiler, and finds that the most 'advanced' subset of function over loading tends to be defined in only a few utility modules.

...read moreread less

Abstract: The usefulness and usability of programming tools (for example, languages, libraries, and frameworks) may greatly impact programmer productivity and software quality. Ideally, these tools should be designed to be both useful and usable.But in reality, there always exist some tools or features whose essential characteristics can be fully understood only after they have been extensively used. The study described in this paper is focused on discovering how C++'s function overloading is used in production code using an instrumented g++ compiler. Our principal finding for the system studied is that the most 'advanced' subset of function overloading tends to be defined in only a few utility modules, which are probably developed and maintained by a small number of programmers, the majority of application modules use only the 'easy' subset of function overloading when overloading names,and most overloaded names are used locally within rather than across module interfaces.We recommend these as guidelines to software designers.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.16•

Type Highlighting: A Client-Driven Visual Approach for Class Hierarchies Reengineering

[...]

Petru Florin Mihancea

3 Oct 2008

TL;DR: A metric-based visual approach is introduced to capture the extent to which the clients of a hierarchy polymorphically manipulate that hierarchy.

...read moreread less

Abstract: Polymorphism and class hierarchies are key to increasing the extensibility of an object-oriented program but also raise challenges for program comprehension. Despite many advances in understanding and restructuring class hierarchies, there is no direct support to analyze and understand the design decisions that drive their polymorphic usage. In this paper we introduce a metric-based visual approach to capture the extent to which the clients of a hierarchy polymorphically manipulate that hierarchy. A visual pattern vocabulary is also presented in order to facilitate the communication between analysts. Initial evaluation shows that our techniques aid program comprehension by effectively visualizing large quantities of information, and can help detect several design problems.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.32•

Rejuvenate Pointcut: A Tool for Pointcut Expression Recovery in Evolving Aspect-Oriented Software

[...]

Raffi Khatchadourian¹, Awais Rashid²•Institutions (2)

Ohio State University¹, Lancaster University²

3 Oct 2008

TL;DR: An AspectJ source-level inferencing tool called rejuvenate pointcut is demonstrated which helps developers maintain pointcut expressions over the lifetime of a software product and represents a significant step towards providing tool-supported maintainability for evolving aspect-oriented software.

...read moreread less

Abstract: Aspect-oriented programming (AOP) strives to localize the scattered and tangled implementations of crosscutting concerns (CCCs) by allowing developers to declare that certain actions (advice) should be taken at specific points (join points) during the execution of software where a CCC (an aspect) is applicable. However, it is non-trivial to construct optimal pointcut expressions (a collection of join points) that capture the true intentions of the programmer and, upon evolution, maintain these intentions. We demonstrate an AspectJ source-level inferencing tool called rejuvenate pointcut which helps developers maintain pointcut expressions over the lifetime of a software product. A key insight into the tool's construction is that the problem of maintaining pointcut expressions bears strong similarity to the requirements traceability problem in software engineering; hence, the underlying algorithm was devised by adapting existing approaches for requirements traceability to pointcut maintenance. The Eclipse IDE-based tool identifies intention graph patterns pertaining to a pointcut and, based on these patterns, uncovers other potential join points that may fall within the scope of the pointcut with a given confidence. This work represents a significant step towards providing tool-supported maintainability for evolving aspect-oriented software.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.40•

Evaluating Key Statements Analysis

[...]

David Binkley¹, Nicolas Gold¹, Mark Harman¹, Zheng Li¹, Kiarash Mahdavi¹ - Show less +1 more•Institutions (1)

King's College London¹

3 Oct 2008

TL;DR: An empirical investigation of three kinds of key statements is presented, based on Bieman and Ottpsilas principal variables, which shows that key statements have higher average impact and higher average cohesion.

...read moreread less

Abstract: Key statement analysis extracts from a program, statements that form the core of the programpsilas computation. A good set of key statements is small but has a large impact. Key statements form a useful starting point for understanding and manipulating a program. An empirical investigation of three kinds of key statements is presented. The three are based on Bieman and Ottpsilas principal variables. To be effective, the key statements must have high impact and form a small, highly cohesive unit. Using a minor improvement of metrics for measuring impact and cohesion, key statements are shown to capture about 75% of the semantic effect of the function from which they are drawn. At the same time, they have cohesion about 20 percentage points higher than the corresponding function. A statistical analysis of the differences shows that key statements have higher average impact and higher average cohesion (p<0.001).

...read moreread less

Proceedings Article•10.1109/SCAM.2008.28•

Automatic Determination of May/Must Set Usage in Data-Flow Analysis

[...]

Andrew Stone¹, Michelle Mills Strout¹, Shweta Behere²•Institutions (2)

Colorado State University¹, Avaya²

3 Oct 2008

TL;DR: The DFAGen Tool is presented, which generates implementations for locally separable data-flow analyses that are pointer, side-effect, and aggregate cognizant from an analysis specification that assumes only scalars.

...read moreread less

Abstract: Data-flow analysis is a common technique to gather program information for use in transformations such as register allocation, dead-code elimination, common subexpression elimination, scheduling, and others. Tools for generating data-flow analysis implementations remove the need for implementers to explicitly write code that iterates over statements in a program, but still require them to implement details regarding the effects of aliasing, side effects, arrays, and user-defined structures. This paper presents the DFAGen Tool, which generates implementations for locally separable (e.g. bit-vector) data-flow analyses that are pointer, side-effect, and aggregate cognizant from an analysis specification that assumes only scalars. Analysis specifications are typically seven lines long and similar to those in standard compiler textbooks. The main contribution of this work is the automatic determination of may and must set usage within automatically generated data-flow analysis implementations.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.30•

Aspect-Aware Points-to Analysis

[...]

Qiang Sun¹, Jianjun Zhao¹•Institutions (1)

Shanghai Jiao Tong University¹

3 Oct 2008

TL;DR: The experimental result indicates that the proposed points-to analysis for AspectJ can achieve a significant higher precision and run in practical time and space than existing Java approaches.

...read moreread less

Abstract: Points-to analysis is a fundamental analysis technique whose results are useful in compiler optimization and software engineering tools. Although many points-to analysis algorithms have been proposed for procedural and object-oriented languages like C and Java, there is no points-to analysis for aspect-oriented languages so far. Based on Andersen-style points-to analysis for Java, we propose flow- and context-insensitive points-to analysis for AspectJ. The main idea is to perform the analysis crossing the boundary between aspects and classes. Therefore, our technique is able to handle the uniqueaspectual features. To investigate the effectiveness of our technique, we implement our analysis approach on top of the ajc AspectJ compiler and evaluate it on nine AspectJ benchmarks. The experimental result indicates that, compared to existing Java approaches, the proposed technique can achieve a significant higher precision and run in practical time and space.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.13•

TBCppA: A Tracer Approach for Automatic Accurate Analysis of C Preprocessor's Behaviors

[...]

Katsuhiko Gondow¹, H. Kawashima, T. Imaizumi•Institutions (1)

Tokyo Institute of Technology¹

3 Oct 2008

TL;DR: This paper proposes a novel approach (called TBCppA) based on tracer, which generates CPP mapping information by instrumenting the unpreprocessed C source code using XML-like tags called "tracers".

...read moreread less

Abstract: C preprocessor (CPP) is a major cause that makes it much difficult to accurately analyze C source code, which is indispensable to refactoring tools for C programs. To accurately analyze C source code, we need to generate CPP mapping information between unpreprocessed C sourcecode and preprocessed one. Previous works generate CPP mapping information by extending the existing CPP, which results in low portability and low maintainability due to the strong dependency of CPP implementation. To solve this problem, this paper proposes a novel approach (called TBCppA) based on tracer, which generates CPP mapping information by instrumenting the unpreprocessed C source code using XML-like tags called "tracers". The advantage of TBCppA is high portability and high maintainability, which the previous methods do not have. We successfully implemented a first prototype of TBCppA, and our preliminary evaluation of applying TBCppA to gcc-4.1.1 produced promising results.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.9•

Using Program Transformations to Add Structure to a Legacy Data Model

[...]

Mariano Ceccato, Thomas R. Dean¹, Paolo Tonella²•Institutions (2)

Queen's University¹, fondazione bruno kessler²

3 Oct 2008

TL;DR: A set of source transformations used to create a structured data model as part of a migration of eight million lines of code to Java is described.

...read moreread less

Abstract: An appropriate translation of the data model is central to any language migration effort. Finding a mapping between original and target data models may be challenging for legacy languages (e.g., Assembly) which lack a structured data model and rely instead on explicit programmer control of the overlay of variables. Before legacy applications written in languages with an unstructured data model can be migrated to modern languages, a structured data model must be inferred. This paper describes a set of source transformations used to create such a model as part of a migration of eight million lines of code to Java. The original application is written in a proprietary language supporting variable layout by memory relocation.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.15•

Some Assembly Required - Program Analysis of Embedded System Code

[...]

Ansgar Fehnker¹, Ralf Huuck¹, Felix Rauch¹, Sean Seefried¹•Institutions (1)

University of New South Wales¹

3 Oct 2008

TL;DR: This work presents a model-checking based static analysis approach which seamlessly integrates the analysis of embedded ARM assembly with C/C++ code analysis, and provides an extended analysis framework for checking general properties of ARM code.

...read moreread less

Abstract: Programming embedded system software typically involves more than one programming language. Normally, a high-level language such as C/C++ is used for application oriented tasks and a low-level assembly language for direct interaction with the underlying hardware. In most cases those languages are closely interwoven and the assembly is embedded in the C/C++ code. Verification of such programs requires the integrated analysis of both languages at the same time. However, common algorithmic verification tools fail to address this issue. In this work we present a model-checking based static analysis approach which seamlessly integrates the analysis of embedded ARM assembly with C/C++ code analysis. In particular, we show how to automatically check that the ARM code complies to its interface descriptions. Given interface compliance, we then provide an extended analysis framework for checking general properties of ARM code. We implemented this analysis in our source code analysis tool Goanna, and applied to the source code of an L4 micro kernel implementation.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.14•

Is Cloned Code More Stable than Non-cloned Code?

[...]

Jens Krinke¹•Institutions (1)

Rolf C. Hagen Group¹

3 Oct 2008

TL;DR: If the dominating factor of deletions is eliminated, it can generally be concluded that cloned code is more stable than non-cloned code.

...read moreread less

Abstract: This paper presents a study on the stability of cloned code. The results from an analysis of 200 weeks of evolution of five software system show that the stability as measured by changes to the system is dominated by the deletion of code clones. It can also be observed that additions to a systems are more often additions to non-cloned code than additions to cloned code. If the dominating factor of deletions is eliminated, it can generally be concluded that cloned code is more stable than non-cloned code.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.36•

Constructing Subtle Faults Using Higher Order Mutation Testing

[...]

Yue Jia¹, Mark Harman¹•Institutions (1)

King's College London¹

3 Oct 2008

TL;DR: This paper investigates higher order mutants (HOMs) and introduces the concept of a subsuming HOM; one that is harder to kill than the first order mutants from which it is constructed.

...read moreread less

Abstract: Traditional mutation testing considers only first order mutants, created by the injection of a single fault. Often these first order mutants denote trivial faults that are easily killed. This paper investigates higher order mutants (HOMs). It introduces the concept of a subsuming HOM; one that is harder to kill than the first order mutants from which it is constructed. By definition, subsuming HOMs denote subtle fault combinations. The paper reports the results of an empirical study into subsuming HOMs, using six benchmark programs. This is the largest study of mutation testing to date. To overcome the exponential explosion in the number of mutants considered, the paper introduces a search based approach to the identification of subsuming HOMs. Results are presented for a greedy algorithm, a genetic algorithm and a hill climbing algorithm.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.17•

Precise Analysis of Java Programs Using JOANA

[...]

Dennis Giffhorn, Christian Hammer¹•Institutions (1)

Karlsruhe Institute of Technology¹

3 Oct 2008

TL;DR: This demonstration presents the JOANA plugin for the Eclipse framework, which can compute and navigate through dependence graphs for full Java bytecode, analyze Java programs with a broad range of slicing and chopping algorithms, and use precise algorithms for language-based security to check programs for information leaks.

...read moreread less

Abstract: The JOANA project (Java Object-sensitive ANAlysis) is a program analysis infrastructure for the Java language. It contains a wide range of analysis techniques such as dependence graph computation, slicing and chopping for sequential and concurrent programs, computation of path conditions and algorithms for software security. This demonstration presents the JOANA plugin for the Eclipse framework. In the current version, a user can compute and navigate through dependence graphs for full Java bytecode, analyze Java programs with a broad range of slicing and chopping algorithms, and use precise algorithms for language-based security to check programs for information leaks.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.27•

Analysis and Transformations for Efficient Query-Based Debugging

[...]

Michael Gorbovitski¹, K.T. Tekle¹, Tom Rothamel¹, Scott D. Stoller¹, Yanhong A. Liu¹ - Show less +1 more•Institutions (1)

Stony Brook University¹

3 Oct 2008

TL;DR: A framework that supports powerful queries in debugging tools is described, and in particular the transformations, alias analysis, and type analysis used to make the queries efficient are described.

...read moreread less

Abstract: This paper describes a framework that supports powerful queries in debugging tools, and describes in particular the transformations, alias analysis, and type analysis used to make the queries efficient. The framework allows queries over the states of all objects at any point in the execution as well as over the history of states. The transformations are based on incrementally maintaining the results of expensive queries studied in previous work. The alias analysis extends the flow-sensitive intraprocedural analysis to an efficient flow-sensitive interprocedural analysis for an object-oriented language with also a form of context sensitivity. We also show the power of the framework and the effectiveness of the analyses through case studies and experiments with XML DOM tree transformations, an FTP client, and others. We were able to easily determine the sources of all injected bugs, and we also found an actual bug in the case study on the FTP client.

...read moreread less

Proceedings Article•10.1109/SCAM.2008.31•

From Indentation Shapes to Code Structures

[...]

Abram Hindle¹, Michael W. Godfrey¹, Richard Holt¹•Institutions (1)

University of Waterloo¹

3 Oct 2008

TL;DR: The study indicates that indentation shape correlates positively with code structure; that is, certain shapes typically correspond to certain code structures, which can form the basis of a tool framework that can analyze code in a language independent way.

...read moreread less

Abstract: In a previous study, we showed that indentation was regular across multiple languages and the variance in the level of indentation of a block of revised code is correlated with metrics such as McCabe cyclomatic complexity. Building on that work the current paper investigates the relationship between the "shape'' of the indentation of the revised code block (the "revision'') and the corresponding syntactic structure of the code. We annotated revisions matching these three indentation shapes: "flat'' (all lines are equally indented), "slash'' (indentation becomes increasingly deep), or "bubble'' (indentation increases and then decreases). We then classified the code structure as one of: function definition, loop, expression, comment, etc. We studied thousands of revisions, coming from over 200 software projects, written in a variety of languages. Our study indicates that indentation shape correlates positively with code structure; that is, certain shapes typically correspond to certain code structures. For example, flat shapes commonly correspond to comments while bubble shapes commonly correspond to conditionals and function definitions. These results can form the basis of a tool framework that can analyze code in a language independent way to support browsing targeted to viewing particular code structures such as conditionals or comments.

...read moreread less