In Software Industry there are numerous abstractions
developed by the software engineer to build and maintain a successful software.
One of the vital abstraction among them is the software or the system
architecture. This places one of the important role in making up of a complete
In this essay, the section I deals with the introduction to
the software architecture and the architectural changes and the process of
reverse engineering. The next section provides an insight of the research
question that is being addressed in this summary and in the section III, we
discuss about the different processes to generate the surrogate architectural
views. In section IV we study about the process of obtaining the ground
architecture and in the section V we deal with the Bayesian Learning for
Software Architecture Recovery which is the base for developing automated
architectural recovery tool. In the section VI, we deal with the reverse
engineering tools which are used currently in the real time.
What is Software
According to Bass et al, Software Architecture of a program
or a computing system is defined as the structure or structures of the system
which comprises the software elements, the external visible properties of these
elements and the relationship among them (Bass et al. 2003). The software architecture
acts as a base for the software architects to build a solution for the problem
without getting to the lower level of abstraction such as the source code etc.
Software Architecture acts as the roadmap for implementation and the
maintenance related activities. A good software architecture is built upon the
principle of “Separation of Concerns” where different responsibilities and the
functionalities are assigned to different architectural elements. A Bad
architecture increases the architectural bad smell and increases the complexity
of the software system which make it impossible to make changes to it.
The software system evolves over a period through the
process of software maintenance. It is evident that more the changes are scattered
across the software components it is more likely to induce bugs into the
software system. Sometimes the changes are made to the code are not usually
documented for further reference. But the Architectural documentations must be
usually updated to adapt the oscillating environments. In most of the cases the
architectural documents are out of date. So, it becomes very difficult to
maintain the system and make changes to it.
According to Kouroshfar et al, Co Changes are defined the multiple changed
files which are committed to the same repository (Kouroshfar et al. 2015). It is also evident that Co
changes made across multiple architectural modules induce more defects when
compared to the changes made with a localized module.
In the most the time the support engineers work on
maintaining the software system without the knowledge of the underlying
architecture. So, the process of reverse engineering was introduced in the field
of Software Engineering. According to Eilam, Reverse engineering, also called
back engineering, is the processes of extracting knowledge or design
information from a product and reproducing it or reproducing anything based on
the extracted information (Eilam 2005).
One of the main goal of the software reverse engineering to develop the Software
architecture from the source code.
The architecture plays a vital role in the maintenance of
the software system. Due to continuous and rapid evolution of the system, there
is left with no documentation or in some cases the documentations are not
consistent. So, the research question which is being addressed in this summary
is “Whether it is possible to generate the exact architecture of the underlying
system from its implementation details such as the source code using the
process of reverse engineering”.
The different type of the Architectural views that are
generated by the help of the reverse engineering techniques are (Kouroshfar et al. 2015):
This view provides the information about the units of implementation
2. Connector and
Component view: It provides the information about the run time
behaviour of the system and the interaction of the components between them.
View: This view provides the relationship between the software entities
and the non-software elements and their executing environment.
Since due to lack of consistency in the Architectural documentations, the
surrogate models are generated by using the reverse engineering techniques to
obtain an approximation of the software architecture.
Some of commonly used reverse engineering methods to
generate the surrogate models are as follows,
1. Package View
2. Bunch View
3. ArchDH View
4. LDA View
5. ACDC View
1. Package View:
In this method, the packages represent the system architecture’s module. For
Example, the package structuring in the Java projects represents each
2. Bunch View:
Bunch View is generated by the reverse engineering tool which produces clusters
based upon the dependencies between the classes. Bunch view depends upon the
principle of source code analysis to convert the source code into directed
graphs which is the representation of the source code artefacts and their
relationships (Wu et al. 2005).
3. ArchDH View:
Cia et al proposed an Architecture recovery algorithm known as the Architecture
Design Rule Hierarchy (ArchDH) (Cai et al. 2013).
The steps involved in the ArchDH Algorithm are:
First the algorithm identifies design rules and
allocates them a special position in the architecture.
Then by identifying the source code, there might
be some parts of the programs may be dependent on the controllers or the
dispatchers. The ArchDH Algorithm identifies these controllers or dispatchers
and provides them special positions in the algorithm.
Then the ArchDH algorithm separates the rest of
the codes into modules.
A dependency graph is formed by the rest of the
If the sub graph is still large, then the design
rules or the controllers separates them within the sub graph recursively.
This way the algorithm generates a hierarchy
which is called the design rule hierarchy.
4. LDA View: LDA View is generated with the help of the
information retrieval and data mining techniques such as Latent Dirichlet
Allocation(LDA). LDA analyses the textual similarities between the classes and
clusters them into different modules.
5. ACDC View:
Algorithm for comprehension driven clustering groups program entities based on
the principle of easing comprehension (Tzerpos and Holt 2000). This algorithm clusters
programs based on the list of system design patterns such as source file
pattern, naming pattern etc. After constructing the skeleton, the algorithm
then clusters the left-over elements using the orphan adoption methods.
Obtaining Ground Truth
Since the above provided Architecture views are just the
surrogate views, the support engineers find it difficult to make chances to the
system without knowing the complete picture of the system architecture. It is
difficult to maintain the Software Architecture due to the phenomenon of Architectural
Drift and Erosion. So, to deal with the Architectural drift and erosion, the
ground truth architecture is developed. According to Garcia et al, Ground truth
Architecture is defined as the architecture of the system which is verified as
accurate by the system’s architects or developers who have intimate knowledge
of the underlying application and problem domain.
Garcia et al, proposed a framework to recover the ground
truth software architecture. The principles of the framework are also known as
the mapping principles (Garcia et al. 2013). The mapping principles are
sub divided into 4 types.
Principles: It consists of long standing software engineering
principles such as the separation of concerns, isolation of changes, coupling,
Domain Principles: Domain
Principles consists of mapping principles based on the domain information.
Domain information consists of data related to the domain of the system in
question. Example: Retail, Telecom, Banking etc. The Domain principles are
obtained from the research literatures, the industry standards or the
Engineer’s experience who is working in that domain.
Principles: Application Principles consists of principles that are
related to application whose architecture is undergoing changes. Application
principles may be obtained from the documentation or the comments from the
Context: The system context as described in Fig 1 is a grey area which
contains principles related to the mapping principles and the infrastructure on
which the application is being built upon.
Process involved in development of the Ground Truth
Architecture (Garcia et al. 2013):
Step 1: Use the
available documentations to get any domain or application specific information
to which can be used to produce the Domain or the Application principles.
Step 2: The recoverees
can select any of the existing to aid the architecture recovery process. The
use of the recovery technique induces the generic principles into the recovery
Step 3: The next
step is to extract the implementation level information which is required by
the selected technique.
Step 4: In this
step, the recoverers apply their chosen technique to obtain the initial
architecture of the system.
Step 5: In this
step, any of the mapping technique obtained in the step 1 can be used to modify
the architecture obtained in the step 4.
recoverer must identify any of any of the utility components such as the
libraries, the middleware components and the application frameworks which are
being used. This is performed because these components affect the quality of
the recovered architecture of the system.
Step 7: By this
step the recoverer have produced a recovered authoritative architecture that
have been enriched and modified with the help of the different mapping
techniques. Then the certifier of the system architecture then looks through
the proposed grouping and may suggest addition of new grouping or splitting up
of an existing group in to multiple sub groups or to transfer source code
components from one group to another.
Step 8: At this
point the recoverer makes changes to the grouping based on the inputs provided
by the certifier. The steps 7 and 8 are repeated by the certifier and the
recoverer until both of are satisfied with the results produced.
At the end of the step 8, the ground truth architecture is
generated for the underlying software system.
Since the recovery of the Software architecture is a manual
and tiresome process, there are certain automated methods to recover the
software architecture from its implementation details. One of the commonly used
automated method is the Bayesian learning based approach (Maqbool and Babri 2007). The Bayesian learning based
is used to recover the software architecture of the system automatically where
there is out of date or the incomplete documentation of the system.
for Software Architecture Recovery:
According to Maqbooll and Babri, The Bayesian learning takes
a probability-based approach to reasoning and inferring results. The Naïve
Bayes classifier is one of the Bayesian learning method which has been
implemented to solve many of the practical problems (Maqbool and Babri 2007). According to the Bayesian
approach, the most probable target value vmap, given the attribute values a1,
a2, ak. is given by:
f(x) – function which can take value v1,v2,….vj
from the set V.
a1 , a2 ,…..ak – denotes
The Naïve Bayer classifier makes the assumptions simplified
that the attribute values are conditionally independent given the target value
(i.e) P (a1, a2,…..,ak|vj)= ? i P(ai,vj)
(Maqbool and Babri 2007). So according to the Naïve Bayes classifier, the mots
probable targeted value is given by the equation.
There are certain open source tools which are available to
automate the process of the Software Architecture recovery. The three-step
process which is used to perform the Architectural recovery are (Armstrong and Trudeau 1998),
Extraction: This is the process of extracting
the details of the source model into lower level artefacts such as the classes,
variables and functions and the relationship between them.
Classification: In this process the lower level
components and their relationships are combined to form more abstract
components such as the files, modules and the sub systems.
Visualisation: To produce the diagrammatic
representations for further analysis.
Some of the commonly used architectural recovery tools are,
The software Bookshelf (PBS)
CIA (information Abstraction)
is a public domain tool from the university of Victoria which is used to
understand the large information spaces. It can extract, organize, abstract and
visualize components (M et al. 1993). It consists of a C Language
Parser called the rigiparse and a graphical tool called the rigidit.
is a prototype tool which was developed from the Carnegie Mellon University. It
assists in interpreting the extracted data as architectural information (Kazman et al. 1999).
Bookshelf (PSB): This tool was developed in the university of Toronto
as a prototype reverse engineering tool to work on the legacy systems (Finnigan et al. 1997). It contains 3 different
components namely, a C Language parser called cfx, a relationship abstraction
tool called the GORK and the java-based user interface to visualize the
architectural components called lsedit.
Information Abstraction (CIA) is a relational database developed at AT & T
Bell Research Laboratory. It is used to extract and store the information about
the source code in to the relational database (Chen et al. 1990). CIA contains a tool called
ciao, which is used by the programmers to query and visualize the data which is
present in the CIA database.
SNiFF +: It
was developed by the Take Five Corporation. It is a extensible and scalable
programming tool for both C and C++.
This tool is used for parsing and information retrieval.
Since there is incomplete documentation regarding the
software architecture, by using the reverse engineering techniques it might be
possible to obtain the ground truth software architecture. And the automated
reverse engineering tools such as Rigi which is one of the most famous reverse
engineering tool which have reduced the manual work in recovering the
architecture (Armstrong and Trudeau 1998).