Volume 6 Number 2 September 2016

1

Effective Analysis of Financial Data using Knowledge Discovery Process
G. Arutjothi, C. Senthamarai

Abstract: Finance is the biggest factor of the Banking Industry. In Banking Industry success and failure is based on the credit. Banking Industries are competitive today with increase in volume, velocity and variety of new and existing data. Managing and analyzing the massive data is more difficult. One of the critical problem in financial organization is to properly evaluate the credit risk. Credit risk is the biggest challenge for the Banking Industry. Credit risk encompasses the borrower ability and willingness to pay and it is one of the main factor for defining a lenders credit policy. This research paper focuses on reducing the credit risk using the credit evalution model. This model uses data mining techniques such as decision tree, Supprot vector machine and logistic regression and it provides the information to make decision on loan proposals using weka tool.

2

Perslustrate of Detection Methodology against Clone Attacks in Wireless Sensor Networks
J. Sybi Cynthia,D. Shalini Punithavathani

Wireless Sensor Networks (WSNs) is a capable technology and have immense potential to be employed in decisive situation like battle field and commercial application such as construction, traffic observation, environment monitoring and numerous other scenarios. One of the major challenges WSNs faces today is security breaches. A network security is very much necessary to face these security breaches. The WSN deployed in aggressive environments are susceptible to clone attacks. Clone attack would be probably be the most vigorous adversary in WSN especially in battlefield. And is waking up, belatedly, to the threat of an clone in wsn. It should be better organized for the research community to develop new architectures, systems and applications, and to assess alternatives and tradeoffs in developing technologies for its successful deployment. This paper is well emerge naturally in response to survey on various detection methodology and its evaluation metrics based on security primitives against clone attack in WSN. This paper promises many benefits for the research field in advance.

3

Tree Based Opinion Mining in Tamil for Product Recommendation using R
A. Sharmista, M. Ramaswami

Sentiment analysis, the automated extraction of expressions of positive or negative attitudes from text has received considerable attention from researchers during the past decade. In addition, the popularity of internet users has been growing fast parallel to emerging technologies; that actively use online review sites, social networks and personal blogs to express their opinions. In this paper we discuss some of the challenges in sentiment extraction especially in Tamil language and some of the approaches that have been taken to address these challenges and our approach analyses sentiments from Twitter social media. Tamil, a Dravidian language has a very rich morphological structure which is agglutinative. Tamil words are made up of lexical roots followed by one or more affixes, mostly suffixes. Tamil is also a post positional inflectional language. We have developed a Parts of speech tagging system to handle nouns and verbs. So finding a word in a language like Tamil is very complex. We try to resolve this complexity by identifying the categorical ambiguities and developing decision tree classification techniques at word grammatical category and grammatical feature level. These techniques were used to annotate the corpora and trained using the R Tool. The results obtained in each level were encouraging.

4

Lung Cancer Image Segmentation and Classification using Soft Computing Techniques
G. Madhu Bala, I. Laurence Auroquiaraj

Abstract: In this paper it shows current medical diagnosis, treatment, and surgery, medical imaging plays one of the most significant roles, since imaging devices such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and ultrasound diagnostics yield a great deal of information about diseases and organs. Lung cancer is the uncontrolled growth of abnormal cells that start off in one or both lungs, usually in the cells that line the air passages. The abnormal cells do not develop into healthy lung tissue; they divide rapidly and form tumors. In this paper, the noise that affects the features of CT images, the widespread use of CT image imaging requires the need for developing filter for decreasing noise. Some of the related work is given the most thresholding based segmentation methods attempt to segment the CT images. Most previous works are prepared to compare different thresholding based image segmentation algorithms based on characteristics such as correctness, stability with respect to parameter choice and stability with respect to image choice. Performance measure like precision, specificity and false positive rate is used to evaluate the accuracy. It is observed from the experimental evaluation that the performance of ANN is better than that of individual classifiers of Lung image data sets.

5

Mining Educational Data to Predicting Higher Secondary Students Performance
A. Dinesh Kumar , V. Radhika

Abstract: Education means imparting knowledge to the students and developing their innate quality. Recent terms data mining techniques have been applied in educational field in order to find out the hidden knowledge from educational data. The great deal of research has been done identifying the factors that affect the student's performance those factors can be named as psychological factor and environmental factor. Student performance affected by different factors such as learning environment, economic condition and peer group family, this study focus on environmental factors and educational institute factors. Predicting student performance is very essential for higher secondary teachers to identify their students according to their performance by the name of excellent performers, average performers and below average performers. In this study student environmental and educational factors are compared. The C4.5 and ID3 decision tree algorithm applied on predicting the student performance with feature selection technique ranker.

6

An Experimental Investigation on Combinatorial Testing with Proficient Constraints
M. Bharathi, V. Sangeetha

Abstract: Combinatorial testing is concentrated on pinpointing errors and faults arise owing to the interaction of altered parameters of software. On every occasion our processes have manifold variables that may intermingle with each other, we use combinatorial analysis. The variables may derive from diversity of sources such as dissimilar operating system, peripherals, and altered databases or from a network. In combinatorial testing, the assignment is to attest that distinct combinations of variables are controlled correctly by the system. Combinatorial methods aid to decrease the cost and increase the efficiency of software testing in numerous applications. Combinatorial testing delivers an enhanced way to shield all the thinkable combinations with improved swapping between cost and time. Pairwise testing is a most familiar test planning technique. Due to resource constraints, it is nearly always impossible to exhaustively test all of these combinations of parameter values. In this paper we propose an approach to generate test cases for pairwise testing by applying and prioritizing input parameter and their corresponding values to be selected and also providing solution for the constraint handling between parameters and values and removing invalid test cases and evaluate the results of 100% coverage with specified constraints.

7

Analysis of Diabetic Awareness among the General Public in Tamil Nadu using Social Networking Data
J. Ramsingh , V. Bhuvaneswari

Abstract: Big data has rapidly developed into a hot topic that attracts extensive attention from various domains such as industry, government, health care, agriculture and in many sectors. Big Data inexorably draws attention of many data analysts from various countries around the world, in India there exists a huge healthcare burden in terms of diabetes due to rapid urbanization, life style changes. India faces several challenges in diabetic management due to the lack of disease awareness, raising prevalence of diabetic complication among the public. Many barriers prevail among the patients, due to the health care systems that exist; analysis of diabetic awareness must be carried out to make India healthier. New technologies have evolved which serves as a tool to engage and involve patients in health care. This paper gives about the role of big data analytics and Hadoop, in revealing the awareness various aspects of Diabetes Mellitus (DM) is essential for the prevention, management and control of the disease. However, several studies have consistently shown that awareness of DM in the general population is low. This condition constitutes a major public health problem in the country. By the means of data collected through the social network WhatsApp.

8

Predictive Analysis for Weather Prediction using Data Mining with ANN : A Study
R. Samya, R. Rathipriya

Abstract: Weather Forecasting is a scientifically and technologically challenging problem forever. Now days, Cloudburst is one of the important forecast problems. Because, it results into huge disastrous, more than 20mm of rain may fall in a few minutes. It also responsible for flash flood creation. Due to this type of sudden flood, the people are affected economically and physically very much. Therefore it is needed to forecast cloudburst in early to avoid disastrous. The main aim of the paper is to survey the various forecast techniques for cloudburst using Data Mining and Artificial Neural Network (ANN), in the literature.The most commonly used parameters for analyzing the cloudburst forecast: temperature, rainfall, evaporation and windspeed. From the study, it came to know that forecasting using big data analytics is the best solution to get accurate cloudburst prediction.

9

Ant Inspired Routing Protocols for Wireless Sensor Networks
O. Deepa, J. Suguna

Abstract: Internet social media services such as Twitter have seen phenomenal growth as millions of users share opinions on different aspects of life every day. This tremendous growth has induced an interest in making use of such data for extracting valuable information, such as their opinions, location of the users and certain other information. In this paper we have analyzed the tweets related to crime attributes against women and children, different sort of crimes that are prevailing , the location in which the users tweets are more frequently occurring related to crimes. The proposed work make use of R language for extracting real time tweets and relies upon Hadoop-based framework for storing the tweets as they are larger in number. The tweets are parsed under Hive environment and we build a sentiment classifier in R that is able to determine positive, negative and neutral sentiments for a given phrase. We observe that the elapse time for processing under Hadoop based framework significantly outperforms the other conventional methods and is more suited for real time streaming tweets.

10

Map Reduce K-Means baased Co-Clustering Approach for Web Page Recommendation System
K. Krishnaveni, R. Rathipriya

Abstract: Co-clustering is one of the data mining techniques used for web usage mining. Co-clustering Web log data is the process of simultaneous categorization of both users and pages. It is used to extract the user's information based on subset of pages. Nowadays, the cyberspace is filled with huge volume of data distributed across the world. The business knowledge acquaintance from such a voluminous data using the conventional systems is challenging. To overcome such complexity the Google invented the Map Reduce, a programming model used to incorporate the parallel processing in the distributed environment. In this paper, MapReduce K-Means based Co-Clustering approach (CC-MR) is proposed for web usage data to identify constant browsing patterns which is very useful for E-Commerce applications like target marketing and recommendation systems. Here, benchmark K-Means clustering algorithm is used to generate constant co-clusters from the web data. Experiments are attempted on real time web dataset to exploit the performance of the proposed MapReduce K-Means based Co-Clustering approach. These experiments are implemented in the MatlabR2016 and this approach yields the promising results.