Shubham to analyse behaviour of e-customers and extract information

 Shubham Kadam1, Pratiksha Kapse2, Vedika Jadhav3, Akshay Kotgirwar4  Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune [email protected] [email protected] [email protected] [email protected]    ABSTRACT On the Internet, where the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to alleviate the problem of information overload, which has created a potential problem to many Internet users. Recommendation System solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services. This project explores the different characteristics and potentials of different prediction techniques in recommendation systems in order to serve as a compass for research and practice in the field of recommendation systems. The extensive usage of internet is fundamentally changing the way we live and communicate. Consequently, the requirements of users while browsing internet are changing drastically. Recommender Systems (RSs) provide a technology that helps users in finding relevant contents on internet. Revolutionary innovations in the field of internet and their consequent effects on users have activated the research in the area of recommender systems. In a regular retail shop the behaviour of customers may yield a lot to the shop assistant. However, when it comes to online shopping it is not possible to see and analyse customer behaviour such as facial mimics, products they check or touch etc. In this case, data mining techniques of e-customers may provide some hints about their buying behaviour. In this study, we have presented a model to analyse behaviour of e-customers and extract information and make predictions about their shopping behaviour on a digital market place. This project performs a data mining application and extract online customers’ behaviour patterns whether he is buying a product or not. Findings will be discussed in the conclusion.     Keywords data mining, clickstream, e- customer, customer behaviour, digital market, location based, recommendation system. 1. INTRODUCTION The marketing of products or services can be done using digital technology so that it will reach out to consumers. The advantages of using the digital market is that it offers more choices, lower prices, easy search and access to online customers. Thus, this is why the digital market is expanding day by day. As a result customers behavior patterns are gaining more importance in buying the things online2. From the 1990’s the development of electronics market entirely changed the way customers perform online transactions3. Traditional markets have become an alternative source because of this digital market. Behavior of customer is the study of when, why, how and where people do or do not buy a products. Understanding the customer’s needs is essential while considering for an online e-commerce application. Another application for data mining is Web mining which helps in discovering usage patterns and behaviors from the web data5. Clickstream data is an example of web mining. This data will help in serving the clients requests and also improve the sales of the business. Web usage mining has gained much attention from research and e-business professionals and it offers many benefits to an e-commerce web site such as: • Targeting customers based on usage behavior or profile (personalization) • Adjusting web content and structure dynamically based on page access pattern of users (adaptive web site).5 With the increase in all of thesei.e the web usage, clickstream data and the mouse movements of the online customers, we have implemented this model to analyse the behavior of the customers. In our model we will be dynamically generating the web data of customers and analysis will be performed based on some attributes that are defined in the dataset used section. According the analysis will be performed.   2 2. MOTIVATION Recommendation systems help users find and select products from the huge number available on the web or in other electronic information sources. Recommender Systems are applications that provide personalized advice to users about products or services they might be interested in. Recommender Systems are playing a major role in the Digital and Social Networking Revolution and becoming a part of everyday life. They are helping people efficiently manage content overload and dive into the long tail of content discovery. The social prevalence of this can be evidenced by the evolution of, and demand for, personalized radio, television, video and on-line shopping. Given a large set of products and a description of the user’s needs, they present to the user a small set of the items that are well suited to the description. Recent work in recommendation systems includes intelligent aides for filtering and choosing web sites, news stories, TV listings, and other information. The users of such systems often have diverse, conflicting needs. Differences in personal preferences, social and educational backgrounds, and private or professional interests are pervasive. As a result, it seems desirable to have personalized intelligent systems that process, filter, and display available information in a manner that suits each individual using them. The need for personalization has led to the development of systems that adapt themselves by changing their behavior based on the inferred characteristics of the user interacting with them. The ability of computers to converse with users in natural language would arguably increase their usefulness and flexibility even further. Proposed system describes a personalized and location based recommendation system designed to help users choose an item from a large set all of the same basic type. Main objective is to support recommendations that become more efficient for individual users over time. 3. OVERVIEW Existing System From the existing system 1, the statistics are as follows: A confusion matrix scorer is applied to calculate accuracy. Table 1 presents the confusion matrixes and accuracy statistics for both decision tree and artificial neural network analysis. Table 1 reads that overall accuracy for prediction is 90.42%. Accuracy at predicting whether customer will buying is 96.3%. When it comes to predict whether customer will pay and leave then accuracy is 40.2%. Since, F measure is 0.947 for predicting whether a customer will leave without paying; the rules generated by decision tree analysis may be used for some business purpose.   Drawbacks of Existing System In the existing system, it is only practicability of the work done. Although the system in existing have used only some data features described in the dataset.Using the neutral network analysis and the decision tree analysis together can become a complicated task.Also offers for the interested customers are not given instantly.1  Need of Naïve Bayes Algorithm Using this algorithm, we can use multiple features as described in the dataset at once and we can predict whether a person will purchase the thing or not. Naive Bayes gives accurate results for large datasets. Offers to interested customers are given instantly and can also provide analysis on a particular area.  4. THE DATASET USED IN THE STUDY Dataset used in the existing system will be same1. Dataset has a server side program to collect clickstream data from the company’s web server; at the same time, another java script program has been used to collect data from client side. Data attributes we have collected and used in the study are as follows: Special day: If it is one week or earlier than an official or religious day such as Christmas, Independence Day etc. this parameter takes true value as 1, otherwise it is 0. Day: represents the day of week: Sunday, Monday, and Tuesday etc. Period of day: We have four periods for this variable; morning, afternoon, evening and after midnight. Time spent on the side: This variable includes total time spent on the site. It is calculated as seconds. Search: If the customer searches a certain product on the site by entering keywords, this variable takes true value labelled with 1; otherwise it is labelled 0 (false) by default. Category of search: Products have been categorized into 4. Skirt, jeans, shorts and pants are one group labelled as 1; shoes, boots, sandals are grouped and labelled as 2; dress, jacket cardigan, overcoat, sweater etc. are 3 and underwear products are 4. Number of items in basket: It shows the number of different items in the online shopping basket. If the customer has two identical products this is counted as one. There must be at least one difference, such as color or size between two products, to count them two separate items. Discounted Item in the Basket: The e-commerce company makes promotion campaigns or applies discounts. This variable identifies if the item in the basket is a promotional or discounted one. If there is at least one discounted item in the basket this variable takes 1(True), otherwise 0 (False). Product category of the item in the basket: There are five categories in this field: Female, male, unisex, child (girl), child (boy). Item add time: It shows the time (in seconds) of the first item added to basket. If the basket is empty it takes 0 value. Amount of clicks: Number of the items clicked. Menu item clicks are not counted. We counted only the click made on products. Click No: This shows the order of the click made by the customer. It takes values as 1st, 2nd, 3rd, etc. Clicked item: Items are labelled as in category of search: Skirt, jeans, shorts and pants are one group labelled as 1; shoes, boots, sandals are grouped and labelled as 2; dress, jacket cardigan, overcoat, sweater etc. are 3 and underwear products are 4. Click time: It shows the time (in seconds) when the product is clicked. For example if the user clicked his/her first item to examine on the 100th seconds on his/her visit, 100 is attained to this variable. 3 Source: It represents the source where e-customers come to the site from. This may be a search engine, another site or promotional mails sent by the company. If the customer is coming from a search engine it takes 1, if s/he is coming from a promotional mailing, it takes 2 and all other sides or sources are labelled as 3. Left without purchase: If the customer checks out properly by making payment this variable takes false value labelled with 0; otherwise it is 1 (true) by default.  5.  SYSTEM ARCHITECTURE  Fig. Architectural Diagram From the above figure, you can illustrate that there are three models i.e the admin module, the client module and the cloud server. The clients will log in to the site and perform search product, like/dislike products, check and view details of products etc. Based on the clients browsing history, the cloud model performs analysis on it. The admin will give offers based on the prediction. Logs of customers activity will be maintained in the database.  Admin Module In the admin application, there are mainly two phases. Firstly, the add and manage products where admin is able to add, delete and manage products. Secondly, the analysis part where the actual algorithm will be implemented. He will also be able to view the prediction results. There is connectivity between the admin application and the database. Based on the prediction results offers will be given to the individual interested customers. Admin will be able to execute queries on the database for managing the products such as add, delete, update.  Client Application Module This module will be developed with the help of AWT/Swing. Swing is a part of Oracle’s Java Foundation Class(JFC) which provides API for graphical user interface. Swing provides more sophisticated set of GUI components and also provides a native look and feel that emulates them across several platforms. The client module will request for the services from the server. Clients will get registered to the application. After that when he gets logged in into the application, he will start searching for some products. He will search the products based on the category, name, like/dislike, rating of a particular product. If the customer is interested in buying any product, he will view some more details about the product. Instantly the customers will get offers on the product if he views the same category of products again and again.  Cloud Server Module The cloud server will be having GlassFish server. GlassFish server is an open source server and also based on server. GlassFish uses a derivative of Apache Tomcat as the servlet container for serving Web content. Server is responsible for users authentication and also provide the services requested by the users. Maintaining the users logs based on the clicks and the activity. JDBC connectivity will be used for connecting the database where the database will be MySQL. Server will apply prediction on logs to analyse the User behaviour and will prediction if item can be purchased by the customer or not.   6. ALGORITHM            Naive Bayes is a simple technique for constructing classifier models that assign class labels to problem instances,  represented as vectors of feature values, where the class labels are drawn from some finite set. It is not a single algorithm for training such classifiers, but a family of algorithms based on a common principle: all naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable. Naive Bayes models uses the method of maximum likelihood.    Bayes theorem provides a way of calculating the posterior probability, P(c|x), from P(c), P(x), and P(x|c). Naive Bayes classifier assume that the effect of the value of a predictor (x) on a given class (c) is independent of the values of other predictors. This assumption is called class conditional independence.    Where, -P(c|x) is the posterior probability of class (target) given predictor (attribute).  -P(c) is the prior probability of class. -P(x|c) is the likelihood which is the probability of predictor given class. -P(x) is the prior probability of predictor.  4 7. CONCLUSION     In this study, application makes analysis of customers behavior based on different attributes. Naive Bayes algorithm helps for considering different attributes at a single time. Also it produces accurate results for large datasets. This analysis will directly produce increase in sells. Users will get more and more offers with this. So this application helps shopping effective and easy. With the increasing popularity of GPS-enabled mobile phones, consumers are often interested in locationbased recommendations. For example, a traveling user may wish to buy things that are famous for that location, so through this desired project he will get recommendations on things he may like based on his location based recommendations as its a need of today’s e-customer, and also enhances business and market for E-commerce.  8. FUTURE WORK              The design and implementation of this project has helped local businesses grow virtually. At the moment, the recommendation algorithms are not the issue, but the entire architecture of how we generate recommendations. It is very naive to assume that per-user recommendations would be the thing to do, because it is based upon naive assumption of the users being static, while they are dynamic. In this fast moving environment recommendation system should also match up with user. This system can be integrated with other platforms when required. There is no need to design entire system from scratch for another platform like iOS or windows because of the use of XML, just a User Interface needs to create for that platform. REFERENCES       1 GokhanSilahtaroglu, Hale Donertasli,2015. Analysis and Prediction of E-Customers Behavior by Mining Clickstream Data.  2 Thompson S., H. Teo, 2006. To buy or not to buy online: adopters and non-adopters of online shopping in Singapore. Behaviour and Information Technology, 25(6), 497-509. 3 Aikaterini C. et.al., 2013. Online and mobile customer behaviour: a critical evaluation of Grounded Theory studies , Behaviour and Information Technology, 655667. 4 Schaefer, Kerstin; Kummer, Tyge-F.,2013.Determining The Performance Of Website Based Relationship Marketing ,Expert Systems With Applications,75717578. 5 Hu, XH; Cercone, N, A., 2004. Data Warehouse/Online Analytic Processing Framework For Web Usage Mining And Business Intelligence Reporting, International Journal Of Intelligent Systems,585-606. 6 Al-Zaidy, Rabeah; Fung, Benjamin C. M., Youssef, Amr M., 2012. Mining criminal networks from unstructured text documents, Digital Investigation ,147-160.  7 Domingues, Marcos Aurelio; Soares, Carlos; Jorge, Alipio Mario,2013.Using Statistics, Visualization And Data Mining For Monitoring The Quality Of Meta-Data In Webportals., Information Systems And E-Business Management,569-595. 8 Vicari,Donatella;2014.Alfo, Marco, Model based clustering of customer choice data, Computational Statistics and Data Analysis,71 Special Issue,3-13. 9 Berthold MR,2008.KNIME: the Konstanz information miner In Data analysis, Machine Learning And Applications, Springer-Verlag,319-326. 10 Yan LI Bo-qinFENG,Yan LI Feng WANG, 2009. Page Interest Estimation Based on the User’s Browsing Behavior, Second International Conference on Information and Computing Science, 258-261. 11 VinayGautam, Vivek Gautam,2013.UserBehavior Based Enhanced Protocol (UBEP) for Secure Near Field Communication, World Academy of Science, Engineering and Technology International Journal of Computer, Information, Systems and Control Engineering Vol:7 No:11, 853-863