KEY DATA EXTRACTION AND EMOTION ANALYSIS OF DIGITAL SHOPPING BASED ON BERT

Mayakannan Selvaraju; Sarika Jay; B VA N S S Prabhakar Rao

KEY DATA EXTRACTION AND EMOTION ANALYSIS OF DIGITAL SHOPPING BASED ON BERT

Published Oct 8, 2021

Mayakannan Selvaraju

a:1:{s:5:"en_US";s:19:"Assistant Professor";}

Sarika Jay

UG Scholar

B VA N S S Prabhakar Rao

UG Scholar

Abstract

Purpose: The objective of this paper is to focus on extracting the key words about the product quality and the customer experience with the same in a more efficient and accurate way by pre-training the Bidirectional Encoders Representations from Transformers (BERT) model with the quality domain knowledge and classifying the result with deep learning technique.

Methodology: Dataset is considered to be amazon reviews which is a combination of single product-based customer reviews and several products and their reviews which is of medium – large size. This dataset is subjected to initial process of cleaning, data wrangling, Exploratory Data Analysis with pre-trained BERT along with a neural network classifier. The BERT classifier is loaded along with tokenizer in the input modules. The BERT model is configured and training for fine-tuning. The prediction is done based on the final fine tuning.

Findings: BERT model along with TF-IDF topic extraction model was implemented to analyse the trend and theme of the outbreak which eventually helped to analyse the public concerns and appropriate health support. Fine tuning of Chinese BERT model and softmax neural network layer was used to train the model to classify into three sentiments which resulted in 75.65% accuracy. Higher accuracy was expected to obtain but was in need to improve in the modelling and more datasets from different parts of the world will lead to much more accuracy in regards with public concerns. A function is generated to output a sample permutation and thus its replication which will be a single statistic. We consider a hypothesis that the words distribution is of with same identity and setting a value of probability as with minimum value of 5.9. When the p-value values to 0.0 which gives as the null hypothesis is invalid. The baseline will be TF-IDF model with logistic regression. Here a prediction function along with prediction matrix values is generated. The model weights and tuning are interpreted with the help of Eli5 library. The pre-train model is initialized and the configuration is used with layers of encoding and pooling with dimensionality of 768.With this initialization, logits for the input sequence is generated.

Originality/value: The BERT model which is pre-trained is enables with tokenizing the input dataset which is taken as amazon Alexa product review dataset. While the input is loaded, pre-cleaning process is done such as managing the equal negative and positive comments so that the predictions can be made easy. In order to identify how much is the difference between negative and positive comments we implement the testing of permutation and from that calculating the p-value.

How to Cite

Selvaraju, M., Sarika Jay, & B VA N S S Prabhakar Rao. (2021). KEY DATA EXTRACTION AND EMOTION ANALYSIS OF DIGITAL SHOPPING BASED ON BERT. SPAST Abstracts, 1(01). Retrieved from https://spast.org/techrep/article/view/1742

Abstract 97 |

Keywords

Natural Language Processing, Text summarization, extracting techniques, Sentiment Analysis.

References

[1] Yaser Keneshloo,Tian Shi,Naren Ramakrishnan And Chandan K Reddy,”Deep Reinforcement Learning For Sequence-To-Sequence Models” IEEE Vol. 31, No. 7, JULY 2020.
[2] Andres Alejandro,Ra Ramos Magna,Hector Aleende-CID,Carla Taramasco,Carlos Becerra And Rosa L Figueroa,“Application Of Machine Learning And Word Embedding For Cancer”IEEE 2020,Doi 10.1109/ACCESS.2020.3000075.
[3] Mohammad Bidoki, Mohammad R. Moosavi⁎, Mostafa Fakhrahmad “A Semantic Approach To Extractive Multi-Document Summarization: Applying Sentence Expansion For Tuning Of Conceptual Densities”ELSEVIER, Information Processing And Management 57 (2020) 102341.
[4] Ángel Hernández-Castañeda , René Arnulfo García-Hernández,Yulia Ledeneva, And Christian Eduardo Millán-Hernández “Extractive Automatic Text Summarization Based On Lexical-Semantic Keywords”,IEEE 2020, Doi 10.1109/ACCESS.2020.2980226.
[5] Wanying Yan And Junjun Guo ,“Joint Hierarchical Semantic Clipping And Sentence Extraction For Document Summarization”, J Inf Process Syst, Vol.16, No.4, Pp.820~831, August 2020.
[6] Fucheng You;Shuai Zhao;Jingjing Chen ,“A Topic Information Fusion And Semantic Relevance For Text Summarization “,IEEE 2020, R 10.1109/ACCESS.2020.2999665.
[7] Zenun Kastrati, Ali Shariq Imran And Arianit Kurti“Weakly Supervised Framework For Aspect-Based Sentiment Analysis On Student Review On MOOC”,IEEE2020,Doi 10.1109/ACCESS.2020.3000739.
[8] Kaichun Yao, Libo Zhang , Dawei Du , Tiejian Luo, Lili Tao, And Yanjun Wu “Dual Encoding For Abstractive Text Summarization”, IEEE TRANSACTIONS ON CYBERNETICS, VOL. 50, NO. 3, MARCH 2020.
[9] Zepeng Hao, Jingzhou Ji , Bin Xue And Tao Xie “Abstractive Summarization Model With A Feature-Enhanced Seq2Seq Structure”, IEEE,2020 2020 5th Asia-Pacific Conference On Intelligent Robot Systems
[10] “Sentiment Analysis About Investors And Consumers In Energy Market Based On BERT-Bilstm”IEEE 2020,Doi 10.1109/ACCESS.2020.3024750.

Issue

Vol. 1 No. 01 (2021): Smart Green Connected Societies

Section

GE3- Computers & Information Technology

Main Article Content

Article Sidebar

Abstract

How to Cite

Article Details

Most read articles by the same author(s)