2nd International Conference on AI, Data Mining and Data Science (AIDD 2024)

Accepted Papers

Uncovering Greenwashing: a Study on Corporate Sustainability Reports and Public Sentiment

Charlott Jakob

ABSTRACT

Greenwashing is an increasingly widespread problem in society. It is difficult to detect due to deceptive strategies and knowledge disparity between companies and the public. Companies can either present genuine or misleading statements about their efforts in their sustainability reports (SRs). Negative public opinion might be an incentive for companies to obscure certain statements in an attempt to counteract negative headlines. In this study, we investigate the link between a company’s public image and its SR. Analysing the top 60 companies in Germany, we explore the connection between linguistic text features in their SRs and the sentiment towards them in media. The paper reveals a lack of significant feature differences between positive and negative sentiment. A single significant difference was found, indicating that companies with negative public sentiment use negations more frequently than companies with predominantly positive public sentiment. This research shows that while detecting signs of greenwashing is possible, it remains a challenging task.

Keywords

Greenwashing, Sentiment Analysis, Sustainability Reports, Natural Language Processing, Public Perception.

Facilitating Stock Recommendations Through Sentiment Analysis

Shlok Bhura, Tanish Bhilare, Rylan Nathan Lewis and Dr. Kavita Kelkar, Department of Computer Engineering, K.J. Somaiya College of Engineering, Mumbai, India

ABSTRACT

Sentiment analysis is a relatively new method of stock recommendation that assesses news articles, social media feeds, and other information sources to ascertain investor sentiment towards a particular stock using machine learning and natural language processing. The model suggests whether to buy, hold, or sell the stock based on sentiment analysis. By emphasising trends and patterns in investor sentiment, the objective is to give investors insightful information that can help their decision-making. Several methods, including Decision Trees, Random Forests, Logistic Regression, and Gradient Boosting, were implemented to find the most accurate sentiment analysis model. With an accuracy score of 85.02% among all, the Random Forest model came out as the most appropriate.

Keywords

Tokenization, Stocks, Sentiment Analysis, LSTM, YFinance, Gradient Boosting, Decision Trees, Random Forests, Logistic Regression, Stock Market & TextBlob.

Similar Data Points Identification With Llm: a Human-in-the-loop Strategy Using Summarization and Hidden State Insights

Xianlong Zeng¹, Yijing Gao², Ang Liu³, and Fanghao Song¹, ¹Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, ²Data Science Institute, Brown University, Providence, RI 02912, ³Global Urban Studies Program, Rutgers University, New Brunswick, NJ 08901

ABSTRACT

This study introduces a simple yet effective method for identifying similar data points across non-free text domains, such as tabular and image data, using Large Language Models (LLMs). Our two-step approach involves data point summarization and hidden state extraction. Initially, data is condensed via summarization using an LLM, reducing complexity and highlighting essential information in sentences. Subsequently, the summarization sentences are fed through another LLM to extract hidden states, serving as compact, feature-rich representations. This approach leverages the advanced comprehension and generative capabilities of LLMs, offering a scalable and efficient strategy for similarity identification across diverse datasets. We demonstrate the effectiveness of our method in identifying similar data points on multiple datasets. Additionally, our approach enables non-technical domain experts, such as fraud investigators or marketing operators, to quickly identify similar data points tailored to specific scenarios, demonstrating its utility in practical applications. In general, our results open new avenues for leveraging LLMs in data analysis across various domains.

Keywords

Large Language Model; Data Representation; Machine Learning.

Uncertainty-aware Seismic Signal Discrimination Using Bayesian Convolutional Neural Networks

Soma Datta Reddy and Sunitha Palissery, Earthquake Engineering Research Centre, IIIT Hyderabad, Hyderabad, India

ABSTRACT

Seismic signal classification plays a crucial role in mitigating the impact of seismic events on human lives and infrastructure. Traditional methods in seismic hazard assessment often overlook the inherent uncertainties associated with the prediction of this complex geological phenomenon. This work introduces a probabilistic framework that leverages Bayesian principles to model and quantify uncertainty in seismic signal classification by applying a Bayesian Convolutional Neural Network (BCNN). The BCNN was trained on a dataset that comprises waveforms detected in the Southern California region and achieved an accuracy of 99.1%. Monte Carlo Sampling subsequently creates a 95% prediction interval for probabilities that considers epistemic and aleatoric uncertainties. The ability to visualize both aleatoric and epistemic uncertainties provides decision-makers with information to determine the reliability of seismic signal classifications. Further, the use of Bayesian CNN for seismic signal classification provides a more robust foundation for decision-making and risk assessment in earthquake-prone regions.

Keywords

Seismic signal classification, Bayesian networks, Uncertainty quantification, Earthquake forecasting, Model trustworthiness.

Predictive Analytics for Pilot Training in Southern Africa

Sibusiso Mzulwiniand Tendani Lavhengwa, Department Informatics, Faculty Of Information And Communication Technology, Tshwane University Of Technology, City of Tshwane, RSA

ABSTRACT

This research investigates the strategic adoption of predictive analytics in pilot training in Southern Africa, specifically South Africa, Namibia, and Botswana. The goal is to enhance aviation safety by identifying and addressing pilot performance weaknesses through data-driven techniques. Employing a mixed-method approach that combines systematic literature review and digital trace data from authorities like the National Transportation Safety Board (NTSB), the study utilizes Natural Language Processing (NLP) and machine learning (ML) to analyze aviation incident reports. Key insights reveal patterns of pilot errors and operational risks, offering solutions through tailored training programs. The study addresses a nonempirical gap by applying the Diffusion of Innovations (DOI) framework to examine the adoption of predictive analytics, considering technological readiness and regulatory factors. Recommendations include standardized reporting, specialized training modules, and weather analytics integration. Ultimately, the study underscores predictive analytics' transformative potential in enhancing pilot training and safety in the Southern African aviation sector.

Keywords

Predictive Analytics, Digital Trace Data (DTD), pilot training, Diffusion of Innovations (DOI), Aviation Safety & Natural Language Processing (NLP).

Welcome to AIDD 2024