Welcome to AIDD 2024

2nd International Conference on AI, Data Mining and Data Science (AIDD 2024)

October 05 ~ 06, 2024, Virtual Conference



Accepted Papers
Uncovering Greenwashing: a Study on Corporate Sustainability Reports and Public Sentiment

Charlott Jakob

ABSTRACT

Greenwashing is an increasingly widespread problem in society. It is difficult to detect due to deceptive strategies and knowledge disparity between companies and the public. Companies can either present genuine or misleading statements about their efforts in their sustainability reports (SRs). Negative public opinion might be an incentive for companies to obscure certain statements in an attempt to counteract negative headlines. In this study, we investigate the link between a company’s public image and its SR. Analysing the top 60 companies in Germany, we explore the connection between linguistic text features in their SRs and the sentiment towards them in media. The paper reveals a lack of significant feature differences between positive and negative sentiment. A single significant difference was found, indicating that companies with negative public sentiment use negations more frequently than companies with predominantly positive public sentiment. This research shows that while detecting signs of greenwashing is possible, it remains a challenging task.

Keywords

Greenwashing, Sentiment Analysis, Sustainability Reports, Natural Language Processing, Public Perception.


Facilitating Stock Recommendations Through Sentiment Analysis

Shlok Bhura, Tanish Bhilare, Rylan Nathan Lewis and Dr. Kavita Kelkar, Department of Computer Engineering, K.J. Somaiya College of Engineering, Mumbai, India

ABSTRACT

Sentiment analysis is a relatively new method of stock recommendation that assesses news articles, social media feeds, and other information sources to ascertain investor sentiment towards a particular stock using machine learning and natural language processing. The model suggests whether to buy, hold, or sell the stock based on sentiment analysis. By emphasising trends and patterns in investor sentiment, the objective is to give investors insightful information that can help their decision-making. Several methods, including Decision Trees, Random Forests, Logistic Regression, and Gradient Boosting, were implemented to find the most accurate sentiment analysis model. With an accuracy score of 85.02% among all, the Random Forest model came out as the most appropriate.

Keywords

Tokenization, Stocks, Sentiment Analysis, LSTM, YFinance, Gradient Boosting, Decision Trees, Random Forests, Logistic Regression, Stock Market & TextBlob.


Similar Data Points Identification With Llm: a Human-in-the-loop Strategy Using Summarization and Hidden State Insights

Xianlong Zeng1, Yijing Gao2, Ang Liu3, and Fanghao Song1, 1Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, 2Data Science Institute, Brown University, Providence, RI 02912, 3Global Urban Studies Program, Rutgers University, New Brunswick, NJ 08901

ABSTRACT

This study introduces a simple yet effective method for identifying similar data points across non-free text domains, such as tabular and image data, using Large Language Models (LLMs). Our two-step approach involves data point summarization and hidden state extraction. Initially, data is condensed via summarization using an LLM, reducing complexity and highlighting essential information in sentences. Subsequently, the summarization sentences are fed through another LLM to extract hidden states, serving as compact, feature-rich representations. This approach leverages the advanced comprehension and generative capabilities of LLMs, offering a scalable and efficient strategy for similarity identification across diverse datasets. We demonstrate the effectiveness of our method in identifying similar data points on multiple datasets. Additionally, our approach enables non-technical domain experts, such as fraud investigators or marketing operators, to quickly identify similar data points tailored to specific scenarios, demonstrating its utility in practical applications. In general, our results open new avenues for leveraging LLMs in data analysis across various domains.

Keywords

Large Language Model; Data Representation; Machine Learning.