Keywords

1 Introduction

Recommendation systems play a crucial role in addressing the challenge of information overload and have garnered widespread attention. In recent years, numerous recommendation algorithms have been proposed, including collaborative filtering [18], content-based recommendation [15], and deep neural network-based recommendation [26]. Typically, these approaches recommend projects based on their correlation with other projects, aiming to identify those most likely to engage users. However, in real life, Human behavior is controlled by a series of potential causal models [10]. For example, in Fig. 1, users tend to buy Polaroid photo paper, battery, and album after purchasing Polaroid. This is because after purchasing Polaroid, users then have the idea of purchasing photo paper and other related items. Therefore, the Polaroid camera is the cause of purchasing other projects. From this example, we can see that causal relationship plays important role in the recommendation system. By understanding the causal relationship between different items, the recommendation system can make more precise and targeted suggestions, such that improving the overall effectiveness of the recommendation system and enhancing the user experience. In addition, by applying causal relationships, people can easily understand how the system generates specific recommendations and can have a better understanding and trust in the recommendation system. Hence, learning about causality in the recommendation process is crucial.

Fig. 1.
figure 1

Potential causal effects in people’s behavior

Currently, some scholars [7, 23] have recognized the importance of causal relationships and proposed methods for learning and discovering hidden causal relationships. They use data analysis to obtain the probability of mutual occurrence between variables, and assume that variables with a higher probability of co-occurrence have a causal relationship. However, the causal relationships obtained by this method may be incorrect, which can even affect their recommendation performance. For example, in Fig. 2, when the weather gets hot, the sales of ice cream and the number of drowning people will increase at the same time. By the existing methods, some of them will erroneously conclude that there is a causal relationship between them.

Fig. 2.
figure 2

Causal phenomena in nature

In order to solve this problem, we propose a new recommendation model called CDRS by applying causal relationships. In our new model, a causal structure learning method is proposed. We first postulate the causal diagram of the relevant items and filter out the irrelevant historical information. Then, we capture the causal relationship between user behaviors by fitting actual user behavior data and learning the causal diagram. With our new model, we can discover casual relationships more correctly and such that improve the recommendation performance.

The main contributions of this paper are summarized as follows:

  • We propose a method to capture the causal relationship between user behavior to enhance interpretability and improve the performance of recommendations.

  • We introduce a general recommendation framework called CDRS based on causal learning, which incorporates a causal discovery module into the recommendation model.

  • We conduct experiments on two real-world datasets, demonstrating the efficacy of our framework in enhancing recommendation performance.

2 Related Work

Causal inference is one of the core issues in statistics and data science, which refers to the process of inferring causal relationships when a certain phenomenon has already occurred. Causal inference has extensive applications in biomedical research, economics, management, and social sciences. It enables researchers to reveal causal relationships between variables and uncover the underlying mechanisms behind observed phenomena. Currently, research on causal inference mainly involves two directions: causal discovery and causal effect estimation. Causal discovery aims to identify causal relationships among variables, while causal effect estimation aims to estimate the magnitude of causal effects. The recommendation systems that apply the causal inference are known as causal recommendations. In this paper, we aim to propose a new causal discovery method to improve recommendation performance. Therefore, in this section, we first introduce the work related to causal discovery and then the causal recommendation.

2.1 Causal Discovery

Causal discovery aims to uncover causal relationships among variables from complex data. It involves identifying a graphical network structure that accurately describes the causal relationships between variables. Typically, this graphical structure is a directed acyclic graph (DAG).

The problem of learning directed acyclic graphs(DAGs) from data has garnered significant interest in recent years. Various methods, such as differentiable DAG learning [3] and graph neural networks [16], have been proposed to address this problem. Initially, Constraint-based approaches [12] rely on a set of conditional independence tests to identify causal relationships between variables. However, these methods have limitations in distinguishing the structure of Markov equivalence classes. Then, Score-based methods [5, 22] have been proposed, they use a score function to evaluate the fitting degree of different graphs to the data. These Score-based methods can only assess the impact of one variable on the dependent variable, and they cannot solve the real-world problem of multiple variables. To address these limitations, a hybrid method [14] was introduced which incorporates the likelihood framework into the causal function model to discover causal structures in multiple environments. All of the above-mentioned methods suffer the high complexity with the increase of searching of the graphs. To deal with the complexity problem, several methods have been proposed [4, 29]. For example, NOTEARS [29] formulates the structure learning problem as a continuous optimization problem on a real matrix, which is easier to handle the models that contain continuous variables. DAG-GNN [27] extends the NOTEARS algorithm to support nonlinear relationships, but it may be limited in high-dimensional settings.DAG-GAN [6] proposes a method of generating antagonistic networks to simulate the causal generation mechanism, but it does not ensure non-circularity. CASTLE [13] learns the DAG structure of all input variables and uses the direct parent variable of the target variable as a predictor in regression or classification tasks.Overall, the existing methods still face challenges in accurately discovering the causal relationships among variables from complex data.

2.2 Causal Recommendation

Currently, there are three main types of work in the causal recommendation.

The first category aims to eliminate biases in recommendation systems, including popularity bias [1] and exposure bias [11] by using causal inference. The inverse propensity score (IPS) [19] method has been widely used and shown good performance, where it first estimates propensity scores based on some assumptions and then uses inverse propensity scores to reweight samples. CauseE [2] performs two rounds of MF on a large biased dataset and a small unbiased dataset, using L1 or L2 regularization to enforce the similarity between the two factorized embeddings. MCER [25] captures popularity bias as the direct causal effect of predicted scores and eliminates it by subtracting the direct popularity bias effect from the total causal effect.

The second category focuses on improving recommendation performance. For example, CARS [24] proposes a causal data augmentation framework to generate new data, address data sparsity issues, and improve the ability of sequential and top-N recommendations. CauseREC [28] models the distribution of counterfactual data to learn accurate and reliable user representations. These user representations are designed to be less sensitive to noisy behaviors and more trustworthy for fundamental behaviors.

The third category focuses on improving the interpretability of recommendation systems. For example, CountER [21] generates provider-side counterfactual explanations by finding a minimal set of user historical behaviors. [20] can specify the complexity and strength of the explanation and seek simple and effective explanations for model decisions.

3 Preliminary

In this section, we first show our target problem and then briefly introduce the related definitions.

3.1 Target Problem

Let \(U=\left\{ u_{1},u_{2},...,u_{n} \right\} \) denote a set of users, \(V=\left\{ v_{1},v_{2},...,v_{m} \right\} \) denote a set of items, \(S=\left\{ \left( u_{k},v_{k}^{1},v_{k}^{2},...,v_{k}^{l_{i} } \right) \right\} _{k=1}^{N} \) denote the interaction set between users and items. The objective of our paper is to predict the next item \(v_{k}^{l_{i+1}}\) to be selected. Here, \(u_{i} \in R^{1\times q^{u} } \) is a \(q^{u}\)-dimensional vector representing the user’s latent characteristics, and \(v_{i} \in R^{1\times q^{v} } \) is a \(q^{v}\)-dimensional vector representing the item’s latent characteristics. The prediction formula is given as follows.

$$\begin{aligned} \textrm{p}\left( v_{k}^{l_{i+1}} \mid \left( u_{k}, v_{k}^{1}, v_{k}^{2}, \ldots v_{k}^{l_{i}}\right) \right) \end{aligned}$$
(1)

3.2 Relevant Definitions

Causal graphs, causal relationships, and causal inference are fundamental concepts in the field of causal reasoning. In addition, in our model, we employ a variational encoder for causal inference. Therefore, to better understand our paper, we will introduce the definitions of causal graphs, causal relationships, and variational encoder in this section.

Causal Graph. A causality diagram is a directed acyclic graph \(G=\left( V,E \right) \), where V is a set of nodes and E is a set of edges, used to analyze and express the causal relationships between variables. The diagram captures the direction and strength of the causal relationships between variables. The cause-and-effect diagram consists of three basic configurations, as shown in Fig. 3. The first configuration is a chain, shown in Fig. 3(a). The two arrow lines in Fig. 3(a) between the three variables have the same direction and one variable acts as the “cause" of the other variable through the intermediary variable. The second configuration is a fork, shown in Fig. 3(b). The two arrow lines in Fig. 3(b) extend from the same variable as the source and point to the other two variables, which are the common causes of the outcome. The third configuration is the inverse fork, shown in Fig. 3(c). The arrow lines in Fig. 3(c) point to one variable and from another variable, and the variable is the common result of them.

Fig. 3.
figure 3

Basic configuration of three components

Causal Relationship. Causality refers to the relationship between two variables (x and y) in statistical data, where the change of one variable (y) is caused by the change of the other variable (x). In this context, x is considered the cause and y the effect. The relationship between these two variables is known as a causal relationship. For instance, after buying a printer, it is likely that the consumer will also purchase printing paper and ink.

Causal Discovery. The objective of causal discovery is to identify and learn the underlying causal relationship structure among its internal variables from the observed dataset. Let \(X=\left\{ X_{1} ,X_{2},...,X_{m} \right\} \) denote a set of random variable samples, and each sample is represented by a d-dimensional random vector, where \(X\in R^{m\times d } \) and the causal relationships among them are represented by a causal graph G. \(A\in R^{m\times m }\) is the weighted adjacency matrix of the DAG of m node. The definition of causal discovery is as follows: \(A_{ij} =1\) represents the edge from \(X_{i}\) to \(X_{j}\) and \(X_{i}\) is the reason for \(X_{j}\). Conversely, \(A_{ij}=0\) represents no edge between the two variables. The purpose of causal discovery is to infer the structure A. In structure A, any two points can only have a unidirectional relationship, not a bidirectional one. That is, if m is the cause of n, it is impossible for n to be the cause of m.

Variational Automatic Encoder. As shown in Fig. 4, Variational Autoencoder (VAE) is a generative model that can learn the latent distribution of input data and generate new samples that are similar to it. Unlike traditional autoencoders, VAE not only learns the features of the data but also the underlying distribution, enabling it to generate new data samples. The core idea of VAE is to model the underlying distribution of input data by encoding and decoding the latent variables. In VAE, the encoder maps the input data \(X=\left\{ X_{1},X_{2},\cdots ,X_{n} \right\} \), to the mean \(\mu \) and variance \(\delta ^{2} \) of the latent space and samples a latent vector \(Z=\left\{ Z_{1},Z_{2},\cdots ,Z_{n} \right\} \) from this distribution. The decoder then maps the latent vector back to the input space to generate a new sample \(\hat{X} =\left\{ \hat{X} _{1},\hat{X} _{2},\cdots ,\hat{X} _{n} \right\} \).

Fig. 4.
figure 4

Variational autoEncoder

4 Methodology

4.1 Overview of Model Structure

To recommend projects based on causal relationships, we propose a framework consisting of two modules: a causal discovery module and a causal recommendation module. The framework is shown in Fig. 5. Initially, user and project datasets are used as inputs for causal discovery analysis. Causal discovery is then performed using a variational encoder to generate reconstructed samples from the dataset. Finally, the reconstructed samples are used as new inputs, and a neural collaborative filtering network is applied for a recommendation, using reconstructed samples and negative samples to provide customized recommendations for potential causal relationships. The specific content is as follows.

Fig. 5.
figure 5

The overall architecture of our method model

4.2 Causal Discovery

We propose two steps to achieve the goal of generating a new sample \(\hat{X}\) that conforms to the same probability distribution as X. The first step is to extract data Z from the potential variable space that follows the same distribution as X. The second step is to generate a new sample \(\hat{X}\) by sampling from Z. Next, we will introduce the detailed steps.

step1 (Coding Model): To ensure that the generated Z corresponds to the original X, we first establish a posterior distribution \(p\left( Z | X ^ {k} \right) \) specific to \(X ^ {k}\). We further assume it to be a normal distribution, i.e., \(p\left( Z | X ^ {k} \right) =N (0,I)\). Because directly computing \(p\left( Z | X ^ {k} \right) =N (0,I)\) is difficult, we use a variational posterior \(q\left( Z|X^{k}\right) \) to approximate it, which is also a normal distribution. Finally, to derive the variational posterior distribution \(q\left( Z|X^{k}\right) \), we establish a coding model for its implementation. The input data \(X^k\) is encoded by an encoder neural network into a latent variable Z with a density distribution of \(q\left( Z|X^{k} \right) \). The variational posterior distribution \(q\left( Z|X^{k}\right) \) is modeled as a Gaussian function with multiplicative coefficients. Specifically, the mean value of Z is multiplied by \(\mu _{Z} \in R^{m\times d}\), and the standard deviation is expressed by multiplying \(\delta _{Z} \in R^{m\times d}\). The module is shown below:

$$\begin{aligned} \begin{aligned} Z &:=\left[ \mu _{Z} ^{(k)}|\log _{}{\delta _{Z}^{(k)} } \right] \\ &:=\left( I-A^{T} \right) ReLU\left( \cdot \cdot \cdot \left( ReLU\left( X W_{1}^{(k)} \right) W_{2} \right) \cdot \cdot \cdot W_{D-1} \right) W_{D}^{(k)} \end{aligned} \end{aligned}$$
(2)

where X represents input data, \(W^{(k)}\) represents the weight of the neural network at level k, ReLU represents the modified linear unit, D represents the depth of the neural network, and \(A^{T}\) represents the transposition of the matrix A.

step2 (Generation Model): Because \(p\left( Z|X^{k} \right) \) belongs to \(X^{k}\), we train a generation model \(\hat{X}^{k}=g(Z)\), which restores \(X^{k}\) to \(\hat{X}^{k}\) from \(Z^{k}\) sampled from distribution \(p\left( Z|X^{k} \right) \). This model can accurately reconstruct each sample based on real user behavior data, encompassing the reconstruction of every element within \(X^{k}\). By leveraging this approach, we can achieve a more precise and accurate reconstruction of the user behavior data. To calculate the likelihood \(p\left( X^ {k} |Z \right) \), the decoder produces an average value of \(\mu _{X} \in R^{m\times d}\) and a standard deviation of \(\delta _{X} \in R^{m\times d}\). The generated model is defined as follows.

$$\begin{aligned} \begin{aligned} \hat{X} ^{\left( k \right) } &:= \left[ \mu _{X} ^{(k)}|\log _{}{\delta _{X}^{(k)} } \right] \\ &: =ReLU\left( \cdot \cdot \cdot \left( ReLU\left( ( I-A^{T})^{-1}Z W_{1}^{(k)} \right) W_{2} \right) \cdot \cdot \cdot W_{D-1} \right) W_{D}^{(k)} \end{aligned} \end{aligned}$$
(3)

where Z represents the input data, A is an \(m \times m\) matrix, \(W_1^{(k)}, W_2, \dots , W_{D}^{(k)}\) are parameters in the neural network that represent the weight matrix of each layer, D is the depth of the neural network, and \(\hat{X}^{(k)}\) is the output result of the neural network.

After training the above model, we minimized the certain loss by the Evidence Lower Bound (ELBO) as the loss function. ELBO consists of two parts: reconstruction loss and Kullback-Leibler (KL) loss. We set the ratio of Reconstruction loss to KL loss to 1:1, which can not only ensure the quality of the generated samples, but also introduce some noise to make the generated samples have certain generalization ability. The loss function is defined as follows.

$$\begin{aligned} L_{ELBO} =\frac{1}{n} \sum _{k=1}^{n}\left( L_{rec}^{N} + (-L_{KL}^{N}) \right) \end{aligned}$$
(4)

where \(L_{rec}^{N}\) is the reconstruction loss and \(L_{KL}^{N}\) is the KL loss.

The reconstruction loss is a measure of the dissimilarity between the input and the output, which aims to minimize the reconstruction error. It can be expressed as follows: \(L_{rec}^{N} =\frac{1}{n}\sum _{i=1}^{n} \left\| X-\hat{X}\right\| ^{2}\), where X is the input data and \(\hat{X}\) is the output from the model. On the other hand, the KL loss quantifies the divergence between the prior distribution and the approximate posterior distribution, which aims to regularize the latent space. The formula for kl loss is expressed as follows.

$$\begin{aligned} L_{KL}^{N} =\frac{1}{2} \sum _{i=1}^{n} \sum _{j=1}^{d} (\delta _{Z} ) _{ij}^{2}+(\mu _{Z} ) _{ij}^{2}-2\log _{}{ \left( \delta _{Z} \right) _{ij} } -1 \end{aligned}$$
(5)

where n is the batch size, d is the dimension of the potential variable Z, and \(\delta _{Z}\) and \(\mu _{Z}\) represents the standard deviation and mean value of Z.

In addition, We need to ensure that the learned DAG is acyclic. Therefore, we add the following acyclic constraint:

$$\begin{aligned} tr\left[ \left( I+\alpha \hat{A} \circ \hat{A} \right) ^{n} \right] -n=0 \end{aligned}$$
(6)

where tr represents the trace of the matrix, \(\hat{A}\) represents a normalized adjacency matrix, \(\circ \) represents the Hadamard product (i.e., multiplication by element), n is the size of the adjacency matrix, \(\alpha \) is a hyperparameter, and I is the identity matrix.

The interpretation of Eq. 6 can be understood in the following way: if the trace of the power of n of (\(I+\alpha \hat{A} \circ \hat{A}\)) equals n, then the DAG topology is legal and all nodes can be accessed. Otherwise, if the trace of the n power of (\(I+\alpha \hat{A} \circ \hat{A}\)) is less than n, it indicates that some nodes are unreachable or there are rings, which can lead to instability and overfitting of the model.

In summary, the overall objective of optimization is shown as follows.

$$\begin{aligned} L_{DAG} =-L_{ELBO} +\lambda \left( tr\left[ \left( I+\alpha \hat{A} \circ \hat{A} \right) ^{n} \right] -n \right) +\varsigma \left\| \left\{ W_{d}^{(n)} \right\} _{d=1}^{D} \right\| _{2}^{2} \end{aligned}$$
(7)

where \( \varsigma \) is the regularization coefficient, and \(W_{d}^{(n)}\) is the d-th weight matrix of the n-th layer. Overfitting is avoided by punishing large weight values.

4.3 Recommendation Based on Causality

In this section, we will present a method to enhance recommendation accuracy by utilizing causal preferences obtained through causal discovery. First, we conduct negative sampling that involves a characteristic of \(u_i\) for users, and compute the user’s expected preference E. Next, we calculate the similarity between the user and other projects, and select the project \(v_j\) with the lowest similarity. Then, We replace the original user characteristics with data acquired through causal discovery as input. Finally, we input the negative samples along with the modified data into the model and train the neural network to obtain prediction results. To optimize the model, we use Bayesian personalized ranking loss to maximize user preference for positive samples over negative samples. The optimization objective can be formulated as follows.

$$\begin{aligned} L_{1}(O)= -\sum _{i=1}^{N} \delta \left( f(u,v)-f(u,v^{''} ) \right) \end{aligned}$$
(8)

where u is a user, v is a positive sample, \(v ''\) is a negative sample and \(\delta \) is the sigmoid activation function.

The formula after adding the causal discovery module can be converted into the following form:

$$\begin{aligned} L_{2}(O)= -\sum _{i=1}^{N} \delta \left( f(\hat{u}_{i} ,\hat{v}_{i})-f(\hat{u}_{i},v_{i}^{''} ) \right) \end{aligned}$$
(9)

where \(\hat{u}_{i}\) and \(\hat{v}_{i}\) are the reconstructed samples.

The loss function consists of two parts, one is the loss caused by causal discovery, and the other is the loss caused by prediction. The form is shown as follows.

$$\begin{aligned} L=L_{DAG} +L_{2} (O) \end{aligned}$$
(10)

5 Experiments

In this section, we conduct experiments to show the effectiveness of the proposed framework. Our experiment aims to answer the following three essential questions:

RQ1. How does the performance of the CDRS compare to that of the baseline models?

RQ2. Does the application of the causal discovery module general framework to the recommendation system model have a certain effect on improvement?

RQ3. Can our methodology provide interpretability for the recommended results?

5.1 Experimental Settings

Datasets. We selected two real-world datasets, namely Cloud Theme Click and MovieLens10M, for our experiment. Cloud Theme Click is an e-commerce dataset collected from the Alibaba Taobao application, which captures click records from users and items in different purchase scenarios, such as ’what to bring for traveling’ and ’how to dress for a party’. Cloud Theme Click includes user purchase history in the month preceding the promotion. MovieLens10M is a movie dataset containing ratings provided by users for each movie, as well as user characteristics such as gender, age, and occupation. Gender is represented as a binary feature, the occupation has 21 categories, and age is divided into 7 groups based on age range. Each movie is characterized by its ID, title, category (with 20 categories available), and year (which is divided into 18 categories) (Table 1).

Table 1. Statistics of datasets.

Baselines. To demonstrate the effectiveness of the causal discovery module in our model, we conducted experiments and compared it with six baseline models. BPR, GRU4Rec, and NeuMF models are commonly used standard models for recommendation systems, while IPS, DICE, and CauseE are standard models for causal reasoning in recommendation systems. The selection of these baselines allowed us to more accurately assess the additional value of our causal discovery module and to perform a more comprehensive comparison with state-of-the-art methods.

BPR [17] is a personalized sorting algorithm based on Bayesian posterior optimization.

NeuMF [8] is a popular deep learning-based recommendation algorithm. It combines traditional matrix decomposition with multi-layer perceptron and can simultaneously extract low-dimensional and high-dimensional features.

GRU4Rec [9] is a recommendation algorithm that uses session information and gated recurrent unit (GRU) to provide recommendations.

IPS [19] is a classical reverse probability weighting method for dealing with selection bias in observed data. The algorithm is based on the probability of sample selection by weighting the observed data so that the intervention effect can be estimated more accurately when dealing with causality.

CausE [2] is a machine learning-based causal inference method that uses probabilistic graphical models and machine learning methods to represent causal relationships as directed edges, learn the probability distribution of each node in the causal graph, and estimate the effect of the intervention using causal inference.

DICE [30] is a causal inference algorithm based on interpolation methods for inferring causal relationships from observed data. The algorithm uses the local structure of the causal graph to estimate the intervention effect by interpolation, and the global structure of the causal graph to control the accuracy of interpolation.

Evaluation Metrics. We have chosen two widely adopted evaluation metrics, namely Normalized Discounted Cumulative Gain(NDCG) and Recall, to assess the recommendation accuracy. Specifically, we computed the NDCG and Recall scores for the top 20 and top 25 recommended items, and the higher the score, the better the recommendation performance.

5.2 Overall Performance Comparison(RQ1)

Tables 2 and 3 present the overall performance of our proposed model and the six selected baselines. Table 2 displays the experimental results of the Cloud Theme Click dataset, while Table 3 presents the results of the MovieLens10M dataset. The best performance values are highlighted in bold in both tables.

The results demonstrate that our proposed method outperforms the other baselines in both datasets, exhibiting superior precision and NDCG scores, which validates the effectiveness of our framework.

Among the baselines that do not apply causal reasoning, i.e., BPR, NeuMF, and GRU4Rec, the BPR model performed the worst due to its shallow and simple structure; NeuMF model has improved performance compared to the BPR model because it incorporates a neural network in its model; and GRU4Rec model perform the best among them because the use of GRU. IPS, CausE, and DICE are all causal inference-based models for recommendation systems that address the bias inherent in traditional recommendation systems to achieve unbiased learning. Overall, these three methods achieve better performance compared with that of the methods that do not apply causal reasoning, demonstrating that incorporating causal inference into recommendation systems can improve the performance of the model. Our approach focuses on causal discovery, enabling the model to learn user causal preferences and ensuring that the training and test sets are independent and identically distributed, thus resulting in significant improvement in performance.

Table 2. Experimental Results on Cloud theme click Dataset.
Table 3. Experimental Results on MovieLens10M Dataset.

5.3 Effect of Causal Discovery Module(RQ2)

In order to study the effectiveness of our causal discovery module, we incorporated our proposed causal discovery module framework into two baseline methods, namely BPR and NeuMF, and conducted experiments on this basis, comparing the results with the baseline model. The dataset used is MovieLens10M. NDCG and Precision indicators are used to evaluate the performance of recommended projects. The experimental results are shown in Fig. 6. The first figure shows the results of the NDCG and Recall of the BPR model and the BPR-CauD model configured with a causal discovery framework on top 20 and top 25, respectively. The second diagram is the result of NeuMF and NeuMF CauD model configured with a causal discovery framework. From the result, we can see that the performance of models configured with a causal framework has improved compared with that of the baseline model. The results show that the addition of the causal discovery module improves overall performance, verifying the effectiveness of our new module. Compared to traditional recommendation algorithms, causal discovery frameworks can better explain recommendation results and discover causal relationships between user behaviors.

Fig. 6.
figure 6

Experimental results of two different models of BPR and NeuMF.

Fig. 7.
figure 7

Some causal structures exist in the movie dataset. The green part represents a false causal relationship, while the blue part represents an existing causal relationship. (Color figure online)

5.4 Interpretable Recommendation Structure(RQ3)

Our model provides reliable explanations because it considers the potential causal relationships that govern human behavior. Taking the movie dataset as an example, as shown in Fig. 7, when people choose movies, they often assume that the popularity of a movie influences their selection - the more popular the movie, the more likely it is to be chosen. However, this is a false causal relationship, as popular movies do not necessarily correspond to personal preferences. People’s movie choices are determined by various factors, such as the director, the cast, and online ratings. Our model uses causal inference to discover and learn causal relationships, enabling it to provide reliable explanations and make accurate recommendations.

6 Conclusion

This paper proposes a novel approach to improve the performance and interpretability of recommendation models by capturing the causal relationships between user behavior. Specifically, we propose a general recommendation framework based on causal learning, which incorporates a causal discovery module into the recommendation model. By leveraging causal learning, our approach provides a more robust and accurate recommendation model that is better able to explain the underlying causal relationships between variables. We conducted a series of experiments to demonstrate the effectiveness of our framework in improving recommendation performance. The results suggest that the incorporation of causal learning into the recommendation model leads to significant improvements in accuracy and interpretability. In summary, our proposed framework highlights the importance of causal learning in improving the performance and interpretability of recommendation models, and provides a promising avenue for future research in this area.