GDPR-Compliant Data Breach Detection: Leveraging Semantic Web and Blockchain

Ansar, Kainat; Ahmed, Mansoor; Khalid, Muhammad Irfan; Helfert, Markus

doi:10.1007/978-3-031-60328-0_1

Kainat Ansar¹⁴,
Mansoor Ahmed¹⁵,
Muhammad Irfan Khalid¹⁶ &
…
Markus Helfert¹⁵

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 990))

Included in the following conference series:

World Conference on Information Systems and Technologies

13 Accesses

Abstract

Insider attacks are becoming common and have a significant financial impact on organizations. Insider threats come from within the targeted organization, and insider attacks are carried out by users who have been granted access to the target organization’s network, apps, or databases. An attacker with administrative capabilities can edit logs and login records to remove traces of the attack, making insider attacks difficult to detect. Such data breaches may severely negatively influence the life of the Data Owner. Creating a mechanism for quickly identifying data breaches is still essential and difficult. The General Data Protection Regulation (GDPR) has established processes and guidelines to address data privacy issues. Due to this, when a data breach occurs, the Data Controller is required under the GDPR implementation to notify the Data Protection Authority. To address these problems, this article proposes a GDPR-compliant data breach detection system with a severity assessment mechanism using the semantic web and blockchain technology. The suggested method can generate alert notifications for each data breach. Consequently, with the help of the severity assessment mechanism, the proposed model conducts a breach assessment to indicate the data breach’s severity level.

Download conference paper PDF

Keywords

1 Introduction

For every organization, a data breach is a serious issue. Any incident in which data is seen, deleted, altered, or transferred by an unauthorized party or an authorized person unintentionally or intentionally is referred to as a data breach [1]. A data breach can be caused by a variety of factors, including hardware issues, software crashes, phishing, malware, ransomware, distributed denial-of-service, human error, misplaced or lost data storage devices (such as USB drives, laptops, portable drives, and so on), malicious insiders, and external issues such as power outages [2]. However, our research focuses on malicious insider threats.

Insider data breaches are growing more common and have a higher financial impact on organizations. A recent report states that insider threats are responsible for 60% of data breaches [3]. An insider is typically someone who has allowed access to company resources and intentionally or unintentionally damages the company. Current or former employees, contractors, partners, or employees who have access to an organization’s systems or data may pose a threat to them [4].

Since the General Data Protection Regulation (GDPR) went into effect in May 2018, there has been a paradigm shift in data privacy [5]. The GDPR specifies processes and rules to address the challenges of insider threat and data protection. As a result, when a data breach occurs, the Data Controller (DC) must notify the Data Protection Authority (DPA) and the affected Data Owner (DO). He will face significant fines if he fails to notify the breaches within a particular time frame. According to GDPR, organizations that experience a data breach may be fined up to 4% of their annual revenue, or €20 million, whichever is greater [6]. Such a system that can detect data breaches is required to avoid severe penalties.

On the other hand, in contrast to traditional internet technology, which simply provides a “network of information,” blockchain is a cutting-edge technology that provides a “network of value” [7]. Ethereum blockchain utilizes particular languages such as solidity [8] to become fully programmable, allowing the building of modern decentralized applications. These decentralized applications use smart contracts. Smart contracts are coding scripts that enable users to execute transactions without the possibility of fraud or third-party interference [9].

To address the issue of data breaches, this article developed a GDPR-compliant detection system that takes advantage of the semantic web, smart contracts, and blockchain technologies. The processes of the proposed system methodology and the functioning of smart contracts are detailed in the subsequent sections.

2 Related Work

Malicious insider threats have recently been identified as one of the organizations’ most harmful breach attacks. Data breaches are security incidents that occur when an attacker gains access to a company’s network, application, or database and performs malicious activity. Numerous studies have been conducted to solve this issue.

In [10], the authors presented a data leakage prevention system. Authors employed document semantic signatures to detect breaches. When the semantic signatures of the outgoing document match those of the original document, the system detects data leaks. A sensitive file, however, can avoid detection if an attacker encrypts it and sends it via email. In this circumstance, the detection system cannot recognize the encrypted data as a sensitive file. As a result, sensitive data may be leaked. In [11], an anomaly detection model is proposed for database protection. The Hidden Markov Model (HMM) was utilized for prediction, and the authors achieved minimal false-positive rates. The HMM-based system, on the other hand, is dependent on the training dataset. If the training dataset is insufficient, the system may generate false-positive alarms.

The authors suggested a three-tiered data protection strategy in [12] in response to the information leakage concern created by cloud indexing. However, the process requires a pre-defined data classification. Data that has been misclassified may be leaked. The authors of [13] presented a data leak prevention strategy based on Named Entity Recognition (NER). However, the approach did not use semantic technologies to provide meaning to entities. As a result, spelling errors and related words could impact NER.

To detect insider attacks in relational database systems, the authors proposed a blockchain-based framework in [14]. However, the authors’ solution only addresses the private data and centralized control system, in which a private blockchain network is built in a privately controlled environment with no democratic participants. Furthermore, regardless of whether a network is built on blockchain technology, attackers can manipulate any data or network within an organization. Storing all proof within the same centralized controls or system can increase the attack risk. The authors employed a private blockchain network, meaning anyone with access to that company can modify the private blockchain network even if the entire organization is compromised.

In [15], a blockchain-based event-driven data alteration detection system is presented. However, the model described in the study does not clarify how the framework will function technically. Furthermore, the paper lacks any practical application examples or solutions. It is important to note that storing any data in any structure mandates using a smart contract, yet this approach does not share any knowledge of the smart contract. Furthermore, this paper does not specify how to keep data evidence or fingerprints on the blockchain network. Due to the lack of an appropriate structure, this approach will be ambiguous and impractical. Furthermore, existing research lacks GDPR-compliant practical methods for data leak detection that Data Protection Authorities and Data Controllers can use to determine if it is required to notify affected Data Owners.

To summarize the above discussion, we find that existing blockchain-based data breach detection solutions have several limitations. As a result, developing a system capable of addressing the issues mentioned above is challenging. Considering the limitations of previous studies, we present a novel Personal Data Breach Detection (PDBD) technique in this paper. The following are the main contributions of our work.

i
A GDPR-compliant PDBD model is developed. It will enable the DC to quickly determine the necessary mitigation measures for data breach events.
ii
Semantic Web Rule Language (SWRL) rules are developed for the Data Breach Severity Assessment (DBSA) mechanism. This will result in providing the DC with a computable tool to assess the severity of data breaches. It will also help the DC in the process of notifying about breaches accordingly to the data protection authorities and the affected DO.
iii
Severity level detection ontology is developed to calculate breach severity index score. Also, ontology will indicate breach severity level using SWRL rules.
iv
Hash Variance Algorithm (HVA) is introduced to reduce the computational overhead of both DBSA and Ethereum.

3 Use Case Scenario

The use case scenario for the health industry is discussed in this part to show how the system performs. In current hospitals, collecting and processing personal data from patients has become mandatory. Almost every hospital department handles protected health information and personally identifiable information about patients. It is hard to recover privacy or restore psychosocial damage when an insider attack discloses a patient’s private information. Furthermore, compromised information can interfere with hospital operations and negatively impact the health and well-being of the patient. If immediate treatment is not received, this condition may result in death or permanent disability due to these operating delays.

The use case scenario assumes that John, the data processor, is a medical specialist who frequently requests patients’ medical records for operational needs, and Michael, the Data Owner, is the patient. Michael, the patient, agrees with having her medical data preserved on the blockchain. John can get the required patient data from the patient database by submitting a request to Robert, the Data Controller. Robert uses our proposed system for tasks like data verification and consent validation. Before providing John with any data, our recommended approach allows Robert to detect any alterations to the database record and confirm its authenticity.

3.1 System Design and Methodology

Figure 1 illustrates the proposed data breach detection model and its components with operation flow on DBSA and Ethereum layers. The main components of the proposed model are as follows.

Data Consumer:

Supposedly trusted third parties or data consumers are important entities of the proposed model that request Data Owners’ personal information. For instance, a surgeon who frequently seeks patients’ medical records for operation purposes. (as discussed in the previous section)

Ethereum:

Ethereum is a blockchain-based platform. Blockchain technology is the collection of blocks containing transaction data linked to each other in a chain. It is a digital ledger that is secure, cryptography-based, and distributed across a network. And this ledger is such of a kind that allows your transactions to be secure, anonymous, fast, and without any central authority. We have used the Ethereum (ETH) network with shared database records in this proposed model. Intending not to store all the data on the blockchain, we create a Cell Signature (CS) against each data table cell and only store that on the blockchain. These cell signatures are generated using the SHA256 [16] for each cell in a table. SHA256 is a cryptographic hash function. As such, it is practically impossible to reverse it and find a message or data that hashes to a given digest. For each row in the table, we generate cell pointer CnRn. For example, in Table X, row 1 (R1) has N columns, and the Cell Pointer (CP) will be generated as shown in Eq. 1.

$$\begin{aligned} Table X\_CP\_R1 = R1C1(FLH), R1C2(FLH), ..., R1CN(FLH) \end{aligned}$$

(1)

The sequence of CP with cell signature is depicted in Fig. 2. These CPs are then stored on a blockchain using a private key. Any modifications to a CP get logged on the blockchain with a new cell signature of the respective row. Any previous CPs of the modified data cell are also preserved in the blockchain.

HVA:

In the previous phase, we created a cell signature for each data cell and stored this CP on the blockchain. The next phase is the Hash Variance Algorithm (HVA) phase. Cell signatures created using SHA256 in the previous (ethereum phase) will serve as inputs to this phase. The function of the HVA mechanism is shown in Algorithm 1. This can reduce the computational overhead of both DBSA and Ethereum and increase the system throughput. This phase mainly utilizes semantic web technologies such as SPARQL, SWRL rules, reasoning engine, and Jena framework to fetch calculated cell signatures from the previous phase and calculate runtime cell signatures of CPs by fetching shared records from the shared part of database applications by using the SPARQL query. In other words, we need to calculate the difference between CPs (Ethereum and shared database) and then compare them one by one according to the Fixed-Length Hash (FLH) threshold set in advance. If a difference is found, it will be considered as an attack and modified record.

DBSA: The above HVA phase has described the basic structure and mechanism of data breach detection. Based on the above methodology, the calculated output value of the HVA phase can be forwarded. It is necessary to forward the final output of HVA to the DBSA phase to calculate the severity score. In this phase, severity level detection ontology is developed to calculate the breach severity index score. The presented DBSA mechanism in this synopsis uses severity assessment methodology [17] provided by the European Union Agency for Network and Information Security (ENISA). ENISA introduced a severity level assessment formula [17] to calculate the overall severity score, which is shown below in Eq. 2.

$$\begin{aligned} Severity\_level\_score= DPF * ER+ SB \end{aligned}$$

(2)

where DPF is a data processing factor, ER is the ease of recognition, and SB denotes a breach situation. Furthermore, DPF includes classified breached data as simple, behavioral, financial, and sensitive. ER evaluates how easily a certain person is identified using breached data. The ER can be negligible, limited, significant, or maximum. Whereas SB includes malicious intents and security loss in terms of confidentiality, integrity, and availability.

The methodology, as mentioned above, is implemented in the DBSA phase. Severity level detection ontology is developed to apply these guidelines using the recommended methodology [17]. The main classes and subclasses are shown in Fig. 2. In addition, SWRL rules are developed to indicate the data breach’s severity level. However, two rules are modeled for the proof of concept, as shown below.

Rule1: Affected_DO(?ado), Breach_Detected(?bd)(?ado dc: hasSB Min)

(?ado dc:hasER Negligible)(?ado dc: hasDPF Simple) —>

(?bd dc: setFlag Low)

Rule2: Affected_DO(?ado), Breach_Detected(?bd)(?ado dc: hasSB Confidentiality_loss)(?ado dc:hasER Maximum)(?ado dc: hasDPF Sensitive) —>

(?bd dc: setFlag High)

4 Conclusion and Future Works

Data Controllers are obliged to implement measures that will facilitate compliance with GDPR and notify the Data Protection Authorities and every affected party (data owner) in case of any data breaches or possible risk of data privacy violation with undue delay (72 h). Failure to issue a breach notification within time can result in a heavy fine. However, the ability to effectively detect a data breach is still a critical issue and challenging task. Thus, the Data Controllers must have an efficient system for detecting data breaches within time, along with severity level, and in an appropriate way to manage the personal information within organizations and smart devices. This paper presented a novel semantic-blockchain-based model for rapid data breach detection to protect personal data from breaches and reduce direct and indirect data damage that prevents direct and indirect personal data damage. The proposed model generates alerts against data breaches by taking into account severity assessment details and grading the breach incident according to the Data Owner’s impact and the significance of the breach. In the future, we will implement this system using semantic web and blockchain technologies.

References

Firman, A., et al.: Why does medical confidentiality matter during the Covid-19 pandemic? A case study from regulations in Indonesia. J. Legal Ethical Regul. Issues 24(1), 13 (2021)
Google Scholar
What Constitutes a GDPR Data Breach? Definition & Meaning. https://sectigostore.com/blog/what-constitutes-a-gdpr-data-breach-definition-meaning/. Accessed Oct 2022
Insider Threats Are Becoming More Frequent and More Costly. https://www.idwatchdog.com/insider-threats-and-data-breaches/. Accessed Oct 2022
2022 Ponemon Cost of Insider Threats Global Report. https://www.proofpoint.com/us/resources/threat-reports/cost-of-insider-threats. Accessed Oct 2022
General Data Protection Regulation (GDPR). https://en.wikipedia.org/wiki/General_Data_Protection_Regulation. Accessed Apr 2021
Article 83 European Union General Data Protection Regulation (GDPR): General conditions for imposing administrative fines. http://www.privacy-regulation.eu/en/article-83-general-conditions-forimposing-administrative-fines-GDPR.htm#5. Accessed Apr 2021
Farhan, H.K.: Blockchain: Transforming the Fourth Industrial Revolution. Global Foundation for Cyber Studies and Research (2020)
Google Scholar
Dannen, C.: Bridging the blockchain knowledge gap. In: Introducing Ethereum and Solidity. Apress, Berkeley, Springer (2019). https://doi.org/10.1007/978-1-4842-2535-6_1
Alotaibi, S.J.: Using blockchain for smart contracts. In: Innovative and Agile Contracting for Digital Transformation and Industry 4.0, pp. 208–221. IGI Global (2021)
Google Scholar
Alhindi, H., Traore, I., Woungang, I.: Preventing data leak through semantic analysis. Internet Things 14, 100073 (2021)
Article Google Scholar
Fadolalkarim, D., Bertino, E., Sallam, A.: An anomaly detection system for the protection of relational database systems against data leakage by application programs. In: 36th International Conference on Data Engineering (ICDE). IEEE (2020)
Google Scholar
Squicciarini, A., Sundareswaran, S., Lin, D.: Preventing information leakage from indexing in the cloud. In: 2010 IEEE 3rd International Conference on Cloud Computing. IEEE (2010)
Google Scholar
Gómez-Hidalgo, J.M., et al.: Data leak prevention through named entity recognition. In: 2010 IEEE Second International Conference on Social Computing. IEEE (2010)
Google Scholar
Srivastava, S.S., et al.: Verity: blockchains to detect insider attacks in DBMS. arXiv preprint arXiv:1901.00228 (2019)
Srivastava, S., Kumar, A., Jha, S.K., Dixit, P., Prakash, S.: Event-driven data alteration detection using block-chain. Secur. Privacy 4(2), e146 (2021)
Article Google Scholar
Handschuh, H., van Tilborg, H.C.A.: SHA Family (Secure Hash Algorithm) (2005)
Google Scholar
Manson, C.G., Gorniak, S.: Recommendations for a methodology of the assessment of severity of personal data breaches. ENISA (European Union Agency for Network and Inform. Security) Working Document, v1. 0 (2013)
Google Scholar

Download references

Funding

This research was conducted with the financial support of Science Foundation Ireland under Grant Agreement Nos. [13/RC/2106_P2] and [20/SP/8955] at the ADAPT SFI Research Centre at Maynooth University. ADAPT, the SFI Research Centre for AI-Driven Digital Content Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme

Author information

Authors and Affiliations

Department of Computer Science, COMSATS University, Islamabad, Pakistan
Kainat Ansar
ADAPT Centre, Innovation Value Institute, Maynooth University, Maynooth, Ireland
Mansoor Ahmed & Markus Helfert
Faculty of Computing and Information Technology, Department of Information Technology, University of Sialkot, Sialkot, Pakistan
Muhammad Irfan Khalid

Authors

Kainat Ansar
View author publications
You can also search for this author in PubMed Google Scholar
Mansoor Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Irfan Khalid
View author publications
You can also search for this author in PubMed Google Scholar
Markus Helfert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kainat Ansar .

Editor information

Editors and Affiliations

ISEG, Universidade de Lisboa, Lisbon, Portugal
Álvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, Porto, Portugal
Fernando Moreira
Institute of Information Technology, Lodz University of Technology, Łódz, Poland
Aneta Poniszewska-Marańda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ansar, K., Ahmed, M., Khalid, M.I., Helfert, M. (2024). GDPR-Compliant Data Breach Detection: Leveraging Semantic Web and Blockchain. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Poniszewska-Marańda, A. (eds) Good Practices and New Perspectives in Information Systems and Technologies. WorldCIST 2024. Lecture Notes in Networks and Systems, vol 990. Springer, Cham. https://doi.org/10.1007/978-3-031-60328-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-60328-0_1
Published: 16 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60327-3
Online ISBN: 978-3-031-60328-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

GDPR-Compliant Data Breach Detection: Leveraging Semantic Web and Blockchain