A Simple and Efficient Data Hiding Method with Error Detection and Correction (2024)

1. Introduction

With the advancement of technologies, such as Virtual Reality (VR), Augmented Reality (AR), and cloud computing, there is a growing need for faster data transfer speeds and lower latency, so researchers have turned their attention to the development of Fifth Generation (5G) communication technology. The research and development of 5G technology have significantly enhanced the speed and efficiency of information transmission. As 5G technology continues to evolve and becomes more widely spread, the demand for information security has also steadily increased. Data hiding (DH) technology consistently remains a popular and enduring topic in the field of information security.

Data hiding technology can be broadly categorized into two major classes based on the extent of recoverability of cover images: reversible data hiding (RDH) [1,2,3,4,5,6,7] and irreversible data hiding (IRDH) [8,9,10,11,12,13,14,15,16,17]. As the names suggest, RDH techniques allow for the lossless restoration of the cover image after extracting the hidden data, whereas IRDH techniques introduce irreversible distortions to the cover image during the embedding of secret data. RDH techniques, in their pursuit of lossless cover image recovery, allocate extra space during the data embedding process to store auxiliary information required for image restoration. IRDH is a better choice when there is no special requirement for the quality of cover images. IRDH techniques compromise image quality in exchange for increased capacity to embed data.

However, whether it is reversible or irreversible data hiding technology, most existing DH methods have primarily been designed under the ideal assumption that the stego image remains unaltered during transmission. There are relatively few methods that enable the recipient to perform correctness checks and error correction on the extracted data. In digital communications, the reliability of data transmission is critical, and it is worth sacrificing some embedding capabilities to introduce redundant information in order to detect and correct errors.

In the field of communication, various techniques have been developed to detect and correct errors in data transmission. These techniques include parity code, Hamming code [18,19,20], cyclic redundancy check (CRC) [21,22,23], and many more. Parity code is one of the earliest error-checking codes, originally used in telegraph communication during the early 20th century. It ensures that the number of ones in the data is either odd or even by adding a parity bit (odd or even parity) to the data. This way, if a bit error occurs during transmission or storage, parity codes can detect it. In the mid-20th century, Hamming code was developed and has become widely used as an error-correcting technique. Hamming code arranges the original data bits by following specific rules to generate and append redundant check bits. At the receiving end, these check bits are used to detect errors. If only one error occurs in the data, the receiving end can analyze the check bits to determine the location of the error and enable the ability to correct it. CRC code employs a polynomial encoding technique by attaching a polynomial checksum to the data for error detection.

This paper introduces a straightforward yet effective IRDH technique that allows the receiver to identify errors in the extracted data and correct erroneous bits. The data hider begins by grouping the secret data and recording the number of bits in each group as an indicator. Then, it encodes the secret data, ensures that each group has an odd number of bits, and that there is more than a bit. This is performed to ensure that the principle of majority votes can be applied to validate data when extracting secret data. In case a group contains only one bit, the single bit is duplicated, resulting in having three bits of data instead. To facilitate information extraction, verification, and error correction, the indicators are further encoded using a (7,4) Hamming code to guard against indicator errors. Subsequently, the receiver first extracts the indicators and verifies their accuracy. The extracted data streams are then grouped according to the indicators. Since all the elements within a group should be the same, errors can be easily detected and the incorrect data can be restored using the voting principle. The redundant elements can then be eliminated according to the indicators to obtain the final secret data.

The main contributions of this paper are as follows:

The copy encoding method is simpler to use than other methods and can detect errors more easily;
The use of voting strategy can effectively correct errors;
When the secret data encoded is not greater than $10 \times 10^{5}$ bits, the peak signal-to-noise ratio (PSNR) of the stego image generated by the proposed method is higher than other schemes;
It takes little time to perform encoding and decoding.

Overall, experimental results demonstrate the feasibility of the proposed algorithm, exhibiting high error-detection rate, high error-correction rate, high image visual quality and high total embedding capacity.

The remaining sections of this paper are organized as follows: Section 2 briefs the related works. Section 3 presents a detailed description of the proposed algorithm. Section 4 describes the experimental results and performance evaluation of the proposed approach. Section 5 presents the conclusions and outlines future research directions.

2. Related Works

In this section, we will provide a brief overview of the (7,4) Hamming code [18,19,20], and the data hiding method that combines quotient value differencing (QVD) and Hamming code capable of error detection and correction proposed by Kosuru et al. [16].

2.1. The (7,4) Hamming Code

Hamming Code is a coding technique used to detect and correct errors. Although there are other, more complex error correction codes available in modern coding technology, Hamming codes remain an important component of many systems that require error detection and correction because of their simplicity, effectiveness, and reliability. At the sender’s end, the (7,4) Hamming code generates three bits of parity information for every four bits of data. The parity bits are used to check whether the four bits of data have changed during the transmission process. If a change has occurred, these three bits of parity information indicate the location of the data changed. When a receiver receives the data, the data are divided into groups of seven bits, allowing for the analysis of the three-bit parity codes in each group to verify the integrity of the data. If there is only one incorrect bit in a group, the receiver can utilize the three-bit parity code associated with the group to identify the location of the error and correct it. The specific operations of parity coding and error location identification are depicted in Figure 1.

According to the encoding rules of the (7,4) Hamming code, the first, second, and fourth positions are designated for parity bits in a group of seven bits. The four bits of data to be encoded are placed sequentially in the remaining positions. Parity bit $C_{1}$ at the first position is responsible for checking data bits in positions 1, 3, 5, and 7. Parity bit $C_{2}$ at the second position checks data bits in positions 2, 3, 6, and 7. Parity bit $C_{3}$ at the fourth position checks data bits in positions 4, 5, 6, and 7. Parity bits are generated based on the count of ‘1’s in the data bits they oversee. For a parity bit, if there is an odd number of ‘1’s in the data being checked, the parity bit is set to 1; otherwise, it is set to 0. The specific formulas for each parity bit are shown in Equations (1)–(3). Where $D_{i}$ denotes the i-th bit of secret data in the group.

$C_{1} = D_{1} \oplus D_{2} \oplus D_{4},$

(1)

$C_{2} = D_{1} \oplus D_{3} \oplus D_{4},$

(2)

$C_{3} = D_{2} \oplus D_{3} \oplus D_{4} .$

(3)

When the receiver receives a group of seven bits of data, it recalculates the parity bits based on the ‘1’ counts in the data positions they oversee. The new parity bits are then used to determine whether the data changed during transmission. If the newly generated parity bits are not all zeroes, it indicates there is an error in the data at least. If there is only one error, the value of $({C^{'}}_{3} {C^{'}}_{2} {C^{'}}_{1})$ is converted from the binary format to decimal as the error location. Once the exact location is figured out, the correction can be performed at the location. If there are multiple errors occurring in various positions, the presence of the errors can be detected from the parity bits but not their exact locations. Equations (4)–(6) show how the receiver’s parity bits are calculated:

${C^{'}}_{1} = C_{1} \oplus D_{1} \oplus D_{2} \oplus D_{4},$

(4)

${C^{'}}_{2} = C_{2} \oplus D_{1} \oplus D_{3} \oplus D_{4},$

(5)

${C^{'}}_{3} = C_{3} \oplus D_{2} \oplus D_{3} \oplus D_{4} .$

(6)

2.2. Kosuru et al.’s Method [16]

In order to make the data hiding technique capable of error detection and correction while extracting information, Kosuru et al. combined the Hamming code technique with the QVD data embedding method. Firstly, Kosuru et al. divided the cover image into non-overlapping $2 \times 2$ pixel blocks, and then divided the pixels in each block into two parts: one part was the LSB values L and the other part was the quotient values Q. As shown in Figure 2, each pixel value was divided by two to obtain one part with seven MSBs and one part with one LSB. Then the eight-bit data, $D_{8} D_{7} D_{6} D_{5} D_{4} D_{3} D_{2} D_{1}$ , was read from the secret data stream, and the coding rule of (12,8) Hamming code was used to calculate the four-bit check digit $c_{4} c_{3} c_{2} c_{1}$ . Similar to the encoding rules of (7,4) Hamming code, the specific calculation is as follows:

$\{\begin{matrix} c_{1} = D_{1} \oplus D_{2} \oplus D_{4} \oplus D_{5} \oplus D_{7} \\ c_{2} = D_{1} \oplus D_{3} \oplus D_{4} \oplus D_{6} \oplus D_{7} \\ c_{3} = D_{2} \oplus D_{3} \oplus D_{4} \oplus D_{8} \\ c_{4} = D_{5} \oplus D_{6} \oplus D_{7} \oplus D_{8} \end{matrix} .$

(7)

The secret data and the checksum are concatenated together to obtain $D_{8} D_{7} D_{6} D_{5} D_{4} D_{3} D_{2} D_{1} c_{4} c_{3} c_{2} c_{1}$ . Then, $D_{8} D_{7} D_{6}$ is transformed into a decimal number $b_{1}$ , $D_{5} D_{4} D_{3}$ is transformed into a decimal number $b_{2}$ , and $D_{2} D_{1} c_{4}$ is transformed into a decimal number $b_{3}$ . Then, $Q_{1} - Q_{2} = d_{1}$ , $Q_{1} - Q_{3} = d_{2}$ and $Q_{1} - Q_{4} = d_{3}$ are calculated, respectively. Then, each of the 16 quantization ranges (QR) of Table 1 are observed to see which of them the absolute values of $d_{1}, d_{2}$ and $d_{3}$ belong to. Each QR has its upper bound (UB) and lower bound (LB). Using the LB differences, $b_{1}, b_{2}$ and $b_{3}$ are embedded into the interpolated $d_{1}, d_{2}$ and $d_{3}$ . The specific calculation is as follows:

${d^{'}}_{j} = \{\begin{matrix} {L B}_{j} + b_{j}, i f d_{j} \geq 0 \\ - {L B}_{j} - b_{j}, i f d_{j} < 0 \end{matrix}, (j = 1,2, 3)$

(8)

where ${L B}_{j}$ is the LB of the QR which $d_{j}$ belongs. The values $|{d^{'}}_{1} - d_{1}| = m_{1}$ , $|{d^{'}}_{2} - d_{2}| = m_{2}$ and $|{d^{'}}_{3} - d_{3}| = m_{3}$ are further computed. The change in the difference after embedding the data is mapped to the corresponding quotient pairs through Equations (9)–(11).

$({Q_{1}}^{1}, {Q^{'}}_{2}) = \{\begin{matrix} (Q_{1} - ⌊\frac{m_{1}}{2}⌋, Q_{2} + ⌈\frac{m_{1}}{2}⌉), i f m o d (d_{1}, 2) = 0 \\ (Q_{1} - ⌈\frac{m_{1}}{2}⌉, Q_{2} + ⌊\frac{m_{1}}{2}⌋), i f m o d (d_{1}, 2) = 1 \end{matrix},$

(9)

$({Q_{1}}^{2}, {Q^{'}}_{3}) = \{\begin{matrix} (Q_{1} - ⌊\frac{m_{2}}{2}⌋, Q_{3} + ⌈\frac{m_{2}}{2}⌉), i f m o d (d_{2}, 2) = 0 \\ (Q_{1} - ⌈\frac{m_{2}}{2}⌉, Q_{3} + ⌊\frac{m_{2}}{2}⌋), i f m o d (d_{2}, 2) = 1 \end{matrix},$

3. Proposed Scheme

The (7,4) Hamming coding, although more complex than parity checks, remains relatively simple and finds applicability in numerous scenarios. However, it can only correct a single bit error for every seven bits of data, making its error tolerance limited in cases of multiple or complex errors. In this section, we will introduce a straightforward yet efficient data hiding method capable of performing data verification and correction. Section 3.1 will cover the encoding of secret data, Section 3.2 will explain the encoding of indicator data, Section 3.3 will describe the data embedding process, and Section 3.4 will detail the data extraction along with the verification and correction processes.

3.1. The Copy Encoding

We use an efficient yet simple coding method, which we call copy encoding, to encode the secret data stream. Initially, we group successive identical bits in the secret data stream into groups. The bit values in a group can only be all 0s or all 1s. The number of bits in a group is recorded as the indicator of the group. Thus, ${I N}_{j}$ is the indicator for the j-th group and j is the index of the group. If there are more than seven successive bits with the same value in the secret stream, we need to group them into multiple groups to guarantee that the value of ${I N}_{j}$ is in the range of [1, 7]. The indicators recorded are used to guide the extraction of secret data. Then we encode each group of secret data stream according to the parity of $I N$ . The principle of secret data encoding at this stage aims to ensure that the number of encoded bit counts in a group is an odd number and is greater than one, so that the principle of ‘majority voting rule’ can be applied to verify and correct the data in the group during the data extraction process. In case ${I N}_{j}$ is an even number, we copy one bit of data in the group and append it to the end of the bits in the group to ensure that the count of the encoded data in the group is odd. If ${I N}_{j}$ is an odd number and is greater than one, the encoded secret bit stream of the group is the same as the original. When a group contains only one bit, i.e., the group’s indicator ${I N}_{j} = 1$ , we duplicate the data twice to make the encoded group comprise three bits.

Figure 3 provides an example of encoding secret data using copy encoding. When there is a secret data stream 000000000001000 as shown in the figure, we want to group the successive identical bits into groups first. There are 11 consecutive bits with the same bit value of 0; since we want to keep the value of an indicator in the range of [1, 7], these 11 ‘0’ bits will be split into 2 groups: the first 7 bits in 1 group and ${I N}_{1} = 7$ ; the remaining four bits into 1 group and ${I N}_{2} = 4$ . After the serial ‘0’s, there is a single bit one which forms a group on its own and ${I N}_{3} = 1$ . Finally, the last three successive bits of zero can be gathered into the fourth group and ${I N}_{4} = 3$ . With this process, we finish grouping the secret data stream and $I N$ denotes the indicators of the groups created. Following the grouping, we encode each group. Since ${I N}_{1}$ and ${I N}_{4}$ are odd numbers and not equal to one, bits in the first group and the fourth group remain unchanged. ${I N}_{2}$ is an even number; therefore, we need to copy one bit and append it to the end of the group to gain five bits of zero. When ${I N}_{3} = 1$ , we need to copy the single bit twice and append them to make it a group of three bits of one. Lastly, concatenate the encoded bits in groups to form the encoded secret data stream 000000000000111000. The algorithm for copy encoding is shown in Algorithm 1. The time complexity of the encoding process of secret data is O(N).

Algorithm 1. Copy encoding algorithm.
Input:	Secret data $S = s_{1} s_{2} s_{3} \dots s_{n}$
Output:	$Indicator I N$ , encoded data $S^{'}$
1:	add $s_{1}$ to ${g r o u p}_{j}$ /* $j = 1$ */
2:	${I N}_{j}$ = 1 /* $j = 1$ */
3:	for $i = 2$ :ndo
4:	if ${I N}_{j} \neq 7$
5:	if $s_{i} = = s_{i - 1}$
6:	add $s_{i}$ to ${g r o u p}_{j}$
7:	${I N}_{j}$ = ${I N}_{j}$ + 1
8:	else
9:	$j = j + 1$
10:	add $s_{i}$ to ${g r o u p}_{j}$
11:	${I N}_{j}$ = 1
12:	end if
13:	else if ${I N}_{j} = = 7$
14:	$j = j + 1$
15:	add $s_{i}$ to ${g r o u p}_{j}$
16:	${I N}_{j}$ = 1
17:	end if
18:	end for
19:	if $m o d ({I N}_{j}, 2) = = 0$
20:	copy one secret bit of ${g r o u p}_{j}$ , appending it to the end of ${g r o u p}_{j}$
21:	else if $m o d ({I N}_{j}, 2) = = 1$ $& & {I N}_{j} = = 1$
22:	copy the secret bit in the ${g r o u p}_{j}$ twice
23:	else if $m o d ({I N}_{j}, 2) = = 1$ $& & {I N}_{j} \neq 1$
24:	the secret bits in ${g r o u p}_{j}$ are unchanged
25:	end if
26:	Concatenate the data in all ${g r o u p}_{j}$ to obtain the encoded data $S^{'}$ .
27:	return $I N$ , and $S^{'}$

3.2. Coding of Indicators IN

To prevent indicators from being tampered with during transmission, it is necessary to encode the indicators $I N$ as well. Encoding the indicators enables us to extract the correct indicators to use in guiding data extraction with data validation and correction. The copy encoding method will inevitably produce extra indicators, so it is not suitable to encode indicators $I N$ . Therefore, we chose the traditional (7,4) Hamming coding method to encode indicators instead. When encoding the secret data, we control the indicators $I N$ to fall in the range of [1, 7], which can then be converted from decimal numbers to three-bit binary numbers. We need to first convert all the indicators $I N$ to binary representations and form an indicator stream. Subsequently, we encode the indicator stream using the (7,4) Hamming coding method. A three-bit parity code is generated for every four bits of data. The details of the encoding process were described in the Related Work section earlier.

Figure 4 shows how the indicators are converted to an indicator stream and how indicators are encoded. As shown in Figure 4, the indicators are first converted from decimals to three-bit binary streams and form the indicator stream 111100001011. Subsequently, we divide the indicator stream into four-bit groups and encode each group by using (7,4) Hamming coding method to obtain the encoded indicator stream. The encoded indicator stream we obtain from the example is 1111111 0000000 0110011.

Since the range of $I N$ is [1, 7], $I N = 0$ is used as a flag to terminate the indicator data stream. In other words, encoding $I N = 0$ means that our indicator data stream has been terminated. When we decode $I N = 0$ , indicator data stream has been fully decoded and the secret data stream follows.

3.3. The Data Embedding Phase

Once the secret data and indicators are encoded, we move on to the phase of data embedding. We use the simple but efficient data embedding method, least significant bit (LSB) substitution, to hide the secret data. Initially, we concatenate the encoded indicators stream with the encoded secret data stream to obtain the resulting data stream that we want to embed in the cover image. For each pixel of the cover image, we sequentially replace its least significant bits (LSBs) with a bit from the encoded data stream. If the size of the result data stream is large, we replace multiple LSBs of each pixel with the multiple bits from the result data stream sequentially. After hiding the result data stream, we receive the desired stego image for uploading.

3.4. Data Extraction and Error Correction Phase

When the receiver downloads the stego image from the cloud, the receiver can read the encoded indicator stream and the encoded secret data stream directly from the LSBs of the stego image pixels. The receiver should extract the indicators $I N$ first to subsequently guide the verification, correction, and extraction of secret data. The receiver needs to group the encoded indicator stream into groups and each group consists of seven bits. Subsequently, the receiver recalculates the three-bit check code in each group using Equations (4)–(6) and performs the process of data verification and error correction using (7,4) Hamming codes which were detailed in the Related Work section.

Next, the receiver removes the three-bit parity code from the encoded indicator stream and acquires the pure indicator stream, and then regroups the indicator stream into three-bit groups. After converting the three-bit indicator stream in each group into decimal form, the indicators $I N$ are subsequently used to guide the verification, correction, and extraction of secret data in the group.

Then, the receiver can deduce the length of the j-th group of the encoded secret data streams based on the indicator ${I N}_{j}$ . Since the purpose of the copy encoding method is to apply the “majority principle” to perform data verification and error correction on the data stream, the number of data bits corresponding to the indicators in the encoded group should be odd numbers and greater than one. According to the parity of the indicator ${I N}_{j}$ , we can easily deduce the length of the corresponding encoded secret data group. If the ${I N}_{j}$ is an even number, it means that we copied one bit from the j-th group of secret data. Therefore, the ${I N}_{j} + 1$ bits of data should be in the j-th group when extracting. If the ${I N}_{j}$ is an odd number and not one, it indicates that the j-th group was unchanged after encoding and the ${I N}_{j}$ bits of data can be read as the encoded data for the j-th group. If the indicator ${I N}_{j} = 1$ , it means that we copied the secret data of the j-th group twice. Therefore, when extracting, we should read ${I N}_{j} + 2$ bits of data as the encoded data for the j-th group. After the grouping is completed, the receiver needs to verify and correct the encoded data in each group. According to the proposed copy encoding rules, the elements within a group should remain the same for correct transmission. Therefore, if there is a mixture of bit value zero and bit value one in a group, it means that an error occurred during the transmission. If the amount of the encoded data in a group is an odd number and greater than one, it provides a basis for error correction using voting strategy. It is like voting in an election: the value of each element in the group represents what it supports initially. In the situation of correction, the value of the majority votes in the group is ‘the leader’ of the group, and all elements in the group should follow ‘the leader’ and correct their values to be consistent with ‘the leader’. Through the voting strategy, the values of the elements in a group should be consistent with the value of the majority votes. After the error correction is completed, the receiver deletes the extra bits added during the encoding based on the indicators $I N$ to complete the extraction of the secret data. The algorithm for data extraction and decoding is shown in Algorithm 2. The time complexity of the decoding process of secret data is O(N).

Algorithm 2. Data extraction and decoding algorithm.
Input:	$Indicator I N$ , encoded data $S^{'} = {s^{'}}_{1} {s^{'}}_{2} {s^{'}}_{3} \dots {s^{'}}_{n}$
Output:	Secret data $S$
1:	for $j = 1$ : enddo
2:	if $m o d ({I N}_{j}, 2) = = 0$
3:	add $({I N}_{j} + 1)$ bits encoded data to ${g r o u p}_{j}$
4:	else if $m o d ({I N}_{j}, 2) = = 1$ $& & {I N}_{j} = = 1$
5:	add $({I N}_{j} + 2)$ bits encoded data to ${g r o u p}_{j}$
6:	else if $m o d ({I N}_{j}, 2) = = 1$ $& & {I N}_{j} \neq 1$
7:	add ${I N}_{j}$ bits encoded data to ${g r o u p}_{j}$
8:	end if
9:	if the data of ${g r o u p}_{j}$ is all 0′s or all 1′s,
10:	no error occurred in ${g r o u p}_{j}$ .
11:	add ${I N}_{j}$ bits data in ${g r o u p}_{j}$ to $S$
12:	else if the data of ${g r o u p}_{j}$ is not all 0′s or all 1′s,
13:	error occurred in ${g r o u p}_{j}$ .
14:	correct the error according to the majority principle
15:	add ${I N}_{j}$ bits data in ${g r o u p}_{j}$ to $S$
16:	end if
17:	end for
18:	return $S$

Figure 5 provides an example of data extraction and error correction. First, the receiver groups the encoded indicator stream into seven-bit groups. In the example, we can divide the encoded indicator stream into three groups. According to the rules of (7,4) Hamming code, the three-bit parity codes are calculated respectively. Then, the data bits in the group are error corrected based on the three-bit parity codes. Removing the parity codes in groups to obtain the pure indicator stream 111100001011, the receiver divides the pure indicator stream into three-bit groups. In the example, we can divide the pure indicator stream into four groups and convert these groups into decimals to obtain ${I N}_{j}$ (j = 1,2,3,4). Since ${I N}_{1} = 7$ is an odd number, the first seven bits of data form the first group. ${I N}_{2} = 4$ and four is an even number, so the next ${I N}_{2} + 1 = 5$ bits of data become the second group. When the indicator of the third group ${I N}_{3} = 1$ , the ${I N}_{3} + 2 = 3$ bits of data read convert to the third group. Finally, ${I N}_{4} = 3$ is an odd number, and the next three bits of data belong to the fourth group. Following the above steps, we have grouped the encoded secret data stream. The receiver can calculate the proportion of elements in the group, figure out where the errors occurred and use a voting strategy to correct the errors based on the “majority principle”. As shown in Figure 5, there are six bits with a value of zero and one bit with a value of one in the first group. The voting operation is based on the “majority principle” to determine whether an element is right or wrong. It meets the principle of majority voting that there are six bits out of seven with a bit value of zero; therefore, the bit with a value of one can be replaced by a bit with value zero. After the correction, according to ${I N}_{1} = 7$ , seven-bit information is read from the corrected data stream as the extracted secret data. The other two groups with errors were corrected and extracted using the same method. The receiver can obtain the final secret data stream 000000000001000 after all four groups are extracted and corrected.

4. Experimental Results and Analysis

In this section, we present the results of the experimental evaluations conducted on the proposed data hiding scheme. Our tests encompassed several color images. Each color image comprises of 3 channels: red, green, and blue, and each channel has a size of 512 × 512 pixels. During the data embedding process, we embedded data into the least significant bits (LSBs) of the pixels in the red channel, the green channel, and the blue channel sequentially. The experiments were conducted on a Windows PC utilizing MATLAB software, version 2017a. Our test set includes colored images “Airplane”, “Baboon”, “Lake”, “Lena”, and “Peppers”.

A stream of binary data generated by a random number generator as the secret data will be encoded and embedded into the cover image. The effectiveness of the proposed scheme is assessed based on embedding capacity (EC), peak signal-to-noise ratio (PSNR), Structural Similarity (SSIM), Quality Index (QI), and entropy (EN). EC represents the quantity of secret data that a method can embed, where a larger EC indicates that the method can transmit more information in a carrier, thereby possessing higher transmission efficiency. Assuming the carrier is a cover image, EC can be the number of embedded secret data bits or calculated bits per pixel (bpp) as the capacity measurement. PSNR and SSIM, expressed in decibels (dB), are the two primary metrics used to evaluate the image distortion between the original image and stego one. The calculations for PSNR and SSIM are as follows:

$M S E = \frac{1}{M \times N} \sum_{i = 1}^{M} \sum_{j = 1}^{N} (C_{i j} - S_{i j})^{2},$

(12)

$PSNR = 10 \log_{10} \frac{{(255)}^{2}}{MSE} (dB),$

(13)

$SSIM = \frac{(2 μ_{C} μ_{S} + c_{1}) (2 σ_{C S} + c_{2})}{[{(μ_{C})}^{2} + {(μ_{S})}^{2} + c_{1}] [{(σ_{C})}^{2} + {(σ_{S})}^{2} + c_{2}]} .$

(14)

In Equation (12), $M \times N$ is the size of the cover image, $C_{i j}$ represents the pixel values of the cover image, and $S_{i j}$ represents the pixel values of the stego image. A higher PSNR value indicates smaller distortion caused by the hidden data, and the stego image closely resembles the cover image. Generally, a PSNR value greater than 30 dB suggests that the information embedding has caused imperceptible distortion to human eyes, making it an acceptable result.

In Equation (14), $μ$ represents the mean and serves as an estimate of brightness; $σ$ is the standard deviation and serves as an estimate of contrast; $σ_{C S}$ represents the covariance between the cover image C and the stego image S and serves as a measure of structural similarity. Additionally, $c_{1}$ and $c_{2}$ are two constants. SSIM combines factors related to brightness, contrast, and structure to comprehensively assess the similarity between two images. SSIM values can range from −1 to 1; with values close to 1 indicating that the stego image is highly similar to the cover image.

QI is another metric used for evaluating images’ similarity. we adopted the QI settings as described in [16]. The calculation formula for QI is as follows:

$Q I = \frac{4 \times \bar{C} \times \bar{S} \times \{\sum_{i = 1}^{M} \sum_{j = 1}^{N} (C_{i j} - \bar{C}) \times (S_{i j} - \bar{S})\}}{\{\sum_{i = 1}^{M} \sum_{j = 1}^{N} {(C_{i j} - \bar{C})}^{2} + \sum_{i = 1}^{M} \sum_{j = 1}^{N} {(S_{i j} - \bar{S})}^{2}\} \times ({\bar{C}}^{2} + {\bar{S}}^{2})},$

(15)

where $\bar{C}$ means the average value of the cover image pixels and $\bar{S}$ is the average value of the stego image pixels. The higher the value of QI, the smaller the difference between the cover image and the stego image.

The entropy (EN) can represent the characteristics of an image’s gray scale distribution. The more similar the entropy of two images, the more similar they are. The entropy (EN) is calculated as follows:

$e n t r o p y = \sum_{p i = 0}^{255} θ (p i) \log \frac{1}{θ (p i)},$

(16)

where $p i$ represents the image pixel value between 0 and 255, and $θ (p i)$ is the probability of the image pixel value $p i$ .

Table 2, Table 3 and Table 4 recorded the performance of the proposed scheme across various metrics for different embedding capacities on different test images. Upon reviewing the data in these three tables, it became evident that the proposed scheme performs quite well when embedding a small amount of secret data. For instance, when embedding 100,000 bits of information, the average PSNR across the 5 test images reached an impressive 56 dB, with an SSIM of 0.9996, and a QI value approaching 1. Additionally, the entropies of the stego images were also close to those of the original cover images. This indicates that the proposed scheme produces stego images highly similar to the original cover images when embedding a low amount of data.

As the size of the embedded data stream increases, the performance metrics of the stego images generated by the proposed scheme exhibit a decreasing trend. Even when embedding 1,000,000 bits of secret data, the scheme still produced stego images with an average PSNR of 35 dB, an SSIM of 0.97, and a QI maintained at 0.9942. The entropies of the stego images also did not differ significantly from those of the original images. It implies that when embedding a substantial amount of secret data, the proposed scheme introduces limited distortion and exhibits good performance in terms of structural similarity against the original images. The overall performance is commendable.

Table 5 shows the time taken for encoding and decoding using different binary images as secret data. The binary images used as secret data are of size $256 \times 256$ . From Table 4, it can be noticed that the proposed copy encoding method takes a longer time to encode and decode for complex secret data. However, the longest encoding time is only 0.81 s and the longest decoding time is only 0.58 s. Since there are more indicators required for the complex secret data, the indicators encoded and decoded with (7,4) Hamming code take more time as well. For the smooth secret data, the proposed copy encoding method takes about 0.2 s to encode all the secret data and 0.3 s to decode them. Overall, the proposed copy encoding method is simple and efficient in encoding and decoding.

The focus of the proposed scheme is on error detection and correction. The scheme utilizes a copy encoding method along with a voting-based error-correction approach. When fewer than half of the data bits in a group are erroneous, the scheme can accurately detect them and correct them. As illustrated in Figure 5, the error correction strategy of the proposed method effortlessly rectifies the data if the number of errors does not exceed half of the data bits in the group.

Figure 6 shows the visualization results of the proposed scheme before and after embedding the encoded secret image into the cover image. As can be seen from (a) and (c), it is difficult for a malicious user to find out that (c) is an image with the embedded secret image (b). (d) is the pixel difference between the stego image and the original cover image. From (d), we can find it is almost impossible for people to find out the difference between the original cover image and the stego image at the pixel level. In general, people will treat the stego image as a cover image and not pay much attention to it. If a malicious user still chooses to attack the stego image by adding salt and pepper noise to the image in order to destroy the information of the image, the secret data image encoded using the proposed method can be mostly correctly extracted. Salt and pepper noise, randomly appearing as white or black dots, is a common type of noise in images. These noises may be black pixels in bright areas, white pixels in dark areas, or both. Table 6 shows the correctness percentages of the secret data extracted from the stego image after the stego image suffered from different densities of salt and pepper noise attacks. When the stego image is attacked by the salt and pepper noise with a density of 0.01, the extracted data can be corrected by the proposed scheme to obtain a secret image that is almost the same as the original secret image, and the difference between them is insignificant. Figure 6e,f demonstrate the observation. When the stego image is subjected to a salt and pepper noise attack with a density of 0.01, the indicator data stream encoded using the (7,4) Hamming code possesses 99.56% correctness after correction; additionally, the secret data stream encoded using the copying encoding method possesses 100% correctness after correction, as shown in Table 6. While the density of the salt and pepper noise attack is increased, the correctness of the indicator data and the secret data is decreased. However, we can still extract the indicator data from the stego image with a 94.83% correctness rate and the secret data with a 99.85% correctness rate, even after the stego image has suffered a salt and pepper noise attack with a density of 0.1. From Figure 6g,h, when the stego image is subjected to the salt and pepper noise attack, the data image extracted from a stego image is very similar to the original one. It means that the attack does not affect our ability to obtain the approximate information on the stego image. In summary, the proposed scheme has an acceptable error correction efficiency, and the secret data can have a correctness rate of about 99%.

Salt and pepper noise attacks on stego images do not necessarily make the embedded data erroneous. In Table 7, we model the case where the error occurs in the extracted data stream. Table 7 demonstrates the accuracy of the information extracted by the proposed scheme under different proportions of error rates. The error rate is the percentage of error bits in the extracted data. $γ_{i n d}$ and $γ_{d a t a}$ denote the accuracy rates for the indicator extraction and the secret data extraction, respectively. From Table 7, it can be found that the indicators encoded by using (7,4) Hamming code can be extracted 88.7% correctly by the error correction mechanism even when 10% of the bits are erroneous. On the other hand, the proposed method can extract 99.7% of the correct information through the error correction mechanism in cases where there are 10% of error bits. That is to say, the error correction efficiency of the proposed scheme is so high that the information can be extracted almost error-free, even in cases where 10% of the bits are in error.

Table 8 presents the performance of the proposed scheme in comparison to several existing IRDH methods. A “✗” denotes the absence of the feature, while a “✓” signifies its implementation. The Hiding Capacity (HC) represents the total amount of data a scheme can hide in an image without deducting redundant data bits used for error detection and correction. Pradhan et al. [11] proposed a PVD technique using 3 × 3 size blocks. The concept of QVD was proposed by Jung [12]. Although this method obtained high HC values, it suffered from overflow and extraction errors. Pradhan et al. [13] combined additive–subtraction-based Quotient Value Difference (ASQVD) with neighbor matching to improve the HC values. Sonar and Swain [14] combined pixel value correlation with QVD to further improve HC. However, these methods focus on the data hiding aspect and do not consider error detection, let alone error correction. Swain and Pradhan [15] proposed a hybrid approach using QVD and quotient value correlation that allows data integrity verification at the receiver. From the table, there are methods that do not consider error detection and correction, such as [11,12,13,14], have advantages in HC, PSNR, and QI because they do not require extra bits for information detection. Unfortunately, no one can determine whether the data transmitted using these schemes is reliable or not. On the other hand, methods that take error detection and correction into account inevitably sacrifice some hiding capacity to record detection codes. Despite having similar PSNR values, these methods including the proposed one have smaller HCs compared to conventional IRDH methods. Although the method in [15] exhibits high PSNR and QI values, it can only detect data errors but not correct them. As for [16], while it slightly outperforms the proposed scheme in terms of PSNR and QI, it can only correct 1 error for every 12 bits of data and results in lower error correction efficiency. The proposed scheme improves error correction efficiency substantially as it can accurately detect and correct errors when fewer than half of the data bits in a group are erroneous. If there are three bits in a group, it can correct at least one error; with five bits, it can correct two errors; with seven bits, it can correct three errors. Compared to [16], the proposed scheme exhibits a significant enhancement to error correction efficiency.

5. Conclusions

The majority of existing data hiding techniques primarily focus on enhancing the embedding capacity of methods and the visual quality of the stego images. However, there is limited research emphasizing on the ability of detecting and correcting errors during the extraction process at the receiver’s end. The proposed approach introduces a simple and efficient data hiding method capable of detecting and correcting errors while maintaining comparable embedding capacity and PSNR. The proposed scheme first groups adjacent and identical data in the secret data and records the count of data in each group as indicators. Subsequently, it encodes the secret data using a copy encoding method. The indicators are then encoded using the (7,4) Hamming code. Finally, the encoded secret data and the encoded indicators are embedded one after the other in the cover image. The receiver can read the indicators first to group the data and then employs a voting-based error detection and correction method to verify the correctness of the extracted data. Experimental results show that when the secret data to be encoded is smooth, encoding takes about 0.2 s and decoding takes about 0.3 s. When the secret data is complex, the time spent on encoding and decoding is also less than 1 s. The data extracted and corrected by the proposed copy encoding method can reach a 99% correctness rate even with a noise attack on the stego image. It can be concluded that the proposed copy encoding method is a simple and efficient coding method with error detection and correction capability. Meanwhile, the proposed method outperforms other schemes in terms of PSNR when embedding secret data is less than $10 \times 10^{5}$ bits.

A limitation of this work is the generation of significant redundancy during the encoding process due to the need for indicators to guide subsequent data extraction. In the future, we will explore various strategies to reduce redundancy without compromising error correction efficiency.

Author Contributions

Conceptualization and methodology, H.C., J.-C.L., C.-C.C. and J.-H.H.; software, H.C.; validation, H.C., J.-C.L., C.-C.C. and J.-H.H.; data curation, H.C.; writing—original draft preparation, H.C.; writing—review and editing, J.-C.L. and H.C.; supervision, C.-C.C. and J.-H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Abbreviated terms used in the paper and their full names:

LSB	Least Significant Bit
VR	Virtual Reality
AR	Augmented Reality
5G	the Fifth Generation
DH	Data Hiding
RDH	Reversible Data Hiding
IRDH	Irreversible Data Hiding
CRC	Cyclic Redundancy Check
PSNR	Peak Signal-to-noise Ratio
QVD	Quotient Value Differencing
EC	Embedding Capacity
SSIM	Structural Similarity
QI	Quality Index
EN	Entropy
ISB	Intermediate Significant Bit

References

Yu, C.; Zhang, X.; Li, G.; Zhan, S.; Tang, Z. Reversible data hiding with adaptive difference recovery for encrypted images. Inf. Sci. 2022, 584, 89–110. [Google Scholar] [CrossRef]
Li, L.; Yao, Y.; Yu, N. High-fidelity video reversible data hiding using joint spatial and temporal prediction. Signal Process. 2023, 208, 108970. [Google Scholar] [CrossRef]
Kim, C. Dual Reversible Data Hiding Based on AMBTC Using Hamming Code and LSB Replacement. Electronics 2022, 11, 3210. [Google Scholar] [CrossRef]
Wu, X.; Yang, C.-N.; Liu, Y.-W. A general framework for partial reversible data hiding using hamming code. Signal Process. 2020, 175, 107657. [Google Scholar] [CrossRef]
Celik, M.U.; Sharma, G.; Tekalp, A.M.; Saber, E. Lossless generalized-LSB data embedding. IEEE Trans. Image Process. 2005, 14, 253–266. [Google Scholar] [CrossRef] [PubMed]
Kim, C.; Yang, C.-N.; Zhou, Z.; Jung, K.-H. Dual efficient reversible data hiding using Hamming code and OPAP. J. Inf. Secur. Appl. 2023, 76, 103544. [Google Scholar] [CrossRef]
Geetha, R.; Geetha, S. Embedding electronic patient information in clinical images: An improved and efficient reversible data hiding technique. Multimed. Tools Appl. 2020, 79, 12869–12890. [Google Scholar] [CrossRef]
Faheem, Z.B.; Ali, M.; Raza, M.A.; Arslan, F.; Ali, J.; Masud, M.; Shorfuzzaman, M. Image Watermarking Scheme Using LSB and Image Gradient. Appl. Sci. 2022, 12, 4202. [Google Scholar] [CrossRef]
Almazaydeh, L. Secure RGB image steganography based on modified LSB substitution. Int. J. Embed. Syst. 2020, 12, 453–457. [Google Scholar] [CrossRef]
Mahmoud, M.M.; Elshoush, H.T. Enhancing LSB Using Binary Message Size Encoding for High Capacity, Transparent and Secure Audio Steganography–An Innovative Approach. IEEE Access 2022, 10, 29954–29971. [Google Scholar] [CrossRef]
Pradhan, A.; Raja Sekhar, K.; Swain, G. Digital Image Steganography based on Seven Way Pixel Value Differencing. Indian J. Sci. Technol. 2016, 9, 1–11. [Google Scholar] [CrossRef]
Jung, K.-H. Data hiding scheme improving embedding capacity using mixed PVD and LSB on bit plane. J. Real-Time Image Process. 2018, 14, 127–136. [Google Scholar] [CrossRef]
Pradhan, A.; Sekhar, K.R.; Swain, G. Image steganography using add-sub based QVD and side match. In Digital Media Steganography; Hassaballah, M., Ed.; Academic Press: Cambridge, MA, USA, 2020; pp. 81–97. [Google Scholar] [CrossRef]
Sonar, R.; Swain, G. Steganography based on quotient value differencing and pixel value correlation. CAAI Trans. Intell. Technol. 2021, 6, 504–519. [Google Scholar] [CrossRef]
Swain, G.; Pradhan, A. Image Steganography Using Remainder Replacement Adaptive QVD and QVC. Wirel. Pers. Commun. 2022, 123, 273–293. [Google Scholar] [CrossRef]
Kosuru, S.N.V.J.D.; Pradhan, A.; Basith, K.A.; Sonar, R.; Swain, G. Digital Image Steganography with Error Correction on Extracted Data. IEEE Access 2023, 11, 80945–80957. [Google Scholar] [CrossRef]
Darabkh, K.A.; Al-Dhamari, A.K.; Jafar, I.F. A New Steganographic Algorithm Based on Multi Directional PVD and Modified LSB. Inf. Technol. Control 2017, 46, 1. [Google Scholar] [CrossRef]
Cintas-Canto, A.; Kermani, M.M.; Azarderakhsh, R. Error Detection Constructions for ITA Finite Field Inversions Over GF(2^m) on FPGA Using CRC and Hamming Codes. IEEE Trans. Reliab. 2023, 72, 651–661. [Google Scholar] [CrossRef]
Chan, C.-S.; Chang, C.-C. An efficient image authentication method based on Hamming code. Pattern Recognit. 2007, 40, 681–690. [Google Scholar] [CrossRef]
Kumar, U.K.; Umashankar, B.S. Improved Hamming Code for Error Detection and Correction. In Proceedings of the 2007 2nd International Symposium on Wireless Pervasive Computing, San Juan, PR, USA, 5–7 February 2007. [Google Scholar] [CrossRef]
Sobolewski, J.S. Cyclic redundancy check. In Encyclopedia of Computer Science; GBR: John Wiley and Sons Ltd.: Hoboken, NJ, USA, 2003; pp. 476–479. [Google Scholar]
Castagnoli, G.; Brauer, S.; Herrmann, M. Optimization of cyclic redundancy-check codes with 24 and 32 parity bits. IEEE Trans. Commun. 1993, 41, 883–892. [Google Scholar] [CrossRef]
Li, B.; Shen, H.; Tse, D. An Adaptive Successive Cancellation List Decoder for Polar Codes with Cyclic Redundancy Check. IEEE Commun. Lett. 2012, 16, 2044–2047. [Google Scholar] [CrossRef]

Figure 1.Example of coding and error correction process on (7,4) Hamming code.

Figure 2.The way to divide the pixels of a pixel block into two parts.

Figure 3.Examples of generating indicators and using copy encoding.

Figure 4.Examples of generating encoded indicators.

Figure 5.Example of extracting data and correcting errors based on indicators.

Figure 6.Visual presentation of stego image and extracted secret images. Part (a) is the cover image; (b) is the secret image; (c) is the stego image; (d) is the difference between the cover image and stego image; (e,g) are the extracted secret images after error correction when the noise density is 0.01 and 0.1; (f,h) are the differences between the extracted image and the original secret image.

Figure 7.Comparison of the proposed scheme with other schemes capable of error detection and correction. (a) Lena; (b) Baboon.

Table 1.The quantization range table of the quotient difference.

QR	QR1	QR2	QR3	QR4	QR5	QR6	QR7	QR8	QR9	QR10	QR11	QR12	QR13	QR14	QR15	QR16
LB	0	8	16	24	32	40	48	56	64	72	80	88	96	104	112	120
UB	7	15	23	31	39	47	55	63	71	79	87	95	103	111	119	127

Table 2.Performance of the proposed scheme with an embedding capacity of 100,000 bits.

Image Name	PSNR	SSIM	QI	${E N}_{i n}$	${E N}_{o u t}$
Airplane	56.0251	0.9987	0.9999	6.6639	6.6634
Baboon	56.0270	0.9999	0.9999	7.7624	7.7621
Peppers	56.0138	0.9999	0.9999	7.6698	7.6694
Lake	56.0319	0.9998	0.9999	7.7622	7.7619
Lena	56.0504	0.9999	0.9999	7.7494	7.7502
Average	56.0296	0.9996	0.9999	-	-

Table 3.Performance of the proposed scheme with an embedding capacity of 500,000 bits.

Image Name	PSNR	SSIM	QI	${E N}_{i n}$	${E N}_{o u t}$
Airplane	47.2959	0.9916	0.9994	6.6247	6.6634
Baboon	47.2913	0.9993	0.9996	7.7177	7.7621
Peppers	47.3644	0.9995	0.9997	7.6465	7.6694
Lake	47.2969	0.9985	0.9998	7.7169	7.7619
Lena	47.2799	0.9995	0.9997	7.7001	7.7502
Average	47.3057	0.9977	0.9996	-	-

Table 4.Performance of the proposed scheme with an embedding capacity of 1,000,000 bits.

Image Name	PSNR	SSIM	QI	${E N}_{i n}$	${E N}_{o u t}$
Airplane	35.7220	0.8984	0.9904	6.4677	6.6634
Baboon	35.7160	0.9899	0.9942	7.4948	7.7621
Peppers	35.7467	0.9931	0.9957	7.4445	7.6694
Lake	35.7260	0.9794	0.9960	7.4828	7.7619
Lena	35.7370	0.9932	0.9946	7.4321	7.7502
Average	35.7295	0.9708	0.9942	-	-

Table 5.Encoding and decoding time for indicators and secret data. (Unit: second.)

Image Name	Indicators Encoding	Secret Data Encoding	Indicators Decoding	Secret Data Decoding
Airplane	0.74	0.26	0.43	0.35
Baboon	1.59	0.81	0.75	0.58
Barbara	1.01	0.45	0.54	0.42
Boat	0.74	0.26	0.43	0.33
Couple	0.87	0.31	0.47	0.35
Lena	0.74	0.23	0.42	0.31
Peppers	0.72	0.23	0.42	0.31

Table 6.Correctness of extracted data under different densities of salt and pepper noise attack.

Noise Density	$γ_{i n d}$	$γ_{d a t a}$
0.01	99.56%	100.00%
0.02	99.07%	100.00%
0.03	98.62%	99.99%
0.04	97.94%	99.98%
0.05	97.49%	99.94%
0.06	96.75%	99.97%
0.07	96.26%	99.90%
0.08	95.89%	99.94%
0.09	95.08%	99.92%
0.1	94.83%	99.85%

Table 7.Error correction rates of the proposed scheme under different error rates.

Error Rate	$γ_{i n d}$	$γ_{d a t a}$
1%	99.08%	99.99%
3%	96.79%	99.97%
5%	94.80%	99.86%
10%	88.87%	99.34%

Table 8.Performance comparison of the proposed scheme with other methods.

Method	PSNR	QI	HC	Error Detection	Error Correction
Pradhan et al. [11]	39.78	0.9985	1,879,572	✗	✗
Jung [12]	35.27	0.9967	2,757,637	✗	✗
Pradhan et al. [13]	33.02	0.9947	2,794,301	✗	✗
Sonar and Swain [14]	35.15	0.9966	3,086,396	✗	✗
Swain and Pradhan [15]	39.74	0.9988	1,738,014	✓	✗
Kosuru et al. [16]	36.78	0.9977	2,359,296	✓	✓
Proposed	35.73	0.9942	2,551,826	✓	✓

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

QR	QR1	QR2	QR3	QR4	QR5	QR6	QR7	QR8	QR9	QR10	QR11	QR12	QR13	QR14	QR15	QR16
LB	0	8	16	24	32	40	48	56	64	72	80	88	96	104	112	120
UB	7	15	23	31	39	47	55	63	71	79	87	95	103	111	119	127

QR	QR1	QR2	QR3	QR4	QR5	QR6	QR7	QR8	QR9	QR10	QR11	QR12	QR13	QR14	QR15	QR16
LB	0	8	16	24	32	40	48	56	64	72	80	88	96	104	112	120
UB	7	15	23	31	39	47	55	63	71	79	87	95	103	111	119	127

QR	QR1	QR2	QR3	QR4	QR5	QR6	QR7	QR8	QR9	QR10	QR11	QR12	QR13	QR14	QR15	QR16
LB	0	8	16	24	32	40	48	56	64	72	80	88	96	104	112	120
UB	7	15	23	31	39	47	55	63	71	79	87	95	103	111	119	127