Determining mandatory nutritional parameters for Iberian meat products using a new method based on near infra-red reflectance spectroscopy and data mining

1 Animal Source Foodstuff s Innovation Service (SiPA), University of Extremadura, Av. Ciencias S/N, ES10003, Cáceres, Spain. 2 Media Engineering Group, School of Technology, University of Extremadura, Av. Ciencias S/N, ES-10003, Cáceres, Spain. 3 Chemometrics and Analytical Technology, Department of Food Science, Faculty of Science, University of Copenhagen, Rolighedsvej 26, DK-1958, Frederiksberg C, Denmark. Corresponding author: Daniel Caballero, dcaballero@unex.es


Introduction
In the European Union (EU), mandatory labelling of nutritional information requires specific parameters (protein, lipid, saturated lipid, salt, carbohydrate and sugar contents) to be declared (European Union, 2011). This regulation aims to protect consumers' health by informing them of the food's nutritional content. Thus, it moderates the provision of useful, understandable and uniform information to consumers, allowing them to make coherent decisions and safe food choices (Benoit et al., 2016).
In this regulation, the detection limits for the main nutritional parameters are 0.1 %, but if the value detected is less than this percentage, the amount can be declared as not detectable (European Union, 2011). In addition, this regulation aims that allergens or substances causing intolerances be declared, updating the guidance document under regulation 2000/13/EC (European Union, 2000). This regulation was active until 2014.
Consumers rate meat and meat products derived from Iberian pigs highly because of the products' unique sensory traits, which are a consequence of both the raw materials' characteristics, especially lipid-related ones, and their particular processing conditions. Among these products, the most appreciated are the expensive, dry-cured Iberian hams. These are derived from Iberian pigs reared outdoors and fed on natural resources, mainly acorn and grass (Cava et al., 2000). The main nutritional parameters of protein, lipid, salt and carbohydrate contents of Iberian pork meat products derived from Iberian pigs were widely studied previously (Cruz and Vieira, 2017;Muriel et al., 2004;Utrilla et al., 2010;Ventanas, 2012).
Traditional physico-chemical analysis methods for nutritional information are tedious, time-and solvent-consuming and require the destruction of the samples. These analyses take around 6 days for results to be produced, delaying the response time. In this sense, the use of alternative techniques, such

Original Scientific Paper
Determining mandatory nutritional parameters for Iberian meat products using a new method based on near infra-red reflectance spectroscopy and data mining Daniel Caballero 1,2,3 , Maria Asensio 1 , Carlos Fernández 1 , Noelia Martín 1 , and Antonio Silva 1 A b s t r a c t: The new regulation about mandatory labelling on nutrition requires the declaration of specific parameters: protein, lipid, salt and carbohydrate contents. This study reports a fast, accurate method to determine the values of these mandatory nutritional parameters based on near infra-red reflectance spectroscopy (NIRs) technology and data mining techniques, used in an automatic way. For that, two batches of different Iberian pork meat products (dry-cured ham, dry-cured loin, dry-cured shoulder, dry-fermented Salchichón sausage, and dry-fermented Chorizo sausage were used. One batch of each product was used to train the method and the remaining batch was used for validation. To develop the method, prediction equations were obtained from the NIRs, while nutritional data for the training batches were obtained by applying data mining techniques, and the prediction equations were evaluated against the NIRs data from the validation batch. The prediction equations achieved from very good to excellent degrees of relationship (R > 0.75) and accurate results (MAE < 1, RMSEC < 1, RMSEP < 1) from the training batch. These prediction equations were corroborated using the validation batch, which showed very good to excellent correlation coefficients (R > 0.75 DK-1958, Frederiksberg C, Denmark. as computerized tomography (CT), magnetic resonance imaging (MRI) and near infra-red reflectance spectroscopy (NIRs) were proposed for determining some nutritional parameters in these products. CT was applied by some authors to characterize some nutritional parameters of meat products (Fulladosa et al., 2010;Picouet et al., 2013). MRI was proposed to monitor the ripening process of hams (Caballero et al., 2016) and to determine some quality parameters of Iberian dry-cured loins (Ávila et al., 2018;Caballero et al., 2017a;Caballero et al., 2017b;Pérez-Palacios et al., 2017). Collell et al. (2011) applied NIRs to predict data on moisture, water activity and NaCl content at the surface of dry-cured ham during processing. The fatty acid compositions of dry-cured ham subcutaneous fat were predicted by NIRs (Pérez-Juan et al., 2010). In lamb or rabbit meat products (Cifuni et al., 2016;Cozzolino et al., 2000), NIRs was applied in order to determine some nutritional parameters. Moreover, in other meat products, NIRs was applied to obtain some physico-chemical parameters (González-Mohino et al., 2018;Pérez-Palacios et al., 2019;Zamora-Rojas et al., 2011).
Data mining is an important part of a larger process known as Knowledge Discovery in Databases (KDD) (Fayyad et al., 1996). The main goal of data mining consists of extracting hidden information from a data set. This can be achieved by the automatic analysis of large amounts of data, which allows the extraction of interesting and previously unknown patterns (Sayad, 2011). The development of robust and efficient algorithms to process data and the increase in computing power has enabled the use of intensive computational methods for data analysis (Hastie et al., 2001). The development and interest in data mining have recently grown because of the rapidly decreasing cost of large storage devices and increasing ease of data collection over networks (Mitchell, 1999).
There are some examples of data mining techniques applied to determine quality traits of different meats, like beef (Song et al., 2002) or lamb (Cortez et al., 2006). In the case of pork, several examples of application of data mining techniques have been published (Caballero et al., 2016;Caballero et al., 2017a;Caballero et al., 2017b;Caballero et al., 2019;Pérez-Palacios et al., 2014;Pérez-Palacios et al., 2017;Silva et al., 2013).
However, currently, there is not a fast and specified method to obtain the required nutritional information about Iberian pork products according to the EU regulation. Judprasong et al. (2013) studied the performance of laboratories in Thailand that reported nutritional analyses according to ISO 13528 regulation. The analytical method proposed in the current paper is faster than the proposed method by Judprasong et al. (2013) and is faster than the official methods proposed by AOAC (2000). In fact, some industrial meat companies have applied this novel NIRs method of analysis successfully on their dry-cured meat products, and have confirmed it takes less time and money to obtain the nutritional parameters.
Therefore, the main objective of this study was to report a fast and accurate method to determine the values of the main nutritional parameters mandatory on the labelling of Iberian dry-cured and dry-fermented meat products based on NIRs technology and data mining techniques used in an automatic way.

Experimental design
Two batches of some Iberian meat products (dry-fermented Chorizo sausage, dry-cured ham, dry-cured loin, dry-fermented Salchichón sausage and dry-cured shoulders) were used.
The first batch was composed of 45 samples of each Iberian meat product for training the new method, while the second batch was composed of 50 samples of each product used to validate the new method. All these Iberian meat products were acquired in different supermarkets to maximise the sample variability for each product.
Firstly, the samples from the first training batch were analysed in two ways: i) by physico-chemical analyses to obtain the content of the main mandatory nutritional parameters and ii) by using a NIRs (FOSS FOODSCAN lab, FOSS analytics, Hillerod, Denmark) to acquire NIRs spectra. All these data were gathered in a database. In this database, Multiple Linear Regression (MLR) was applied to obtain prediction equations for the mandatory nutritional parameters for each Iberian dry-cured meat product as a function of the bandwidth values. Then, the samples from second, validation batch were analysed in the same two ways as were used for the training batch. Thus, the prediction equations obtained in the first, training batch were applied using the NIRs data obtained from second, validation batch, and for validating the prediction equations, the calculated values were evaluated with the data from physico-chemical analyses from samples comprising the second, validation batch. Figure 1 shows the experimental design followed in this study.

Physico-chemical analyses
The following physico-chemical analyses were performed in quintuplicate to determine the main nutritional parameters of each sample.
Lipid extraction of Iberian meat products was performed using the original extraction ratio of 20 parts of chloroform:methanol (2:1 v/v) to 1 part of sample. Briefly, 5 g of Iberian meat product sample were mixed with 100 ml of chloroform:methanol (2:1 v/v). The mixture was homogenised, centrifuged (10 min, 3000 rpm) and filtered. Subsequently, 5 ml of distilled water was added to the filtrate and the new mixture was shaken vigorously. The final biphasic system was separated by centrifugation (10 min, 3000 rpm). The upper aqueous phase was eliminated. The lower chloroform phase was filtered through anhydrous sodium sulphate and collected. The lipid content was then gravimetrically determined after chloroform was evaporated with a rotary evaporator under vacuum and the solvent was further evaporated under nitrogen (Pérez-Palacios et al., 2008).
Salt content was determined by the official method for meat and meat products (AOAC, 2000, ref. 971.19). It consists of mixing the sample with water and ethyl alcohol. After successive centrifugations, the final extract is obtained and further measured using volumetric analysis by precipitation.
Protein content was determined by the official method (AOAC, 2000, ref. 981.10). It consists of determining the nitrogen content by the Kjeldahl digestion method based on volumetric analysis, and then deriving the protein content by multiplying the nitrogen content by a factor of 6.25.

NIRs analyses
Approximately 20 g of each sample of Iberian meat product were minced using a commercial meat mincer (Moulinex A327R1, Moulinex, Alençon, France) and analysed by NIRs spectrometer (FOSS FOODSCAN lab, FOSS analytics, Hillerod, Denmark) using a wavelength range from 850 nm to 1048 nm and taking 45 s for each spectra. This spectrometer has a wavelength accuracy of 0.5 nm and a wavelength precision of 0.1 nm. For each sample, five spectra were acquired.
Next, spectral data were imported using WinIsi III (FOSS analytics, Hillerod, Denmark) to extract the numerical data from each NIRs spectrum. Then, the noise was eliminated by MSC filter (Martens and Naes, 1989). This correction allows a measured spectrum to be compared against a reference spectrum, the spectrum can be corrected using the slope of this fit, and, consequently, any outlier spectra can be removed.
Finally, values for 2 nm bandwidths were calculated from the NIRs spectra. Therefore, for each NIRs spectrum, two hundred values were calculated, extracted and gathered into the database.

Data mining analyses
Predictive data mining techniques were applied to the database constructed with results from the physico-chemical and NIRs analyses. Future models can be predicted from current data using trend analysis (Wu et al., 2008). Therefore, for each main nutritional parameter of each Iberian meat product, predictive equations as a function of NIRs data were obtained. Thus, the main nutritional parameters could be calculated as a function of NIRS data. The free software WEKA (Waikato Envi ronment for Knowledge Analysis v. 3.8.1, University of Waikato, Hamilton, New Zealand) was used to perform the predictive analyses. MLR models the linear relationship between a target variable and multiple independent variables. It produces a linear regression equation that can be used to predict future values (Hastie et al., 2001). For the selection of attributes, the M5 method was applied. This method is based on stepping though the attributes, the one with the smallest standardized coefficient being removed until no further improvement is observed in the error estimation (Kira and Rendell, 1992). A ridge value of 1 × 10 −4 was applied. The estimation procedure was a 10-fold cross validation (Dietterich, 1998), where the data were divided into 10 subsets of equal size. One subset was tested each time and the remaining data were used for fitting the model. The process was repeated sequentially until all subsets were tested. Therefore, all data subsets were used for both training and testing. However, since this method requires 10 analyses (i.e. with the 10 different data subsets), it is a robust method (Grossman et al., 2010).
The correlation coefficient (R; equation 2) was used for evaluating the goodness of the prediction and for its validation. According to the Colton rules (Colton, 1974), R from 0 to 0.25 is considered as little to no association, from 0.25 to 0.50 indicates a weak degree of relationship, from 0.50 to 0.75 indicates a moderate to good degree of relationship and from 0.75 to 1 indicates a very good to excellent degree of relationship.
( 2) where fi is the predicted value, yi is the real value and mean (y) is the average value.
Moreover, the mean absolute error (MAE; equation 3), root mean square error of calibration (RMSEC; equation 4) and root mean square error of prediction (RMSEP; equation 5) (Hyndman, 2006) were also used to validate the prediction results. The MAE measures the difference between real values and predicted ones. Values of MAE lower than 2 are appropriate (Hartemink and Minasny, 2016).
( 3) where fi is the predicted value and yi is the real value.
The RMSEP measures the relative difference between real values and predicted ones. This measure is commonly used to assess the predictive ability of the models, since is a constant measure for prediction. Values of RMSEP lower than 5 are appropriate (Hartemink and Minasny, 2016). (4) where fi is the predicted value and yi is the real value.
The RMSEC measures the goodness of fit between real data and the data from the calibration model. Depending on the type of data, models and their application can be subject to huge optimistic bias due to over-fitting compared to the results when applying the calibration (Austin and Tu, 2004). Values of RMSEC lower than 5 are appropriate (Hartemink and Minasny, 2016). (5) where fi is the real value and yi is the value obtained by the calibration model.

Comparison of analytical methods
The compatibility index (En) (Beilby, 1972) was applied to evaluate the compatibility of the official methods of analyses and the new NIRs method. This index was calculated by the following equation (Equation 6): where X1 is the average value according to the official method and X2 is the average value according to the new proposed method. S1 and S2 are the standard deviation for the official method and the new method, respectively. N1 is the number of samples for the official method and N2 is the number of samples for the new method. Values of En lower than 2 indicate the methods are compatible (Golnick et al., 2016).

Statistical analyses
Groups of values of the main nutritional parameters calculated by the official and proposed methods from training and validation batches were compared using one-way analysis of variance (ANOVA) using the general linear model procedure. In addition, the effect between batches was compared by using ANOVA. Statistical analyses were performed using the SPSS package software (IBM SPSS v. 20.0, IBM Co, New York, New York, USA, 2011).

Results from training batch
The main nutritional parameters of the Iberian meat products were obtained by applying the traditional physico-chemical analyses on the training batch. Table 1 shows the results from these analyses.
MLR was applied to the NIRs data from the training batch of Iberian meat products to obtain prediction equations for the main nutritional parameters (protein, lipid, salt and carbohydrate contents) of these meat products. Table 1 shows the predicted values based on the NIRs method and the correlation coefficient, p-value, RMSEP, RMSEC and MAE of the prediction equations for the main nutritional parameters of each Iberian meat product.
Note the correlation coefficient of the carbohydrate contents of dry-cured ham, dry-cured loin and dry-cured shoulder were not calculated because the values of the carbohydrates in these Iberian dry-cured Table 1. Mean ± standard error value, p-value, correlation coefficient (R), root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP) and mean absolute error (MAE) between values obtained using the official and new NIRs methods to produce the parameters (lipid content (%), protein content (%), salt content (%) and carbohydrate content), the main mandatory items on nutritional labels, for the training batch of different Iberian dry-cured and dry-fermented meat products (chorizo, ham, loin, Salchichón and shoulder).   Union, 2012;Ventanas, 2012). Therefore, for the new NIRs method proposed in this study, the values of carbohydrates for these meat products is shown as lower than 0.100 % (not detectable) in all samples of dry-cured ham, dry-cured loin and dry-cured shoulder. Prediction by means of NIRs data and values for nutritional analyses for the remaining cases achieved very good to excellent degrees of relationship according to the rules given by Colton (Colton, 1974) (R >0.75). R values were R >0.97 for the dry-cured ham and R >0.95 for the dry-cured loin, for all nutritional parameters. For MAE, for all prediction equations, the values obtained were lower than 1, which is an appropriate value for MAE (Hyndman, 2006). For RMSEP and RMSEC, for all prediction equations, the values obtained were lower than 1.50, which are appropriate values for RMSEP and RMSEC (Hartemink and Minasny, 2016). However, the most accurate results were obtained for the dry-cured ham (MAE < 0.15, RMSEC < 0.15, RMSEP < 0.30). Moreover, no significant differences were found between values from official methods and the new method based on prediction equations. Table 2. Mean ± standard error value, the p-value, the correlation coefficient (R), root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP) and mean absolute error (MAE) between values obtained using the official and new NIRs methods to produce the parameters (lipid content (%), protein content (%), salt content (%) and carbohydrate content), the main mandatory items on nutritional labels, for the validation batch of different Iberian dry-cured and dry-fermented meat products (chorizo, ham, loin, Salchichón and shoulder).  These results indicate the ability of the proposed model to calculate the main nutritional parameters based on NIRs and data mining are comparable with the results obtained by the official methods of analysis. Thus, the NIRs method could be proposed as an alternative method to estimate the values of these nutritional parameters with similar accuracy as the official methods. Previous studies obtained similar results (R > 0.75) for the nutritional parameters salt and lipid contents (Collell et al., 2011;González-Mohino et al., 2018;Pérez-Juan et al., 2010;Pérez-Palacios et al., 2019;Zamora-Rojas et al., 2011). MLR was evaluated to predict some quality parameters of Iberian dry-cured meat products, dry-cured ham or dry-cured loin, in previous studies (Caballero et al., 2016;Caballero et al., 2017a;Caballero et al., 2017b;Pérez-Palacios et al., 2014;Pérez-Palacios et al., 2017).

Official method
The ability of this proposed method to calculate the main nutritional parameters of different Iberian dry-cured meat products was next evaluated using the validation batch.

Results from validation batch
Taking a step forward, for validating the models obtained from the training batch, the prediction equations obtained from the training batch were applied to the NIRs data from validation batch of the different Iberian meat products. Table 2 shows the values from the official methods of analyses and the values from the new method based on NIRs, the correlation coefficients, RMSEC, RMSEP and MAE of the results obtained from NIRs data and data from the official analytical methods for the main nutritional parameters (lipid, protein, salt and carbohydrate contents) of the studied Iberian meat products. As previously stated, the values of carbohydrates for some dry-cured products (ham, loin and shoulder) were labelled as not detectable according to the current regulation (European Union, 2012).
The values obtained from the official methods and from the new method of analysis (Table 2) were similar to the results obtained from the training batch (Table 1), and they are in agreement with the results obtained in previous studies (Caballero et al., 2016;Cruz and Vieira, 2017;Lorenzo et al., 2000;Muriel et al., 2004;Utrilla et al., 2010;Ventanas, 2012).
The values obtained for the NIRs method and the official methods for all studied cases achieved very good to excellent correlation coefficients according to the rules given by Colton (1974) (R > 0.75). The strongest degrees of relationships were achieved for dry-cured loin (R > 0.90) and dry-cured shoulder (R > 0.85). Regarding the nutritional parameters, the highest correlation coefficients were found for the lipid and salt contents (R > 0.80) from all Iberian meat products. In relation to the MAE, in all cases, the values obtained were lower than 1.50, which is a good value for MAE (Hyndman, 2006), and for RMSEP and RMSEC, all values were lower than 2, which is a very good value for RMSEP and RMSEC (Hartemink and Minasny, 2016). However, the most accurate results were obtained for the Salchichón dry-fermented sausage (MAE < 1, RMSEC < 1 and RMSEP < 1.10).
Evaluating the values obtained for the compatibility index (Table 3), all values obtained were lower than 2 (En < 2), indicating the analytical methods used in this study were compatible (Golnick et al., 2016). For some products, their compatibility indices were lower than 1: dry-cured ham (En < 0.75), dry-fermented Chorizo sausage (En < 0.85) and dry-cured loin (En < 0.95). These results support the Table 3. Compatibility index (En) results, comparing the official methods and the new NIRs method of analysis analysing the main parameters on mandatory nutritional labels for the validation batch of different Iberian dry-fermented and dry-cured meat products. compatibility of the newly-developed method based on NIRs and data mining with the official methods of analysis. The batch effect on the main mandatory nutritional parameters of the studied Iberian meat products (dry-fermented Chorizo sausage, dry-cured ham, dry-cured loin, dry-fermented Salchichón sausage and dry-cured shoulder) was studied. No significant differences were found between training and validation batches for the main mandatory nutritional parameters. These results indicated the assignment of samples into batches had no influence on the results obtained, and therefore, the results are batch-independent. In a previous study, the batch effect acted like a random effect (Herbert et al., 1974), but in our case, the batch did not influence the resul ts.

Chorizo
Average values for the main mandatory nutritional parameters determined by official methods and our new method based on NIRs are shown in Figure  2. Thus, the accuracy of the new method based on NIRs was corroborated for lipid ( Figure 2A), protein ( Figure 2B), salt ( Figure 2C) and carbohydrate ( Figure 2D) contents. No significant differences were found between values from official methods and the new method based on NIRs. These results are in accordance with the results showed in Table 3.

Conclusion
A new, fast and accurate analytical method was studied, which should be suitable to analyse the content of the main nutritional parameters required for mandatory labelling according to the new EU regulation. The new method is based on prediction equations obtained from the combination of NIRs spectra and data mining techniques. In other proposed methods in ISO 13528, the lead-in time to produce results is around 5 or 6 days, with similar accuracy to the official methods (although these analyses were conducted on rice, not on meat and meat products). However, the new NIRs method proposed in this study is faster and more accurate than the previous methods for the required nutritional analyses in comparison with the traditional methods. Additionally, only a small number of samples is required before the results obtained are accurate. Thus, the new NIRs method produces the main nutritional parameters for each dry-cured meat product in around 10 minutes per sample, using a NIRs spectrometer and a conventional laptop in an automated way. This method could be of interest to inspection agencies in order to evaluate the nutritional labelling of Iberian meat products in a timely manner.
Disclosure statement: No potential conflict of interest was reported by authors.
Acknowledgment: Daniel Caballero thanks the "Junta de Extremadura" for the post-doctoral grant (PO17017).