Movatterモバイル変換


[0]ホーム

URL:


 
 
Search for Articles:
Title / Keyword
Author / Affiliation / Email
Journal
Article Type
 
 
Section
Special Issue
Volume
Issue
Number
Page
 
Logical OperatorOperator
Search Text
Search Type
 
add_circle_outline
remove_circle_outline
 
 
Journals
Data

Journal Description

Data

Data is apeer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability. The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.
  • Open Access— free for readers, witharticle processing charges (APC) paid by authors or their institutions.
  • High Visibility: indexed withinScopus,ESCI (Web of Science),Ei CompendexdblpInspec,RePEc, and other databases.
  • Journal Rank: JCR - Q2 (Multidisciplinary Sciences) / CiteScore - Q2 (Information Systems and Management)
  • Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 26.8 days after submission; acceptance to publication is undertaken in 3.6 days (median values for papers published in this journal in the second half of 2024).
  • Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor: 2.2 (2023); 5-Year Impact Factor: 2.4 (2023)

Latest Articles

15 pages, 1302 KiB  
Data Descriptor
Experimental Parametric Forecast of Solar Energy over Time: Sample Data Descriptor
byFernando Venâncio Mucomole,Carlos Augusto Santos Silva andLourenço Lázaro Magaia
Data2025,10(3), 37;https://doi.org/10.3390/data10030037 - 17 Mar 2025
Abstract
Variations in solar energy when it reaches the Earth impact the production of photovoltaic (PV) solar plants and, in turn, the dynamics of clean energy expansion. This incentivizes the objective of experimentally forecasting solar energy by parametric models, the results of which are [...] Read more.
Variations in solar energy when it reaches the Earth impact the production of photovoltaic (PV) solar plants and, in turn, the dynamics of clean energy expansion. This incentivizes the objective of experimentally forecasting solar energy by parametric models, the results of which are then refined by machine learning methods (MLMs). To estimate solar energy, parametric models consider all atmospheric, climatic, geographic, and spatiotemporal factors that influence decreases in solar energy. In this study, data on ozone, evenly mixed gases, water vapor, aerosols, and solar radiation were gathered throughout the year in the mid-north area of Mozambique. The results show that the calculated solar energy was close to the theoretical solar energy under a clear sky. When paired with MLMs, the clear-sky index had a correlational order of 0.98, with most full-sun days having intermediate and clear-sky types. This suggests the potential of this area for PV use, with high correlation and regression coefficients in the range of 0.86 and 0.89 and a measurement error in the range of 0.25. We conclude that evenly mixed gases and the ozone layer have considerable influence on transmittance. However, the parametrically forecasted solar energy is close to the energy forecasted by the theoretical model. By adjusting the local characteristics, the model can be used in diverse contexts to increase PV plants’ electrical power output efficiency.Full article
(This article belongs to the TopicSmart Energy Systems, 2nd Edition)
Show Figures

Figure 1

28 pages, 68080 KiB  
Article
KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition
byHyeongbok Kim,Eunbi Kim,Sanghoon Ahn,Beomjin Kim,Sung Jin Kim,Tae Kyung Sung,Lingling Zhao,Xiaohong Su andGilmu Dong
Data2025,10(3), 36; https://doi.org/10.3390/data10030036 (registering DOI) - 14 Mar 2025
Abstract
Comprehensive datasets are crucial for developing advanced AI solutions in road infrastructure, yet most existing resources focus narrowly on vehicles or a limited set of object categories. To address this gap, we introduce the Korean Road Infrastructure Dataset (KRID), a large-scale dataset designed [...] Read more.
Comprehensive datasets are crucial for developing advanced AI solutions in road infrastructure, yet most existing resources focus narrowly on vehicles or a limited set of object categories. To address this gap, we introduce the Korean Road Infrastructure Dataset (KRID), a large-scale dataset designed for real-world road maintenance and safety applications. Our dataset covers highways, national roads, and local roads in both city and non-city areas, comprising 34 distinct types of road infrastructure—from common elements (e.g., traffic signals, gaze-directed poles) to specialized structures (e.g., tunnels, guardrails). Each instance is annotated with either bounding boxes or polygon segmentation masks under stringent quality control and privacy protocols. To demonstrate the utility of this resource, we conducted object detection and segmentation experiments using YOLO-based models, focusing on guardrail damage detection and traffic sign recognition. Preliminary results confirm its suitability for complex, safety-critical scenarios in intelligent transportation systems. Our main contributions include: (1) a broader range of infrastructure classes than conventional “driving perception” datasets, (2) high-resolution, privacy-compliant annotations across diverse road conditions, and (3) open-access availability through AI Hub and GitHub. By highlighting critical yet often overlooked infrastructure elements, this dataset paves the way for AI-driven maintenance workflows, hazard detection, and further innovations in road safety.Full article
Show Figures

Figure 1

14 pages, 3207 KiB  
Data Descriptor
A Comprehensive Indoor Environment Dataset from Single-Family Houses in the US
bySheik Murad Hassan Anik,Xinghua Gao andNa Meng
Data2025,10(3), 35;https://doi.org/10.3390/data10030035 - 5 Mar 2025
Abstract
The paper describes a dataset comprising indoor environmental factors such as temperature, humidity, air quality, and noise levels. The data were collected from 10 sensing devices installed in various locations within three single-family houses in Virginia, USA. The objective of the data collection [...] Read more.
The paper describes a dataset comprising indoor environmental factors such as temperature, humidity, air quality, and noise levels. The data were collected from 10 sensing devices installed in various locations within three single-family houses in Virginia, USA. The objective of the data collection was to study the indoor environmental conditions of the houses over time. The data were collected at a frequency of one record per minute for a year, combining to a total over2.5 million records. The paper provides actual floor plans with sensor placements to aid researchers and practitioners in creating reliable building performance models. The techniques used to collect and verify the data are also explained in the paper. The resulting dataset can be employed to enhance models for building energy consumption, occupant behavior, predictive maintenance, and other relevant purposes.Full article
Show Figures

Figure 1

7 pages, 407 KiB  
Data Descriptor
Draft Genome Sequence Data of theEnsifer sp. P24N7, a Symbiotic Bacteria Isolated from Nodules ofPhaseolus vulgaris Grown in Mining Tailings from Huautla, Morelos, Mexico
byJosé Augusto Ramírez-Trujillo,Maria Guadalupe Castillo-Texta,Mario Ramírez-Yáñez andRamón Suárez-Rodríguez
Data2025,10(3), 34;https://doi.org/10.3390/data10030034 - 27 Feb 2025
Abstract
In this work, we report the draft genome sequence ofEnsifer sp. P24N7, a symbiotic nitrogen-fixing bacterium isolated from nodules ofPhaseolus vulgaris var. Negro Jamapa was planted in pots that contained mining tailings from Huautla, Morelos, México. The genomic DNA was sequenced [...] Read more.
In this work, we report the draft genome sequence ofEnsifer sp. P24N7, a symbiotic nitrogen-fixing bacterium isolated from nodules ofPhaseolus vulgaris var. Negro Jamapa was planted in pots that contained mining tailings from Huautla, Morelos, México. The genomic DNA was sequenced by an Illumina NovaSeq 6000 using the 250 bp paired-end protocol obtaining 1,188,899 reads. An assembly generated with SPAdes v. 3.15.4 resulted in a genome length of 7,165,722 bp composed of 181 contigs with a N50 of 323,467 bp, a coverage of 76X, and a GC content of 61.96%. The genome was annotated with the NCBI Prokaryotic Genome Annotation Pipeline and contains 6631 protein-coding sequences, 3 complete rRNAs, 52 tRNAs, and 4 non-coding RNAs. TheEnsifer sp. P24N7 genome has 59 genes related to heavy metal tolerance predicted by RAST server. These data may be useful to the scientific community because they can be used as a reference for other works related to heavy metals, including works in Huautla, Morelos.Full article
(This article belongs to the Special IssueBenchmarking Datasets in Bioinformatics, 2nd Edition)
Show Figures

Figure 1

15 pages, 838 KiB  
Article
Data Quality Tools to Enhance a Network Anomaly Detection Benchmark
byJosé Camacho andRafael A. Rodríguez-Gómez
Data2025,10(3), 33;https://doi.org/10.3390/data10030033 - 25 Feb 2025
Abstract
Network traffic datasets are essential for the construction of traffic models, often using machine learning (ML) techniques. Among other applications, these models can be employed to solve complex optimization problems or to identify anomalous behaviors, i.e., behaviors that deviate from the established model. [...] Read more.
Network traffic datasets are essential for the construction of traffic models, often using machine learning (ML) techniques. Among other applications, these models can be employed to solve complex optimization problems or to identify anomalous behaviors, i.e., behaviors that deviate from the established model. However, the performance of the ML model depends, among other factors, on the quality of the data used to train it. Benchmark datasets, with a profound impact on research findings, are often assumed to be of good quality by default. In this paper, we derive four variants of a benchmark dataset in network anomaly detection (UGR’16, a flow-based real-world traffic dataset designed for anomaly detection), and show that the choice among variants has a larger impact on model performance than the ML technique used to build the model. To analyze this phenomenon, we propose a methodology to investigate the causes of these differences and to assess the quality of the data labeling. Our results underline the importance of paying more attention to data quality assessment in network anomaly detection.Full article
Show Figures

Figure 1

7 pages, 1353 KiB  
Data Descriptor
Spatial Dataset of Climate Robust and High-Yield Agricultural Areas in Brandenburg: Results of a Classification Framework Using Bio-Economic Climate Simulations
byHannah Jona von Czettritz,Sandra Uthes,Johannes Schuler,Kurt-Christian Kersebaum andPeter Zander
Data2025,10(3), 32;https://doi.org/10.3390/data10030032 - 25 Feb 2025
Abstract
Coherent spatial data are crucial for informed land use and regional planning decisions, particularly in the context of securing a crisis-proof food supply and adapting to climate change. This dataset provides spatial information on climate-robust and high-yield agricultural arable land in Brandenburg, Germany, [...] Read more.
Coherent spatial data are crucial for informed land use and regional planning decisions, particularly in the context of securing a crisis-proof food supply and adapting to climate change. This dataset provides spatial information on climate-robust and high-yield agricultural arable land in Brandenburg, Germany, based on the results of a classification using bio-economic climate simulations. The dataset is intended to support regional planning and policy makers in zoning decisions (e.g., photovoltaic power plants) by identifying climate-robust arable land with high current and stable future production potential that should be reserved for agricultural use. The classification method used to generate the dataset includes a wide range of indicators, including established approaches, such as a soil quality index, drought, water, and wind erosion risk, as well as a dynamic approach, using bio-economic simulations, which determine the production potential under future climate scenarios. The dataset is a valuable resource for spatial planning and climate change adaptation, contributing to long-term food security especially in dry areas such as the state of Brandenburg facing increased production risk under future climatic conditions, thereby serving globally as an example for land use planning challenges related to climate change.Full article
(This article belongs to the SectionSpatial Data Science and Digital Earth)
Show Figures

Figure 1

16 pages, 7115 KiB  
Article
Using Weather Data for Improved Analysis of Vehicle Energy Efficiency
byReno Filla
Data2025,10(3), 31;https://doi.org/10.3390/data10030031 - 24 Feb 2025
Abstract
In moving vehicles, the dominating energy losses are due to interactions with the environment: air resistance and rolling resistance. It is known that weather has a significant impact, yet there is a lack of literature showing how the wealth of openly available data [...] Read more.
In moving vehicles, the dominating energy losses are due to interactions with the environment: air resistance and rolling resistance. It is known that weather has a significant impact, yet there is a lack of literature showing how the wealth of openly available data from professional weather observations can be used in this context. This article will give an overview of how such data are structured and how they can be accessed in order to augment logs gained during vehicle operation or simulated trips. Two efficient algorithms for such data extraction and augmentation are discussed and several examples for use are provided, also demonstrating that some caveats do exist with respect to the source of weather data.Full article
(This article belongs to the SectionSpatial Data Science and Digital Earth)
Show Figures

Figure 1

9 pages, 752 KiB  
Data Descriptor
Open Georeferenced Field Data on Forest Types and Species for Biodiversity Assessment and Remote Sensing Applications
byPatrizia Gasparini,Lucio Di Cosmo,Antonio Floris,Federica Murgia andMaria Rizzo
Data2025,10(3), 30;https://doi.org/10.3390/data10030030 - 21 Feb 2025
Abstract
Forest ecosystems are important for biodiversity conservation, climate regulation and climate change mitigation, soil and water protection, and the recreation and provision of raw materials. This paper presents a dataset on forest type and tree species composition for 934 georeferenced plots located in [...] Read more.
Forest ecosystems are important for biodiversity conservation, climate regulation and climate change mitigation, soil and water protection, and the recreation and provision of raw materials. This paper presents a dataset on forest type and tree species composition for 934 georeferenced plots located in Italy. The forest type is classified in the field consistently with the Italian National Forest Inventory (NFI) based on the dominant tree species or species group. Tree species composition is provided by the percent crown cover of the main five species in the plot. Additional data on conifer and broadleaves pure/mixed condition, total tree and shrub cover, forest structure, sylvicultural system, development stage, and local land position are provided. The surveyed plots are distributed in the central–eastern Alps, in the central Apennines, and in the southern Apennines; they represent a wide range of species composition, ecological conditions, and silvicultural practices. Data were collected as part of a project aimed at developing a classification algorithm based on hyperspectral data. The dataset was made publicly available as it refers to forest types and species widespread in many countries of Central and Southern Europe and is potentially useful to other researchers for the study of forest biodiversity or for remote sensing applications.Full article
Show Figures

Figure 1

19 pages, 251 KiB  
Data Descriptor
HOSPI Application to Portuguese Hospitals’ Websites
byDelfina Soares,Joana Carvalho andDimitrios Sarantis
Data2025,10(3), 29;https://doi.org/10.3390/data10030029 - 21 Feb 2025
Abstract
The Health Online Service Provision Index (HOSPI) is an instrument to assess and monitor hospitals’ websites. The index comprises four criteria—Content, Services, Community Interaction and Technology Features—each with a subset of indicators and sub-indicators. HOSPI was applied to the Portuguese hospitals’ websites in [...] Read more.
The Health Online Service Provision Index (HOSPI) is an instrument to assess and monitor hospitals’ websites. The index comprises four criteria—Content, Services, Community Interaction and Technology Features—each with a subset of indicators and sub-indicators. HOSPI was applied to the Portuguese hospitals’ websites in 2023, originating the dataset described in this article. The article also provides a detailed account of the data collection process, which involved direct observation of the websites and specific treatment methods, ensuring the reliability and validity of the dataset. It underscores the relevance of having this data available and how it can improve service provision online in health facilities and support policymaking.Full article
18 pages, 639 KiB  
Article
A Directory of Datasets for Mining Software Repositories
byThemistoklis Diamantopoulos andAndreas L. Symeonidis
Data2025,10(3), 28;https://doi.org/10.3390/data10030028 - 20 Feb 2025
Abstract
The amount of software engineering data is constantly growing, as more and more developers employ online services to store their code, keep track of bugs, or even discuss issues. The data residing in these services can be mined to address different research challenges; [...] Read more.
The amount of software engineering data is constantly growing, as more and more developers employ online services to store their code, keep track of bugs, or even discuss issues. The data residing in these services can be mined to address different research challenges; therefore, certain initiatives have been established to encourage sharing research datasets collecting them. In this work, we investigate the effect of such an initiative; we create a directory that includes the papers and the corresponding datasets of the data track of the Mining Software Engineering (MSR) conference. Specifically, our directory includes metadata and citation information for the papers of all data tracks, throughout the last twelve years. We also annotate the datasets according to the data source and further assess their compliance to the FAIR principles. Using our directory, researchers can find useful datasets for their research, or even design methodologies for assessing their quality, especially in the software engineering domain. Moreover, the directory can be used for analyzing the citations of data papers, especially with regard to different data categories, as well as for examining their FAIRness score throughout the years, along with its effect on the usage/citation of the datasets.Full article
(This article belongs to the SectionInformation Systems and Data Management)
Show Figures

Figure 1

29 pages, 4066 KiB  
Article
SAPEx-D: A Comprehensive Dataset for Predictive Analytics in Personalized Education Using Machine Learning
byMuhammad Adnan Aslam,Fiza Murtaza,Muhammad Ehatisham Ul Haq,Amanullah Yasin andNuman Ali
Data2025,10(3), 27;https://doi.org/10.3390/data10030027 - 20 Feb 2025
Abstract
Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student [...] Read more.
Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student career chances, predicting learning success has been a central focus in education. Both performance analysis and providing high-quality instruction are challenges faced by modern schools. Maintaining high academic standards, juggling life and academics, and adjusting to technology are problems that students must overcome. In this study, we present a comprehensive dataset, SAPEx-D (Student Academic Performance Exploration), designed to predict student performance, encompassing a wide array of personal, familial, academic, and behavioral factors. Our data collection effort at Air University, Islamabad, Pakistan, involved both online and paper questionnaires completed by students across multiple departments, ensuring diverse representation. After meticulous preprocessing to remove duplicates and entries with significant missing values, we retained 494 valid responses. The dataset includes detailed attributes such as demographic information, parental education and occupation, study habits, reading frequencies, and transportation modes. To facilitate robust analysis, we encoded ordinal attributes using label encoding and nominal attributes using one-hot encoding, expanding our dataset from 38 to 88 attributes. Feature scaling was performed to standardize the range and distribution of data, using a normalization technique. Our analysis revealed that factors such as degree major, parental education, reading frequency, and scholarship type significantly influence student performance. The machine learning models applied to this dataset, including Gradient Boosting and Random Forest, demonstrated high accuracy and robustness, underscoring the dataset’s potential for insightful academic performance prediction. In terms of model performance, Gradient Boosting achieved an accuracy of 68.7% and an F1-score of 68% for the eight-class classification task. For the three-class classification, Random Forest outperformed other models, reaching an accuracy of 80.8% and an F1-score of 78%. These findings highlight the importance of comprehensive data in understanding and predicting academic outcomes, paving the way for more personalized and effective educational strategies.Full article
Show Figures

Figure 1

19 pages, 477 KiB  
Article
Consistency and Stability in Feature Selection for High-Dimensional Microarray Survival Data in Diffuse Large B-Cell Lymphoma Cancer
byKazeem A. Dauda andRasheed K. Lamidi
Data2025,10(2), 26;https://doi.org/10.3390/data10020026 - 18 Feb 2025
Abstract
High-dimensional survival data, such as microarray datasets, present significant challenges in variable selection and model performance due to their complexity and dimensionality. Identifying important genes and understanding how these genes influence the survival of patients with cancer are of great interest and a [...] Read more.
High-dimensional survival data, such as microarray datasets, present significant challenges in variable selection and model performance due to their complexity and dimensionality. Identifying important genes and understanding how these genes influence the survival of patients with cancer are of great interest and a major challenge to biomedical scientists, healthcare practitioners, and oncologists. Therefore, this study combined the strengths of two complementary feature selection methodologies: a filtering (correlation-based) approach and a wrapper method based on Iterative Bayesian Model Averaging (IBMA). This new approach, termed Correlation-Based IBMA, offers a highly efficient and effective means of selecting the most important and influential genes for predicting the survival of patients with cancer. The efficiency and consistency of the method were demonstrated using diffuse large B-cell lymphoma cancer data. The results revealed that the 15 most important genes out of 3835 gene features were consistently selected at a thresholdp-value of 0.001, with genes with posterior probabilities below 1% being removed. The influence of these 15 genes on patient survival was assessed using the Cox Proportional Hazards (Cox-PH) Model. The results further revealed that eight genes were highly associated with patient survival at a 0.05 level of significance. Finally, these findings underscore the importance of integrating feature selection with robust modeling approaches to enhance accuracy and interpretability in high-dimensional survival data analysis.Full article
Show Figures

Figure 1

22 pages, 6282 KiB  
Article
CropsDisNet: An AI-Based Platform for Disease Detection and Advancing On-Farm Privacy Solutions
byMohammad Badhruddouza Khan,Salwa Tamkin,Jinat Ara,Mobashwer Alam andHanif Bhuiyan
Data2025,10(2), 25;https://doi.org/10.3390/data10020025 - 18 Feb 2025
Abstract
Crop failure is defined as crop production that is significantly lower than anticipated, resulting from plants that are harmed, diseased, destroyed, or influenced by climatic circumstances. With the rise in global food security concern, the earliest detection of crop diseases has proven to [...] Read more.
Crop failure is defined as crop production that is significantly lower than anticipated, resulting from plants that are harmed, diseased, destroyed, or influenced by climatic circumstances. With the rise in global food security concern, the earliest detection of crop diseases has proven to be pivotal in agriculture industries to address the needs of the global food crisis and on-farm data protection, which can be met with a privacy-preserving deep learning model. However, deep learning seems to be a largely complex black box to interpret, necessitating a prerequisite for the groundwork of the model’s interpretability. Considering this, the aim of this study was to follow up on the establishment of a robust deep learning custom model named CropsDisNet, evaluated on a large-scale dataset named “New Bangladeshi Crop Disease Dataset (corn, potato and wheat)”, which contains a total of 8946 images. The integration of a differential privacy algorithm into our CropsDisNet model could establish the benefits of automated crop disease classification without compromising on-farm data privacy by reducing training data leakage. To classify corn, potato, and wheat leaf diseases, we used three representative CNN models for image classification (VGG16, Inception Resnet V2, Inception V3) along with our custom model, and the classification accuracy for these three different crops varied from 92.09% to 98.29%. In addition, demonstration of the model’s interpretability gave us insight into our model’s decision making and classification results, which can allow farmers to understand and take appropriate precautions in the event of early widespread harvest failure and food crises.Full article
Show Figures

Figure 1

22 pages, 3785 KiB  
Article
Visual Footprint of Separation Through Membrane Distillation on YouTube
byErsin Aytaç andMohamed Khayet
Data2025,10(2), 24;https://doi.org/10.3390/data10020024 - 8 Feb 2025
Abstract
Social media has revolutionized the dissemination of information, enabling the rapid and widespread sharing of news, concepts, technologies, and ideas. YouTube is one of the most important online video sharing platforms of our time. In this research, we investigate the trace of separation [...] Read more.
Social media has revolutionized the dissemination of information, enabling the rapid and widespread sharing of news, concepts, technologies, and ideas. YouTube is one of the most important online video sharing platforms of our time. In this research, we investigate the trace of separation through membrane distillation (MD) on YouTube using statistical methods and natural language processing. The dataset collected on 04.01.2024 included 212 videos with key characteristics such as durations, views, subscribers, number of comments, likes, etc. The results show that the number of videos is not sufficient, but there is an increasing trend, especially since 2019. The high number of channels offering information about MD technology in countries such as the USA, India, and Canada indicates that these countries recognized the practical benefits of this technology, especially in areas such as water treatment, desalination, and industrial applications. This suggests that MD could play a pivotal role in finding solutions to global water challenges. Word cloud analysis showed that terms such as “water”, “treatment”, “desalination”, and “separation” were prominent, indicating that the videos focused mainly on the principles and applications of MD. The sentiment of the comments is mostly positive, and the dominant emotion is neutral, revealing that viewers generally have a positive attitude towards MD. The narrative intensity metric evaluates the information transfer efficiency of the videos and provides a guide for effective content creation strategies. The results of the analyses revealed that social media awareness about MD technology is still not sufficient and that content development and sharing strategies should focus on bringing the technology to a wider audience.Full article
Show Figures

Figure 1

17 pages, 662 KiB  
Article
A Bayesian State-Space Approach to Dynamic Hierarchical Logistic Regression for Evolving Student Risk in Educational Analytics
byMoeketsi Mosia
Data2025,10(2), 23;https://doi.org/10.3390/data10020023 - 7 Feb 2025
Abstract
Early detection of academically at-risk students is crucial for designing timely interventions that improve educational outcomes. However, many existing approaches either ignore the temporal evolution of student performance or rely on “black box” models that sacrifice interpretability. In this study, we develop a [...] Read more.
Early detection of academically at-risk students is crucial for designing timely interventions that improve educational outcomes. However, many existing approaches either ignore the temporal evolution of student performance or rely on “black box” models that sacrifice interpretability. In this study, we develop a dynamic hierarchical logistic regression model in a fully Bayesian framework to address these shortcomings. Our method leverages partial pooling across students and employs a state-space formulation, allowing each student’s log-odds of failure to evolve over multiple assessments. By using Markov chain Monte Carlo for inference, we obtain robust posterior estimates and credible intervals for both population-level and individual-specific effects, while posterior predictive checks ensure model adequacy and calibration. Results from simulated and real-world datasets indicate that the proposed approach more accurately tracks fluctuations in student risk compared to static logistic regression, and it yields interpretable insights into how engagement patterns and demographic factors influence failure probability. We conclude that a Bayesian dynamic hierarchical model not only enhances prediction of at-risk students but also provides actionable feedback for instructors and administrators seeking evidence-based interventions.Full article
Show Figures

Figure 1

14 pages, 770 KiB  
Article
Stress Factors in Higher Education: A Data Analysis Case
byRodolfo Bojorque,Fernando Moscoso,Fernando Pesántez andÁngela Flores
Data2025,10(2), 22;https://doi.org/10.3390/data10020022 - 7 Feb 2025
Abstract
This study investigates stressors in higher education, focusing on their impact on students and faculty at Universidad Politécnica Salesiana (UPS) and using eight years of comprehensive data. Employing data mining techniques, the research analyzed enrollment, retention, graduation, employability, socioeconomic status, academic performance, and [...] Read more.
This study investigates stressors in higher education, focusing on their impact on students and faculty at Universidad Politécnica Salesiana (UPS) and using eight years of comprehensive data. Employing data mining techniques, the research analyzed enrollment, retention, graduation, employability, socioeconomic status, academic performance, and faculty workload to uncover patterns affecting academic outcomes. The study found that UPS exhibits a stable educational system, maintaining consistent metrics across student success indicators. However, the COVID-19 pandemic presented unique stressors, evidenced by a paradoxical increase in student grades during heightened faculty stress levels. This anomaly suggests a potential link between academic rigor and faculty well-being during systemic disruptions. Stressors affecting students directly correlated with reduced academic performance, highlighting the importance of early detection and intervention. Conversely, faculty stress was reflected in adjustments to grading practices, raising questions about institutional pressures and faculty motivation. These findings emphasize the value of proactive data analytics in identifying stress-induced anomalies to support student success and faculty well-being. The study advocates for further research on faculty burnout, motivation, and institutional strategies to mitigate stressors, underscoring the potential of data-driven approaches to enhance the quality and sustainability of higher education ecosystems.Full article
Show Figures

Figure 1

12 pages, 6336 KiB  
Data Descriptor
An Open Database of the Internal and Surface Temperatures of a Reinforced-Concrete Slab-on-I-Beam Section
byPedro Cavadia,José M. Benjumea,Oscar Begambre,Edison Osorio andMaría A. Mantilla
Data2025,10(2), 21;https://doi.org/10.3390/data10020021 - 4 Feb 2025
Abstract
Due to climate change, the temperature monitoring of reinforced-concrete (RC) structures is becoming critical for preventive maintenance and extending their lifespan. Significant temperature variations in RC elements can affect their natural frequencies and modulus of elasticity or generate abnormal stress levels, potentially leading [...] Read more.
Due to climate change, the temperature monitoring of reinforced-concrete (RC) structures is becoming critical for preventive maintenance and extending their lifespan. Significant temperature variations in RC elements can affect their natural frequencies and modulus of elasticity or generate abnormal stress levels, potentially leading to structural damage. Data from thermal monitoring systems are invaluable for testing and validating numerical methodologies for estimating internal thermal responses and aiding in prevention/maintenance decision making. Despite its importance, few experimental outdoor data on the internal and external temperatures of concrete structures are available. This study presents a comprehensive dataset from a 120-day temperature-monitoring campaign on a 1.2 m long reinforced-concrete slab-on-I-beam model under tropical conditions in Bucaramanga, Colombia. The monitoring system measured the internal temperatures at 40 points using embedded thermocouples, while the surface temperatures were recorded with handheld and drone-mounted thermal cameras. Simultaneously, the ambient temperature, solar radiation, rainfall, wind velocity, and other parameters were monitored using a weather station. The instrumentation ensured the synchronization and high spatial resolution of the thermal data. The data, collected at 30 min intervals, are openly available in CSV format, offering valuable resources for validating numerical models, studying thermal gradients, and enhancing structural health-monitoring frameworks.Full article
Show Figures

Figure 1

16 pages, 2825 KiB  
Article
Seaweed-Based Bioplastics: Data Mining Ingredient–Property Relations from the Scientific Literature
byFernanda Véliz,Thulasi Bikku,Davor Ibarra-Pérez,Valentina Hernández-Muñoz,Alysia Garmulewicz andFelipe Herrera
Data2025,10(2), 20;https://doi.org/10.3390/data10020020 - 1 Feb 2025
Abstract
Automated analysis of the scientific literature using natural language processing (NLP) can accelerate the identification of potentially unexplored formulations that enable innovations in materials engineering with fewer experimentation and testing cycles. This strategy has been successful for specific classes of inorganic materials, but [...] Read more.
Automated analysis of the scientific literature using natural language processing (NLP) can accelerate the identification of potentially unexplored formulations that enable innovations in materials engineering with fewer experimentation and testing cycles. This strategy has been successful for specific classes of inorganic materials, but their general application in broader material domains such as bioplastics remains challenging. To begin addressing this gap, we explore correlations between the ingredients and physicochemical properties of seaweed-based biofilms from a corpus of 2000 article abstracts from the scientific literature since 1958, using a supervised word co-occurrence analysis and an unsupervised approach based on the language model MatBERT without fine-tuning. Using known relations between ingredients and properties for test scenarios, we discuss the potential and limitations of these NLP approaches for identifying novel combinations of polysaccharides, plasticizers, and additives that are related to the functionality of seaweed biofilms. The model demonstrates a valuable predictive ability to identify ingredients associated with increased water vapor permeability, suggesting its potential utility in optimizing formulations for future research. Using the model further revealed alternative combinations that are underrepresented in the literature. This automated method facilitates the mapping of relationships between ingredients and properties, guiding the development of seaweed bioplastic formulations. The unstructured and heterogeneous nature of the literature on bioplastics represents a particular challenge that demands ad hoc fine-tuning strategies for state-of-the-art language models for advancing the field of seaweed bioplastics.Full article
Show Figures

Figure 1

21 pages, 6166 KiB  
Article
Impact of Various Land Cover Transformations on Climate Change: Insights from a Spatial Panel Analysis
byMohsen Khezri
Data2025,10(2), 19;https://doi.org/10.3390/data10020019 - 31 Jan 2025
Abstract
This study introduces an innovative empirical methodology by integrating spatial panel models with satellite imagery data from 1970 to 2019. This innovative approach illuminates the effects of greenhouse gas emissions, deforestation, and various global variables on regional temperature shifts and the environmental repercussions [...] Read more.
This study introduces an innovative empirical methodology by integrating spatial panel models with satellite imagery data from 1970 to 2019. This innovative approach illuminates the effects of greenhouse gas emissions, deforestation, and various global variables on regional temperature shifts and the environmental repercussions of land-use alterations, establishing a substantial empirical basis for climate change. The results revealed that global variables such as sunspot activity, the length of day (LOD), and the Global Mean Sea Level (GMSL) have negligible impacts on global temperature variations. This model uncovers the nuanced effect of deforestation on global temperatures, highlighting a decrease in temperature following deforestation above 40°N latitude, contrary to the warming effect observed in lower latitudes. Exceptionally, deforestation within the 10° N to 10° S tropical bands results in a temperature decrease, challenging the established theories. The results suggest that converting forests to grass/shrublands and croplands plays a significant role in these temperature dynamics.Full article
Show Figures

Figure 1

12 pages, 305 KiB  
Article
Statistical Approach in Personalized Nutrition Exemplified by Reanalysis of Public Datasets
byPaola G. Ferrario,Maik Döring andChristian Ritz
Data2025,10(2), 18;https://doi.org/10.3390/data10020018 - 30 Jan 2025
Abstract
In clinical nutrition, it is regularly observed that individuals respond differently to a dietary treatment. Personalized nutrition aims to consider such variability in response by delivering personalized nutritional recommendations. Ideally, the optimal treatment for each individual will be selected and then dispensed according [...] Read more.
In clinical nutrition, it is regularly observed that individuals respond differently to a dietary treatment. Personalized nutrition aims to consider such variability in response by delivering personalized nutritional recommendations. Ideally, the optimal treatment for each individual will be selected and then dispensed according to the specific individual’s characteristics. The aim of this paper is to discuss and apply existing statistical methods, which can be adequately used in the context of personalized nutrition. We discuss the estimation of individualized treatment rules (ITRs) as we wish to favor one out of two interventions. The applicability of the methods is demonstrated by reusing two public datasets: one in the context of a parallel group design and one in the context of a crossover design. The bias of the estimator of the ITRs underlying parameters is evaluated in a simulation study.Full article
data-logo

Journal Browser

Journal Browser

Highly Accessed Articles

E-Mail Alert

Add your e-mail address to receive forthcoming issues of this journal:

Topics

Topic inData,Energies,Sensors,Sustainability,Water
Water and Energy Monitoring and Their NexusTopic Editors: Lucas Pereira, Hugo Morais, Wolf-Gerrit Früh
Deadline: 31 March 2025
Topic inAlgorithms,Data,Earth,Geosciences,Mathematics,Land,Water,IJGI
Applications of Algorithms in Risk Assessment and EvaluationTopic Editors: Yiding Bao, Qiang Wei
Deadline: 31 July 2025
Topic inAI,Data,Economies,Mathematics,Risks
Advanced Techniques and Modeling in Business and EconomicsTopic Editors: José Manuel Santos-Jaén, Ana León-Gomez, María del Carmen Valls Martínez
Deadline: 30 September 2025
Topic inBiology,Data,Diversity,Fishes,Animals,Conservation,Hydrobiology
Intersection Between Macroecology and Data ScienceTopic Editors: Paulo Branco, Gonçalo Duarte
Deadline: 30 November 2025
loading...

Special Issues

Special Issue inData
Cutting-Edge Datasets and Algorithms for Enhancing Industrial Processes and Supply Chain OptimizationGuest Editors: Iván Pérez-Olguín, Luis Carlos Méndez González, Luis Alberto Rodríguez-Picón
Deadline: 30 April 2025
Special Issue inData
Data-Driven Approaches for Safety in Industrial SitesGuest Editors: Francesca Mauro, Mara Lombardi, Mario Fargnoli
Deadline: 30 June 2025
Special Issue inData
Benchmarking Datasets in Bioinformatics, 2nd EditionGuest Editor: Pufeng Du
Deadline: 31 July 2025
Special Issue inData
Data Mining and Computational Intelligence for E-Learning and Education—3rd EditionGuest Editor: Antonio Sarasa-Cabezuelo
Deadline: 20 August 2025

Topical Collections

Topical Collection inData
Modern Geophysical and Climate Data Analysis: Tools and MethodsCollection Editors: Vladimir Sreckovic, Zoran Mijic
Data, EISSN 2306-5729, Published by MDPI
RSSContent Alert

Further Information

Article Processing Charges Pay an Invoice Open Access Policy Contact MDPI Jobs at MDPI

Guidelines

For Authors For Reviewers For Editors For Librarians For Publishers For Societies For Conference Organizers

MDPI Initiatives

Sciforum MDPI Books Preprints.org Scilit SciProfiles Encyclopedia JAMS Proceedings Series

Follow MDPI

LinkedIn Facebook Twitter
MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

© 1996-2025 MDPI (Basel, Switzerland) unless otherwise stated
Terms and Conditions Privacy Policy
We use cookies on our website to ensure you get the best experience.
Read more about our cookieshere.
Accept
Back to TopTop
[8]ページ先頭

©2009-2025 Movatter.jp