Information and Data Quality for Intelligent Systems
Data quality (DQ) is critical to successfully implementing artificial intelligence (AI) systems. However, both AI researchers and practitioners overwhelmingly concentrate on models/algorithms while undervaluing the impact of DQ (Sambasivan et al., 2020). The trend of AI research is moving from model-centric AI towards data-centric AI (Ng, 2021). In the year 2021 alone, 108,000 journal and conference articles have been published on DQ in AI and closely-related areas in total. The demand for DQ assurance in high-stakes domains such as medical, legal, and cyber-security is more intensive and urgent (Sambasivan et al., 2020). Recently, techniques such as semi-supervised learning (SSL), transfer learning (TL), few-shot learning (FSL), active learning (AL), and generative adversarial learning (GAN) have been proposed by natural language processing (NLP) and machine learning (ML) researchers to enhance the model performance when the quality of training data is not high enough or the amount of the data is not sufficient (Lourentzou, 2019). However, the data quality assessment, assurance, and improvement covering the whole life-cycle of building an intelligent system have not yet been well-investigated (Chen et al., 2021).
This special issue aims to bring AI and information science researchers together to understand the challenges, investigate the problems, propose solutions, exchange ideas, share resources, and look for new research directions in the field of DQIS. This special issue focuses on how to use state-of-the-art (SOTA) technologies on assessment, assurance, and improvement of big data for building high-quality intelligent systems. Big data are harvested for building intelligent systems for supporting a broad array of applications from biomedicine, healthcare, education, legal intelligence to smart city and autopilot (Roh et al, 2021). DQ could significantly impact the quality of the intelligent system that is built on it. The special issue endeavors to publish research and practice on the evaluation and improvement of DQ quantitatively and systematically in specific applications and domains and the roles and best practices of different ML and deep learning (DL) techniques for DQ improvement.
Indicative list of anticipated article topics:
- Data quality assessment for machine learning and deep learning (including defining of dimensions, measurement, and evaluation techniques)
- Data quality management in high-stake domains (e.g., legal, medical, cyber security)
- Quality evaluation of knowledge graph and ontology system
- Experimental study regarding the impact of data quality to the performance of machine learning and deep learning
- Techniques for data quality issue detection and data quality improvement
- Data augmentation using current transfer learning, fine tuning, semi-supervised learning, GAN, and any current techniques
- Exploratory data analysis
- Data security and privacy
- Fairness in machine learning (e.g., how to handle missing data)
- Ethics in machine learning (e.g., biased data leads to biased results)
- The role of human factors in data quality assurance
- Other related topics
Dr. Junhua Ding, Department of Information Science, University of North Texas, Denton, Texas, USA, Email: [email protected]
Dr. Haihua Chen, Department of Information Science, University of North Texas, Denton, Texas, USA, Email: [email protected]
Dr. Lei Li, Department of Information Management, Beijing Normal University, Beijing, China, Email: [email protected]
Dr. Ismini Lourentzou, Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA, Email: [email protected]
First announcement/CfP: December 15, 2021
Second CfP: April 15, 2022
Final Reminder: May 23, 2022
Submissions due: May 30, 2022
Papers sent to reviewers: June 7, 2022
Reviews due: July 31, 2022
Author notification: August 31, 2022
Final papers: September 21, 2022
Chen, Haihua, Jiangping Chen, and Junhua Ding, "Data Evaluation and Enhancement for Quality Improvement of Machine Learning," in IEEE Transactions on Reliability, vol. 70, no. 2, pp. 831-847, June 2021. Doi: 10.1109/TR.2021.3070863.
Lourentzou, Ismini, “Data quality in the deep learning era: Active semi-supervised learning and text normalization for natural language understanding.” Diss. University of Illinois at Urbana-Champaign, 2019.
Ng, Andrew, “A chat with andrew on mlops: From model-centric to data-centric ai”, 2021, [Online; accessed 12-01-2021].
Roh, Yuji, Geon Heo, and Steven Euijong Whang, "A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective," in IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp. 1328-1347, 1 April 2021. Doi: 10.1109/TKDE.2019.2946162.
Sambasivan, Nithya, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M. Aroyo. "“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI." In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1-15. 2021. Doi: ttps://doi.org/10.1145/3411764.3445518.