In addition to explicit data, large data sets usually contain hidden data in the form of patterns, which can be discovered by various machine learning methods. The process of gaining additional knowledge from large data sets is called the KDD process (Knowledge Discovery in Databases). The core of the KDD process is data mining - a collection of methods for pattern recognition.
The Fraunhofer IOSB is working on new and existing methods that can be used to improve the data quality, for example by identifying potential errors in a data set. Extracted patterns are used to train machine learning techniques. The learned prediction models can then point out irregularities to the user when entering new data. An essential factor that must be taken into account in prediction models is interpretability. Especially in sensitive areas such as the medicine the comprehensibility of the prediction results is of great importance.
The extraction of knowledge from large amounts of data can support decision-makers in different areas such as the medical field or the banking and insurance industries. In the medical domain, quality assurance methods can be used for test results such as blood values. Another field of application is the detection of incorrectly entered data in large data sets.