Synthetic Data28 Oct 2020
An AI-project is built on vast amounts of data. Good quality data can be hard or expensive to gather, also there are serious privacy concerns when the data pertains to real persons. On European level, the GDPR imposes high standards and restrictions for data gathering, management and usage.
The consumer is optimally protected in this way, but the work of the data scientist does not become easier. As a result the concept of “synthetic data” is getting some traction: fictitious data simulating the statistical properties of the original dataset. Applications are, among others, dataset rebalancing, masking or anonymizing sensitive data, or making simulation environments for machine learning applications.