AI TRAINING - KIELTYKA GLADKOWSKI LEGAL | CROSS BORDER POLISH LAW FIRM RANKED IN THE LEGAL 500 EMEA SINCE 2019

LICENSING DATASETS FOR AI TRAINING

Publication date: October 23, 2025

DATASETS AND SYNTHETIC DATA

Creating and developing an AI model requires an unimaginable amount of data. Once input, the model analyzes the information, performs calculations, and draws conclusions based on this data that informs its future operations. Comparing this process to that of humans, one could say that AI “learns” in this way. AI systems are trained on numerous examples and draw model patterns from them, allowing them to predict correct solutions. This process is called “AI training”. Data is currently so expensive and difficult to access that it is estimated that it may be in short supply by 2032. The answer to these problems is synthetic data. This data is generated by the AI itself, which uses parameters from real-world data and randomly generates subsequent scenarios. These scenarios are designed to faithfully reproduce the properties, complexities, and relationships observed in the original data from which they were generated. There are certain risks involved. First of all, it is about creating synthetic data based on erroneous assumptions from real data – with the continuous introduction of new real data, previous errors can be corrected and artificial intelligence will be able to “unlearn” them, whereas if the first synthetic data generated is “contaminated”, each subsequent one will also contain erroneous information. The undoubted advantage of synthetic data is that it can be used to generate subsequent scenarios that may not actually occur at all or very rarely. This is used in industries such as automotive (for simulating traffic scenarios), finance (detecting fraud), and healthcare (detecting and treating rare conditions).

USE OF MEDICAL DATA FOR AI TRAINING

Publication date: October 21, 2025

Under EU Law, namely Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (hereinafter “GDPR”) and the pending entry into application of Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directive 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act) (hereinafter “AIA”), the use of sensitive data (including medical data) for AI training would only be possible after obtaining consent, in cases specified by law, or when using anonymized data. AIA is not a lex specialis vis-à-vis the GDPR, so when using personally identifiable data, using data for AI model training requires meeting the requirements of both acts.

KG LEGAL \ INFO BLOG

LICENSING DATASETS FOR AI TRAINING

USE OF MEDICAL DATA FOR AI TRAINING

KG LEGAL \ INFO
BLOG