KG LEGAL \ INFO
BLOG

TEXT AND DATA MINING, INCLUDING DATA EXTRACTION – LEGAL STATUS IN POLAND

publication date: January 03, 2023

In the current legal status, from the perspective of copyright issues, Polish regulations do not provide for specific provisions on fair use in the form of text and data mining for commercial purposes. The Polish legislator is currently working on the implementation of Article 4 of Directive 2019/790 on copyright and related rights in the digital single market. The following article presents the legal definition and problems of understanding at the statutory level of the legal terms ‘text and data mining’ and the data mining techniques related to this process.

New important regulations for text and data mining, including data extraction for business and commercial purposes

Genesis

Legal issues related to broadly understood data and their use cause a lot of uncertainty in the area of legal definitions. This is due to the fact that the development of IT devices, as well as the entire data mining technology, in its practice of digital legal transactions, is ahead of legislative actions and the legal framework for a safe and legal data processing as part of technological progress. Undoubtedly, the above mechanisms significantly facilitate new technologies, such as data extraction, which can be defined as processing and combining existing data into useful information.

The process began as early as the 1990s and today is used in areas such as financial services, e-commerce, fintech and banking.

The most commonly used data extraction techniques include, for example, pattern tracking, i.e. an increase in demand for a given product and linking an increase in its supply. Undoubtedly, this is to a large extent convergent with inflationary processes, so important in times of the current crisis and global recession.

Legal environment in the EU

Accordingly, the use of data has a significant role in everyday legal life, and the prevalence of this process is increasing, which makes it necessary to legally secure certain areas of data protection that may be the source of data extraction, e.g. copyright.

One of the most important EU legal acts concerning copyright is Directive 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the digital single market and amending Directives 96/9/EC and 2001/29/EC (the so-called Digital Single Market Directive)[1]. The whole act refers to copyright in a broad sense, but also puts emphasis on data extraction. In accordance with the above, the EU legislator draws attention to the possibility of commercial use of text and data mining and obliges Member States to introduce exceptions or limitations for certain categories of situations. It also provides for the possibility to store duplicated and downloaded data for as long as needed for exploration.

In particular, Recital 18 of Directive 2019/790 indicates that “in addition to their importance for scientific research, text and data mining techniques are also widely used by private and public entities to analyze large amounts of data in various spheres of everyday life and for various purposes, including by government services, to make complex business decisions, and to develop new applications and technologies. Rightholders should continue to be able to license uses of their works or other subject matter that go beyond the mandatory exception for text and data mining for research purposes provided for in this Directive and the existing exceptions and limitations provided for in Directive 2001/29/EC”.

This is where the apparent purpose of the Directive comes into play, in the form of an initiative to eliminate legal uncertainty at national law level among users of text or data mining as to whether reproductions and downloads for the purpose of text or data mining may be carried out in relation to covered works or other subject matter protection, provided that – as is clear from recital 18 of the Directive – that access has been lawfully obtained, in particular where reproductions and extractions made for the purpose of a technical process do not meet all the conditions of the existing exception for temporary acts of reproduction provided for in Article 5 sec. 1 of Directive 2001/29/EC.

To provide greater legal certainty in such cases and to encourage innovation also in the private sector, the Directive aims to create a vision, under certain conditions, of an exception or limitation to the reproduction and downloading of works or other subject matter for text and data mining, as well as allow the copies made to be stored for as long as necessary for the purposes of text and data mining.

Such an exception or limitation should, in accordance with the wording of recital 18 of Directive 2019/790, apply only where the beneficiary has lawful access to the work or other subject matter, including where the work or other subject matter has been entered into to the public on the Internet, and unless the rightholders have duly reserved the rights of reproduction and downloading for text and data mining purposes. For content that has been made public on the internet, it should only be considered appropriate to claim those rights by machine-readable means, including the metadata and terms of use of the website or service. The reservation of rights for text and data mining purposes should not extend to other uses. In other cases, it may be appropriate to reserve rights by other means, such as contracts or unilateral declarations. Authorized entities should be able to apply measures that ensure that the reservations they have made in this respect are respected. The EU provision finally concludes that “such an exception or limitation should be without prejudice to the mandatory exception for text and data mining for scientific research provided for in this Directive and the existing exception for temporary acts of reproduction provided for in Article 5 sec. 1 of Directive 2001/29/EC.”.

Another important provision of the Directive refers to the rightholder’s reservation of the admissibility of the above processes, however, the greatest emphasis is placed on the introduction of statutory limitations or exceptions, which include:

  • In case of temporary or permanent reproduction by any means or in any form, in whole or in part, of a database protected by copyright – Article 5(a) of Directive 96/9/EC[2];
  • Where a right is established for the producer of a database requiring qualitatively and/or quantitatively significant investment to obtain verification or presentation of its content, the right to protection against extraction and/or re-use of the data in whole or in substantial part, as to the quality and/or quantities – Article 7(1) of Directive 96/9/EC[3];
  • Where a Member State provides for an exclusive right to authorize or prohibit direct or indirect, temporary or permanent reproduction of a work by any means and in any form, in whole or in part: for authors, in respect of their works, for performers, in respect of fixations of their performances, for producers of phonograms, as regards their phonograms, for producers of the first fixations of films, as regards the original and copies of their films, for broadcasting organizations as regards fixations of their programmes, whether those programs are transmitted by wire or wireless, including by cable or satellite – Article 2 of Directive 2001/29/EC[4];
  • In the case of permanent or temporary reproduction of a computer program by any means and in any form, in part or in whole. To the extent that loading, displaying, running, transmitting or storing a computer program requires such reproduction, such acts require the authorization of the rightholder, translation, adaptation, arrangement and any other modification of the computer program and reproduction of the results of these operations without prejudice to the rights of the person who modifies the program – Article 4(1)(a), (b) of Directive 2009/24/EC[5];
  • Member States shall ensure that publishers of press publications established in that Member State have the rights provided for in Article 2 and Article 3 sec. 2 of Directive 2001/29/EC as regards online use of their press publications by information society service providers. The rights set out in the first paragraph do not apply to private and non-commercial uses of press publications by individual users. The protection guaranteed in the first paragraph does not apply to linking activities. The rights set out in the first paragraph do not apply to single words or very short extracts from a press publication (Article 15(1) of Directive 2019/790[6]).

Polish legalization of EU text and data mining, including “data extraction” for analysis purposes for complex business decisions

In accordance with the aforementioned problem of legal terminology at the statutory level of the term “text and data mining”, Directive 2019/790 requires implementation at the national level. Such an implementation has been included in the Polish draft act of June and autumn 2022, currently being processed by the Polish Government Legislation Centre, amending the act of 4 February 1994 on copyright and related rights (consolidated text, Journal of Laws of 2022, item 2509), by attempting to introduce to this act, a new article 263 using the term “TEXT AND DATA EXPLORATION”, the current text of which (subject to the legislative process) reads as follows:

“art. 263

1. It is allowed to multiply disseminated works for the purpose of text and data mining, unless the right holder has stipulated otherwise.

2. The reservation referred to in paragraph 1, should be adequate to the method of sharing works. In the case of works made publicly available in such a way that everyone can access them at a place and time of their choice, the reservation is made in a machine-readable format within the meaning of the Act of August 11, 2021 on open data and re-use of information in public sector (Journal of Laws, item 1641 and of 2022, item 1700), including metadata and terms of use of the website or service.

3. Works reproduced in accordance with sec. 1 may only be stored for text and data mining purposes for as long as necessary for that purpose.”.

Therefore, the draft provision introduces statutory concepts and principles of text and data mining for purposes other than research, which legal framework should correlate with Article 4 of Directive 2019/790.

As follows from the content of the justification to the legislative proposal, the draft provision is intended to respond to the problem of trading that the possibility of using digital technologies to mine texts and data is of great importance not only for research organizations, but also for economic entities that use the results of mining in various spheres of everyday life and for various purposes, such as making complex business decisions, developing new business models, developing innovative applications and technologies.

Therefore, the drafter draws attention to Art. 4 of Directive 2019/790, which introduces to copyright law fair use in the form of text and data mining for commercial purposes.

The explanatory memorandum to the draft refers precisely to recital 18 of the Directive, which emphasizes the uncertainty as to whether reproductions and downloads for the purposes of text and data mining are allowed in Poland in the current legal state. The aim of the Directive is therefore to increase legal certainty and stimulate the development of innovation in the private sector.

In the current legal status, Polish regulations do not provide for regulations on fair use in the form of text and data mining for commercial purposes. Therefore, it is necessary to implement Article 4 of the Directive, which is to take place by introducing a new Art. 263 of this Act. Therefore, “text and data mining” is permitted under the fair use clause under the DSM Directive (Directive 2019/790 – the so-called DSM Directive – Digital Single Market). The objective scope of fair use in the form of text and data mining for commercial purposes is indicated in Art. 4 sec. 1 of Directive 2019/790, which provides for an exception to the rights provided for in Art. 5 letter a) and Art. 7 sec. 1 of Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, in art. 2 of Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonization of certain aspects of copyright and related rights in the information society, in art. 4 sec. 1 letter a) and b) of Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 on the legal protection of computer programs and in art. 15 sec. 1 of Directive 2019/790.

The scope of works that can be explored on the basis of the above provision therefore coincides with the scope provided for in Art. 3 of this Directive 2019/790, and additionally also covers computer programs.

According to Art. 4 sec. 2 of the Directive, reproductions and extractions made under this fair use may be kept for as long as necessary for the purposes of text and data mining. What is particularly important, the use of this form of fair use is not subject to the fulfillment of any conditions as to the purpose of the use.

Important provision is contained in Article 4(3) of Directive 2019/790. It follows from the said provision that the entity holding the economic copyrights to the works that are to be explored may exclude such a possibility by appropriate reservation, for example by measures of machine-readable means in the case of content that has been made available on the Internet. The reservation is to consist in the use of machine-readable means, including metadata and terms of use of the website or service, which follows directly from the second paragraph of recital 18 of Directive 2019/790. Therefore, this is a special case of fair use, where the very possibility of using it depends on the will of the rightholder, of course, provided that this will is expressed in an appropriate way. The provision of sec. 3 of the Copyright Act, included in the draft Art. 263 indicates that this reservation should be adequate to the manner in which works are made available. In the case of works made publicly available in such a way that everyone can access them at a place and time of their choice (i.e. made available on the Internet), there is a requirement to make reservation in a machine-readable format within the meaning of the Act of August 11, 2021 on open data and re-use of public sector information (Journal of Laws item 1641, as amended), including metadata and terms of use of a website or service.

Accordingly, it can be noted that the said provision does not introduce any exceptions or limitations to the above-mentioned rights. The provision focuses on decision-making by the authorized person, and not on the more important issue, which is the aforementioned obligatory behavior of the legislator.

To sum up, the explanatory memorandum to the draft Act emphasizes the objective which assumes, in the same way as in the Directive, the increase of legal certainty, however, in the absence of statutory restrictions, legal certainty decreases.

It is worth adding that other obligations imposed on the Polish legislator, i.e., for example, storing duplicated content or its copies as long as it is necessary for the purposes of extraction, have been met.

Consequences

Pursuant to the Treaty on the Functioning of the European Union, a directive, unlike a regulation, is not directly applicable and requires implementation, for example, into the national legal order on the basis of an act. The consequence of this provision assumes the freedom of the national legislator, while not excluding the obligation to apply all provisions of the directive.

The implemented Article 4 of Directive 2019/790 indicates that “Member States shall provide for the exception or limitation”. Therefore, the EU provision assumes that Member States, including Poland, are generally obliged under Directive 2019/790 to introduce certain restrictions, hence the Polish legislator, both in the cited draft provision and in its justification, focused on broadly on the options that the right holder can take, which should be interpreted together with the main objective of Directive 2019/790, which is to provide legal certainty on technological issues such as ‘data extraction’ and ‘text and data mining’.


[1]https://eur-lex.europa.eu/legal-content/PL/TXT/PDF/?uri=CELEX:32019L0790&from=PL

[2]https://eur-lex.europa.eu/legal-content/PL/TXT/PDF/?uri=CELEX:31996L0009&from=PL

[3]https://eur-lex.europa.eu/legal-content/PL/TXT/PDF/?uri=CELEX:31996L0009&from=PL

[4]https://eur-lex.europa.eu/legal-content/PL/TXT/PDF/?uri=CELEX:32001L0029&from=PL

[5]https://eur-lex.europa.eu/legal-content/PL/TXT/PDF/?uri=CELEX:32009L0024&from=PL

[6]https://eur-lex.europa.eu/legal-content/PL/TXT/PDF/?uri=CELEX:32019L0790&from=PL

UP