Choosing a licence for your data

It is important that any data being shared is provided with a valid license that tells the user what they are allowed to use the data for. Licensing consideration for software and datasets are quite different and for this reason typical open source software licenses may not be suitable for sharing datasets.

Open data licences have been championed by the Open Data Commons group and Creative Commons. A selection of these licences are discussed below.

Licence types for Machine Learning (ML) models also fall into a separate category and should be considered different to software and data licences.

Creative Commons Licences

CC licences are some of the most widely used and recognised standard licences for providing access to various types of data and other resources. The rights holder grants advanced permission for the user to reuse, copy, distribute and possibly even modify the original licenced material avoid the need for the user to explicitly obtain permission for each usage. The main CC licences offer a series of 'baseline rights', with attribution (BY) as a core requirement (you must credit the work's creator), which can be combined together with three other 'licence elements' that can be selected as required to produce a customised licence:

  • Non-Commercial (NC) -- you can only use the work for non-commercial purposes
  • No-Derivatives (ND) -- you may not create adaptations of the work or merge it into other works.
  • Share Alike (SA) -- you may create adaptations of the work, but if you make them publicly available, these must be under the same licence terms as the original CC work
This structure allows the following creative commons licenses to be be created:

Attribution (BY)
Attribution No Derivatives (BY-ND)
Attribution Non-Commercial (BY-NC)
Attribution Non-Commercial No Derivatives (BY-NC-ND)
Attribution Non-Commercial Share Alike (BY-NC-SA)
Attribution Share Alike (BY-SA)

It should be noted that there can be Interoperability issues when mixing different CC licence types. For example, CC BY NC SA and CC BY SA licensed data can only be blended with their own type (not each other) or with CC BY licensed data (or equivalent licensed data) or with data released under CC0.

CC Zero (CC0) Licence

CC0 was created by Creative Commons to allow the release of content/data into the public domain. It provides the means for the rights holder to provide an irrevocable, royalty-free and unconditional licence for anyone to use the resource for any purpose.

Public Domain Mark

Public Domain Mark (PDM) is used to mark works that are already in the public domain and for which there are no known copyright or database restrictions. PDM can be used for factual data in a database to make it clear that the data is free to use.

Open Data Commons Licences

The Open Data Commons group have developed a set of legal tools and licences to aid in publishing, providing and using open data. They have created three distinct licences:

  • Public Domain Dedication and License (PDDL) -- this places the data(base) in the public domain (waiving all rights). (compatible with CC Zero)
  • Attribution License (ODC-By) -- you are free to share, create or adapt the data(base) but you must provide attribution to the original source. (compatible with CC BY)
  • Open Database License (ODC-ODbL) -- any subsequent use of the database must provide attribution, an unrestricted version of the new product must always be accessible, and any new products made using ODbL material must be distributed using the same terms. It is the most restrictive of all ODC licenses. (compatible with CC BY SA).

Note that the use of ODC-By and ODC-ODbl can lead to the problem of attribution stacking, hindering adoption for truly open data.

Choosing a licence for your dataset

Unfortunately there is no single answer as to which licence you should use for your data, and this is very much a personal choice. However, it should be noted that restrictive license types can cause significant problems for subsequent users. A common problem is 'Attribution Stacking' where the use of a single data source could require attribution to many, many creators. Restrictive licenses may prevent users from creating new datasets with all or part of a dataset, hindering reusability.

Some of the main reasons for adopting the use of CC licences are their ease of use, widespread adoption, familiarity and flexibility. However, it should be noted that CC licenses were not specifically devised for data or datasets with database right. You cannot revoke a CC licence once it has been issued.

The following table compares the previously listed licences in terms of usage restrictions, allowed modification and suitability for open data.

Licence Usage restriction Modifiable Suitability for Data, Datasets and Databases
Attribution (CC-BY) Anyone Yes, but you must attribute Not specifically developed for data, datasets and databases, but can be used with minimal amounts of data
Attribution Share Alike (BY-SA) Anyone Yes, but you must attribute for usage. You must use the CC BY SA licence for onward licensing Same as Attribution (CC-BY). Additionally Share Alike requirement can impact negatively on interoperability and prevent open data.
Attribution Non-Commercial (BY-NC) Anyone for non-commercial purposes Yes, but you must attribute Same as Attribution (CC-BY). Additionally non-commercial restriction can be a hindrance as the definition of commercial can be ambiguous. There may also be interoperability issues with other licence types.
Attribution No Derivatives (BY-ND) Anyone No, and you must attribute Same as Attribution (CC-BY). Re-use and re-purposing not permitted.
Attribution Non-Commercial Share Alike (BY-NC-SA) Anyone for non-commercial purposes Yes, but you must attribute Same as Attribution (CC-BY). Additionally Share Alike requirement can impact negatively on interoperability and prevent open data. Non-commercial restriction can be a hindrance as the definition of commercial can be ambiguous. There may also be interoperability issues with other licence types.
Attribution Non-Commercial No Derivatives (BY-NC-ND) Anyone for non-commercial purposes No, and you must attribute Same as Attribution (CC-BY). Re-use and re-purposing not permitted. Additionally non-commercial restriction can be a hindrance as the definition of commercial can be ambiguous. There may also be interoperability issues with other licence types.
Creative Commons Zero Anyone Yes, no restrictions Ideal for open data.
Open Data Commons Open Database Licence (ODC-ODbl) Anyone Yes, but you must attribute any public use of the database, or works produced from the database, as specified in the ODbL Well suited for open data. Some attribution requirements may lead to attribute stacking problems.
Open Data Commons Attribution Licence (ODC-By) Anyone Yes, but you must attribute any public use of the data, or works produced from the data, as specified in the licence. Well suited for open data. Some attribution requirements may lead to attribute stacking problems.
Public Domain and Dedication Licence (PDDL) Anyone for non-commercial purposes Yes, no restrictions Ideal for open data.

For open data, the most suitable licence types are the Creative Commons Zero (CC0) Licence and Public Domain and Dedication Licence (OODC-PDDL).