There are several datasets for AI and ML research that uses source code from open source projects. The models trained on these datasets often don’t comply with these licenses, that often require attribution of their authors, and in some cases, requires that any new projects using the code to be licensed under the same license as the original.