Read more at source.
Read more at source.
The Institutional Data Initiative's database spans genres, decades, and languages. Greg Leppert, executive director of the Institutional Data Initiative, says the project is an attempt to level the playing field by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly-refined and curated content repositories that normally only established tech giants have the resources to assemble.
The new public domain database could be used in conjunction with other licensed materials to build artificial intelligence models. As lawsuits over the use of copyrighted data for training AI wind their way through the courts, the future of how AI tools are built hangs in the balance. Projects like the Harvard database are plowing forward under the assumption that there will always be an appetite for public domain datasets.
In addition to the trove of books, the Institutional Data Initiative is also working with the Boston Public Library to scan millions of articles from different newspapers now in the public domain. They are open to forming similar collaborations in the future. Other projects, startups, and initiatives promise to give companies access to substantial and high-quality AI training materials without the risk of running into copyright issues.
I think about it a bit like the way that Linux has become a foundational operating system for so much of the world. Companies would still need to use additional training data to differentiate their models from those of their competitors. - Greg Leppert, Executive Director of the Institutional Data Initiative