Harvard University is preparing to release a dataset of approximately 1 million public-domain books, accessible to anyone for training large language models and other AI tools. The books span multiple genres, languages, and authors, including works by Dickens, Dante, and Shakespeare. The books, now free from copyright due to their age, aim to make AI […]