Today's GitHub hot list first: the most complete Chinese ancient poetry database, including more than 300,000 poems
乾明 发自 凹非寺
量子位 报道 | 公众号 QbitAI
It contains 55,000 Tang poems, more than 280,000 Song poems, as well as the Book of Songs, the Analects, and primers...
This project, called "chinese-poetry", which claims to be "the most complete database of Chinese classical poetry", topped the GitHub hot list today.
As of press time, this project has received nearly 25,000 stars and over 4,600 forks, which shows how popular it is.
The project initiator is named Jackey, who works in operation and maintenance automation at Teambition. He explained why he built this repository:
In a sense, these huge collections are a bit far away from us. The electronic version is easy to copy, so this open source database was born. This database is distributed in JSON format, which allows you to easily start your project.
10 Big Datasets
The core content of the entire project is the data set.
Currently, there are 10 data sets in the warehouse, namely: All Tang Poems, All Song Poems, All Song Ci, Five Dynasties·Huajian Collection, Five Dynasties·Ci of Two Southern Tang Masters, Analects, Book of Songs, Dream Shadow, Four Books and Five Classics, and Elementary .
All this data comes from the Internet. How is it collected? The project initiator also shared the crawling process and data analysis of the entire Song Dynasty poetry.
Why are there no ancient poems? He also gave an explanation, saying that there is no record of the ancient poetry collection process, because the ancient poetry data is huge, the target website has restrictions, and the collection process is often interrupted for more than a week.
He also conducted a preliminary word frequency analysis around the database:
But the applications of these data sets go far beyond this.
8 major case studies
In the project, the author also included application cases using the dataset.
There are browser-based poetry websites, an Android app called "Offline Complete Tang Poems", simplified Tang poetry generation (char-RNN), poetry desktops and related applets, and so on.
Moreover, most of these projects are open source on GitHub.
If you are interested, you can collect the portal:
https://github.com/chinese-poetry/chinese-poetry
-over-
AI Insider | Seize new opportunities for AI development
Expand your network of high-quality contacts, obtain the latest AI information & paper tutorials, welcome to join the AI Insider Community to learn together~
Communicate with experts | Enter the AI community
Quantum Bit QbitAI · Toutiao signed author
Tracking new trends in AI technology and products
If you like it, click "Watching"!
Featured Posts