CLUECorpus2020 Corpus#
| Name | Text Type | Plain Text Size |
|---|---|---|
| CLUECorpus2020 | Chinese | 200GB |
CLUECorpus2020 is obtained by cleaning the Chinese portion of Common Crawl. The open-source portion provides approximately 200GB of corpus text. Detailed information can be found on the official website. Users can apply for download via email as follows:
Data Download Application method: Submit research purpose and intended use of the corpus, research plan, institutional affiliation and applicant introduction to the email address below, with a commitment not to provide the data to third parties.
Email: CLUEbenchmark@163.com, Subject: CLUECorpus2020 200G Corpus