A Corpus is a collection of written or spoken material in machine-readable form, assembled for the purpose of studying linguistic structures.
Generally, corpora are assembled according to predefined criteria to fit intended aims such as studying linguistic structures, machine translation, or natural language processing. Building a corpus is a time consuming task.
This guide lists corpora across the world's languages:
If you're new to corpus research, The Routledge Companion to Corpus Based Language Studies and its companion, Corpus-based language studies:an advanced resource book offer an excellent survey of corpora and tools useful for corpus-based research.