creating corpora

PPTX 6 стр. 709,4 КБ Бесплатная загрузка

Предварительный просмотр (5 стр.)

Прокрутите вниз 👇
1 / 6
creating corpora creating corpora a corpus is a structured collection of texts used for linguistic research, natural language processing (nlp), and other language-related tasks. purpose and planning - clearly defining the corpus's purpose determines scope and type of data. for example, general corpora (like british national corpus) include various genres and topics, while specialized corpora focus on specific domains or time periods. - planning includes deciding size, balance (equal representation of genres or periods), and language varieties (dialects, formal/informal). data collection methods - manual collection: gathering texts personally or from libraries, ensuring quality and relevance. - web scraping: automated tools to collect web texts; needs filtering and ethical considerations. - crowdsourcing: using users to submit data, useful for gathering spoken or dialectal language. annotation types and tools - morphosyntactic: tagging words with parts of speech, e.g., nouns, verbs. - syntactic parsing: marking sentence structure and relations. - semantic annotation: labeling …
2 / 6
creating corpora - Page 2
3 / 6
creating corpora - Page 3
4 / 6
creating corpora - Page 4
5 / 6
creating corpora - Page 5

Хотите читать дальше?

Скачайте все 6 страниц бесплатно через Telegram.

Скачать полный файл

О "creating corpora"

creating corpora creating corpora a corpus is a structured collection of texts used for linguistic research, natural language processing (nlp), and other language-related tasks. purpose and planning - clearly defining the corpus's purpose determines scope and type of data. for example, general corpora (like british national corpus) include various genres and topics, while specialized corpora focus on specific domains or time periods. - planning includes deciding size, balance (equal representation of genres or periods), and language varieties (dialects, formal/informal). data collection methods - manual collection: gathering texts personally or from libraries, ensuring quality and relevance. - web scraping: automated tools to collect web texts; needs filtering and ethical considerations. -...

Этот файл содержит 6 стр. в формате PPTX (709,4 КБ). Чтобы скачать "creating corpora", нажмите кнопку Telegram слева.

Теги: creating corpora PPTX 6 стр. Бесплатная загрузка Telegram