Data Collection and Pre-processing in Foreign Literature/CORPORA
$10-30 USD
Закрито
Опублікований almost 6 years ago
$10-30 USD
Оплачується при отриманні
I am expected to create CORPORA of text in Azebaijan language that belong to category specified in SUBJECT.
REQUREMENTS:
- Only Azerbaijani text is allowed (any text in other language must be excluded)
- All final text must be in textual file
- Format is one sentence per line - each sentence must start from new line
- Only Single space between all words
- Only complete sentences should be used
- Poem/Poetry is not allowed
- All page-numbers, headers, titles, etc. must be excluded - just sentences are allowed
- If applicable, Headers and Footers must be removed
- Total size of textual file should be at least 15000 lines (sentences)
DELIVERABLES:
1) Final Textual file (.TXT) with all sentences
2) List of book-title used as source
3) Source files (.PDF, .DOC electronic books) where text extracted from