Beyond News and Documentaries: Developing a Corpus-Based Lexical Resource for Informal North Korean Speech
Keywords:
North Korean, Corpus Analysis, Formal and Informal Register, Token and Type, Parts of SpeechAbstract
By conceptualizing North Korean (NK) news and NK comedy talk shows as representing distinct Target Language Use domains (Bachman & Palmer, 2010), this study investigates lexical variation across formal and informal registers in contemporary NK discourse and considers whether differences in communicative purpose and discourse conditions are reflected in measurable lexical patterns. Two groups of video clips were compiled, manually transcribed, and normalized: an informal spoken corpus based on 90 minutes of seven NK comedy talk show episodes and a formal spoken corpus based on 90 minutes of NK news broadcasts. The two corpora datasets were analyzed with a focus on part-of-speech (POS) distributions and lexical frequency. The results reveal clear register-based differences in lexical distributions as well as POS categories. In particular, nouns dominate the news corpus, whereas the comedy corpus shows a higher proportion of adverbs, indicating more descriptive and emotionally expressive language use. The comparison was intended not merely to document register contrasts, but to evaluate whether reliance on a single text type, particularly formal discourse, may lead to a skewed lexical representation in instructional materials. In doing so, the analysis provides empirical evidence relevant to questions of domain representativeness in NK language education and highlights the potential limitations of narrowly defined instructional input.