The author considers several neural network models, evaluates the complexity of sentences, and shows that fine-tuning pre-trained language models, namely RuBERT, performs slightly better than training a Graph Neural Network. One of them contains 75 thousand sentences, the others-−6 million pairs of sentences. There are 2 datasets in the paper, which are used as the gold standard for the Russian language. The paper (Ivanov) is devoted to the evaluation of the complexity of sentences in Russian. The proposed model shows high results in terms of accuracy of estimation on test data and efficiency in identifying complex legal documents. ![]() It is shown that XGBoost classification model trained on linguistic features and language model predictions achieves the best results. Such a complex hybrid model is exploited for estimation of the complexity of legal texts in Russian for the first time. BERT is configured on a tagged textbook corpus of about 10 million words. Their input are the pre-trained fine-tuning BERT scores and the values of 130 linguistic features. Several regression and classification methods are compared. The paper (Blinova and Tarasov) develops a model for estimation of the complexity of legal texts in Russian. While much work on automatic text simplification aims to shorten the text, Vecchiato states that a clear text “can be reasonably long if more words are needed to adequately explain a concept.” Vecchiato distinguishes structural, cognitive and development complexity and suggests that a text simplification should integrate the different levels of intelligibility, namely readability, coherence and representability. She highlights that a clear text does not necessarily exclude any ambiguous expression. The paper of Vecchiato discusses these four notions and the formal processes of text simplification which should vary accordingly. However, even the notions of text clarity, text simplicity, plain language and easy language are problematic. Recently, plain and easy language has gained attention as a subject of standardization in many countries. ![]() This Research Topic covers the topic of text complexity and simplification, related notions, resources and methods for English, Portuguese, Spanish, and Russian languages. Despite the significant progress of recent neural models ( Sharoff, 2022), many challenges remain unfaced, including the consistency of the long output provided by the models used in a generative context. The diversity of languages, text types and genres, as well as their audience, are major challenges for researchers. ![]() Although the first methods of measuring text complexity were suggested over 70 years ago, the problem is far from being solved. Recently, text simplification has raised a lot of interest in the scientific community as numerous texts, including classroom books, scientific articles, legal and financial documents, prove to be too difficult and as such cannot cater to readers' needs.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |