Прогнозування популярності Інтернет-курсів методами машинного навчання

Юнькова, Олена ОлександрівнаYunkova, OlenaЮнькова, Елена АлександровнаВолодько, Т. О.Volodko, T. O.2022-04-182022-04-182021Юнькова О. О. Прогнозування популярності Інтернет-курсів методами машинного навчання / Юнькова О. О., Володько Т. О. // Моделювання та інформаційні системи в економіці : зб. наук. пр. / М-во освіти і науки України, ДВНЗ «Київ. нац. екон. ун-т ім. Вадима Гетьмана» ; редкол.: О. Є. Камінський (голов. ред.) [та ін.]. – Київ : КНЕУ, 2021. – Вип. 101. – С. 169–181.2616-6437https://ir.kneu.edu.ua:443/handle/2010/37256Масові відкриті онлайн-курси (МВОК) — приклад розвитку руху відкритого навчання, яке привернуло велику увагу як академічної, так і громадської сфери. МВОК не є самостійним явищем, ізольованим від інших розробок в області відкритого і дистанційного навчання або освітніх технологій. Навпаки, МВОК тісно пов’язані з іншими розробками в цій галузі, мають потенціал для підтримки навчання протягом усього життя, усунення перешкод у процесі навчання, забезпечення рівності можливостей в освіті і, що найголовніше, забезпечення лібералізації знань. У роботі визначено теоретичні засади формування ринку інтернет-курсів; проаналізовано сучасний стан і тенденції ринку інтернет-курсів України та світу; проведена класифікація інтернет-курсів залежно від їх рейтингової оцінки; прогнозується значення рейтингової оцінки для визначення популярності онлайн-курсів. Вирішення проблеми прогнозування популярності курсів у даному дослідженні досягається завдяки методам машинного навчання, які класифікують онлайн курси на основі параметру рейтингової оцінки. А саме, курси, які отримують максимальний рейтинговий бал, вважаються популярними. Розв’язування задач класифікації чи регресії засобами машинного навчання найчастіше досягається шляхом побудови ансамблевих моделей. В основу такого підходу покладено гіпотезу про об’єднання кількох моделей, яке може призвести до утворення потужнішої моделі. Спосіб об’єднання моделей має бути адаптованим до їхніх типів. Наразі існує кілька мета-алгоритмів, що застосовують для утворення об’єднаних моделей. В одному з них (метод беггінгу) однорідні початкові моделі навчаються паралельно та незалежно одна від одної, а потім об’єднуються згідно певного детермінованого правила усереднення. Одним з варіантів цього алгоритму є метод випадкового лісу. В іншому алгоритмі моделі навчаються послідовно в адаптивний спосіб. Найпопулярні з них — адаптивний і градієнтний бустинг. Перший оновлює вагу кожного з об’єктів навчального датасета, а другий — оновлює значення цих об’єктів. При цьому обидва методи намагаються розв’язати задачу оптимізації для пошуку найкращої моделі, представленої зваженою сумою початкових слабших моделей. У даній роботі для прогнозування популярності інтернет-курсів застосовано алгоритми градієнтного бустингу та випадкового лісу. Запропоновані моделі гарантують 65-ти відсоткову точність прогнозів. Серед факторів, що знижують точність прогнозу-вання, можна назвати атрибути, які не дуже корелюють із прогнозним значенням, а також диспропорція та значні викиди, які спостерігаються у даних. Розглянуті методи машинного навчання піддаються модифікаціям та тюнингу, що дає можливість покращити моделювання класифікатора. Mass open online courses (MОOС) are an example of the development of the open learning movement, which has attracted a lot of attention from both the academic and public spheres. IOC is not an independent phenomenon isolated from other developments in the field of open and distance learning or educational technologies. On the contrary, IOCs are closely linked to other developments in this field, have the potential to support lifelong learning, remove barriers to learning, ensure equal opportunities in education and, most importantly, liberalize knowledge. The theoretical bases of formation of the market of Internet courses are defined in the work; the current state and trends of the Internet courses market in Ukraine and the world are analyzed; the classification of Internet courses depending on their rating assessment is carried out; the rating value is predicted to determine the popularity of online courses. The solution to the problem of predicting the popularity of courses in this study is achieved through machine learning methods that classify online courses based on the rating parameter. Namely, the courses that receive the maximum rating score are considered popular. Solving problems of classification or regression by machine learning is most often achieved by building ensemble models. This approach is based on the hypothesis of combining several models, which could lead to the formation of a more powerful model. The method of combining models should be adapted to their types. Currently, there are several meta-algorithms used to form integrated models. In one of them (the method of bugging) homogeneous initial models are studied in parallel and independently of each other, and then combined according to a certain deterministic averaging rule. Currently, there are several meta-algorithms used to form integrated models. In one of them (the method of bugging) homogeneous initial models are studied in parallel and independently of each other, and then combined according to a certain determined averaging rule. One variant of this algorithm is the random forest method. In another algorithm, models are trained sequentially in an adaptive manner. The most popular of these are adaptive and gradient boosting. The first updates the weight of each of the training dataset objects, and the second updates the values of these objects. In doing so, both methods attempt to solve the optimization problem to find the best model represented by the weighted sum of the initial weaker models. In this paper, gradient boosting and random forest algorithms are used to predict the popularity of online courses. The proposed models guarantee 65 percent accuracy of forecasts. Factors that reduce the accuracy of the forecast include attributes that do not correlate much with the forecast value, as well as the disparity and significant emissions observed in the data. The considered methods of machine learning are subject to modifications and tuning, which makes it possible to improve the modeling of the classifier.ukінформаційні технологіїінтернет-навчанняалгоритм градієнтного бустингуалгоритм випадкового лісуinformation technologiese-leaminggradient boosting algorithmrandom forest algorithmПрогнозування популярності Інтернет-курсів методами машинного навчанняPredicting the popularity of Internet courses by machine learning methodsArticle519.868:339.92