DEV Community

Cover image for Social responsibility on articles published in News papers, forecasting
gouse
gouse

Posted on

Social responsibility on articles published in News papers, forecasting

Abstract:

India is a democratic country with huge population, it is governed by four pillars of democracy: the legislature, executive, judiciary, and media. Each pillar must act within its own sphere while keeping the larger picture in mind. The strength of a democracy is determined by the strength of each pillar and the way pillars complement each other [1]. Because of its pervasive presence and undeniable importance in molding public opinion, the Media has emerged as the fourth pillar. The existence of a free and objective media committed to lend voice to the voiceless is the cornerstone of a healthy democracy. The Indian media has played this role quite creditably barring a few exceptions at some moments in history as in during the emergency. However, with the introduction of the internet, commercialization, changes in ownership patterns, and “News Coloured with Views,” the proliferation of Fake News, Paid News, and Propaganda, there have been some troubling tendencies in recent times. The media, like all other institutions, should have its own system of checks and balances. There has to be a code of ethics that needs to be voluntarily adhered to more as an article of faith and as an expression of media’s commitment to professionalism [1,2,3]. The main aim of this article is to demonstrate the various kind of articles that are going to be published in the fourth coming days. We can predict this with help of Machine Learning and Artificial intelligence using Python -R programming tools, these analysis can be extremely beneficial to the society and the movement in assessing print media’s duty. With regards to Education, empowerment, healthcare, political awareness. To do this we used ML AI algorithms like Time series and regression which can assist in predicting or forecasting news for the next day or week.

Key words: Education, Politics, Time series, Regression, Python, R

  1. Objectives.

Now a days, there has been some speculation on news media that some political parties are favoring few newspapers and news channels, for their party benefits.

Some of them highlighting more on those political parties agenda, it’s really impacting on the society on how the print media is contributing on social responsibility of the society. We are forecasting these news articles published in newspapers. The categorizations are to be seen if they are related to Education, social awareness or political, these are our main goals of this paper.

  1. Methodology

We plan to run a Time series and regression models to predict what kind of news we can expect tomorrow, this is going to help the government on designing public/society welfare related advertisements, for example: if they are publishing articles more or related to educational then, we can give advertisement base on the news publishing. In addition government body can keep an eye on the articles to tackle disturbing news trends which is published in recent times. Also to find which ones are taking advantage of the internet commercialization that changes in ownership patterns. Those articles and “news colored with views”, which are fake, paid for a particular negative propaganda. Over all like all other institutions, media too should have its own mechanism about social responsibility. [4,5,6,]

  1. Data collection

After global digitization, communication channels have evolved online platforms more than ever that includes news channels and newspapers. Presently major regional newspapers and news channels are also having their online presence. We referred 3 major newspapers in which we randomly measured area of the articles. In [Fig 01] its showing dimensions measurable by collecting the size of the articles, those dimensions are converted into cm (cent meters). Randomly we collected data through followed by weekly basis. Say for example weekly 7 days randomly 3 to 4 days we considered. Picked those newspapers that had the largest circulation in English across India, we chose the top three: Times of India (1,614,105), Deccan Chronicle (1,064,661) Economic Times (664,352)

Fig 01

News articles cut

  1. Analysis and Forecasting

Correlation:

For understanding data patterns we used R and Python programming tools, we did Correlation analysis by newspapers wise for Educational and Political, we found that for educational news there is a positive correlation for EDU ET vs EDU Deccan Chronicle [52.8%], correlation for EDU_ET Vs EDU TIMES OF INDIA [76.5%] has positive correlation. Which means if the news is related to Education, then almost all newspaper’s coverage for educational news is similar. For Political related news there is no relation between one newspaper to another newspaper which means if the news coverage is related to politics, then no relation among these newspapers, if favorable to them then only they are publishing or else not publishing.

Correlations

Regression Model Political articles:

Predicting what kind of news would be published tomorrow by these newspapers we choose regression model [3,7]. Initially we run regression Model for predicting Educational related articles for tomorrow. Correlation is 49.9% and coefficient is 24% with Std error of 13.784 which is showing not optimum to predict news. Example tomorrow Education related news size would be 430+/- 13.784 cm. This model effectiveness is 24%, Moreover residual and regression is sig. with 5.423 of F value, 99% of significant. But predicting variable is politics Deccan Chronicle, with T distribution value 2.080 sign. 95%. Politics Times of India T distribution value 2.590 sign. 95% has remaining variable Politics ET, variable is not significant, final conclusion of this model we can predict only Deccan Chronicle paper news for tomorrow with less accuracy. In this model accuracy is less so we should follow alternate models.

Below the regression summary tables are explaining the model related summary, accuracy and predicting weights etc.

co-efficient
regression model results
Regression Model Education articles:

Initially we run regression Model for predicting political related articles news for tomorrow. Correlation is 56.6% and Coefficient is 32% with Std error of 13.115 which is showing not optimum to predict news. Example tomorrow political related news size would be 430+/- 13.115 cm. This model effectiveness is 23%. Moreover, residual and regression is sig. with 7.703 of F value, 99% of significant. But predicting variable is EDU DECCAN CHRONICLE with T distribution value 4.121 sign. 99% remaining variables educational ET, Edu Times of India variables are not significant. Final conclusion of this model, we can predict only Deccan Chronicle paper news for tomorrow with less accuracy. In this model, accuracy is less so we should follow alternate models.

Below the regression summary tables are explaining the model related summary, accuracy and predicting weights etc.

Regression Model Summary

co-efficient

a. Dependent Variable: Week_number

b. Predictors: (Constant), EDU Times of India, EDU Deccan Chronicle, EDU ET

model wise results
Summary of Regression:

Not optimum, since effectiveness, accuracy of model is less. We continued, the using the other models. As the data is continues (series) we will use time series model, ARIMA or Exponential smoothing models. [6,7]

Time series Models Politics:

Above the regression models are not fitting to predict kind of news that’s going to be published tomorrow, then we choose alternate methods when time series data is available as an independent variable, we can go for Time series ARIMA time series models [22,23]. We are very much interested in the time series approach: auto regressive integrated moving average (ARIMA) models. ARIMA model is labelled as an ARIMA model (p, d, and q), wherein: “p” is the number of autoregressive terms; “d” is the number of differences; and “q” is the number of moving averages. In the auto regressive process, Autoregressive models assume that Yt is a linear function of the preceding values.

Below the ARIMA is R square and Stationery R2 are less 19% but comparatively regression model even ARIMA. Political papers articles related variables prediction/ fitted values are close to actual values in this model with moving average 6 and autoregression 1 RMSE 74.3 and 94% which means training data validation data model accuracy maximum absolute error 20.1 which is moderate to actual so Timeseries ARIMA[20,24,] models is best model for predict of Daily new for tomorrow we can estimate what king of news are publishing for coming day on online.

Arima1

Arima2

Time series Models Educational

Below the ARIMA is R square and Stationary R2 are less 36 % it better then comparatively regression model even ARIMA Educational related papers articles related variables prediction/ fitted values are close to actual values this model with moving average 6 and autoregression 1 RMSE 96 and 24% which means training data validation data model accuracy maximum absolute error 69 which is moderate to actual so Timeseries ARIMA models is best model for predict of Daily new for tomorrow we can estimate what king of news are publishing for coming day on online.

Model Description:

Time series fig
Model Statistics:

time series fig2

time series fig3
Conclusion Summary:

India is a democratic country with many religions, castes, and regional political parties, yet major political parties have news channels and newspapers, or are indirectly linked to newspapers, that favor local regional parties. During this process, the print media has been swayed by political party leaders, and they have published items for their own gain by ignoring their social responsibility to society.

from this experiment we could see the majority of the articles are of individual interests and not on social responsibility.

This article aims to help whomsoever wants to use this in order to predict the news and restrict or have a check on the news columns published.

References :

  1. Press Information Bureau, Government of India, Vice President’s Secretariat 08-December-2019 20:06 IST

  2. The Audit Bureau of Circulations (ABC) of India is a non-profit circulation-auditing organisation.

  3. Y.-W. Cheung, K.S. Lai Lag order and critical values of the augmented Dickey–Fuller test J. Bus. Econ. Stat., 13 (1995), pp. 277–280

  4. Stanny, Monika, and Wojciech Strzelczyk. 2015. Zróznicowanie przestrzenne sytuacji dochodowej gmin a rozw ˙ ój społecznogospodarczy obszarów wiejskich w Polsce. Roczniki Naukowe Stowarzyszenia Ekonomistów Rolnictwa i Agrobiznesu XVII: 301–7.

  5. Stanny, Monika, and Wojciech Strzelczyk. 2018. Kondycja Finansowa Samorz ˛adów Lokalnych a Rozwój Społeczno-Gospodarczy Obszarów wiejskich; Uj˛ecie Przestrzenne. Warszawa: Wyd. IRWiR PAN oraz Wyd. Naukowe Scholar Spółka z o.o., pp. 113–46.

  6. Vermeulen, Ben, and Andreas Pyka. 2018. The role of network topology and the spatial distribution and structure of knowledge in regional innovation policy. A calibrated agent-based model study. Computational Economics 52: 773–808. [CrossRef]

  7. Tang, Lijing, and Dongyan Wang. 2018. Optimization of County-Level Land Resource Allocation through the Improvement of Allocation Efficiency from the Perspective of Sustainable Development. International Journal of Environmental Research and Public Health 15: 2638. [CrossRef]

  8. Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big data. 2016;3(1):9.

  9. Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann; 2005.

10.Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Cunningham SJ. Weka: practical machine learning tools and techniques with java implementations. 1999.

11.Wu C-C, Yen-Liang C, Yi-Hung L, Xiang-Yu Y. Decision tree induction with a constrained number of leaf nodes. Appl Intell. 2016;45(3):673–85.

12.Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY, et al. Top 10 algorithms in data mining. Knowl Inform Syst. 2008;14(1):1–37.

13.Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. IEEE Access. 2018;6:35365–81.

14.Xu D, Yingjie T. A comprehensive survey of clustering algorithms. Ann Data Sci. 2015;2(2):165–93.

15.Zaki MJ. Scalable algorithms for association mining. IEEE Trans Knowl Data Eng. 2000;12(3):372–90.

16.Zanella A, Bui N, Castellani A, Vangelista L, Zorzi M. Internet of things for smart cities. IEEE Internet Things J. 2014;1(1):22–32.

17.Zhao Q, Bhowmick SS. Association rule mining: a survey. Singapore: Nanyang Technological University; 2003.

18.Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.

19.Zheng Y, Rajasegarar S, Leckie C. Parking availability prediction for sensor-enabled car parks in smart cities. In: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2015 IEEE Tenth International Conference on. IEEE, 2015; pages 1–6.

20.Zhu H, Cao H, Chen E, Xiong H, Tian J. Exploiting enriched contextual information for mobile app classification. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012; pages 1617–1621

21.Zhu H, Chen E, Xiong H, Kuifei Y, Cao H, Tian J. Mining mobile user preferences for personalized context-aware recommendation. ACM Trans Intell Syst Technol (TIST). 2014;5(4):58.

  1. Prybutok VR, Yi J, and Mitchell D. Comparison of neural network models with ARIMA and regression models for prediction of Houston’s daily maximum ozone concentrations. Eur J Oper Res 2000; 122(1): 31–40.

23.Ho SL, Xie M, and Goh TN. A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction. Comput Ind Eng 2002; 42(2–4): 371–375.

  1. Kandananond K. A comparison of various forecasting methods for autocorrelated time series. Int J Eng Bus Manage 2012; 4: 4.

Top comments (0)