

ORIGINAL ARTICLE 

Year : 2010  Volume
: 14
 Issue : 3  Page : 155159 


A comparison of ordinal regression models in an analysis of factors associated with periodontal disease
Shivalingappa B Javali^{1}, Parameshwar V Pandit^{2}
^{1} Department of Biostatistics, SDM College of Dental Sciences, Dharwad, India ^{2} Department of Statistics, Bangalore University, Bangalore, Karnataka, India
Date of Submission  26Aug2009 
Date of Acceptance  17Jun2010 
Date of Web Publication  20Jan2011 
Correspondence Address: Shivalingappa B Javali Department of Biostatistics, SDM College of Dental Sciences, Sattur, Dharwad  580 009, Karnataka India
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/0972124X.75909
Abstract   
Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity) by using different regression models for ordinal data. Design: A crosssectional design was employed using clinical examination and 'questionnaire with interview' method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 1840 years. The periodontal disease examination was conducted by using Community Periodontal Index for Treatment Needs (CPITN). Statistical Analysis Used: Regression models for ordinal data with different builtin link functions were used in determination of factors associated with periodontal disease. Results: The study findings indicated that, the ordinal regression models with four builtin link functions (logit, probit, Cloglog and nloglog) displayed similar results with negligible differences in significant factors associated with periodontal disease. The factors such as religion, caste, sources of drinking water, Timings for sweet consumption, Timings for cleaning or brushing the teeth and materials used for brushing teeth were significantly associated with periodontal disease in all ordinal models. Conclusions: The ordinal regression model with Cloglog is a better fit in determination of significant factors associated with periodontal disease as compared to models with logit, probit and nloglog builtin link functions. The factors such as caste and time for sweet consumption are negatively associated with periodontal disease. But religion, sources of drinking water, Timings for cleaning or brushing the teeth and materials used for brushing teeth are significantly and positively associated with periodontal disease. Keywords: CPITN, ordinal data and builtin link functions, ordinal regression model, periodontal disease
How to cite this article: Javali SB, Pandit PV. A comparison of ordinal regression models in an analysis of factors associated with periodontal disease. J Indian Soc Periodontol 2010;14:1559 
How to cite this URL: Javali SB, Pandit PV. A comparison of ordinal regression models in an analysis of factors associated with periodontal disease. J Indian Soc Periodontol [serial online] 2010 [cited 2019 Oct 21];14:1559. Available from: http://www.jisponline.com/text.asp?2010/14/3/155/75909 
Introduction   
Periodontal disease is the major component of oral health that is often measured in epidemiologic studies on an ordinal scale. But data of this type are generally reduced for analysis to a dichotomy. Several statistical models have been developed to make use of information in ordinal response data, but those techniques have not been much used in analyzing data corresponding to epidemiologic studies. In this article, we discuss an overview of logistic regression models for ordinal data based on cumulative and conditional probabilities. The most popular ordinal regression models are embedded under different link functions in the framework of generalized linear models. The application of the proposed model with different link functions to data of periodontal disease of 1,760 random samples confirmed that generalized linear models are easy to use and interpret but gave results quite different to those obtained using binary (simple) logistic regression after dichotomizing outcome in the conventional way.
Many variants of regression models for analyzing ordinal response variables have been developed and described during the past years. ^{[1],[2],[3],[4],[5],[6],[7],[8],9[],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20]} Compared to frequently used methods for binary and nominal data, ordinal regression models have the advantage that they make full use of ranked data. ^{[13],[16],[18]} Nevertheless, these models have been underutilized in biomedical and epidemiological research. Therefore, epidemiological data analyses concerning risk factors rely heavily on regression models. The choice of a model is largely determined by the scale of measurement of the response variable. ^{[3]}
Epidemiologists and statisticians are often interested in estimating the risk of adverse events, originally measured on interval scale (such as attachment loss), but they often choose to decide the outcome on two or more categories in order to compute an estimate of effects of covariates. Similarly, some response variable originally measured on an ordinal scale (severity of periodontal disease) is often categorized into several binary variables during statistical analysis. As a motivating example, the Community Periodontal Index for Treatment Needs (CPITN) was used to assess the pattern or severity of periodontal disease. The severity of periodontal disease response was recorded on a 5level ordinal scale. Usually such data are analyzed by ordinal logistic model rather than by creating dichotomy among the levels of periodontal disease (with and without periodontal disease).
Although such approaches are not incorrect, they often result in loss of information due to collapsing of some groups of the response variable and considerable amount of loss of statistical power in results. Therefore, if researchers wish to study the effects of independent variables on all levels of ordered categorical response, an ordinal regression method must be appropriately chosen in order to obtain valid results. But in statistical literature, several statistical models for ordinal response have been proposed; however, their utilization in the dental epidemiological and biomedical literature has been minimal and least. Evaluation of the usefulness of ordinal models in dental epidemiological research with particular emphasis on model formation includes severity of periodontal disease as a response variable.
In this study, the ordinal regression model was used to model relationship between the ordinal outcome (i.e., different levels of severity of periodontal disease) and independent variables. The framework of ordinal regression model is described with data set in the following section.
Application  CPITN index data
Let Y (periodontal disease) be a categorical response variable with k+1 (k=4) ordered categories coded as 0, 1, 2, 3, 4. Here, we consider the severity of periodontal disease as a response variable given by ordered categories, with higher values indicating more severity, as given below:
The major goal of this article was to use applications of an ordinal logistic regression model for modeling CPITN with different builtin link functions ^{[20]} to predict the probability of occurrence of periodontal disease. The following builtin link functions were considered.
The strengths of the ordinal regression model with above four builtin link functions are briefly described. Firstly, many indicators concerning periodontal disease outcome (CPITN) are frequently measured on an ordinal scale. Thus, the ordinal regression model seems to have a broad marketplace to analyze diverse periodontal disease outcomes. Second, comparable to logistic regression model, an ordinal regression model can be used to perform the following tasks:
 To identify significant independent variables that influence the ordinal response, i.e., periodontal disease
 To describe the direction of the relationship between the ordinal outcome, i.e., periodontal disease, and the independent variables
 To analyze for all levels of the ordinal outcome, i.e., periodontal disease, and subsequently evaluate and predict validity of the regression model.
Third, the four different link functions are used to model the effects of independent variables on the ordinal response. Finally, the model assumes that the relationship between the ordinal outcome and the independent variable is independent of the category. This assumption implies that the corresponding regression coefficients in the link function are equal for each cutoff point. ^{[21]} Therefore, it is easy to construct and interpret the ordinal regression model, which requires only one model assumption and produces only one set of regression coefficients.
Materials and Methods   
Study area
The crosssectional study was conducted during June to October 2008 in Dharwad, Karnataka, India. Dharwad is situated in north Karnataka and is one of the educational centers.
Study population and sampling procedure
The crosssectional study involved a systematic random sample of 1760 individuals aged 1840 years. Sample size was determined based on the results of pilot study, which showed that standard deviation (SD) of CPITN score was 0.8120 under precision of 5% and confidence level of 99%. The sample size was estimated to be 1,756 @ 1,760. The mean age of the study subjects was 34.267.28 years.
Clinical examination
The periodontal disease (CPITN) examination was carried out by two qualified dental surgeons using the standardized and widely accepted procedure recommended by the WHO report on oral health, ^{[22]} with mouth mirror, CPITN probe, dental explorer, disposable gloves and sterilized instruments under artificial light. Before the start of the actual study, a pilot study was conducted to assess the intra and interexaminer agreement for recording CPITN scores on a convenient sample size of 140 study subjects. The intraexaminer agreement was 0.8719 (first examiner) and 0.7193 (second examiner), respectively. The interexaminer (between the two examiners) agreement was found to be 0.8795.
Besides the data on periodontal disease (CPITN), the data were also collected on various characteristics, like socioeconomicsociodemographic characteristics, food habits, eating habits, oral hygiene practices and deleterious habits, using 'structured questionnaire and personal interview' procedure. The CPITN (periodontal disease) data was considered as an ordinal response variable. The 17 independent variables of the study were as follows: Socioeconomicsociodemographic characteristics included gender (male=1, female=0), age (as a continuous variable), religion (Hindu=1, nonHindu=0), caste (SC/ST/OBC=0, GM=1), SocioEconomic status (low=0, intermediate=1, high=2) ^{[23]} and family size (as continuous variable). Food habits included types of diet (vegetarian=0, nonvegetarian=1). Eating habits were assessed in terms of frequency of sweet consumption (per day) (once=1, twice=2, more than twice=3). Oral hygiene practices were measured in terms of oral hygiene habits (finger=0, brush/others=1), frequency of brushing (once=1, twice or more=2), methods of brushing (circular/vertical=1, horizontal=2), materials used for brushing teeth (paste/powder=1, others=2), types of toothpaste (nonfluoridated=0, fluoridated=1), duration of change of toothbrush (13 months=0, >3 months=1) and mouth rinsing habit (no=0, yes=1). The deleterious habits were assessed through smoking habit (no=0, yes=1) and chewing habit (no=0, yes=1). Since a crosssectional design was adapted for the present study, data collection regarding the abovementioned characteristics was based on the information at the time of data collection and not on past history.
Data analysis
The major goal of this article was to utilize the application of ordinal logistic regression model with different builtin link functions, viz., logit, probit, Cloglog and nloglog, in the estimation of significant factors associated with periodontal disease. There is no clearcut method to determine the order of preference of using different link functions. However, the logit link and Cloglog link are generally suitable for analyzing the ordered categorical data evenly distributed among all categories. Lastly, the investigators were also interested in establishing the fitting performance of ordinal regression model with different builtin link functions, viz., logit, probit, Cloglog and nloglog, ordinal response by using log likelihood and Akaike information criteria (AIC). Statistical significance was set at 5% level of significance (P<.05) ^{[24],[25]}
Results   
The periodontal disease Community Index for Treatment needs (CPITN) ordinal data set was analyzed. Comparisons in terms of estimates, log likelihood and AIC values in particular were carried out for model with four builtin link functions and these are discussed and presented in this article. The results of estimates of ordered regression model with four builtin link functions on five categories of periodontal disease are presented in [Table 1].  Table 1 :Estimates of determinants of five categories of periodontal disease using ordinal regression model with different builtin link functions
Click here to view 
It shows that, three thresholds of the model equation are significantly different from zero and substantially contributed to the values of the response probability in different categories in regression model with four builtin link functions. Out of 21 covariates, only 6 covariates are significantly associated with periodontal disease, in which caste and time for sweet consumption exhibited negative regression coefficients, indicating that these are negatively associated with CPITN. This means that, they are likely to decrease the higherorder scores of CPITN. However, the four covariates, viz., religion, sources of drinking water, timings of cleaning teeth and materials used for brushing teeth, are positively associated with CPITN. These significant covariates exhibited positive regression coefficients. This indicates that, these are likely to increase with the higherorder scores of CPITN in all four builtin link functions.
Further, according to order of suitability, the ordinal regression model with Cloglog builtlink function is a better fit (−1908.49) as compared to nloglog builtin link function (−1992.05), logit builtin function (−2078.36) and probit builtin link function (−2099.90). This is also supported by AIC vales. AIC is smallest in ordinal regression model with Cloglog builtin link function (2.19), followed by nloglog builtin link function (2.29), logit builtin link function (2.39) and probit builtin link function (2.41) [Table 2]. Therefore, we conclude that the ordinal regression model with Cloglog builtin link function is a better fit as compared to model with logit, nloglog and probit builtin link functions to periodontal disease ordinal data.  Table 2 :Performances ordinal regression model with different builtin link functions
Click here to view 
Discussions and Conclusions   
It is convenient for us to analyze ordinal outcome by means of logistic and linear regression analyses. By altering the measuring scale of ordinal outcome, we are able to analyze data and produce research findings. However, the loss of information or incorrect analysis may have occurred in some cases. For instance, when the scale of outcome categories (e.g., healthy, bleeding calculus; shallow pocket and deep pocket) is arbitrarily collapsed into a binary measure (e.g., without disease and with disease), we are forced to use logistic regression analysis to analyze the two levels of ordinal outcome. By doing so, important information may be lost in the resulting model. Therefore, we study the effects of independent variables on all levels of the ordered categorical outcome; an ordinal regression method must be appropriately chosen in order to obtain valid research results. Using the ordinal regression method, researchers could identify significant independent variables with their control to enhance occurrence of periodontal disease.
We agree with Ananth and Kleinbaum ^{[16]} ; Scott, Goldberg and Mayo ^{[18]} ; Rolf and Axel ^{[26]} that ordinal regression models should be more widely used in epidemiology and biomedical research, especially in dental epidemiology. However, for adequate use, one has to be very careful about the goodness of fit and validity of model assumptions. If the usual assumption of equal slopes for all ordinal response levels is fulfilled by the data, the standard models with different builtin link functions (logit, probit, Cloglog, nloglog represent the powerful tools producing easily interpretable parameters which summarize the effects of independent variables over all response levels. In the case of ordinal responses, much more effort by the researcher themselves is required to find models describing the data adequately. Nowadays different statistical softwares offer an easy access to the standard ordinal regression models with builtin link functions (logit, probit, Cloglog, n (n)loglog). ^{[19],[27]}
On analyzing the results of this study, negligible differences were observed in ordinal models with different built in link functions with their log likelihood estimates and comparable in practical applications of periodontal disease data. This can be explained by the fact that the ordinal regression models with different builtin link functions are equivalent in any case. ^{[28]} On the other hand, all the link functions are quite similar, at least for small probabilities. ^{[9]} Then again, all builtin link functions would usually not lead to quite different estimated associations between the independent variables and the response variable. All builtin link functions that were considered here did not result in quite different estimates of response, but found differences in likelihood ratio chisquare values. The 'goodness of fit' statistic was acceptable, but similar to Pearson's and deviance methods.
In summary, there are no differences of practical relevance in ordinal responses of periodontal disease between the results of models with four builtin link functions. All builtin link functions provided similar findings, which must be checked carefully before a model with link can be applied adequately. The choice of the model with builtin link functions depends on the researcher's preference. ^{[29]}
References   
1.  McCullagh P. Regression models for ordinal data (With discussion). J R Stat Soc B 1980;42:10942. 
2.  Anderson JA. Regression and ordered categorical variables (With discussion). J R Stat Soc B 1984;46:130. 
3.  Greenland S. An application of logistic models to the analysis of ordinal responses. Biom J 1985;27:18997. 
4.  Ashby D, Pocock SJ, Shaper AG. Ordered polytomous regression: An example relating serum biochemistry and haematology to alcohol consumption. Appl Stat 1986;35:289301. 
5.  Greenwood C, Farewell V. A comparison of regression models for ordinal data in an analysis of transplantkidney function. Can J Stat 1988;16:32535. 
6.  Agresti A. Tutorial on modeling ordered categorical response data. Psychol Bull 1989;105:290301. [PUBMED] [FULLTEXT] 
7.  Armstrong B, Sloan M. Ordinal regression models for epidemiologic data. Am J Epidemiol 1989;129:191204. 
8.  Ashby D, West CR, Ames D. The ordered logistic regression model in psychiatry: Rinsing prevalence of dementia in old people′s homes. Stat Med 1989;8:131726. 
9.  McCullagh P, Nelder JA. Generalized linear models. New York: Chapman and Hall; 1989. 
10.  Haste TJ, Botha JL, Schnitzler M. Regression with an ordered categorical response. Stat Med 1989;8:78594. 
11.  Peterson B, Harrell FE Jr. Partial proportional odds model for ordinal response variables. Appl Stat 1990;39:20517. 
12.  Holtbrugge W, Schumacher M. A comparison of regression models for the analysis of ordered categorical data. Appl Stat 1991;40:24959. 
13.  Lee J. Cumulative logit modeling for ordinal response variables: Applications of biomedical research. Comput Appl Biosci 1992;8:55562. [PUBMED] 
14.  Greenland S. Alternative models for ordinal logistic regression. Stat Med 1994;13:166577. [PUBMED] 
15.  Cox C. Location scale cumulative odds models for ordinal data: A generalized nonlinear model approach. Stat Med 1995;14:1191203. [PUBMED] 
16.  Ananth CV, Kleinbaum DG. Regression models for ordinal data: A review of methods and applications. Int J Epidemiol 1997;26:132333. [PUBMED] [FULLTEXT] 
17.  Cox C. Multinomial regression models based on continuation ratios. Stat Med 1997;7:43541. 
18.  Scott SC, Goldberg MS, Mayo NE. Statistical assessment of ordinal outcome in comparative studies. J Clin Epidemiol 1997;50:4555. [PUBMED] [FULLTEXT] 
19.  Bender R, Grouven U. Using binary logistic regression models for ordinal data with nonproportional odds. J Clin Epidemiol 1998;51:80916. [PUBMED] [FULLTEXT] 
20.  McCullagh P, Nelder JA. Generalized linear models. 2 ^{nd} ed. London: Chapman and Hall; 1983. 
21.  Bender R, Benner A. Ordinal regression models. Biomed J 2000;42:6. 
22.  World Health Organization. Oral health surveys. Basic Methods. Geneva: WHO; 1997. 
23.  Prasad BG. Social classification of Indian families. J Indian Med Assoc 1961;37:2501. 
24.  SPSS, Inc. Ordinal Regression Analysis, SPSS Advanced Models 10.0., Chicago, IL, 2002. 
25.  Intercooled Stata 9.2 for Windows (2006), Stata Corp LP, 4905 Lake way Drive, College Station, TX 77845, USA. 
26.  Ralf B, Axel B. Calculating Ordinal Regression Models in SAS and SPlus. Biomed J 2000;42:67799. 
27.  Harell FE Jr. Designs and functions for biostatistical /epidemiologic modeling, testing, estimation, validation, graphs, and prediction. Functions available on the Web in the StatLib repositary of tatistical software. Available from: http://www.lib.stat.edu/S/Harrell/. [last cited on 1998a]. 
28.  Laara E, Mathews JN. The equivalence of two models for ordinal data. Biometrika 1985;72: 2067. 
29.  Harell FE Jr, Margolis PA, Gove S, Manson KE, Mulholland EK, Lehmann D, et al. Tutorial in Biostatistics: Occurrence of a clinical prediction model for an ordinal outcome: The World Health Organization Multicentre Study of Clinical Signs and Etiological Agents of Pneumonia, Sepsis and Meningitis in Young Infants. Stat Med 1998b;17:90944. 
[Table 1], [Table 2]
