Importance of statistical tools in business research


















Finally, there is a summary of parametric and non-parametric tests used for data analysis. Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.

An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice. Variable is a characteristic that varies from one individual member of population to another individual.

Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ]. Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… integer , whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data.

Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.

A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ]. Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist as in gender male and female , it is called as a dichotomous or binary data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.

Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale. Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature.

Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.

Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode.

Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1. The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.

The measures of central tendency are mean, median and mode. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is. Median[ 6 ] is defined as the middle of a distribution in a ranked data with half of the variables in the sample above and half below the median value while mode is the most frequently occurring variable in a distribution.

Range defines the spread, or variability, of a sample. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into equal parts. The median is the 50 th percentile. Variance[ 7 ] is a measure of how spread out is the distribution.

It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:. The variance of a sample is defined by slightly different formula:. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used.

The square root of the variance is the standard deviation SD. The SD of a sample is defined by slightly different formula:. An example for calculation of variation and SD is illustrated in Table 2. Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point. It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1.

In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail. In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis plural hypotheses is a proposed explanation for a phenomenon.

Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects. Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 where 0 indicates impossibility and 1 indicates certainty. Alternative hypothesis H 1 and H a denotes that a statement between the variables is expected to be true. The P value or the calculated probability is the probability of the event occurring by chance if the null hypothesis is true.

The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ]. However, if null hypotheses H0 is incorrectly rejected, this is known as a Type I error.

Numerical data quantitative variables that are normally distributed are analysed with parametric tests. The assumption of normality which specifies that the means of the sample group are normally distributed.

The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal. However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

The parametric tests assume that the data are on a quantitative numerical scale, with a normal distribution of the underlying population. The samples have the same variance homogeneity of variances. The samples are randomly drawn from the population, and the observations within a group are independent of each other. Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups.

It is used in three circumstances:. To test if a sample mean as an estimate of a population mean differs significantly from a given population mean this is a one-sample t -test. The formula for one sample t -test is. To test if the population means estimated by two independent samples differ significantly the unpaired t -test. The formula for unpaired t -test is:.

To test if the population means estimated by two dependent samples differ significantly the paired t -test. A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment. The group variances can be compared using the F -test. If F differs significantly from 1. The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.

The within-group variability error variance is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples. However, the between-group or effect variance is the result of our treatment. These two estimates of variances are compared using the F-test. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time. As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated.

When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests distribution-free test are used in such situation as they do not require the normality assumption. That is, they usually have less power.

As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5.

Median test for one sample: The sign test and Wilcoxon's signed rank test. The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.

Therefore, it is useful when it is difficult to measure the values. Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums. It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.

The two-sample Kolmogorov-Smirnov KS test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical.

The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.

The Kruskal—Wallis test is a non-parametric test to analyse the variance. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic. In contrast to Kruskal—Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal—Wallis test.

In the UK, the latest research studies indicate that more than half of all small and medium sized enterprises do not survive for more than five years. Furthermore, recent evidence suggests that the Darwinian nature of market competition is alive and well amongst larger businesses too. The recent financial crisis has seen many larger companies fall victim to their long-term failure to evolve and adapt to ever changing customer needs.

Organisations such as Comet, HMV and Woolworths are among the prominent examples within the high-street retail domain. This paper will describe how statistical research techniques can help businesses to avoid similar scenarios by playing a powerful role in allowing them to understand their market. When seeking to build and maintain a strong market position, organisations face a number of key challenges.

These challenges are represented within each stage of the typical marketing bow-tie model. When seeking to address these challenges, it is vital that organisations have a strong understanding of the needs of their target audience. Once an understanding of needs has been developed, it is possible for organisations to develop a compatible product or service offering that, with the correct marketing strategy, will attract new prospects and encourage them to become customers.

In addition to attracting customers, a strong understanding of needs also allows companies to segment their audience into homogenous groups who share the same needs. Through segmentation, organisations can align their offering with different groups by developing bespoke customer value propositions for each. In the case of HMV, the early development of a distinct strategy for the newly emerging younger, digital segment of music consumers may have helped the company to fend off financial difficulties.

Moreover, segmentation allows businesses to choose where to compete in the market. Companies may choose not to pursue customers who are difficult or costly to serve. A successfully implemented segmentation strategy should lead to the development of a competitive advantage over competitors which, in turn, should be reflected in improved financial performance.

Aside from the product or service offering, once a relationship is in place with customers, it is imperative that businesses seek to understand what drives customer satisfaction and loyalty.

Therefore, ensuring that customers remain loyal, return and repurchase products or services is often a more cost effective strategy than seeking to win new business.

Market research plays a central role in helping organisations to speak to customers and prospects. Customer research studies allow businesses to engage with customers directly about their needs and measure their performance across all aspects of the customer experience. However, research surveys that are badly designed can lead to businesses learning little more than what they already know or, in some cases, acting on ill-informed conclusions. This paper will examine two statistical research techniques designed to help organisations to go beyond the basic and gain a much deeper understanding of needs and loyalty — MaxDiff analysis and derived importance.

Customer loyalty research measures the satisfaction levels of customers across all areas of the customer journey, from initial contact through to the ongoing business relationship. However, though satisfaction levels tell us where a business performs strongly or poorly, they should not necessarily be used to decide on future actions from the research.

Instead, when seeking to improve the customer experience through investment, it is important that businesses understand what drives or strengthens customer loyalty. Take the example of an organisation that has discovered that its customers are relatively dissatisfied with the way it processes and invoices orders.

The company may consider spending millions of pounds on a new customer invoicing system in an attempt to improve satisfaction with invoicing, with a view to improving the overall experience. However, such action may represent a futile investment if satisfaction with invoicing is not a driver of the overall customer experience. There is also an opportunity cost to consider — the money may well be better invested on another aspect of the customer journey.

In order to understand where to prioritise action, businesses need a method to understand what drives customer loyalty. It is possible to ask customers directly what drives their loyalty within surveys. However, there is a tendency for customers to focus on the rational aspects of the relationship, such as prices, when self-stating what is important to them. Issues that are more irrational or softer in nature, such as the business relationship, often end up being underrated and disregarded.

Derived importance refers to the use of statistical correlation to understand the hidden relationship between overall satisfaction or loyalty and satisfaction with individual attributes of the customer experience. Derived importance is typically employed within quantitative customer studies. It can be calculated by asking research respondents to rate their overall satisfaction with a supplier on a monadic rating scale.

Typically, a one to ten scale is employed to measure performance, as in the example below:. Equivalent questions are then used to measure satisfaction with other areas of the customer journey, such as satisfaction with the product, prices and the business relationship. Once data has been gathered, a simple correlation analysis can be performed to understand the strength of the relationship between each individual attribute and the overall score.

As a general rule of thumb, a sample size of at least is required in order to uncover relationships that are statistically significant and stable. However, in the correct circumstances, correlation can still be performed on smaller sample sizes. As a measure of correlation, derived importance can be interpreted on a scale from -1 to 1, where, between the individual attribute and overall satisfaction or loyalty:.

In normal circumstances, all satisfaction attributes should enjoy a positive relationship with overall satisfaction or loyalty. It is rare to uncover a negative relationship when derived importance is calculated from using a suitable sample of respondents. Typically, a score of 0. Therefore, we can determine than an improvement in this attribute is likely to have a positive effect on the overall score.

The two images below show examples of how derived importance results can be presented. Correlation scores can be displayed in a simple table, as shown in Figure 2, with shading to highlight which area of the customer experience the attributes correspond with. Figure 3 shows how derived importance can be used to prioritise actions from the research. Individual attributes can be plotted in a matrix of current performance vs derived importance.



0コメント

  • 1000 / 1000