What is Statistics?

Data is the master of the world now. Every event and activity happening across the globe starts with data, be it predicting the sales, revenue, and cost, analyzing drug effectiveness, tracking visitors in a store, forecasting the duration of the pandemic, predicting the best career for you or the best alliance, and tons of more such. All thanks to Statistics, which can be applied to data in every case domain. Knowledge of statistics gives businesses and every entity the power to transform the data into valuable information, motivating effective action plans. Moreover, applying statistical methods can give organizations a monopoly in their business, where the data gives them an edge over others.

Best Use Case of How Statistics Helped in Covid-19 Drug Development

To better explain the magic of Stats, let’s take an excellent use case of Covid-19 and healthcare. Then, let’s go through the use case step by step as we have followed.

Aim of the study

A healthcare organization wanted to predict the drug efficacy to treat Novel Coronavirus. In the early days of the pandemic, where data was scarce, healthcare organizations turned to statistical, computational techniques to predict if the drug was suited for curing the infected people. For this, we need to understand the following:

  • the proportion of infected people with different severity of the virus,
  • reach of the infected people,
  • likelihood of infected people spreading the virus.

Assumptions and hypothesis/proposals

The first step is to write a proposal or hypothesis. It is at this stage that a statistician creates premises like:

Assumption: The drug is not effective for treating coronavirus

Alternate Assumption: The drug is effective for treating coronavirus

Here medical researchers and statisticians make certain assumptions regarding the total number of people and the sample we will be considering for drug testing. In this use case, the sample size was 30% of the population. In addition, we collected samples from different strata of the population, viz., people of different ages, gender, and medical history groups.

So, the main aim will be to reject the null hypothesis.

Sample vs. Population

The next step is to study the quantitative data of symptomatic patients who show fever, cough, headache, and shortness of breath. For this, we consider the available data as the population. To study the patterns of the virus for drug development, we select the best sample from the population. The best sample represents the features of the population in the most appropriate way. So, most sample trends overlap with most population trends concerning data attributes like age, gender, geographical location, marital status, health status, etc. In addition, we consider different statistical measures like average, the middle most value, etc. That’s how we identify the best sample.

Data attributes

The data we collected for this specific use case includes the following relevant features:

  • Number of reported/confirmed cases
  • Transmission rate
  • Imposition of lockdown
  • Vaccination
  • Deaths

EDA using Descriptive statistics

To validate that we have chosen the most representative sample out of the population, we run some descriptive statistical techniques on the sample and population. This is to pre-analyze the sample data to make some basic inferences before moving further. Finally, we cleaned the data to make it fit for our analysis at this stage.

Inferential statistics

Post using descriptive statistics. We perform inferential statistics. Here, we check the frequency distribution of the patients like the number of active patients/total number of patients, number of recovered/total number of patients, and number of deaths/total number of patients. We also calculate the probabilities of how many people might get the virus and the relative frequencies of patients with high, medium, and mild severity. Finally, we also check the possibilities of people getting affected by the virus post-vaccination. I hope we are clear till now!

AB testing

Once we are ready with the data, we select a small proportion of the sample as a treatment group on which we will do the drug effect analysis. Another group, similar to the treatment group, is kept in control and will not be treated with the drug. Finally, the results from treatment and control groups will be compared to understand if the drug is effective through multiple trials. The group that will not get the drug is considered the control or comparator group to analyze whether the changes in the drug-applied group are attributed to the drug or just a result of chance. Interesting right?

Now that we have made some assumptions for drug development, we will test to understand if the drug is effective.

Phase 1: The new drug is being applied to a small fraction of the treatment group to understand its safety, get an immune response, and understand the precise dosage. The drug is given to people with no morbidity issues and is medically healthy.

Phase 2: The drug is given to more people in the treatment group to assess its effects further and generate a stable immune response. People in this group have the same characteristics regarding sex, age, marital status, and other demographics as the control group.

Phase 3: The drug is extended to thousands of people in the testing group, again with the same characteristics as the comparator group. This analyzes the safety of the drug to a much larger population. These trials are being repeated across different countries while creating a new cure for the pandemic. This is the end step of the trial.

Phase 4: Finally, the treatment and comparator groups are subject to similar tests, and the results are analyzed. Once the results are available, we see that the drug positively impacts the treatment group compared to the control group, and it is not by random chance.

Hypothesis testing

Using statistical testing depending on population distribution, we see the drug is significant in curing the virus with statistical significance. Thus, the effectiveness of the medicine is not just a chance of randomness. Therefore, we reject our base assumption. The efficacy of the drug is proved effective. Hence, our medicine is effective in curing novel coronavirus.

So, how did you feel about this beautiful drug development case for the coronavirus using statistical techniques? Statistics become an essential part not just during the formulation of a cure but also in predicting its effectiveness until it generates ROI for the manufacturer.


Now that we understand what statistics is through a mind-boggling story, the next thing to understand is how to work with statistics. For this, understanding the types of data is a prerequisite. If this story has stirred your curiosity further to understand the magic of this brilliant subject of Statistics and its usage in every possible usecases, stay tuned with us.