Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for T-Test and Chi-Square Test in Data Analysis 🐍🤖🧠
Anand
Anand

Posted on

     

T-Test and Chi-Square Test in Data Analysis 🐍🤖🧠

We apply these tests on data to determine whether there are statistically significant differences or associations between groups or variables

Image

T-Test

Overview

The T-test is a statistical test used to compare the means of two groups to determine if they are significantly different from each other. It is commonly used when the data follows a normal distribution and the sample size is small.

Types of T-Tests

  1. Independent T-Test: Compares the means of two independent groups.
  2. Paired T-Test: Compares means from the same group at different times.
  3. One-Sample T-Test: Compares the mean of a single group against a known mean.

Example
Suppose we want to compare the test scores of students from two different classes to see if there is a significant difference

importnumpyasnpfromscipyimportstats# Sample dataclass_a_scores=[85,86,88,75,78,94,91,88]class_b_scores=[82,84,80,72,76,90,89,85]# Perform the t-testt_stat,p_value=stats.ttest_ind(class_a_scores,class_b_scores)print(f"T-Statistic:{t_stat}, P-Value:{p_value}")
Enter fullscreen modeExit fullscreen mode

output :T-Statistic: 1.07950662400349, P-Value: 0.2986093279117022

Chi-Square Test

Overview
The Chi-Square Test is used to determine if there is a significant association between two categorical variables. It compares the observed frequencies of occurrences with the expected frequencies.

Types of Chi-Square Tests

  1. Chi-Square Test for Independence: Assesses whether two categorical variables are independent.
  2. Chi-Square Goodness of Fit Test: Determines if a sample data matches a population.

Example

Suppose we want to check if there is an association between smoking status (smoker/non-smoker) and exercise frequency (regular/irregular).

importnumpyasnpfromscipy.statsimportchi2_contingency# Sample data in a contingency table# Rows: Smoking Status (Smoker, Non-Smoker)# Columns: Exercise Frequency (Regular, Irregular)data=np.array([[15,35],[40,10]])# Perform the Chi-Square testchi2,p,dof,expected=chi2_contingency(data)print(f"Chi-Square Statistic:{chi2}, P-Value:{p}")
Enter fullscreen modeExit fullscreen mode

output:Chi-Square Statistic: 20.833333333333336, P-Value: 5.223051050415452e-06

Impact of T-Test and Chi-Square Test in Data Analysis

T-Test

  • Comparing Group Means: Helps in comparing the means of two groups, useful in experiments and A/B testing.
  • Hypothesis Testing: Assists in determining if observed differences are statistically significant.
    Chi-Square Test

  • Association Between Variables: Useful in understanding relationships between categorical variables, such as demographic factors and preferences.

  • Goodness of Fit: Helps in determining if a sample distribution fits an expected distribution, useful in model validation.


→ Let's perform a T-test and a Chi-Square test using datasets from thesklearn library.

T-Test Example
We'll use the Wine dataset from sklearn for the T-test. The Wine dataset contains data on various chemical properties of wines from three different cultivars. We'll compare the mean of one of the chemical properties (e.g., alcohol content) between two of these cultivars

fromsklearn.datasetsimportload_wineimportpandasaspdfromscipyimportstats# Load the wine datasetwine=load_wine()wine_data=pd.DataFrame(data=wine.data,columns=wine.feature_names)wine_data['target']=wine.target# Extract data for two cultivars (e.g., 0 and 1)cultivar_0=wine_data[wine_data['target']==0]['alcohol']cultivar_1=wine_data[wine_data['target']==1]['alcohol']# Perform the t-testt_stat,p_value=stats.ttest_ind(cultivar_0,cultivar_1)print(f"T-Statistic:{t_stat}, P-Value:{p_value}")
Enter fullscreen modeExit fullscreen mode

output: T-Statistic: 16.478551495156527, P-Value: 1.9551698789379198e-33

Chi-Square Test Example
We'll use the Iris dataset from sklearn for the Chi-Square test. This dataset contains measurements of various features of Iris flowers from three different species. We'll test if there is an association between the species and a categorical feature created from one of the numerical features (e.g., sepal length).

fromsklearn.datasetsimportload_irisimportpandasaspdfromscipy.statsimportchi2_contingency# Load the iris datasetiris=load_iris()iris_data=pd.DataFrame(data=iris.data,columns=iris.feature_names)iris_data['species']=iris.target# Create a categorical feature from a numerical feature (e.g., sepal length)iris_data['sepal_length_cat']=pd.qcut(iris_data['sepal length (cm)'],q=3,labels=['short','medium','long'])# Create a contingency tablecontingency_table=pd.crosstab(iris_data['sepal_length_cat'],iris_data['species'])# Perform the Chi-Square testchi2,p,dof,expected=chi2_contingency(contingency_table)print(f"Chi-Square Statistic:{chi2}, P-Value:{p}")
Enter fullscreen modeExit fullscreen mode

output : Chi-Square Statistic: 123.28296703296704, P-Value: 1.0624436052362445e-25

Conclusion

Both T-tests and Chi-Square tests are essential tools in data analysis, providing insights into the relationships between variables and helping to validate hypotheses. Their proper application can lead to meaningful conclusions and better decision-making based on statistical evidence.

note: You can run the above Python code in your environment to see the results of the T-test and Chi-Square test on these datasets.


About Me:
🖇️LinkedIn
🧑‍💻GitHub

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Computer Science And Engineering|| S𝜏αყ ԋᥙɳցɾყ ട𝜏αყ ⨍σσɬιടԋ ||
  • Location
    Bharat
  • Education
    Btech- Computer science and engineering
  • Pronouns
    linuXian
  • Work
    Data Analyst
  • Joined

More fromAnand

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp