Accept or Reject an Outlier?

Stephen Lukacs (2) iquanta.org/instruct/python
Enter the measurements to be averaged below...

Which datapoint to test?


, or, just Upload to run the demonstration.

Understanding Grubb's Outlier Test

The Grubb's test was first proposed by Frank Grubb in 1950. He proposed that with any one-dimensional dataset, that the smallest or largest outlier may be thrown out of that dataset if it is truly an outlier. The Grubb's test uses statistical analysis to determine if the outlier should be rejected and disregarded or must be accepted and thus included and maintained within the dataset.

It is a three step process. First, calculate the Grubb's value using the equation: $$g = \frac{\mid outlier - \bar{x} \mid}{ \sigma }$$ where the outlier is either the minimum or maximum datapoint of the dataset to be tested, \(\bar{x}\) and \(\sigma\) is the average or mean and standard deviation of the dataset, respectively.

Second, look up the Grubb's critical value for the count of datapoints, n, and the confidence or certainty level, CL, required in the below table. And finally, third, if the above g value is less than or equal to the critical value, then the datapoint must be accepted and included. If the g value is greater than the critical value, then you may reject and disregard that datapoint.

. .. Grubb's Critical Values .. .

n50%CL80%CL90%CL95%CL98%CL99%CL99.5%CL99.9%CL
31.000001.129471.148371.153121.154451.154641.154681.15470
41.125001.350001.425001.462501.485001.492501.496251.49925
51.229031.489711.601631.671391.725281.748861.763681.78025
61.316601.594291.728881.822121.903601.944251.972822.01074
71.391311.678501.827981.938132.041612.097302.139112.20059
81.456041.749081.908952.031652.152492.220832.274372.35863
91.512911.809781.977262.109562.244272.323152.386812.49195
101.563501.862962.036232.176072.322032.409722.482082.60593
111.608951.910222.088012.233912.389152.484282.564122.70460
121.650161.952702.134102.284952.447962.549422.635732.79095
131.687801.991232.175562.330542.500112.607022.698972.86730
141.722412.026452.213202.371652.546852.658482.755372.93538
151.754402.058852.247622.409042.589092.704862.806112.99656
161.784132.088842.279312.443272.627562.746962.852083.05192
171.811872.116722.308632.474812.662822.785452.894013.10233
181.837862.142762.335912.504022.695322.820822.932483.14846
191.862302.167172.361392.531192.725422.853502.967953.19090
201.885342.190142.385272.556582.753432.883823.000803.23012
211.907142.211812.407752.580392.779592.912083.031363.26649
221.927802.232312.428952.602782.804112.938503.059883.30036
231.947452.251762.449012.623922.827162.963303.086593.33200
241.966162.270262.468052.643912.848902.986633.111693.36164
251.984012.287882.486142.662872.869463.008643.135333.38950
lecture by Stephen Lukacs, Ph.D., ©2011 - 2023; updated: March 7, 2023. all data confirmed via lecture_data_analysis.nb.
"""
reference: https://iquanta.org/instruct/python ::: Statistics 3: Grubb's Outlier Test ::: Stephen Lukacs, Ph.D. ©2023-02-14
"""
from py4web import URL, request
from yatl.helpers import *
from iquanta.mcp import is_str_float, is_str_int, str_to_float, str_to_int, extra_x
from iquanta.chmpy import Gtest, grubbs

BR, B = TAG['br/'], TAG['b']
#demo_data = "0.1190\n0.09847\n0.09852"
demo_data = "6.18\n6.28\n4.85\n6.49"

rtn = FORM(_action=None, _method="post")
if ('txtfile' in request.forms):
    txt, data = request.forms['txtfile'], [ ]
    for l in txt.strip().split('\n'):
        if is_str_float(l.strip()):
           data.append(str_to_float(l.strip()))
    #rtn.append(CAT(data, BR()))
otype = request.forms.get('otype')

rtn.append(STYLE("input[type=text] { width: 70px; text-align: center; border-radius: 7px; } textarea { margin: 0px; width: 295px; height: 200px; border-radius: 5px; } p { margin: 2px 0px; padding: 8px; border-radius: 10px; border: 2px solid silver; }"))
rtn.append(CAT(DIV("Enter the measurements to be averaged below...", BR(), TEXTAREA(txtfile if ('txtfile' in locals()) else demo_data, _name="txtfile"), _style="float:left;"), DIV(BR(), "Which datapoint to test?", BR(), SELECT(OPTION("Minimum Outlier", _value="minimum"), OPTION("Maximum Outlier", _value="maximum"), _name="otype"), *[BR()]*2, INPUT(_type="submit", _value="Upload"), ", or, just Upload to run the demonstration.", _style="float:left; margin: 5px;"), DIV(_style="float:none;clear:both;")))

#rtn.append(otype)
if ('data' in locals()):
    CIs = (50, 80, 90, 95, 99, 99.5, 99.9,)
    Gtests = [ Gtest(data, otype, ci) for ci in CIs ]
    g = Gtests[-1]
    p = P()
    #p.append(CAT(str(g), BR()))
    p.append(CAT(B(f'You may reject the datapoint ({g[7]}) with {"only" if (g[5] < 50.) else ""} {g[5]:.2f}% confidence, or:', BR(), _style="font-size:18pt; font-weight:bold;")))
    for i, (ci, g) in enumerate(zip(CIs, Gtests), 1):
        #p.append(CAT(XML(g), BR()*2,))
        if g[6]:
            p.append(SPAN(XML(f'<b>at {ci}% confidence</b> the datapoint <b>must be accepted</b> with x&#772; = {g[1]:.4g} and &sigma; = {g[2]:.4g}.<br/>'), _style="font-size:14pt;"))
        else:
            p.append(SPAN(XML(f'<b>at {ci}% confidence</b> the datapoint <b>may be rejected</b> with x&#772; = {g[1]:.4g} and &sigma; = {g[2]:.4g} if accepted, and x&#772; = {g[8]:.4g} and &sigma; = {g[9]:.4g} if rejected.<br/>'), _style="font-size:14pt;"))
    rtn.append(p)
dv = DIV(H3("Understanding Grubb's Outlier Test"), "The Grubb's test was first proposed by Frank Grubb in 1950.  He proposed that with any one-dimensional dataset, that the smallest or largest outlier may be thrown out of that dataset if it is truly an outlier.  The Grubb's test uses statistical analysis to determine if the outlier should be rejected and disregarded or must be accepted and thus included and maintained within the dataset.", *[BR()]*2, r"It is a three step process.  First, calculate the Grubb's value using the equation: $$g = \frac{\mid outlier - \bar{x} \mid}{ \sigma }$$ where the outlier is either the minimum or maximum datapoint of the dataset to be tested, \(\bar{x}\) and \(\sigma\) is the average or mean and standard deviation of the dataset, respectively.", *[BR()]*2, "Second, look up the Grubb's critical value for the count of datapoints, n, and the confidence or certainty level, CL, required in the below table.  And finally, third, if the above g value is less than or equal to the critical value, then the datapoint must be accepted and included.  If the g value is greater than the critical value, then you may reject and disregard that datapoint.", *[BR()]*2, _style="")
g = grubbs()
gt = g.find('table#grubbs')[0]
gt['_style'] = "margin: auto;"
dv.append(CAT(g.find('style')[0], H3(". .. Grubb's Critical Values .. ."), gt,))
#dv.append(CAT(*[BR()]*4, g))
dv.append(CAT("lecture by Stephen Lukacs, Ph.D., ©2011 - 2023; updated: March 7, 2023.  all data confirmed via ", A("lecture_data_analysis.nb", _href=URL('static', "pdf/lecture_data_analysis8.pdf"), _target="data_analysis"), "."))
rtn.append(dv)