Accept or Reject an Outlier?
Stephen Lukacs (2) iquanta.org/instruct/python
"""
reference: https://iquanta.org/instruct/python ::: Statistics 3: Grubb's Outlier Test ::: Stephen Lukacs, Ph.D. ©2023-02-14
"""
from py4web import URL, request
from yatl.helpers import *
from iquanta.mcp import is_str_float, is_str_int, str_to_float, str_to_int, extra_x
from iquanta.chmpy import Gtest, grubbs
BR, B = TAG['br/'], TAG['b']
#demo_data = "0.1190\n0.09847\n0.09852"
demo_data = "6.18\n6.28\n4.85\n6.49"
rtn = FORM(_action=None, _method="post")
if ('txtfile' in request.forms):
txt, data = request.forms['txtfile'], [ ]
for l in txt.strip().split('\n'):
if is_str_float(l.strip()):
data.append(str_to_float(l.strip()))
#rtn.append(CAT(data, BR()))
otype = request.forms.get('otype')
rtn.append(STYLE("input[type=text] { width: 70px; text-align: center; border-radius: 7px; } textarea { margin: 0px; width: 295px; height: 200px; border-radius: 5px; } p { margin: 2px 0px; padding: 8px; border-radius: 10px; border: 2px solid silver; }"))
rtn.append(CAT(DIV("Enter the measurements to be averaged below...", BR(), TEXTAREA(txtfile if ('txtfile' in locals()) else demo_data, _name="txtfile"), _style="float:left;"), DIV(BR(), "Which datapoint to test?", BR(), SELECT(OPTION("Minimum Outlier", _value="minimum"), OPTION("Maximum Outlier", _value="maximum"), _name="otype"), *[BR()]*2, INPUT(_type="submit", _value="Upload"), ", or, just Upload to run the demonstration.", _style="float:left; margin: 5px;"), DIV(_style="float:none;clear:both;")))
#rtn.append(otype)
if ('data' in locals()):
CIs = (50, 80, 90, 95, 99, 99.5, 99.9,)
Gtests = [ Gtest(data, otype, ci) for ci in CIs ]
g = Gtests[-1]
p = P()
#p.append(CAT(str(g), BR()))
p.append(CAT(B(f'You may reject the datapoint ({g[7]}) with {"only" if (g[5] < 50.) else ""} {g[5]:.2f}% confidence, or:', BR(), _style="font-size:18pt; font-weight:bold;")))
for i, (ci, g) in enumerate(zip(CIs, Gtests), 1):
#p.append(CAT(XML(g), BR()*2,))
if g[6]:
p.append(SPAN(XML(f'<b>at {ci}% confidence</b> the datapoint <b>must be accepted</b> with x̄ = {g[1]:.4g} and σ = {g[2]:.4g}.<br/>'), _style="font-size:14pt;"))
else:
p.append(SPAN(XML(f'<b>at {ci}% confidence</b> the datapoint <b>may be rejected</b> with x̄ = {g[1]:.4g} and σ = {g[2]:.4g} if accepted, and x̄ = {g[8]:.4g} and σ = {g[9]:.4g} if rejected.<br/>'), _style="font-size:14pt;"))
rtn.append(p)
dv = DIV(H3("Understanding Grubb's Outlier Test"), "The Grubb's test was first proposed by Frank Grubb in 1950. He proposed that with any one-dimensional dataset, that the smallest or largest outlier may be thrown out of that dataset if it is truly an outlier. The Grubb's test uses statistical analysis to determine if the outlier should be rejected and disregarded or must be accepted and thus included and maintained within the dataset.", *[BR()]*2, r"It is a three step process. First, calculate the Grubb's value using the equation: $$g = \frac{\mid outlier - \bar{x} \mid}{ \sigma }$$ where the outlier is either the minimum or maximum datapoint of the dataset to be tested, \(\bar{x}\) and \(\sigma\) is the average or mean and standard deviation of the dataset, respectively.", *[BR()]*2, "Second, look up the Grubb's critical value for the count of datapoints, n, and the confidence or certainty level, CL, required in the below table. And finally, third, if the above g value is less than or equal to the critical value, then the datapoint must be accepted and included. If the g value is greater than the critical value, then you may reject and disregard that datapoint.", *[BR()]*2, _style="")
g = grubbs()
gt = g.find('table#grubbs')[0]
gt['_style'] = "margin: auto;"
dv.append(CAT(g.find('style')[0], H3(". .. Grubb's Critical Values .. ."), gt,))
#dv.append(CAT(*[BR()]*4, g))
dv.append(CAT("lecture by Stephen Lukacs, Ph.D., ©2011 - 2023; updated: March 7, 2023. all data confirmed via ", A("lecture_data_analysis.nb", _href=URL('static', "pdf/lecture_data_analysis8.pdf"), _target="data_analysis"), "."))
rtn.append(dv)