UncertaintyBase

class stats_arrays.UncertaintyBase

Bases: object

Abstract base class for uncertainty types.

All methods on uncertainty classes should be class methods, as instantiating uncertainty classes many times is not desired.

Defaults

  • default_number_points_in_pdf: 200. The default number of points to calculate for PDF/CDF functions.
  • standard_deviations_in_default_range: 3. The number of standard deviations that define the default range when calculating PDF/CDF values. In a normal distribution, 3 standard deviations is approximately 99% of all values.
classmethod bounded_random_variables(params, size, seeded_random=None, maximum_iterations=50)

Generate random variables repeatedly until all varaibles are within the bounds of each distribution. Raise MaximumIterationsError if this takes more that maximum_iterations. Uses random_variables for random number generation.

Inputs

  • params : A Parameter array.
  • size : Integer. The number of values to draw from each distribution in params.
  • seeded_random : Integer. Optional. Random seed to get repeatable samples.
  • maximum_iterations : Integer. Optional. Maximum iterations to try to fit the given bounds before an error is raised.

Output

An array of random values, with dimensions params rows by size.

classmethod cdf(params, vector)

Used when a distribution is bounded, to determine where to begin or end the percentages used in calculating hypercube sampling space.

Inputs

  • params : A Parameter array.
  • vector : A array of values taken from the uncertainty distributions, with one row or the same number of rows as params.

Output

An array of cumulative densities, bounded on (0,1), with params rows and vector columns.

classmethod check_2d_inputs(params, vector)

Convert vector to 2 dimensions if not already, and raise stats_arrays.InvalidParamsError if vector and params dimensions don’t match.

classmethod check_bounds_reasonableness(params, *args, **kwargs)

Test if there is at least a threshold percent chance of generating random numbers within the provided bounds.

Doesn’t return anything. Raises stats_arrays.UnreasonableBoundsError if this condition is not met.

Inputs

  • params : A one-row Parameter array.
  • threshold : A percentage between 0 and 1. The minimum loc of the distribution covered by the bounds before an error is raised.
classmethod from_dicts(*dicts)

Construct a Heterogeneous parameter array from parameter dictionaries.

Dictionary keys are the normal parameter array columns. Each distribution defines which columns are required and which are optional.

Example:

>>> from stats_arrays import UncertaintyBase
>>> import numpy as np
>>> UncertaintyBase.from_dicts(
...     {'loc': 2, 'scale': 3, 'uncertainty_type': 3},
...     {'loc': 5, 'minimum': 3, 'maximum': 10, 'uncertainty_type': 5}
...     )
array([(2.0, 3.0, nan, nan, nan, False, 3),
       (5.0, nan, nan, 3.0, 10.0, False, 5)],
       dtype=[('loc', '<f8'), ('scale', '<f8'), ('shape', '<f8'),
              ('minimum', '<f8'), ('maximum', '<f8'), ('negative', '?'),
              ('uncertainty_type', 'u1')])
Args:
One of more dictionaries.
Returns:
A Heterogeneous parameter array
classmethod from_tuples(*data)

Construct a Heterogeneous parameter array from parameter tuples.

The order of the parameters is:

  1. loc
  2. scale
  3. shape
  4. minimum
  5. maximum
  6. negative
  7. uncertainty_type

Each input tuple must have a length of exactly 7. For more flexibility, use from_dicts.

Example:

>>> from stats_arrays import UncertaintyBase
>>> import numpy as np
>>> UncertaintyBase.from_tuples(
...     (2, 3, np.NaN, np.NaN, np.NaN, False, 3),
...     (5, np.NaN, np.NaN, 3, 10, False, 5)
...     )
array([(2.0, 3.0, nan, nan, nan, False, 3),
       (5.0, nan, nan, 3.0, 10.0, False, 5)],
       dtype=[('loc', '<f8'), ('scale', '<f8'), ('shape', '<f8'),
              ('minimum', '<f8'), ('maximum', '<f8'), ('negative', '?'),
              ('uncertainty_type', 'u1')])
Args:
One of more tuples of length 7.
Returns:
A Heterogeneous parameter array
classmethod pdf(params, *args, **kwargs)

Provide a standard interface to calculate the probability distribution function of a uncertainty distribution. Default is cls.default_number_points_in_pdf points between min to max range if bounds are present, or cls.standard_deviations_in_default_range standard distributions.

Inputs

  • params : A one-row Parameter array.
  • xs : Optional. A one-dimensional numpy array of input values.

Output

Important

The output format for PDF is different than CDF or PPF.

A tuple of a vactor x values and a vector of y values. Y values are a one-dimensional array of probability densities, bounded on (0,1), with length xs, if provided, or cls.default_number_points_in_pdf.

classmethod ppf(params, percentages)

Return percent point function (inverse of CDF, e.g. value in distribution where x percent of the distribution is less than value) for various distributions.

Inputs

  • params : A Parameter array.
  • percentages : An array of percentages, bounded on (0,1). Each row in percentages corresponds to a row in params.

Output

An array of values within the ranges of each distribtion, with params rows and percentages columns.

classmethod random_variables(params, size, seeded_random=None)

Generate random variables for the given uncertainty. Should not check to ensure that random samples are with the (minimum, maximum bounds). Bounds checking is provided by the bounded_random_variables class method.

Inputs

  • params : A Parameter array.
  • size : Integer. The number of values to draw from each distribution in params.
  • seeded_random : Integer. Optional.

Output

An array of random values, with dimensions params rows by size.

classmethod statistics(params, *args, **kwargs)

Build a dictionary of mean, mode, median, and 95% confidence interval upper and lower values.

Inputs

Output

{‘mean’: mean value, ‘mode’: mode value, ‘median’: median value, ‘upper’: upper limit value, ‘lower’: lower limit value}. All values should be floats (not single-element arrays). Parameters that are not defined should be returned None, not omitted.

classmethod validate(params)

Validate the parameter array for uncertainty distribution.

Validation is distribution specific. The only default check is that minimum is less than or equal to maximum, and otherwise raises stats_arrays.ImproperBoundsError.

Doesn’t return anything.

Args:
A Parameter array.

BoundedUncertaintyBase

class stats_arrays.BoundedUncertaintyBase

Bases: stats_arrays.distributions.base.UncertaintyBase

An uncertainty distribution where minimum and maximum bounds are required. No bounds checking is required for these distributions, as bounds are integral inputs into the sample space generator.

classmethod bounded_random_variables(params, size, seeded_random=None, maximum_iterations=None)

No bounds checking because the bounds do not exclude any of the distribution.

cdf(params, vector)

Used when a distribution is bounded, to determine where to begin or end the percentages used in calculating hypercube sampling space.

Inputs

  • params : A Parameter array.
  • vector : A array of values taken from the uncertainty distributions, with one row or the same number of rows as params.

Output

An array of cumulative densities, bounded on (0,1), with params rows and vector columns.

check_2d_inputs(params, vector)

Convert vector to 2 dimensions if not already, and raise stats_arrays.InvalidParamsError if vector and params dimensions don’t match.

classmethod check_bounds_reasonableness(params, *args, **kwargs)

Always true because the bounds do not exclude any of the distribution.

from_dicts(*dicts)

Construct a Heterogeneous parameter array from parameter dictionaries.

Dictionary keys are the normal parameter array columns. Each distribution defines which columns are required and which are optional.

Example:

>>> from stats_arrays import UncertaintyBase
>>> import numpy as np
>>> UncertaintyBase.from_dicts(
...     {'loc': 2, 'scale': 3, 'uncertainty_type': 3},
...     {'loc': 5, 'minimum': 3, 'maximum': 10, 'uncertainty_type': 5}
...     )
array([(2.0, 3.0, nan, nan, nan, False, 3),
       (5.0, nan, nan, 3.0, 10.0, False, 5)],
       dtype=[('loc', '<f8'), ('scale', '<f8'), ('shape', '<f8'),
              ('minimum', '<f8'), ('maximum', '<f8'), ('negative', '?'),
              ('uncertainty_type', 'u1')])
Args:
One of more dictionaries.
Returns:
A Heterogeneous parameter array
from_tuples(*data)

Construct a Heterogeneous parameter array from parameter tuples.

The order of the parameters is:

  1. loc
  2. scale
  3. shape
  4. minimum
  5. maximum
  6. negative
  7. uncertainty_type

Each input tuple must have a length of exactly 7. For more flexibility, use from_dicts.

Example:

>>> from stats_arrays import UncertaintyBase
>>> import numpy as np
>>> UncertaintyBase.from_tuples(
...     (2, 3, np.NaN, np.NaN, np.NaN, False, 3),
...     (5, np.NaN, np.NaN, 3, 10, False, 5)
...     )
array([(2.0, 3.0, nan, nan, nan, False, 3),
       (5.0, nan, nan, 3.0, 10.0, False, 5)],
       dtype=[('loc', '<f8'), ('scale', '<f8'), ('shape', '<f8'),
              ('minimum', '<f8'), ('maximum', '<f8'), ('negative', '?'),
              ('uncertainty_type', 'u1')])
Args:
One of more tuples of length 7.
Returns:
A Heterogeneous parameter array
pdf(params, *args, **kwargs)

Provide a standard interface to calculate the probability distribution function of a uncertainty distribution. Default is cls.default_number_points_in_pdf points between min to max range if bounds are present, or cls.standard_deviations_in_default_range standard distributions.

Inputs

  • params : A one-row Parameter array.
  • xs : Optional. A one-dimensional numpy array of input values.

Output

Important

The output format for PDF is different than CDF or PPF.

A tuple of a vactor x values and a vector of y values. Y values are a one-dimensional array of probability densities, bounded on (0,1), with length xs, if provided, or cls.default_number_points_in_pdf.

ppf(params, percentages)

Return percent point function (inverse of CDF, e.g. value in distribution where x percent of the distribution is less than value) for various distributions.

Inputs

  • params : A Parameter array.
  • percentages : An array of percentages, bounded on (0,1). Each row in percentages corresponds to a row in params.

Output

An array of values within the ranges of each distribtion, with params rows and percentages columns.

random_variables(params, size, seeded_random=None)

Generate random variables for the given uncertainty. Should not check to ensure that random samples are with the (minimum, maximum bounds). Bounds checking is provided by the bounded_random_variables class method.

Inputs

  • params : A Parameter array.
  • size : Integer. The number of values to draw from each distribution in params.
  • seeded_random : Integer. Optional.

Output

An array of random values, with dimensions params rows by size.

classmethod rescale(params)

Rescale params to a (0,1) interval. Return adjusted_means and scale. Needed because SciPy assumes a (0,1) interval for many distributions.

statistics(params, *args, **kwargs)

Build a dictionary of mean, mode, median, and 95% confidence interval upper and lower values.

Inputs

Output

{‘mean’: mean value, ‘mode’: mode value, ‘median’: median value, ‘upper’: upper limit value, ‘lower’: lower limit value}. All values should be floats (not single-element arrays). Parameters that are not defined should be returned None, not omitted.