net.sourceforge.openforecast
Class DataSet

java.lang.Object
  extended by java.util.AbstractCollection<DataPoint>
      extended by net.sourceforge.openforecast.DataSet
All Implemented Interfaces:
Iterable<DataPoint>, Collection<DataPoint>

public class DataSet
extends AbstractCollection<DataPoint>

Represents a collection of data points. Data points are either observations of past data (including both the values of the independent variables and the observed value of the dependent variable), or forecasts or estimates of the dependent variable (for a given set of independent variable values).

Generally when trying to forecast future values you'll use two data sets. The first data set contains all of the observations, or historical data. This data set is used to help initialize the selected forecasting model, the details of which depend on the specific forecasting model. A second data set is then created and initialized with data points describing the values of the independent variables that are to be used to predict or forecast values of the dependent variable.

When defining any data set it is important to provide as much information as possible about the data. While on the surface it may seem trivial, the more information you can provide about a data set (such as whether it is a time-based series, the name of the independent variable representing time, the number of data points/periods in a year), the better the forecasting model will be able to model the data. This is because some models need this type of data to even be applicable.

Author:
Steven R. Gould

Constructor Summary
DataSet()
          Constructs a new empty data set.
DataSet(DataSet dataSet)
          Copy constructor: constructs a new data set object by copying the given data set.
DataSet(String timeVariable, int periodsPerYear, Collection<DataPoint> c)
          Constructs a new time-based data set with the named time variable, the given number of data points in a year, and the given Collection of data points.
 
Method Summary
 boolean add(DataPoint obj)
          Adds the given data point object to this data set.
 boolean addAll(Collection<? extends DataPoint> c)
          Adds a collection of data points to this data set.
 void clear()
          Removes all of the data points from this data set.
 boolean contains(Object obj)
          Returns true if this data set contains the given data point object; or false otherwise.
 boolean containsAll(Collection<?> c)
          Returns true if this DataSet contains all of the DataPoints in the specified collection.
 boolean equals(DataSet dataSet)
          Indicates whether some other DataSet is "equal to" this one.
 boolean equals(Object obj)
          Indicates whether some other object, obj, is "equal to" this one.
 String[] getIndependentVariables()
          Returns an ordered array of all independent variable names used in this data set.
 int getPeriodsPerYear()
          Returns the number of periods - or data points - in a years worth of data for time-series data.
 String getTimeVariable()
          Returns the time variable associated with this data set, or null if no time variable has been defined.
 int hashCode()
          Returns the hash code value for this collection, based on the underlying Collection of DataPoints.
 boolean isEmpty()
          Returns true if this data set contains no data points.
 Iterator<DataPoint> iterator()
          Returns an iterator over the data points in this data set.
 boolean remove(Object obj)
          Removes a single instance of the specified data point object from this data set, if it is present.
 boolean removeAll(Collection<?> c)
          Not currently implemented - always throws UnsupportedOperationException.
 boolean retainAll(Collection<?> c)
          Not currently implemented - always throws UnsupportedOperationException.
 void setPeriodsPerYear(int periodsPerYear)
          Sets the number of periods - or data points - in a years worth of data for time-series data.
 void setTimeVariable(String timeVariable)
          Sets the name of the time variable for this data set.
 int size()
          Returns the number of data points in this data set.
 void sort(String independentVariable)
          Sorts the given data set according to increasing value of the named independent variable.
 String toString()
          Overrides the default toString method.
 
Methods inherited from class java.util.AbstractCollection
toArray, toArray
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

DataSet

public DataSet()
Constructs a new empty data set.


DataSet

public DataSet(DataSet dataSet)
Copy constructor: constructs a new data set object by copying the given data set.

Parameters:
dataSet - the data set to copy from to initialize the new data set.

DataSet

public DataSet(String timeVariable,
               int periodsPerYear,
               Collection<DataPoint> c)
Constructs a new time-based data set with the named time variable, the given number of data points in a year, and the given Collection of data points. This is equivalent to using the default constructor, then calling setTimeVariable, setPeriodsPerYear and addAll to initialize it.

Parameters:
timeVariable - the name of the independent variable representing time.
periodsPerYear - the number of periods - data points - in one years worth of data.
c - a Collection of data points to initialize this data set with.
See Also:
setTimeVariable(java.lang.String), setPeriodsPerYear(int), addAll(java.util.Collection)
Method Detail

add

public boolean add(DataPoint obj)
Adds the given data point object to this data set.

Specified by:
add in interface Collection<DataPoint>
Overrides:
add in class AbstractCollection<DataPoint>
Parameters:
obj - the data point object to add to this set.
Returns:
true if this collection changed as a result of the call. This is consistent with the add method in the java.util.Collection class.
Throws:
ClassCastException - if the specified object does not implement the DataPoint interface.
NullPointerException - if the specified collection contains one or more null elements.

addAll

public boolean addAll(Collection<? extends DataPoint> c)
Adds a collection of data points to this data set. All elements of the collection must be DataPoints.

Specified by:
addAll in interface Collection<DataPoint>
Overrides:
addAll in class AbstractCollection<DataPoint>
Parameters:
c - a collection of data points to add to this data set.
Returns:
true if this collection changed as a result of the call. This is consistent with the add method in the java.util.Collection class.

clear

public void clear()
Removes all of the data points from this data set. This data set will be empty after this method returns unless it throws an exception.

Specified by:
clear in interface Collection<DataPoint>
Overrides:
clear in class AbstractCollection<DataPoint>

isEmpty

public boolean isEmpty()
Returns true if this data set contains no data points. Otherwise returns false.

Specified by:
isEmpty in interface Collection<DataPoint>
Overrides:
isEmpty in class AbstractCollection<DataPoint>
Returns:
true if this data set is empty.

contains

public boolean contains(Object obj)
Returns true if this data set contains the given data point object; or false otherwise. This data set is said to contain the given data point iff dataPoint.equals(dp) returns true for some DataPoint object, dp, within the set of data points.

Specified by:
contains in interface Collection<DataPoint>
Overrides:
contains in class AbstractCollection<DataPoint>
Parameters:
obj - the data point object to search for in this data set.
Returns:
true if this data set contains dataPoint.
Throws:
ClassCastException - if the specified object does not implement the DataPoint interface.
NullPointerException - if the specified collection contains one or more null elements.

containsAll

public boolean containsAll(Collection<?> c)
                    throws ClassCastException,
                           NullPointerException
Returns true if this DataSet contains all of the DataPoints in the specified collection.

Specified by:
containsAll in interface Collection<DataPoint>
Overrides:
containsAll in class AbstractCollection<DataPoint>
Parameters:
c - collection to be checked for containment in this collection.
Returns:
true if this DataSet contains all of the DataPoints in the specified collection.
Throws:
ClassCastException - if the types of one or more elements in the specified collection do not implement the DataPoint interface.
NullPointerException - if the specified collection contains one or more null elements.

remove

public boolean remove(Object obj)
Removes a single instance of the specified data point object from this data set, if it is present. Returns true if this collection contained the specified element (or equivalently, if this collection changed as a result of the call).

Specified by:
remove in interface Collection<DataPoint>
Overrides:
remove in class AbstractCollection<DataPoint>
Parameters:
obj - the data point object to remove from this data set.
Returns:
true if this collection changed as a result of the call. This is consistent with the add method in the java.util.Collection class.
Throws:
ClassCastException - if the specified object does not implement the DataPoint interface.
NullPointerException - if the specified collection contains one or more null elements.

size

public int size()
Returns the number of data points in this data set. If this data contains more than Integer.MAX_VALUE elements, returns Integer.MAX_VALUE.

Specified by:
size in interface Collection<DataPoint>
Specified by:
size in class AbstractCollection<DataPoint>
Returns:
the number of data points in this data set.

iterator

public Iterator<DataPoint> iterator()
Returns an iterator over the data points in this data set. There are no guarantees concerning the order in which the elements are returned.

Specified by:
iterator in interface Iterable<DataPoint>
Specified by:
iterator in interface Collection<DataPoint>
Specified by:
iterator in class AbstractCollection<DataPoint>
Returns:
an iterator over the points in this data set.

sort

public void sort(String independentVariable)
Sorts the given data set according to increasing value of the named independent variable. The initial implementation of this sort method appears a little cumbersome - it may be more efficient later to implement a small quicksort routine here instead.

Parameters:
independentVariable - the name of the independent variable to set by. The resulting data set will be sorted in increasing value of this variable.

getIndependentVariables

public String[] getIndependentVariables()
Returns an ordered array of all independent variable names used in this data set. The array is guaranteed not to contain duplicate names.

Returns:
a sorted array of unique independent variable names for this data set.

setTimeVariable

public void setTimeVariable(String timeVariable)
Sets the name of the time variable for this data set. If this is not set, then the data set will be treated as being non time-based. In addition to setting the time variable for time series data, it is strongly recommended that you also initialize the number of periods per year with a call to setPeriodsPerYear.

Parameters:
timeVariable - the name of the independent variable that represents the time data component. For example, this may be something like "t", "month", "period", "year", and so on.
See Also:
setPeriodsPerYear(int)

getTimeVariable

public String getTimeVariable()
Returns the time variable associated with this data set, or null if no time variable has been defined.

Returns:
the time variable associated with this data set.

setPeriodsPerYear

public void setPeriodsPerYear(int periodsPerYear)
Sets the number of periods - or data points - in a years worth of data for time-series data. If this is not set, then no seasonality effects will be considered when forecasting using this data set.

In addition to setting the number of periods per year, you must also set the time variable otherwise any forecasting model will not be able to consider the potential effects of seasonality.

Parameters:
periodsPerYear - the number of periods in a years worth of data.
See Also:
setTimeVariable(java.lang.String)

getPeriodsPerYear

public int getPeriodsPerYear()
Returns the number of periods - or data points - in a years worth of data for time-series data. If this has not been set, then a value of 0 will be returned.

Returns:
the number of periods in a years worth of data.

removeAll

public boolean removeAll(Collection<?> c)
                  throws UnsupportedOperationException
Not currently implemented - always throws UnsupportedOperationException. Removes all this DataSet's elements that are also contained in the specified collection of DataPoint objects. After this call returns, this DataSet will contain no elements in common with the specified collection.

Specified by:
removeAll in interface Collection<DataPoint>
Overrides:
removeAll in class AbstractCollection<DataPoint>
Parameters:
c - DataPoint objects to be removed from this collection.
Returns:
true if this DataSet changed as a result of the call.
Throws:
UnsupportedOperationException - if the removeAll method is not supported by this collection.
ClassCastException - if the types of one or more elements in the specified DataSet are not DataPoint objects.
NullPointerException - if the specified collection contains one or more null elements.

retainAll

public boolean retainAll(Collection<?> c)
                  throws UnsupportedOperationException
Not currently implemented - always throws UnsupportedOperationException. Retains only the elements in this collection that are contained in the specified collection (optional operation). In other words, removes from this collection all of its elements that are not contained in the specified collection.

Specified by:
retainAll in interface Collection<DataPoint>
Overrides:
retainAll in class AbstractCollection<DataPoint>
Parameters:
c - elements to be retained in this collection.
Returns:
true if this collection changed as a result of the call.
Throws:
UnsupportedOperationException - if the retainAll method is not supported by this collection.
ClassCastException - if the types of one or more elements in the specified DataSet are not DataPoint objects.
NullPointerException - if the specified collection contains one or more null elements.

hashCode

public int hashCode()
Returns the hash code value for this collection, based on the underlying Collection of DataPoints.

Specified by:
hashCode in interface Collection<DataPoint>
Overrides:
hashCode in class Object
Returns:
the hash code value for this collection.

equals

public boolean equals(Object obj)
Indicates whether some other object, obj, is "equal to" this one. Returns true if the Object, obj, represents another DataSet for which equals(DataSet) returns true; otherwise false.

Specified by:
equals in interface Collection<DataPoint>
Overrides:
equals in class Object
Parameters:
obj - the reference object with which to compare.
Returns:
true if this object is the same as the obj argument; false otherwise.
See Also:
equals(DataSet)

equals

public boolean equals(DataSet dataSet)
Indicates whether some other DataSet is "equal to" this one. Returns true if the DataSet, dataSet, represents another DataSet containing exactly the same data points as this DataSet. Note that neither the DataPoint objects, or the DataSet objects have to refer to the same instance. They just must refer to a collection of DataPoints with the same values for the independent and dependent variables.

Parameters:
dataSet - the reference object with which to compare.
Returns:
true if this object is the same as the dataSet argument; false otherwise.

toString

public String toString()
Overrides the default toString method. Lists all data points in this data set. Note that if there are a large number of data points in this data set, then the String returned could be very long.

Overrides:
toString in class AbstractCollection<DataPoint>
Returns:
a string representation of this data set.


OpenForecast, Copyright (c) Steven Gould, 2002-2011