A training set consists of past decisions and their outcomes, organized into a table.
parking "hourly rate" example
A large parking garage operator wishes to set prices that optimize profits. For simplicity, let's say the operator will set the hourly rate set once per zip code per day.
The operator forms a table in which each row corresponds to a given zip code on a given day from the past. Each row contains the following information:
id. a label for the combination of zip code and day
attributes. the zip code and numerical day of week
decision. the hourly rate that was set for that zip code on that day
outcome. the profit taken in that for that zip code on that day
rate_id, zip_code, weekday, hourly_rate, profit
20220308_1, 94103, 1, 22, 8300
20220308_2, 94107, 1, 26, 9370
20220309_1, 94103, 2, 21, 6300
20220309_2, 94107, 2, 24, 5680
Generalizing, a training set is a table in which each row represents a historical event ("sample") about which a decision was made, leading to a measurable outcome.
Each column is one of four types, with an optional prefix for flexible ordering:
sample. exactly one, optionally prefixed by "s:"
attribute. at least one, optionally prefixed by "a:"
decision. exactly one (for now), optionally prefixed by "d:"
outcome. exactly one, optionally prefixed by "o:"
The fields must appear in this order unless prefixes are used.
The sample is the unit—customer, session, or event—about which a decision needs to be made. It should defined as well as possible. For example, if a sample is the click of a checkout button, specify whether it should be included if it results in, say, a rejected or fraudulent credit card transaction.
Attributes are the properties, or features, of each sample about which a decision was made. Attributes must have been known and accessible at the time the decision was made.
The decision is either a numerical (e.g. price) or categorical (e.g. "admit" or "reject") choice made on behalf of each sample.
The outcome is a numerical measure, such as profit or revenue, that resulted from a decision. The outcome should be a "bigger is better" quantity, unlike to churn or latency (although these quantities are permissible if a minus sign is put in front).
1. Enter comma-separated decision input values, ordered the same as the attributes, optionally with a prefixed label, and hit return.
2. Note the suggested optimal decision and your target metric's expected outcome.
3. Optionally record the dash decision ID.
4. View the comparison between the suggested decision and the past outcomes of decisions with similar inputs.
Create a table (.csv) of past decisions and their outcomes (see how), and run the following command: