Relevant, and possibly of interest- *tl:dr- Percentile reinforcement
schedules: **rather than wait for most responses to meet criterion and then
drastically reducing reinforcement frequency by shifting criteria
infrequently, it is better to change criteria frequently to maintain both a
relatively constant reinforcement density and an intermittent one. *
SOURCES
Galbicka, Gregory. Shaping in the 21st Century: Moving Percentile Schedules
into Applied Settings. n.d. (
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1297861/pdf/jaba00010-0182.pdf)
(p740-754)
…also, Fisher, Wayne W., Cathleen C. Piazza, and Henry S. Roane. Handbook
of Applied Behavior Analysis. New York: Guilford Press, 2011. p240ish
OVERVIEW
(Shaping) used to be considered more an art than science. Now: “percentile
schedules” involving rules for when to deliver reinforcers, and these
momentary rules adjust based on recent (local) rates, duration, or types of
responding. (ABA p240)
percentile schedules represent a formalization of the rules of shaping.
(p755)
Percentile schedules, however, do more than automate shaping. In addition,
they make explicit and objective the criteria that define responses as
criterional or noncriterional throughout acquisition and maintenance,
providing explicit prior control over reinforcement density as well as
criterional response probability. Because of this, they provide almost
complete independence from trainer- and subject-related variables. This
allows all subjects to be trained in a specified manner despite changes in
the trainer or the subject, or at different points in the differentiation.
(p740)
(If you shaped by occasionally raising criterion for reinforcement) A plot
of reinforcement density across time would reveal a pattern like a
sawtooth; with each change in the criterion, reinforcement density drops
abruptly, but as behavior gradually changes to include more and more
criterional responses, reinforcement density gradually increases until the
cydle repeats with the next criterion change. This cyclic change in
reinforcement density is more pronounced following extended training (p243)
rather than wait for most responses to meet criterion and then drastically
reducing reinforcement frequency by shifting criteria infrequently, it is
better to change criteria frequently to maintain both a relatively constant
reinforcement density and an intermittent one. Both characteristics
decrease the likelihood of losing control
over responding prior to the acquisition of the terminal response. (p244)
The percentile solution, developed and expanded by Platt (1973) and
colleagues, is momentarily to abandon
the exact physical characteristics of the response and treat it as an
ordinal quantity. Ordinal quantities
are values that carry only an associated rank, as opposed to the more
typical means of quantifying
observations by assigning a cardinal number and a standard unit. (p244)
NITTY GRITTY OF HOW TO DO IT
m previous observations create m + 1 intervals, one of which must contain
the next observation. The counterintuitive notion that intervals of
different sizes are equally likely to contain the next observation arises
because the line represents a cardinal scale, but the question of which
interval will contain the next observation relates to the ordinal
properties of the observations. For the moment, ignore the fact that there
are physical values attached to any of these observations, and treat them
solely in terms of their ranks. In any distribution of values, there is one
and only one value ranked 1st, 2nd, 3rd, and so forth. The question of
interest is not “What is the expected value of the next observation (i.e.,
what distance will next be run)?” but rather is “Where will the next
observation rank?” If the assumption of independence is met, it will be as
likely to rank first or last or anywhere in between, depending on the
number of prior observations. (p745)
Hence, the probability that the next observation will fall into any one of
k intervals defined by m observations is k times the probability of falling
into each interval, or k/(m+ 1). …establishing a criterion at the kth
rank. That is, rather than setting the criterion (for reinforcement) at a
particular fixed, physical value, the criterion can specify that the next
observation, to meet criterion, must rank higher than the value currently
ranked k. When k = 1, responses will be considered criterional if they
exceed the response currently ranked 1st (lowest)… The probability of a
criterional response (denoted w) is ….w = 1 - [k/(m + 1)] . …. Thus, as the
criterion is made more stringent (i.e., as k is increased), the probability
of observing a criterional response decreases accordingly, as intuition
would suggest. (p746)
(If you know you want w to be a set percentage of reinforcement, you can
rewrite equation to find k.) (p746)
Instead of comparing current response to all previous responses (increasing
m by one each iteration), use only the most recent responses to compare to.
For example, only use the past 5 responses. (p747)
For ties- when current response is tied with response it must exceed: “The
simplest solution is to select ties with a random probability equal to w
and call them criterional.” (p749)
Percentile schedules appear to meet all the requirements for a viable
procedure to formalize shaping except the last-they do not specify a
terminal response. The criterion is never specified as an absolute; rather,
it is described only in relative fashion (i.e., exceed the kth
rank)…There is only one terminal response of all shaping-to do better on
the next trial than on previous trials. This is what percentile schedules
program, where “better” is defined as exceeding the kth rank and “previous
trials” is given by the most recent m observations. Because criteria are
evaluated relative to ongoing behavior, there is never a need to stop
shaping (p750)
Although sequential dependencies (e.g. responses such as 1, 2, 3, 4, 1, 2,
3, 4, 1…etc) diminish the ability of percentile schedules to control
criterional response probability, their effects can be minimized by
increasing the comparison distribution size. (p753)
The other “limitation,” that responding be ordinally rankable, could
actually aid application of
percentile schedules…To illustrate, suppose we wish to train a
developmentally disabled client to drink fluid though a straw. Prior
observation of the behavior leads the shaper to suggest that the following
five behaviors
might be involved: (1) holds glass, (2) directs glass toward mouth, (3)
holds straw with other hand, (4) directs straw into mouth, and (5) sucks on
straw. These five behaviors can easily be ranked 1 to 5, with 1 being
furthest from the terminal response and 5 being dosest. A percentile
schedule could be
imposed by recording the response value (i.e., 1 through 5) on each trial.
Whether our conception of the response matches the subject’s will be
evident in the relative frequency of each of the different rankings. (So
steps can be added or taken away depending on the subject’s responses).
(p754)
On Saturday, October 12, 2013 9:22:20 AM UTC-4, Michael J.J. Tiffany wrote:
+1 to using a random reinforcement schedule.
On reenforcement probability: the state of the art in operant conditioning
is probably still in dog training. Can we learn anything from dog trainers?
We don’t see a huge amount of scientific rigor, but we do see strong
selection pressure among the population (people who stay in dog training
are the ones who produce good output – trained dogs – in the least amount
of time, else they lose to those who do). Polling from this population as
well as I can, I’ve derived a consensus figure of just 1/5 for
reinforcement of basic behaviors (e.g. sitting on command).
On reward latency: hacking some deep brain structures can work on
surprisingly long timescales. I don’t really believe in deep brain
structures, but some stimuli (e.g., “this food made me feel poisoned!”) are
more potent than others. Recall the long-ago work on induced food aversions
in rats and dogs with radiation coming hours after the fact! (see the
Taste Aversions part of
Classical Conditioning: Examples and How It Works for
an easy overview). That insight is not immediately actionable in your first
use case, but it’s worth keeping in mind for future experimentation, I
think.
I mean, an entire generation of intellectuals was conditioned to enjoy
avant-garde theater, which I think can only be explained by the sex they
must have been having afterward.
Cheers,
Michael Tiffany
On Fri, Oct 11, 2013 at 11:15 PM, Lincoln Quirk <linc...@gmail.com<javascript:>
wrote:
Okie. I mocked this up with SMS and IFTTT and Google Spreadsheets:
- Created a spreadsheet with 3 columns: “Name”, “Done?” and “Date
Completed”
- Attached the below script to the “onEdit” trigger of the spreadsheet.
(This is currently tricky but if Google approves my attempt to publish the
script then maybe it’s easy? Let me know if you can’t figure it out.)
- Setup an IFTTT trigger to send me an SMS to “get yourself a candy” for
incoming email with #todocomplete in the subject.
- Put some dark M&Ms in my fridge.
- Added to-dos to the list, and checked some off by typing a ‘y’ in the
Done column.
I’ll update you all in a few days and let you know the result. I’m
already feeling positive anticipation about checking things off the list,
so I’m optimistic 
Here’s the script.
function onEdit(e)
{
if (e.range.getColumn() == 2 && !e.range.isBlank())
{
// Mark in column 2 the current date
var r2 = e.range.offset(0, 1);
r2.setValue(new Date());
// With some probability, send the todocomplete IFTTT trigger
var r0 = e.range.offset(0, -1);
if (Math.random() < 0.5)
{
Logger.log("Sending reward");
MailApp.sendEmail("tri...@ifttt.com <javascript:>", "#todocomplete
" + r0.getValue(), “”);
}
else
{
Logger.log(“Unlucky, no reward”);
}
}
}
On Fri, Oct 11, 2013 at 9:17 PM, Brent Yorgey <byo...@gmail.com<javascript:>
wrote:
I don’t know about the optimal rate, but whatever it is, I would use a
Poisson distribution — that way, very occasionally you will get multiple
jellybeans!
-Brent
On Fri, Oct 11, 2013 at 9:08 PM, Daniel Reeves <dre...@beeminder.com<javascript:>
wrote:
Thanks Jake and Paul! Does anyone know a concise recommendation for a
randomized reward schedule? (Eg, when the desired behavior is
exhibited, dispense a jellybean with probability 1/3. Or maybe it
shouldn’t be stateless like that and should, say, limit dry spells of
jellybeanlessness.)
Perhaps the answer is buried in here:
Reinforcement - Wikipedia
On Fri, Oct 11, 2013 at 5:08 PM, Paul Fenwick <paul.j....@gmail.com<javascript:>>
wrote:
Anyway, anyone tried anything like this? Any results, positive or
negative?
My exobrain¹ gives me HabitRPG XP for responding to email, flossing my
teeth (via a beeminder callback), recordinging my weight, and other
bits and pieces. However the novelty value for that wore off pretty
quickly. Jellybeans might indeed work better, as they’re less abstract
and more delicious. 
¹ GitHub - pjf/exobrain: Automate your life with Exobrain
–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to akratics+u...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
–
http://dreev.es – search://“Daniel Reeves”
Goal tracking + Commitment contracts == http://beeminder.com
–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+u...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+u...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+u...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.