Automated positive reinforcement

Lincoln_Quirk · October 11, 2013, 6:42pm

Hello,

I’ve been thinking of a project to automatically reinforce positive
behaviors.

The basic idea is to have a candy dispenser next to my computer, plugged
into the network. When I do something positive – check a bug off in the
bug tracker, complete a to-do item, get to inbox zero, etc. – the candy
dispenser automatically ejects a candy, making a pleasing noise and giving
a small reward.

Anyway, anyone tried anything like this? Any results, positive or negative?

Further thoughts:

Daniel Reeves inspired this with his CFAR testimonial
(http://rationality.org/testimonials/) and so I emailed him about it, and
he suggested two important ideas: 1) that we don’t necessarily need to
build the hardware, if we can just convince our computer/smartphone to make
a pleasant noise; and 2) that we could probably increase the effectiveness
of the system by using intermittent rewards. (He also suggested I join &
email this list which is why I’m here now.)
Another instance of something like this is by Kathryn
McElroy: http://cargocollective.com/kathrynmcelroy/Edible-Email-Notifier -
I have asked her if she had any results but haven’t heard back.
The latency of this system is probably important - it seems like it would
substantially decrease the effectiveness of the reinforcement, if my reward
was delivered more than 5-10 seconds after the event. Which means the
technology stack becomes a bit trickier to implement.

dreev · October 11, 2013, 6:57pm

Clarification: I do think one part of the hardware is necessary,
namely, the actual jellybeans. I have the hypothesis (untested)
that it may suffice to use the honor system. You send or archive the
email or move the trello card or whatever and a pleasant pavlovian
bell chimes and you may take one jellybean. If you counted the total
jellybeans you started with you could set up a separate commitment
device to make sure it matched in the end.

On Fri, Oct 11, 2013 at 11:42 AM, Lincoln Quirk lincolnq@gmail.com wrote:

Hello,

I’ve been thinking of a project to automatically reinforce positive
behaviors.

The basic idea is to have a candy dispenser next to my computer, plugged
into the network. When I do something positive – check a bug off in the bug
tracker, complete a to-do item, get to inbox zero, etc. – the candy
dispenser automatically ejects a candy, making a pleasing noise and giving a
small reward.

Anyway, anyone tried anything like this? Any results, positive or negative?

Further thoughts:

Daniel Reeves inspired this with his CFAR testimonial
(http://rationality.org/testimonials/) and so I emailed him about it, and he
suggested two important ideas: 1) that we don’t necessarily need to build
the hardware, if we can just convince our computer/smartphone to make a
pleasant noise; and 2) that we could probably increase the effectiveness of
the system by using intermittent rewards. (He also suggested I join & email
this list which is why I’m here now.)

Another instance of something like this is by Kathryn McElroy:
http://cargocollective.com/kathrynmcelroy/Edible-Email-Notifier - I have
asked her if she had any results but haven’t heard back.

The latency of this system is probably important - it seems like it would
substantially decrease the effectiveness of the reinforcement, if my reward
was delivered more than 5-10 seconds after the event. Which means the
technology stack becomes a bit trickier to implement.

–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

–
http://dreev.es – search://“Daniel Reeves”
Goal tracking + Commitment contracts == http://beeminder.com

Jake_Hofman · October 11, 2013, 8:07pm

if you do get around to the automated solution, an arduino + solenoid
or actuator is nice solution. i have code to interface that with a
website, so that submitting a webpage form triggers the actuator.
happy to share if it’d be helpful.

for the hardware side:

http://www.instructables.com/id/Controlling-solenoids-with-arduino/?ALLSTEPS
http://bildr.org/2011/03/high-power-control-with-arduino-and-tip120/
http://itp.nyu.edu/physcomp/Tutorials/HighCurrentLoads

On Fri, Oct 11, 2013 at 2:42 PM, Lincoln Quirk lincolnq@gmail.com wrote:

Hello,

I’ve been thinking of a project to automatically reinforce positive
behaviors.

The basic idea is to have a candy dispenser next to my computer, plugged
into the network. When I do something positive – check a bug off in the bug
tracker, complete a to-do item, get to inbox zero, etc. – the candy
dispenser automatically ejects a candy, making a pleasing noise and giving a
small reward.

Anyway, anyone tried anything like this? Any results, positive or negative?

Further thoughts:

Daniel Reeves inspired this with his CFAR testimonial
(http://rationality.org/testimonials/) and so I emailed him about it, and he
suggested two important ideas: 1) that we don’t necessarily need to build
the hardware, if we can just convince our computer/smartphone to make a
pleasant noise; and 2) that we could probably increase the effectiveness of
the system by using intermittent rewards. (He also suggested I join & email
this list which is why I’m here now.)

Another instance of something like this is by Kathryn McElroy:
http://cargocollective.com/kathrynmcelroy/Edible-Email-Notifier - I have
asked her if she had any results but haven’t heard back.

The latency of this system is probably important - it seems like it would
substantially decrease the effectiveness of the reinforcement, if my reward
was delivered more than 5-10 seconds after the event. Which means the
technology stack becomes a bit trickier to implement.

–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul_Fenwick · October 12, 2013, 12:08am

Anyway, anyone tried anything like this? Any results, positive or negative?

My exobrain¹ gives me HabitRPG XP for responding to email, flossing my
teeth (via a beeminder callback), recordinging my weight, and other
bits and pieces. However the novelty value for that wore off pretty
quickly. Jellybeans might indeed work better, as they’re less abstract
and more delicious.

¹ GitHub - pjf/exobrain: Automate your life with Exobrain

dreev · October 12, 2013, 1:08am

Thanks Jake and Paul! Does anyone know a concise recommendation for a
randomized reward schedule? (Eg, when the desired behavior is
exhibited, dispense a jellybean with probability 1/3. Or maybe it
shouldn’t be stateless like that and should, say, limit dry spells of
jellybeanlessness.)

Perhaps the answer is buried in here: Reinforcement - Wikipedia

On Fri, Oct 11, 2013 at 5:08 PM, Paul Fenwick paul.j.fenwick@gmail.com wrote:

Anyway, anyone tried anything like this? Any results, positive or negative?

My exobrain¹ gives me HabitRPG XP for responding to email, flossing my
teeth (via a beeminder callback), recordinging my weight, and other
bits and pieces. However the novelty value for that wore off pretty
quickly. Jellybeans might indeed work better, as they’re less abstract
and more delicious.

¹ GitHub - pjf/exobrain: Automate your life with Exobrain

–
You received this message because you are subscribed to the Google Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

–
http://dreev.es – search://“Daniel Reeves”
Goal tracking + Commitment contracts == http://beeminder.com

byorgey · October 12, 2013, 1:17am

I don’t know about the optimal rate, but whatever it is, I would use a
Poisson distribution — that way, very occasionally you will get multiple
jellybeans!

-Brent

On Fri, Oct 11, 2013 at 9:08 PM, Daniel Reeves dreeves@beeminder.comwrote:

Thanks Jake and Paul! Does anyone know a concise recommendation for a
randomized reward schedule? (Eg, when the desired behavior is
exhibited, dispense a jellybean with probability 1/3. Or maybe it
shouldn’t be stateless like that and should, say, limit dry spells of
jellybeanlessness.)

Perhaps the answer is buried in here:
Reinforcement - Wikipedia

On Fri, Oct 11, 2013 at 5:08 PM, Paul Fenwick paul.j.fenwick@gmail.com
wrote:

Anyway, anyone tried anything like this? Any results, positive or
negative?

My exobrain¹ gives me HabitRPG XP for responding to email, flossing my
teeth (via a beeminder callback), recordinging my weight, and other
bits and pieces. However the novelty value for that wore off pretty
quickly. Jellybeans might indeed work better, as they’re less abstract
and more delicious.

¹ GitHub - pjf/exobrain: Automate your life with Exobrain

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

–
http://dreev.es – search://“Daniel Reeves”
Goal tracking + Commitment contracts == http://beeminder.com

–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Lincoln_Quirk · October 12, 2013, 3:15am

Okie. I mocked this up with SMS and IFTTT and Google Spreadsheets:

Created a spreadsheet with 3 columns: “Name”, “Done?” and “Date
Completed”
Attached the below script to the “onEdit” trigger of the spreadsheet.
(This is currently tricky but if Google approves my attempt to publish the
script then maybe it’s easy? Let me know if you can’t figure it out.)
Setup an IFTTT trigger to send me an SMS to “get yourself a candy” for
incoming email with #todocomplete in the subject.
Put some dark M&Ms in my fridge.
Added to-dos to the list, and checked some off by typing a ‘y’ in the
Done column.

I’ll update you all in a few days and let you know the result. I’m already
feeling positive anticipation about checking things off the list, so I’m
optimistic

Here’s the script.

function onEdit(e)
{

if (e.range.getColumn() == 2 && !e.range.isBlank())
{

// Mark in column 2 the current date
var r2 = e.range.offset(0, 1);
r2.setValue(new Date());

// With some probability, send the todocomplete IFTTT trigger
var r0 = e.range.offset(0, -1);
if (Math.random() < 0.5)
{
  Logger.log("Sending reward");
  MailApp.sendEmail("trigger@ifttt.com", "#todocomplete " +

r0.getValue(), “”);
}
else
{
Logger.log(“Unlucky, no reward”);
}
}
}

On Fri, Oct 11, 2013 at 9:17 PM, Brent Yorgey byorgey@gmail.com wrote:

I don’t know about the optimal rate, but whatever it is, I would use a
Poisson distribution — that way, very occasionally you will get multiple
jellybeans!

-Brent

On Fri, Oct 11, 2013 at 9:08 PM, Daniel Reeves dreeves@beeminder.comwrote:

Thanks Jake and Paul! Does anyone know a concise recommendation for a
randomized reward schedule? (Eg, when the desired behavior is
exhibited, dispense a jellybean with probability 1/3. Or maybe it
shouldn’t be stateless like that and should, say, limit dry spells of
jellybeanlessness.)

Perhaps the answer is buried in here:
Reinforcement - Wikipedia

On Fri, Oct 11, 2013 at 5:08 PM, Paul Fenwick paul.j.fenwick@gmail.com
wrote:

Anyway, anyone tried anything like this? Any results, positive or
negative?

My exobrain¹ gives me HabitRPG XP for responding to email, flossing my
teeth (via a beeminder callback), recordinging my weight, and other
bits and pieces. However the novelty value for that wore off pretty
quickly. Jellybeans might indeed work better, as they’re less abstract
and more delicious.

¹ GitHub - pjf/exobrain: Automate your life with Exobrain

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

–
http://dreev.es – search://“Daniel Reeves”
Goal tracking + Commitment contracts == http://beeminder.com

–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

kubla · October 12, 2013, 1:22pm

+1 to using a random reinforcement schedule.

On reenforcement probability: the state of the art in operant conditioning
is probably still in dog training. Can we learn anything from dog trainers?
We don’t see a huge amount of scientific rigor, but we do see strong
selection pressure among the population (people who stay in dog training
are the ones who produce good output – trained dogs – in the least amount
of time, else they lose to those who do). Polling from this population as
well as I can, I’ve derived a consensus figure of just 1/5 for
reinforcement of basic behaviors (e.g. sitting on command).

On reward latency: hacking some deep brain structures can work on
surprisingly long timescales. I don’t really believe in deep brain
structures, but some stimuli (e.g., “this food made me feel poisoned!”) are
more potent than others. Recall the long-ago work on induced food aversions
in rats and dogs with radiation coming hours after the fact! (see the
Taste Aversions part of
Classical Conditioning: Examples and How It Works for an
easy overview). That insight is not immediately actionable in your first
use case, but it’s worth keeping in mind for future experimentation, I
think.

I mean, an entire generation of intellectuals was conditioned to enjoy
avant-garde theater, which I think can only be explained by the sex they
must have been having afterward.

Cheers,

Michael Tiffany

On Fri, Oct 11, 2013 at 11:15 PM, Lincoln Quirk lincolnq@gmail.com wrote:

Okie. I mocked this up with SMS and IFTTT and Google Spreadsheets:

Created a spreadsheet with 3 columns: “Name”, “Done?” and “Date
Completed”

Attached the below script to the “onEdit” trigger of the spreadsheet.
(This is currently tricky but if Google approves my attempt to publish the
script then maybe it’s easy? Let me know if you can’t figure it out.)

Setup an IFTTT trigger to send me an SMS to “get yourself a candy” for
incoming email with #todocomplete in the subject.

Put some dark M&Ms in my fridge.

Added to-dos to the list, and checked some off by typing a ‘y’ in the
Done column.

I’ll update you all in a few days and let you know the result. I’m already
feeling positive anticipation about checking things off the list, so I’m
optimistic

Here’s the script.

function onEdit(e)
{

if (e.range.getColumn() == 2 && !e.range.isBlank())
{
// Mark in column 2 the current date
var r2 = e.range.offset(0, 1);
r2.setValue(new Date());

// With some probability, send the todocomplete IFTTT trigger
var r0 = e.range.offset(0, -1);
if (Math.random() < 0.5)
{
  Logger.log("Sending reward");
  MailApp.sendEmail("trigger@ifttt.com", "#todocomplete " +
r0.getValue(), “”);
}
else
{
Logger.log(“Unlucky, no reward”);
}
}
}

On Fri, Oct 11, 2013 at 9:17 PM, Brent Yorgey byorgey@gmail.com wrote:

I don’t know about the optimal rate, but whatever it is, I would use a
Poisson distribution — that way, very occasionally you will get multiple
jellybeans!

-Brent

On Fri, Oct 11, 2013 at 9:08 PM, Daniel Reeves dreeves@beeminder.comwrote:

Thanks Jake and Paul! Does anyone know a concise recommendation for a
randomized reward schedule? (Eg, when the desired behavior is
exhibited, dispense a jellybean with probability 1/3. Or maybe it
shouldn’t be stateless like that and should, say, limit dry spells of
jellybeanlessness.)

Perhaps the answer is buried in here:
Reinforcement - Wikipedia

On Fri, Oct 11, 2013 at 5:08 PM, Paul Fenwick paul.j.fenwick@gmail.com
wrote:

Anyway, anyone tried anything like this? Any results, positive or
negative?

My exobrain¹ gives me HabitRPG XP for responding to email, flossing my
teeth (via a beeminder callback), recordinging my weight, and other
bits and pieces. However the novelty value for that wore off pretty
quickly. Jellybeans might indeed work better, as they’re less abstract
and more delicious.

¹ GitHub - pjf/exobrain: Automate your life with Exobrain

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

–
http://dreev.es – search://“Daniel Reeves”
Goal tracking + Commitment contracts == http://beeminder.com

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Erica_Edelman · October 16, 2013, 11:59pm

Relevant, and possibly of interest- *tl:dr- Percentile reinforcement
schedules: **rather than wait for most responses to meet criterion and then
drastically reducing reinforcement frequency by shifting criteria
infrequently, it is better to change criteria frequently to maintain both a
relatively constant reinforcement density and an intermittent one. *

SOURCES

Galbicka, Gregory. Shaping in the 21st Century: Moving Percentile Schedules
into Applied Settings. n.d. (
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1297861/pdf/jaba00010-0182.pdf)
(p740-754)

…also, Fisher, Wayne W., Cathleen C. Piazza, and Henry S. Roane. Handbook
of Applied Behavior Analysis. New York: Guilford Press, 2011. p240ish

OVERVIEW

(Shaping) used to be considered more an art than science. Now: “percentile
schedules” involving rules for when to deliver reinforcers, and these
momentary rules adjust based on recent (local) rates, duration, or types of
responding. (ABA p240)

percentile schedules represent a formalization of the rules of shaping.
(p755)

Percentile schedules, however, do more than automate shaping. In addition,
they make explicit and objective the criteria that define responses as
criterional or noncriterional throughout acquisition and maintenance,
providing explicit prior control over reinforcement density as well as
criterional response probability. Because of this, they provide almost
complete independence from trainer- and subject-related variables. This
allows all subjects to be trained in a specified manner despite changes in
the trainer or the subject, or at different points in the differentiation.
(p740)

(If you shaped by occasionally raising criterion for reinforcement) A plot
of reinforcement density across time would reveal a pattern like a
sawtooth; with each change in the criterion, reinforcement density drops
abruptly, but as behavior gradually changes to include more and more
criterional responses, reinforcement density gradually increases until the
cydle repeats with the next criterion change. This cyclic change in
reinforcement density is more pronounced following extended training (p243)

rather than wait for most responses to meet criterion and then drastically
reducing reinforcement frequency by shifting criteria infrequently, it is
better to change criteria frequently to maintain both a relatively constant
reinforcement density and an intermittent one. Both characteristics
decrease the likelihood of losing control

over responding prior to the acquisition of the terminal response. (p244)

The percentile solution, developed and expanded by Platt (1973) and
colleagues, is momentarily to abandon

the exact physical characteristics of the response and treat it as an
ordinal quantity. Ordinal quantities

are values that carry only an associated rank, as opposed to the more
typical means of quantifying

observations by assigning a cardinal number and a standard unit. (p244)

NITTY GRITTY OF HOW TO DO IT

m previous observations create m + 1 intervals, one of which must contain
the next observation. The counterintuitive notion that intervals of
different sizes are equally likely to contain the next observation arises
because the line represents a cardinal scale, but the question of which
interval will contain the next observation relates to the ordinal
properties of the observations. For the moment, ignore the fact that there
are physical values attached to any of these observations, and treat them
solely in terms of their ranks. In any distribution of values, there is one
and only one value ranked 1st, 2nd, 3rd, and so forth. The question of
interest is not “What is the expected value of the next observation (i.e.,
what distance will next be run)?” but rather is “Where will the next
observation rank?” If the assumption of independence is met, it will be as
likely to rank first or last or anywhere in between, depending on the
number of prior observations. (p745)

Hence, the probability that the next observation will fall into any one of
k intervals defined by m observations is k times the probability of falling
into each interval, or k/(m+ 1). …establishing a criterion at the kth
rank. That is, rather than setting the criterion (for reinforcement) at a
particular fixed, physical value, the criterion can specify that the next
observation, to meet criterion, must rank higher than the value currently
ranked k. When k = 1, responses will be considered criterional if they
exceed the response currently ranked 1st (lowest)… The probability of a
criterional response (denoted w) is ….w = 1 - [k/(m + 1)] . …. Thus, as the
criterion is made more stringent (i.e., as k is increased), the probability
of observing a criterional response decreases accordingly, as intuition
would suggest. (p746)

(If you know you want w to be a set percentage of reinforcement, you can
rewrite equation to find k.) (p746)

Instead of comparing current response to all previous responses (increasing
m by one each iteration), use only the most recent responses to compare to.
For example, only use the past 5 responses. (p747)

For ties- when current response is tied with response it must exceed: “The
simplest solution is to select ties with a random probability equal to w
and call them criterional.” (p749)

Percentile schedules appear to meet all the requirements for a viable
procedure to formalize shaping except the last-they do not specify a
terminal response. The criterion is never specified as an absolute; rather,
it is described only in relative fashion (i.e., exceed the kth
rank)…There is only one terminal response of all shaping-to do better on
the next trial than on previous trials. This is what percentile schedules
program, where “better” is defined as exceeding the kth rank and “previous
trials” is given by the most recent m observations. Because criteria are
evaluated relative to ongoing behavior, there is never a need to stop
shaping (p750)

Although sequential dependencies (e.g. responses such as 1, 2, 3, 4, 1, 2,
3, 4, 1…etc) diminish the ability of percentile schedules to control
criterional response probability, their effects can be minimized by
increasing the comparison distribution size. (p753)

The other “limitation,” that responding be ordinally rankable, could
actually aid application of

percentile schedules…To illustrate, suppose we wish to train a
developmentally disabled client to drink fluid though a straw. Prior
observation of the behavior leads the shaper to suggest that the following
five behaviors

might be involved: (1) holds glass, (2) directs glass toward mouth, (3)
holds straw with other hand, (4) directs straw into mouth, and (5) sucks on
straw. These five behaviors can easily be ranked 1 to 5, with 1 being
furthest from the terminal response and 5 being dosest. A percentile
schedule could be

imposed by recording the response value (i.e., 1 through 5) on each trial.
Whether our conception of the response matches the subject’s will be
evident in the relative frequency of each of the different rankings. (So
steps can be added or taken away depending on the subject’s responses).
(p754)

On Saturday, October 12, 2013 9:22:20 AM UTC-4, Michael J.J. Tiffany wrote:

+1 to using a random reinforcement schedule.

On reenforcement probability: the state of the art in operant conditioning
is probably still in dog training. Can we learn anything from dog trainers?
We don’t see a huge amount of scientific rigor, but we do see strong
selection pressure among the population (people who stay in dog training
are the ones who produce good output – trained dogs – in the least amount
of time, else they lose to those who do). Polling from this population as
well as I can, I’ve derived a consensus figure of just 1/5 for
reinforcement of basic behaviors (e.g. sitting on command).

On reward latency: hacking some deep brain structures can work on
surprisingly long timescales. I don’t really believe in deep brain
structures, but some stimuli (e.g., “this food made me feel poisoned!”) are
more potent than others. Recall the long-ago work on induced food aversions
in rats and dogs with radiation coming hours after the fact! (see the
Taste Aversions part of
Classical Conditioning: Examples and How It Works for
an easy overview). That insight is not immediately actionable in your first
use case, but it’s worth keeping in mind for future experimentation, I
think.

I mean, an entire generation of intellectuals was conditioned to enjoy
avant-garde theater, which I think can only be explained by the sex they
must have been having afterward.

Cheers,

Michael Tiffany

On Fri, Oct 11, 2013 at 11:15 PM, Lincoln Quirk <linc...@gmail.com<javascript:>

wrote:
Okie. I mocked this up with SMS and IFTTT and Google Spreadsheets:

Created a spreadsheet with 3 columns: “Name”, “Done?” and “Date
Completed”

Attached the below script to the “onEdit” trigger of the spreadsheet.
(This is currently tricky but if Google approves my attempt to publish the
script then maybe it’s easy? Let me know if you can’t figure it out.)

Setup an IFTTT trigger to send me an SMS to “get yourself a candy” for
incoming email with #todocomplete in the subject.

Put some dark M&Ms in my fridge.

Added to-dos to the list, and checked some off by typing a ‘y’ in the
Done column.

I’ll update you all in a few days and let you know the result. I’m
already feeling positive anticipation about checking things off the list,
so I’m optimistic

Here’s the script.

function onEdit(e)
{

if (e.range.getColumn() == 2 && !e.range.isBlank())
{
// Mark in column 2 the current date
var r2 = e.range.offset(0, 1);
r2.setValue(new Date());

// With some probability, send the todocomplete IFTTT trigger    
var r0 = e.range.offset(0, -1);
if (Math.random() < 0.5)
{
  Logger.log("Sending reward");
  MailApp.sendEmail("tri...@ifttt.com <javascript:>", "#todocomplete 
" + r0.getValue(), “”);
}
else
{
Logger.log(“Unlucky, no reward”);
}
}
}

On Fri, Oct 11, 2013 at 9:17 PM, Brent Yorgey <byo...@gmail.com<javascript:>

wrote:

I don’t know about the optimal rate, but whatever it is, I would use a
Poisson distribution — that way, very occasionally you will get multiple
jellybeans!

-Brent

On Fri, Oct 11, 2013 at 9:08 PM, Daniel Reeves <dre...@beeminder.com<javascript:>

wrote:

Thanks Jake and Paul! Does anyone know a concise recommendation for a
randomized reward schedule? (Eg, when the desired behavior is
exhibited, dispense a jellybean with probability 1/3. Or maybe it
shouldn’t be stateless like that and should, say, limit dry spells of
jellybeanlessness.)

Perhaps the answer is buried in here:
Reinforcement - Wikipedia

On Fri, Oct 11, 2013 at 5:08 PM, Paul Fenwick <paul.j....@gmail.com<javascript:>>
wrote:

Anyway, anyone tried anything like this? Any results, positive or
negative?

My exobrain¹ gives me HabitRPG XP for responding to email, flossing my
teeth (via a beeminder callback), recordinging my weight, and other
bits and pieces. However the novelty value for that wore off pretty
quickly. Jellybeans might indeed work better, as they’re less abstract
and more delicious.

¹ GitHub - pjf/exobrain: Automate your life with Exobrain

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to akratics+u...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

–
http://dreev.es – search://“Daniel Reeves”
Goal tracking + Commitment contracts == http://beeminder.com

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+u...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+u...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+u...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

Lincoln_Quirk · October 21, 2013, 12:05am

Thanks Erica! I like the percentile solution but it seems to only apply in
cases where you have events with an ordinal value (like how “close” you are
to the goal), not just whether or not you did the thing. Now I’m
brainstorming ways to make my to-do list events ordinal…

Anyway, I did build a cardboard prototype of the device.

Post: http://techhouse.org/~lincoln/blosxom.cgi/si/reinforcer.html
Video: https://www.youtube.com/watch?v=8I_BJsLeRKE

Further data still seems to indicate that this project is valuable – I’m
still using the to-do list system. But it’s still inconclusive, so expect
another update in a few weeks or maybe a couple months.

On Wed, Oct 16, 2013 at 7:59 PM, Erica Edelman edelmaned@gmail.com wrote:

Relevant, and possibly of interest- *tl:dr- Percentile reinforcement
schedules: **rather than wait for most responses to meet criterion and
then drastically reducing reinforcement frequency by shifting criteria
infrequently, it is better to change criteria frequently to maintain both a
relatively constant reinforcement density and an intermittent one. *

SOURCES

Galbicka, Gregory. Shaping in the 21st Century: Moving Percentile
Schedules into Applied Settings. n.d. (
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1297861/pdf/jaba00010-0182.pdf)
(p740-754)

…also, Fisher, Wayne W., Cathleen C. Piazza, and Henry S. Roane. Handbook
of Applied Behavior Analysis. New York: Guilford Press, 2011. p240ish

OVERVIEW

(Shaping) used to be considered more an art than science. Now: “percentile
schedules” involving rules for when to deliver reinforcers, and these
momentary rules adjust based on recent (local) rates, duration, or types of
responding. (ABA p240)

percentile schedules represent a formalization of the rules of shaping.
(p755)

Percentile schedules, however, do more than automate shaping. In addition,
they make explicit and objective the criteria that define responses as
criterional or noncriterional throughout acquisition and maintenance,
providing explicit prior control over reinforcement density as well as
criterional response probability. Because of this, they provide almost
complete independence from trainer- and subject-related variables. This
allows all subjects to be trained in a specified manner despite changes in
the trainer or the subject, or at different points in the differentiation.
(p740)

(If you shaped by occasionally raising criterion for reinforcement) A plot
of reinforcement density across time would reveal a pattern like a
sawtooth; with each change in the criterion, reinforcement density drops
abruptly, but as behavior gradually changes to include more and more
criterional responses, reinforcement density gradually increases until the
cydle repeats with the next criterion change. This cyclic change in
reinforcement density is more pronounced following extended training (p243)

rather than wait for most responses to meet criterion and then drastically
reducing reinforcement frequency by shifting criteria infrequently, it is
better to change criteria frequently to maintain both a relatively constant
reinforcement density and an intermittent one. Both characteristics
decrease the likelihood of losing control

over responding prior to the acquisition of the terminal response. (p244)

The percentile solution, developed and expanded by Platt (1973) and
colleagues, is momentarily to abandon

the exact physical characteristics of the response and treat it as an
ordinal quantity. Ordinal quantities

are values that carry only an associated rank, as opposed to the more
typical means of quantifying

observations by assigning a cardinal number and a standard unit. (p244)

NITTY GRITTY OF HOW TO DO IT

m previous observations create m + 1 intervals, one of which must contain
the next observation. The counterintuitive notion that intervals of
different sizes are equally likely to contain the next observation arises
because the line represents a cardinal scale, but the question of which
interval will contain the next observation relates to the ordinal
properties of the observations. For the moment, ignore the fact that there
are physical values attached to any of these observations, and treat them
solely in terms of their ranks. In any distribution of values, there is one
and only one value ranked 1st, 2nd, 3rd, and so forth. The question of
interest is not “What is the expected value of the next observation (i.e.,
what distance will next be run)?” but rather is “Where will the next
observation rank?” If the assumption of independence is met, it will be as
likely to rank first or last or anywhere in between, depending on the
number of prior observations. (p745)

Hence, the probability that the next observation will fall into any one of
k intervals defined by m observations is k times the probability of falling
into each interval, or k/(m+ 1). …establishing a criterion at the kth
rank. That is, rather than setting the criterion (for reinforcement) at a
particular fixed, physical value, the criterion can specify that the next
observation, to meet criterion, must rank higher than the value currently
ranked k. When k = 1, responses will be considered criterional if they
exceed the response currently ranked 1st (lowest)… The probability of a
criterional response (denoted w) is ….w = 1 - [k/(m + 1)] . …. Thus, as the
criterion is made more stringent (i.e., as k is increased), the probability
of observing a criterional response decreases accordingly, as intuition
would suggest. (p746)

(If you know you want w to be a set percentage of reinforcement, you can
rewrite equation to find k.) (p746)

Instead of comparing current response to all previous responses
(increasing m by one each iteration), use only the most recent responses to
compare to. For example, only use the past 5 responses. (p747)

For ties- when current response is tied with response it must exceed: “The
simplest solution is to select ties with a random probability equal to w
and call them criterional.” (p749)

Percentile schedules appear to meet all the requirements for a viable
procedure to formalize shaping except the last-they do not specify a
terminal response. The criterion is never specified as an absolute; rather,
it is described only in relative fashion (i.e., exceed the kth
rank)…There is only one terminal response of all shaping-to do better on
the next trial than on previous trials. This is what percentile schedules
program, where “better” is defined as exceeding the kth rank and “previous
trials” is given by the most recent m observations. Because criteria are
evaluated relative to ongoing behavior, there is never a need to stop
shaping (p750)

Although sequential dependencies (e.g. responses such as 1, 2, 3, 4, 1,
2, 3, 4, 1…etc) diminish the ability of percentile schedules to control
criterional response probability, their effects can be minimized by
increasing the comparison distribution size. (p753)

The other “limitation,” that responding be ordinally rankable, could
actually aid application of

percentile schedules…To illustrate, suppose we wish to train a
developmentally disabled client to drink fluid though a straw. Prior
observation of the behavior leads the shaper to suggest that the following
five behaviors

might be involved: (1) holds glass, (2) directs glass toward mouth, (3)
holds straw with other hand, (4) directs straw into mouth, and (5) sucks on
straw. These five behaviors can easily be ranked 1 to 5, with 1 being
furthest from the terminal response and 5 being dosest. A percentile
schedule could be

imposed by recording the response value (i.e., 1 through 5) on each trial.
Whether our conception of the response matches the subject’s will be
evident in the relative frequency of each of the different rankings. (So
steps can be added or taken away depending on the subject’s responses).
(p754)

On Saturday, October 12, 2013 9:22:20 AM UTC-4, Michael J.J. Tiffany wrote:
+1 to using a random reinforcement schedule.

On reenforcement probability: the state of the art in operant
conditioning is probably still in dog training. Can we learn anything from
dog trainers? We don’t see a huge amount of scientific rigor, but we do see
strong selection pressure among the population (people who stay in dog
training are the ones who produce good output – trained dogs – in the
least amount of time, else they lose to those who do). Polling from this
population as well as I can, I’ve derived a consensus figure of just 1/5
for reinforcement of basic behaviors (e.g. sitting on command).

On reward latency: hacking some deep brain structures can work on
surprisingly long timescales. I don’t really believe in deep brain
structures, but some stimuli (e.g., “this food made me feel poisoned!”) are
more potent than others. Recall the long-ago work on induced food aversions
in rats and dogs with radiation coming hours after the fact! (see the
Taste Aversions part of http://psychology.about.com/**
od/behavioralpsychology/a/**classcond.htmhttp://psychology.about.com/od/behavioralpsychology/a/classcond.htmfor an easy overview). That insight is not immediately actionable in your
first use case, but it’s worth keeping in mind for future experimentation,
I think.

I mean, an entire generation of intellectuals was conditioned to enjoy
avant-garde theater, which I think can only be explained by the sex they
must have been having afterward.

Cheers,

Michael Tiffany

On Fri, Oct 11, 2013 at 11:15 PM, Lincoln Quirk linc...@gmail.comwrote:
Okie. I mocked this up with SMS and IFTTT and Google Spreadsheets:

Created a spreadsheet with 3 columns: “Name”, “Done?” and “Date
Completed”

Attached the below script to the “onEdit” trigger of the spreadsheet.
(This is currently tricky but if Google approves my attempt to publish the
script then maybe it’s easy? Let me know if you can’t figure it out.)

Setup an IFTTT trigger to send me an SMS to “get yourself a candy”
for incoming email with #todocomplete in the subject.

Put some dark M&Ms in my fridge.

Added to-dos to the list, and checked some off by typing a ‘y’ in the
Done column.

I’ll update you all in a few days and let you know the result. I’m
already feeling positive anticipation about checking things off the list,
so I’m optimistic

Here’s the script.

function onEdit(e)
{

if (e.range.getColumn() == 2 && !e.range.isBlank())
{
// Mark in column 2 the current date
var r2 = e.range.offset(0, 1);
r2.setValue(new Date());

// With some probability, send the todocomplete IFTTT trigger
var r0 = e.range.offset(0, -1);
if (Math.random() < 0.5)
{
  Logger.log("Sending reward");
  MailApp.sendEmail("tri...@**ifttt.com", "#todocomplete " +
r0.getValue(), “”);
}
else
{
Logger.log(“Unlucky, no reward”);
}
}
}

On Fri, Oct 11, 2013 at 9:17 PM, Brent Yorgey byo...@gmail.com wrote:

I don’t know about the optimal rate, but whatever it is, I would use
a Poisson distribution — that way, very occasionally you will get
multiple jellybeans!

-Brent

On Fri, Oct 11, 2013 at 9:08 PM, Daniel Reeves dre...@beeminder.comwrote:

Thanks Jake and Paul! Does anyone know a concise recommendation for a
randomized reward schedule? (Eg, when the desired behavior is
exhibited, dispense a jellybean with probability 1/3. Or maybe it
shouldn’t be stateless like that and should, say, limit dry spells of
jellybeanlessness.)

Perhaps the answer is buried in here: Wikipedia, the free encyclopedia**
Reinforcement http://en.wikipedia.org/wiki/Reinforcement

On Fri, Oct 11, 2013 at 5:08 PM, Paul Fenwick paul.j....@gmail.com
wrote:

Anyway, anyone tried anything like this? Any results, positive or
negative?

My exobrain¹ gives me HabitRPG XP for responding to email, flossing
my
teeth (via a beeminder callback), recordinging my weight, and other
bits and pieces. However the novelty value for that wore off pretty
quickly. Jellybeans might indeed work better, as they’re less
abstract
and more delicious.

¹ https://github.com/pjf/**exobrain/https://github.com/pjf/exobrain/

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to akratics+u…@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

–
http://dreev.es – search://“Daniel Reeves”
Goal tracking + Commitment contracts == http://beeminder.com

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+u…@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+u…@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

–
You received this message because you are subscribed to the Google
Groups “Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to akratics+u…@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.
–
You received this message because you are subscribed to the Google Groups
“Akratics Anonymous” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to akratics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Positive reinforcement vs. punishment Akrasia	17	1611	January 10, 2021
Do habits exist? Akrasia	16	1594	February 28, 2019
beeminder epiphany: auto re-pledge with easy opt-out	21	1654	April 13, 2013
low-friction ad-hoc beeminding	24	2242	October 19, 2012
Common reactions to Beeminder Akrasia	18	1776	April 30, 2021

Automated positive reinforcement

Related topics