Using a voice recorder for low-friction datapoints entry?

Note: I haven’t used this specific workflow myself, but it seems like it should be possible, and might be useful to someone.


Recently I bought a voice recorder (Sony TX660). It’s super small, weighs almost nothing, and records with a single button press even if it’s turned off. A month later, I’m still using it intensively.

Coincidentally,

  • the Whisper speech recognition model (from OpenAI) is rather cheap and rather accurate,
  • and ChatGPT API can easily be prompted to automatically categorize pieces of text based on whatever “buckets” you give to it.

This means that if you have Beeminder tasks like “express gratitude more often” or “drink more water” or “eat three times a day” or whatever else that is done throughout the day rather than in bursts, you can either

  • use your phone to record the datapoints,
  • or use a voice recorder and submit datapoints via voice recognition + automatic categorization at the end of the day.

Pros:

  • You don’t have to ever touch your phone, which is good if you have ADHD.
  • You can enter datapoints even if the phone is discharged.
  • You get to say things out loud more often, which is actually nice (try saying “KABOOM” right now to see if you’re this kind of person)

Cons:

  • You might freak people out by taking out your voice recorder and whispering “hey, I expressed gratitude now”. (Although I’ve been going around with a voice recorder for a while and nobody complained yet.)

If somebody wants to try out a scheme like this, DM me and I can help you with coding it up.

8 Likes

This is really cool! Where do you find your ideas?
I would love something like this because I don’t like having my phone with me to do Beeminding, since I’m a bit addict to it. I would love a device that would list my beemergencies and to which I could talk

(“hey, I just cleaned my room!” “hey, I just took a shower!”)

→ all kind of physical tasks that are hard to do without beeminder, but also hard to do with my phone being on


also what do you use your recorder for, I wonder?

4 Likes

That’s really cool. Is it hard to use the whisper API?

1 Like

also what do you use your recorder for, I wonder?

I have a post about it:

Basically, I have some long-running things that I want to track over a period of potentially up to several years. These include:

  • recording whenever I drink alcohol, whether I enjoyed it, and what the consequences were — so far it has made it much easier not to drink, because the consequences are always “my head aches”;

  • recording problems that I notice in my own life — I feel like “things never get solved” so I want to track whether they actually ever get solved or not;

  • recording changes I notice in myself — it’s easier to realize “yeah I can be wrong about this thing that I feel strongly about right now” when you have a lot of examples of changing your mind before;

  • recording my own limitations — a “know thyself” type of thing, e.g. “I get hurt easily” or “I have bad memory”;

  • and more mundane things like “TODO don’t forget about X”. Like a todo-list inbox that I always carry in my pocket but that’s relatively write-only. (I can listen to past entries on the recorder itself, but I don’t.)

6 Likes

Where do you find your ideas?

I used to carry a tiny notebook and a pen in my pocket for writing things down. During winter, it’s rather hard to write things down when I’m walking outside, because it’s winter.

Not sure how I went from that to “oh yeah I could use a VOICE RECORDER”, but eventually I did.

The whole “categorize voice memos automatically” thing only came up later when I realized that hey, I have a backlog of 300 entries and I don’t want to sort it manually.

5 Likes

That’s really cool. Is it hard to use the whisper API?

Rather easy.

You can even use whisper-the-binary instead of the API and the voice recognition will happen locally and you won’t have to pay…

…but you either need a device with a GPU, or it will be pretty slow. I have an iMac M1 and Whisper can’t use M1’s GPU yet, so that’s why I’m using the API.

I’ve uploaded the code here: voice recorder yeah · GitHub. Whisper is called at lines 63-64.

If you want to run Whisper locally, use either openai/whisper (GPU enabled) or whisper.cpp (CPU only).

5 Likes

I’m going to buy one right away. This thing is going to change my life!!!
Man, really, thanks for sharing your ideas all the time. Please continue, I always find them super interesting! I aspire to have ideas like you and putting them into practice!


Buying the recorder

There are two versions available, so I made a bit of research, if anyone else ends up in the same situation…

Name Price Link Weight Dimensions Model number Notes
ICD-TX660 160€ click! 29g 10.2 x 2 x 0.7 cm (wtf?) ICDTX660.CE7 (link) Sony UK and Sony FR point to this Amazon page
ICD-TX660 194€ click! 29g 9.4 x 4.3 x 16.3 cm (wtf?) SONYICDTX660

It looks like there was a new version that included, among others, USB-C connectivity and better ergonomics.

I don’t really know what’s happening, so I’ll just take the 160€ one.

5 Likes

I have the one with USB-C but I only paid €150 for it (I live in Poland) so something seems off.

Anyway, it should all be fine as long as there’s one-push recording available.

2 Likes

Update: I’m writing an app for transcribing and managing the memos. Comment here if you’re potentially interested in using it.

It might also be possible to implement commands like “send this to beeminder goal X”.

5 Likes

I got a TX660 today! I want to copy this setup to create a tag-time like system.

I do have a question for @april as I’m trying to automate things:

I noticed the storage has a structure like this after my first several recordings:

.
├── capability_02.xml
├── MUSIC
└── REC_FILE
   └── FOLDER01
      ├── 230330_0852.mp3
      └── 230330_0852_01.mp3

As I create more recordings, will they all end up in ./REC_FILE/FOLDER01? It’s probably safe to assume I can just rake **/*.mp3 from REC_FILE, but I thought I’d ask.

2 Likes

No, sometimes it also creates FOLDER01_02 and so on. It’s safer to use REC_FILE/**/*.mp3, yeah.

1 Like

I didn’t really start transferring audio notes, despite building a primitive system for it.

I think I find more value in it being kind of like “photography, but for audio stuff”.

Maybe I’ll build an app that allows me to see what I say over time; like journaling.

I should pick up the habit to do voice journaling every day!

2 Likes

I keep on procrastinating on setting it up. So I created a task ratchet task for it so I eventually mess around with it.

4 Likes

If you happen to use nix, it’s trivial to run it: nix run nixpkgs#openai-whisper ./path/to/file.mp3 --language English

4 Likes

Are you plugging it into your computer each evening to offload and process the audio?

1 Like

That’s my idea idea, but I haven’t gotten that far yet. I ran into an issue with Pavlok’s notification API that’s derailed the project a bit.

2 Likes

So I’m imagining:

  • plug the device in
  • something on the computer notices the recorder’s storage
  • a script runs and transfers the files to an inputs folder
  • all recordings are submitted for transcription (maybe whisper)
  • transcript is submitted to gpt with various prompts
    • one prompt to extract user asking to add data to beeminder
    • prompts to check for various conditions that should result in data sent to goals (“does the text express gratitude?”) maybe conditions could be defined via fineprint hashtags or something? or maybe just include a good enough description of the goal in its fineprint that gpt can figure it out?
    • prompt to extract todos to be submitted to TaskRatchet / other task managers?
  • transcript is submitted to whatever note-taking / archival system you prefer

Dunno. Lots of hand-waving here. Super interested to hear what types of things you guys try with it and how it goes.

It would be very nice if it could somehow auto-connect the storage to the computer via bluetooth or something so the recordings could be ingested without the user explicitly needing to remember to plug it in so the automations can run… Or bypass it altogether by connecting to the user’s wifi like most IoT stuff does.

I realize your smartphone does this, but then you lose the advantage of not needing to engage with your phone for all this to work. And it does sound super nice to be able to press a button and talk and not have to do anything more.

So the kinds of inputs it would be great if it could handle:

Input speech Action
I’m taking a walk walks +1
I’m really grateful for the sunshine and that my birthday is tomorrow gratitude +2
I commit to taking out the trash by midnight tonight or I pay $5 add task to TaskRatchet
Please post a value of 5 to my pushups goal pushups +5
Tell Beeminder to take $3 create one-off $3 Beeminder charge

So maybe you could engineer it such that you have various GPT-powered classifiers and then you have different actions the code should take based on how the input was classified? Like maybe the “I commit…” input should add a task to TaskRatchet, but also add it to Todoist or something. So allow for how the parsed event data should be handled to be configured.

3 Likes

Definitely interested in taking a look at what you’ve got working so far!

1 Like

Playing around with gpt-3.5-turbo and this prompt seems to be doing fairly well:

Prompt

Pretend you are a professional tool which generates JSON data from natural language input. Once you’ve generated the JSON data, a separate system will submit the data to Beeminder.

Here is an example JSON object:

{
    "goal": "<GOAL_NAME>",
    "value": 3,
    "comment": "<USER_INPUT>",
    "system_reasoning": "<REASONING>",
}

The value can be any positive or negative number. Choose a number based on the user’s input in combination with the relevant goal’s description.

The following natural-language input was submitted by Beeminder user frank. This user has the following Beeminder goals:

Name Description
pushups Number of pushups
gratitude Number of things I have expressed gratitude for
dishes Number of times I’ve completed the dishes

Here is the natural-language input:

“<NATURAL_LANGUAGE_HERE>”

Please return the generated JSON only. Please do not return any other text.


Input / Response Pairs

Input Response
“I just did five pushups” { "goal": "pushups", "value": 5, "comment": "I just did five pushups", "system_reasoning": "User reported completing five pushups" }
“I’m really thankful that I get to go to a baseball game tomorrow and that Tony sent me a letter.” { "goal": "gratitude", "value": 2, "comment": "I'm really thankful that I get to go to a baseball game tomorrow and that Tony sent me a letter.", "system_reasoning": "User expressed gratitude for 2 things in their input." }
“Done with the dishes!” { "goal": "dishes", "value": 1, "comment": "Done with the dishes!", "system_reasoning": "User has completed the dishes goal for the day." }

It would also be interesting to play around with creating a prompt that would generate a JSON array so that a single input could produce multiple datapoints in one or more goals.

6 Likes

That is a really cool idea. Would make it easier to log goals. Though me being me I probably would have trouble motivating myself to plug my audio device into my computer. Though then again Beeminder has been a great motivator.

3 Likes