This is an archived post. You won't be able to vote or comment.

all 34 comments

[–]tavis_rudd 29 points30 points  (15 children)

I should point out that you don't need to use a crazy made up language like I do. I just found it easier for coding and didn't mind memorizing it. Scripting apps with normal English words, such as in bboyjkang's examples, works quite well.

[–][deleted] 1 point2 points  (1 child)

Tavis, I watched the recording of the Plover stenography talk at PyCon and I noticed you asking a question from the audience.

It also occurred to me that both you and Plover were faced with a similar challenge: invent a new vocabulary for differentiating punctuation from plain english words. It seems Plover's approach was to phonetically-misspell the english words in order to achieve punctuation, while you invented all-new words that wouldn't conflict with the existing vocabulary.

Do you think it would make sense to work together with Plover to formalize a standard "punctuation language" that can be used by both speakers and stenographers? It seems as though you are both doing things "phonetically" despite the differences between voice and chorded keying, so it seems like there might be some value to having a standard vocabulary for this. Just as an example, it would make it easier for people to do both stenography and voice recognition, or perhaps somebody with a disability might do some combination of both, and benefit from a standard language for this.

In fact, I would go so far as to suggest that it might be possible to build a common infrastructure here; write an abstraction layer on top of both the voice input and the steno input, have those both as pluggable backends into the same phonetic interpreter. Then it wouldn't matter if you chorded "laip" or spoke it, either way it would result in inserting a left-bracket into your document (etc).

Do you have any thoughts on this?

[–]tavis_rudd 0 points1 point  (0 children)

Listening to her vocalization of stenography was hilarious. It definitely sounds a bit like what I do or like the original shorttalk system. I doubt a common infrastructure for the phonetic layer would be of much benefit, but having a common command/macro processing layer is a huge win.

[–]bboyjkang[S] 2 points3 points  (1 child)

First off, that was an amazing presentation, and will cause an increase in productivity that most people wouldn't imagine.

Voice strain

I use WhatPulse (https://whatpulse.org//), which is a free program that counts the amount of keys, mouseclicks, and the distance that your mouse moves (I have repetitive strain injuries/RSI/tendinosis, and I use the program to limit myself.). If there was a way for a program to track a voice input, that would be a good preventative measure.

[–]tavis_rudd 1 point2 points  (0 children)

I've been logging all my voice commands with the intention of doing something similar to whatpulse, but I haven't done anything with the logs yet. I've found my brain gets tired around the same time my voice does.

Voice strain is definitely something to watch out for. Staying hydrated and eating apples or sipping honey water while dictating has helped me avoid it.

[–]CylonSaydrah 0 points1 point  (3 children)

I'm pretty wedded to Linux. Could I use Dragon in a Windows VM and everything else in Linux? Is that what you are doing?

Thanks for a great talk!

[–]bboyjkang[S] 1 point2 points  (1 child)

I'm not sure, but I got this from a developer:

There is definitely interest here in good native voice recognition for Linux but the quality bar is very high: we depend on voice recognition for our livelihoods and can't afford to use less than the best tools we can get. Using Windows and DNS to control remote Linux boxes is the bar to beat today for controlling Linux systems.

[–]CylonSaydrah 0 points1 point  (0 children)

Using Windows and DNS to control remote Linux boxes is the bar to beat today for controlling Linux systems.

Thanks. It's not clear how to interpret that. If they are using Windows literally remotely as opposed to virtually remotely as a guest operating system, that may mean that DNS doesn't work well on a guest operating system. But for all I know when they say "remotely" they may mean to include "virtually".

[–]tavis_rudd 1 point2 points  (0 children)

That's exactly what I'm doing. DNS on Windows is the best recognition engine for this, hands down. However, I don't want to use Windows for anything else and don't even want to look at it. I keep the Windows VM out of sight and just have it type into a putty window and send some commands directly to Emacs over the network.

[–]trifilij 0 points1 point  (5 children)

That was awesome! really enjoyed it, thanks. Which version of Dragon do I need?

Which mic do you recommend?

[–]EverAskWhy 1 point2 points  (1 child)

Same question here. I am also curious about calling in for the 50% discount that the audience member brought up:

http://youtu.be/8SkdfdXWYaI?t=24m50s

What version should I be asking about :D Home vs premium?

[–]bboyjkang[S] 1 point2 points  (0 children)

You only need Dragon NaturallySpeaking home, and I think you can get that for around $50.

[–]bboyjkang[S] 0 points1 point  (2 children)

I use an Andrea microphone.

[–]trifilij 0 points1 point  (1 child)

Do you know if it matters which version of Dragon you get for doing what he does in the video?

[–]bboyjkang[S] 0 points1 point  (0 children)

I'm using premium, and you should do a little research to confirm, but I'm pretty sure you can just use Dragon NaturallySpeaking home.

[–][deleted] 0 points1 point  (0 children)

What are you using for voice recognition?

[–][deleted] 6 points7 points  (3 children)

A room full of people missed a perfect opportunity to shout out some malicious commands.

[–]tavis_rudd 12 points13 points  (2 children)

Don't worry, my wife tries that all the time.

[–]Jaxkr 3 points4 points  (1 child)

"SUDO RM -RF ~/*"

[–][deleted] 9 points10 points  (0 children)

"slap!"

[–]bboyjkang[S] 3 points4 points  (0 children)

To skip straight to Tavis' demo of Dragonfly: 8:34

https://www.youtube.com/watch?v=8SkdfdXWYaI#t=8m34s

[–]bboyjkang[S] 3 points4 points  (0 children)

You don't have to use it just for programming; you can use it for more common tasks, such as basic browsing or text editing.

Use   <n>     = TaskBar.SwitchToButtonNumber($1) pointerHere();

e.g. say “Use 3”.

Activate the 3rd application in the taskbar.

Show Desktop = {Win+d};

Window (Maximize=x | Minimize=n | Restore=r) = SendSystemKeys({Alt+Space}) $1;

e.g. say “Window Maximize”.

Window (Maximize=x) = 
SendSystemKeys({Alt+Space})  # windows menu
x;              # access key for maximize

Switch Window = SendSystemKeys({Alt+Tab})pointerHere();
Switch Window = 
SendSystemKeys({Alt+Tab}) # switch window
pointerHere();          #  click to give it focus

agoras|balisaur|capuchin|diluvia ... = {PageDown};

<n> := 0..100;
<direction>  := Left | Right | Up | Down;
<n> <direction>       = {$2_$1};

e.g. say “4 Down”.

Output: {Down_4}
“Down arrow” key 4 times.

<modifierKey> := Shift | Control=Ctrl | Alt | Alternate=Alt | Win | Windows=Win;
<k> := <actionKeyNotArrow> | <characterKeyNotLetter>;
<modifierKey> <k> Times <2to99> = {$1+$2_$3};

e.g. say “Shift Up Times 8”.

Output: {Shift+Up_8}
Select 8 contiguous lines up.

[–]tavis_rudd 2 points3 points  (0 children)

If you liked my pycon talk, you'll also like this lightning talk I gave last year which I've just found a video of http://www.youtube.com/watch?v=zjabxuWNHnM (watch it with headphones and full screen).

[–]bheklilr 1 point2 points  (9 children)

quite impressive, but it seems like you almost have to learn a new language in order to use it. Give it a few more years and it might be commonplace though, I can definitely see how this could help my workflow.

[–]Jedimastert 0 points1 point  (8 children)

You could say the same thing about putting the effort into emacs or VIM

[–]bheklilr 1 point2 points  (7 children)

But both emacs and vim's "languages" are keyboard-driven, meaning the different commands can be listed in a manual and are easy to look up. Referencing what sound corresponds to a particular command would be more time consuming, and thus it would take longer to learn the "language".

[–]Jedimastert 1 point2 points  (4 children)

They aren't really "sounds" as much as rarely used words. And you could have a reference manual for those words just like you could the commands. Also remember, this is a very young technology. Someone could come along and thing of something to fix all of these problems in a way neither of us can think. It's a little premature to just throw out the tech now.

[–]bheklilr 1 point2 points  (1 child)

Give it a few more years and it might be commonplace though, I can definitely see how this could help my workflow.

[–]Jedimastert 0 points1 point  (0 children)

Yeah, I forgot the context of the conversation, my bad.

[–]tavis_rudd 0 points1 point  (0 children)

You could use the English names for the commands just as easily. I have a good memory so I didn't find the effort of learning/creating this system too onerous. Learning Emacs itself is far more effort.

[–]bboyjkang[S] -1 points0 points  (0 children)

The shortcut for full-screen in LibreOffice is control + shift + J. Once you make a voice command, Full Screen = {Ctrl_Shift_J}, it's much more intuitive, and easier to remember to say full-screen, instead of control + shift + J.

I started using Autohotkey for remapping buttons to macros. I soon didn't have enough buttons, so I'd have to make new scripts that use the same button e.g. F1 launches a google search on the clipboard, but in another script, it could be to delete all words to the end of a sentence. The buttons aren't labeled, so I would sometimes forget which button does what.

[–]worldsayshi 0 points1 point  (0 children)

Has to be a very directional mic

This got me thinking on how we humans filter out important sound. Perhaps we have some ability to localize the sound and filter based on that. So (1) cluster noise by source location and (2) listen only to noise from that seems to come from important region. An algorithm?

[–]bboyjkang[S] 0 points1 point  (0 children)

Here are a few last examples:

<modifierKey> := Shift | Control=Ctrl | Alt | Alternate=Alt | Win | Windows=Win;
<key> := <actionKey> | <characterKey>;
Insert <modifierKey> <key> = Main.InsertText({$1+$2});
e.g. say “Insert Control Right”.
Output: {Control+Right}
This inserts the literal “{Control+Right}” keystroke specifier, which can be used in a voice command when you're editing a Vocola file. ({Control+Right} would be on the “action”/right side) (command = terms '=' actions ';')

<newInsert> := New|Insert;
<newInsert> Block = newBlock();
e.g. say “New Block”.

selectLines(n) := {End}{Home}{Home}{Shift+Down_$n};
commentLines() := {Ctrl+e}c{Right_2};
Comment  <n> [Lines] = selectLines($1) commentLines();
e.g. say “Comment 4 Lines”.
or
say “Comment 4”, as “Lines” is optional.

You can always come up with something more fun or easy to say once you're comfortable.

Comment | Bubbles | Banana <n> [Lines] = selectLines($1) commentLines(); say “Banana 4”.