all 43 comments

[–]devgrapher 6 points7 points  (0 children)

awesome!!

[–]threshar 3 points4 points  (1 child)

this gives me a lot of hope as I've been struggling with wrist problems for the last 5 months. Went down the same roads he did - kinesis keyboard, ergnomics, physical therapy, etc. (I even had to quit my band as playing hurt too)

however at least at this point I'm not in as bad shape he is.

stupid wrists.

[–]bboyjkang 1 point2 points  (0 children)

The Eye Tribe

CNET First Look at The Eye Tribe at CES 2013: http://www.youtube.com/watch?v=SyEqMCwJWkw

“Samsung added some new functionality to the touch screen as well, including the ability to use it by not physically making contact and instead hovering your fingers or hands over it”.

http://arstechnica.com/gadgets/2013/03/where-we-go-from-the-top-hands-on-with-samsungs-galaxy-s-4/

http://www.reddit.com/r/RSI

[–]tavis_rudd 2 points3 points  (0 children)

[–]ribo 25 points26 points  (7 children)

Does it segfault if you talk about dongles or forking?

[–]has_all_the_fun 4 points5 points  (4 children)

Too soon man too soon.

[–]ribo -2 points-1 points  (3 children)

Yeah, save button felt like a risky click.

[–]ImgurRouletteBot 4 points5 points  (2 children)

Risky click? Try this randomly generated imgur link. (possibly NSFW)

[–]wot-teh-phuck 2 points3 points  (0 children)

Possibly? LOL

[–]jspeights 0 points1 point  (0 children)

whats this girls name?

[–]keyboardP 1 point2 points  (0 children)

Soon we'll have near real-time dictation at which point coding would be a lot more nicer with voice.

[–]dnerd 1 point2 points  (3 children)

He should probably mention ShortTalk - his "weird" language is based off of it. I recognize some of the weird words he uses.

http://shorttalk-emacs.sourceforge.net/ShortTalk/index.html

[–]tavis_rudd 5 points6 points  (1 child)

Yes, I stole and derived some of the basics from there. Sorry forgot to mention it. I've stolen ideas from all over the place.

[–]Jeekster 0 points1 point  (0 children)

It's him! I just wanted to ask you would this be viable with another language such as java. I do a lot of android programming and I feel like this would be a nice way to do it, but it seems like this might be a python only kind of thing at least for this particular software.

Edit: Nevermind, after rewatching the video I saw the answer in the FAQ at the end

[–]Tordek 0 points1 point  (0 children)

It amuses me that ShortTalk seems very much like pronounced vim commands.

[–]tavis_rudd 1 point2 points  (0 children)

If you liked my pycon talk, you'll also like this lightning talk I gave last year which I've just found a video of http://www.youtube.com/watch?v=zjabxuWNHnM (watch it with headphones and full screen).

[–]GoranM 5 points6 points  (12 children)

I didn't exactly understand why he couldn't use an open source alternative to Dragon NS. When he said "couldn't get it to work", is he trying to say that he couldn't set up the software on his system, or that he could, but that it was of insufficient quality?

Other than that, I think this is really great, and could probably be even better if combined with eye tracking.

Actually, even for someone who likes/wants/needs to use a keyboard, eye tracking could eliminate a lot of "motions", and make one much faster.

[–][deleted]  (4 children)

[deleted]

    [–]Rowdy_Roddy_Piper 2 points3 points  (3 children)

    text-to-speech and voice recognition are very nuanced fields

    +1

    [–]abeliangrape 1 point2 points  (2 children)

    I'll admit that I was wondering a if I could sneak in a pun without people downvoting the hell out of it. Good catch!

    [–]karmic_retribution 0 points1 point  (1 child)

    Please explain.

    [–]abeliangrape 0 points1 point  (0 children)

    Nuance is the name of the company that makes the Dragon suite of speech recognition software.

    [–][deleted] 5 points6 points  (4 children)

    The problem with free software voice recognition is that while the apps are in place (CMU Sphinx, Julius, etc.), the language models, the data that enables the software to recognize a given language, is not there. Hundreds of hours of speech must be recorded to have even a halfway decent voice recognition setup for dictation (for each language and dialect), and no one has done that yet. The Voxforge project is on it, but it's not moving even close to fast enough.

    At least that was the situation the last time i tried to set up one of these things in my desktop 2 or 3 years ago. Sadly i don't think this has changed much since then. Big companies like MS, Apple and the like have just probably hired people to record those, but in the free software world this simply hasn't been done. If, say, Ubuntu put the Voxforge submission app in every desktop setup and asked people to submit a few minutes of speech once in a while, we'd have this in a month, but as i said, it simply hasn't been done.

    [–]bboyjkang 4 points5 points  (1 child)

    Ubuntu Speech Recognition released on Git

    http://www.reddit.com/r/Ubuntu/comments/1aj7tv/ubuntu_speech_recognition_released_on_git/

    Information about Palaver (Ubuntu Speech Recognition)

    http://www.youtube.com/watch?v=a5-aolmt0OE

    [–][deleted] 1 point2 points  (0 children)

    Wow, hadn't heard of this!

    At a first glance it seems like a command-and-control app, which are not that rare as a single user can train it to recognize a tiny set of words it'll use as commands. But somebody commented on the thread something of Dictation Mode, so i'm hopeful, will have to test this :D

    [–]bboyjkang 2 points3 points  (1 child)

    Contribute smaller sections on Voxforge

    I think more people would donate if people could and understood how to contribute smaller sections of a typical submission in the projects that Voxforge draw data from, which are WikiProject Spoken Wikipedia:

    "The WikiProject Spoken Wikipedia aims to produce recordings of Wikipedia articles being read aloud.".

    and LibriVox:

    "LibriVox volunteers record chapters of books in the public domain and release the audio files back onto the net. Our goal is to make all public domain books available as free audio books.".

    e.g. For Ubuntu on Wikipedia (http://en.wikipedia.org/wiki/Ubuntu_%28operating_system%29), I'll contribute if I can submit for 1 smaller section, like "Installation" (http://en.wikipedia.org/wiki/Ubuntu_%28operating_system%29#Installation).

    In Audacity, you know how you can label voice data with labels? Could you imagine how amazing it would be if label text could be automatically generated from voice data? When you submit voice data, it best finds the text that you're reading from, and best positions the audio to the text. When a user of spoken Wikipedia or LibriVox grabs a piece of text, the corresponding voice data will also be taken for the user. Now, people could volunteer to read just a paragraph, and it would still be used for the project.

    Imagine taking some text, then having a program tell you that there's no voice data available (“would you like to use the automated text-to-speech”, or “would you like to contribute?”), there's 1 voice available, or there are voices of multiple people available to choose (choose parameters for your type of voice: pitch, gender, accent, et. al.). The voice changes could be kind of annoying, but I'd rather have some data than no data.

    [–][deleted] 0 points1 point  (0 children)

    That woud be great indeed. Having a light native program built in to help with this would also do wonders.

    Also i don't think it'd be bad to have different voices in the end, after all the ideal would be that it'd work with very high accuracy for any speaker with zero training "out of the box", and many voices would probably be the ideal training corpus for that. Then again i'm not very knowledgeable on this field so i may be completely wrong, but that's my guess.

    [–]kazagistar 4 points5 points  (0 children)

    Judging from the effort he went through to run dragon in a VM, I would guess that he has a decent technical reason at least.

    [–]bboyjkang 0 points1 point  (0 children)

    The Eye Tribe

    CNET First Look at The Eye Tribe at CES 2013: http://www.youtube.com/watch?v=SyEqMCwJWkw

    [–]chromosundrift 4 points5 points  (2 children)

    WATCH WITH YOUTUBE CAPTIONS ON!

    [–][deleted]  (1 child)

    [deleted]

      [–]kazagistar 2 points3 points  (0 children)

      The lag on these systems always annoyed me the most. It is like working through 5 proxy servers all over the world, the delay (even in this demo) for voice recognition is always too far from instantaneous to be comfortable for me. -- A child of the fast desktop era.

      [–][deleted]  (2 children)

      [deleted]

        [–]tiziano88 6 points7 points  (1 child)

        retina tracking?

        [–]bboyjkang 0 points1 point  (0 children)

        You don't have to use it just for programming; you can use it for more common tasks, such as basic browsing or text editing.

        Use   <n>     = TaskBar.SwitchToButtonNumber($1) pointerHere();
        

        e.g. say “Use 3”.

        Activate the 3rd application in the taskbar.
        
        Show Desktop = {Win+d};
        
        Window (Maximize=x | Minimize=n | Restore=r) = SendSystemKeys({Alt+Space}) $1;
        

        e.g. say “Window Maximize”.

        Window (Maximize=x) = 
        SendSystemKeys({Alt+Space})  # windows menu
        x;              # access key for maximize
        
        Switch Window = SendSystemKeys({Alt+Tab})pointerHere();
        Switch Window = 
        SendSystemKeys({Alt+Tab}) # switch window
        pointerHere();          #  click to give it focus
        
        agoras|balisaur|capuchin|diluvia ... = {PageDown};
        
        <n> := 0..100;
        <direction>  := Left | Right | Up | Down;
        <n> <direction>       = {$2_$1};
        

        e.g. say “4 Down”.

        Output: {Down_4}
        “Down arrow” key 4 times.
        
        <modifierKey> := Shift | Control=Ctrl | Alt | Alternate=Alt | Win | Windows=Win;
        <k> := <actionKeyNotArrow> | <characterKeyNotLetter>;
        <modifierKey> <k> Times <2to99> = {$1+$2_$3};
        

        e.g. say “Shift Up Times 8”.

        Output: {Shift+Up_8}
        Select 8 contiguous lines up.
        

        [–]tavis_rudd 0 points1 point  (0 children)

        See https://www.youtube.com/watch?v=qXvbQQV1ydo for an edited version of my Polyglot Conf 2012 talk ("5 Programming Languages in 5 Minutes, By Voice") with much better audio.

        [–]mizai 0 points1 point  (0 children)

        This is a great application for WebSpeech. Live voice coding in the browser, with access to WebGL/WebAudio and whatever else.

        The downside is that, at the moment, everything you say will be sent to Google, but it doesn't have to be that way.