Speech recognition software for Linux Best Free Linux Speech Recognition Tools
Friday, March 29, 2019

Speech recognition software for Linux Best Free Linux Speech Recognition Tools

VoxForge

Home
·

Read
·

Listen
·

Forums
·

Develop
·

Downloads
·

About

Languages

Български

Catalan

Deutsch

Ελληνικά

Español

Français

עברית

Hrvatski

Italiano

Nederlands

فارسی

Português

Русский

Shqip

Türkçe

Українська

中文

VoxForge was set up to collect transcribed speech for use with Free and  Open Source
Speech Recognition Engines (on Linux, Windows and Mac). 

We will
make available all submitted audio files under the GPL license, and then ‘compile’ them into acoustic models for use
with Open Source speech recognition engines such as CMU Sphinx , ISIP , Julius ( github ) and HTK (note: HTK has distribution restrictions).

Why Do We Need Free GPL Speech Audio?

Most acoustic models used by ‘Open Source’ speech recognition
(or Speech-to-Text) engines are closed source .  They do not give you access to the speech
audio and transcriptions (i.e. the speech
corpus ) used to create the acoustic model. 

The reason for this is that Free and Open Source (‘FOSS’) projects are
required to purchase large speech
corpora with restrictive licensing.  Although there are a
few instances of small FOSS speech corpora that could be used to
create acoustic models, the vast majority of corpora (especially
large corpora best suited to building good acoustic models) must be
purchased under restrictive licenses.

How Can You Help?

Record yourself reading some text and upload your recordings to VoxForge.

    Other Options .

    News

    Search

    New VoxForge Language: Farsi (Persian)
    By kmaclean

    5/26/2015

    I would like to thank mrt_doulaty for the Farsi (Persian) translations of the VoxForge web site and speech submission applet.

    Open Speech Data Corpus for German
    By kmaclean

    4/28/2015

    VoxForge is now mirroring the LT and the Teleccoperation group Open Speech Data Corpus for German with 35 hours of speech from about 180 speakers.

    /uploads/za/Uq/zaUqYAmr08VbnC1LqpmYaA/computer-microphone.jpg

    Submit Speech Using Computer

    /uploads/H0/Ky/H0Kyd_AckJBMbk_qX8qBbQ/download.jpg

    Download QuickStart

     
    Unless otherwise indicated, © 2006-2018 VoxForge; Legal: Terms and Conditions

    Speech recognition software for Linux

    From Wikipedia, the free encyclopedia

    Jump to navigation
    Jump to search

    This article has multiple issues. Please help improve it or discuss these issues on the talk page . ( Learn how and when to remove these template messages )

    This article needs attention from an expert on the subject. Please add a reason or a talk parameter to this template to explain the issue with the article.
    When placing this tag, consider associating this request with a WikiProject .
    (March 2008)
    This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts , without removing the technical details. (February 2012) ( Learn how and when to remove this template message )
    This article needs additional citations for verification . Please help improve this article by adding citations to reliable sources . Unsourced material may be challenged and removed. (February 2012) ( Learn how and when to remove this template message )
    This article’s use of external links may not follow Wikipedia’s policies or guidelines. Please improve this article by removing excessive or inappropriate external links, and converting useful links where appropriate into footnote references . (February 2012) ( Learn how and when to remove this template message )

    ( Learn how and when to remove this template message )

    There are currently several speech recognition software packages for Linux . Some of them are free and open source software while others are proprietary. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for sending operational commands to a computer.

    Contents

    • 1 Native Linux speech recognition
      • 1.1 History
      • 1.2 Current development status
        • 1.2.1 Crowdsourcing of speech samples
      • 1.3 Speech recognition concept
        • 1.3.1 Speech Recognition in Browser
        • 1.3.2 Free speech recognition engines
        • 1.3.3 Proprietary speech recognition engines
    • 2 Voice control and keyboard shortcuts
    • 3 Running Windows speech recognition software with Linux
      • 3.1 Using a compatibility layer
      • 3.2 Using virtualized Windows
    • 4 See also
    • 5 References
    • 6 External links

    Native Linux speech recognition[ edit ]

    History[ edit ]

    In the late 1990s, a Linux version of ViaVoice (created by IBM ) was made available to users for no charge. However, the free SDK was removed by the developer in 2002.

    Current development status[ edit ]

    Recently, there has been a push to get a high-quality native Linux speech recognition engine developed. As a result, numerous projects dedicated to creating Linux speech recognition solutions were established, such as Mycroft . Mycroft is similar to Microsoft’s Cortana , but open source.

    Crowdsourcing of speech samples[ edit ]

    It is essential to compile a speech corpus to produce acoustic models for speech recognition projects. VoxForge is a free speech corpus and acoustic models repository that was built with the aim of collect transcribed speech to be used in speech recognition projects. VoxForge accepts crowdsourced speech samples and corrections of recognized speech sequences. It is licensed under the GPL .

    Speech recognition concept[ edit ]

    The first step is to begin recording an audio stream on a Linux machine. The user has two main processing options:

    • (DSR) Discrete Speech Recognition – process the voice recognition entirely on local machine. This refers to self-contained systems in which all aspects of SR (Speech Recognition) are performed entirely within the user’s computer. This is becoming critical for protection of IP (Intellectual Property) and avoiding unwanted surveillance (2018).
    • (Remote) Server-based SR which transmits the speech file to a remote server for converting the audio file into a text string. Due to recent Cloud storage schemes and data mining, this method more easily allows surveillance, theft of IP and introduction of malware.

    FYI, The second option (remote) was previously used on smart phones as they did not possess sufficient performance, disk space or RAM to process speech recognition on-board the phone.These limitations have largely been overcome although server-based SR on mobile devices remains universal.

    Speech Recognition in Browser[ edit ]

    Discrete Speech Recognition can be performed within a web browser and works well with supported browsers. Remote SR does not require installation of software on the desktop computer or mobile device as it is primarily a server-based system with the inherent security issues noted above.

    • (Remote): https://dictation.io (use Chromium /Chrome) The dictation service records an audio track of the user via the web browser. In turn, dictation.io uses the Google API for speech recognition. Within Google Docs, Google voice typing works within the Chrome browser, regardless of operating system as it is a server-based system.
    • (DSR): There are solutions that work on the client only, without sending data to servers, e.g. pocketsphinx.js .

    Free speech recognition engines[ edit ]

    The following is a list of current projects dedicated to implementing speech recognition in Linux, as well as major native solutions. These are not end-user applications. These are programming libraries that a programmer may use to develop an end-user application.

    • CMU Sphinx is a general term to describe a group of speech recognition systems developed at Carnegie Mellon University.
    • Julius is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers.
    • Kaldi a toolkit for speech recognition provided under the Apache licence.
    • Mozilla DeepSpeech is developing an open source Speech-To-Text engine based of Baidu’s deep speech research paper. It is intended for end user usage in the coming months. [1]

    Possibly active projects:

    This list is incomplete ; you can help by expanding it .
    • Lera (Large Vocabulary Speech Recognition) based on Simon and CMU Sphinx for KDE [2] .
    • Speechpad.pw [3] uses Google’s speech recognition engine and Chrome native messaging API to provide direct speech input in Linux.
    • Speech [4] uses Google’s speech recognition engine to support dictation in many different languages.
    • Speech Control : is a Qt-based application that uses CMU Sphinx ‘s tools like SphinxTrain and PocketSphinx to provide speech recognition utilities like desktop control, dictation and transcribing to the Linux desktop.
    • Platypus [5] is an open source shim that will allow the proprietary Dragon NaturallySpeaking running under Wine to work with any Linux X11 application.
    • FreeSpeech, [6] from the developer of Platypus, is a free and open source cross-platform desktop application for GTK that uses CMU Sphinx ‘s tools to provide voice dictation, language learning, and editing in the style of Dragon NaturallySpeaking.
    • Vedics [7] (Voice Enabled Desktop Interaction and Control System) is a speech assistant for GNOME Environment
    • GnomeVoiceControl [8] is a dialogue system to control the GNOME Desktop that was developed in the Google Summer of Code in 2007.
    • NatI [9] is a multi-language voice control system written in Python
    • SphinxKeys [10] allows the user to type keyboard keys and mouse clicks by speaking into their microphone.
    • VoxForge is a free speech corpus and acoustic model repository for open source speech recognition engines.
    • Simon [11] aims at being extremely flexible to compensate dialects or even speech impairments. It uses either HTK / Julius or CMU SPHINX, works on Windows and Linux and supports training.
    • Speeral Speeral a group of speech recognition tools developed at University of Avignon
    • Jasper project https://jasperproject.github.io/ Jasper is an open source platform for developing always-on, voice-controlled applications. This is an embedded Raspberry Pi front-end for CMU Sphinx or Julius

    It is possible for developers to create Linux speech recognition software by using existing packages derived from open-source projects.

    Inactive projects:

    • CVoiceControl [12] is a KDE and X Window independent version of its predecessor KVoiceControl. The owner ceased development in alpha stage of development.
    • Open Mind Speech, [13] a part of the Open Mind Initiative, [14] aims to develop free (GPL) speech recognition tools and applications, as well as collect speech data. Production ended in 2000.
    • PerlBox [15] is a perl based control and speech output. Development ended in early stages in 2004.
    • Xvoice [16] A user application to provide dictation and command control to any X application. Development ended in 2009 during early project testing. (requires proprietary ViaVoice to function)

    Proprietary speech recognition engines[ edit ]

    • Verbio ASR [17] is a commercial speech recognition server for Linux and windows platforms.
    • DynaSpeak, [18] from SRI International , (speaker-independent speech recognition software development kit that scales from small- to large-scale systems, for use in commercial, consumer, and military applications)
    • Janus Recognition Toolkit (JRTk) [19] is a closed source speech recognition toolkit mainly targeted at Linux developed by the Interactive Systems Laboratories developed at Carnegie Mellon University and Karlsruhe Institute of Technology for which commercial and research licenses are available.
    • LumenVox Speech Engine is a commercial library for Linux and Windows for inclusion in other software. It has been integrated into the Asterisk private branch exchange system. [20]
    • VoxSigma is a speech recognition software suite developed by Vocapia Research . [21]

    Voice control and keyboard shortcuts[ edit ]

    Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for sending operational commands to a computer or appliance. Voice control typically requires a much smaller vocabulary and thus is much easier to implement.

    Simple software combined with keyboard shortcuts , have the earliest potential for practically accurate voice control in Linux.

    Running Windows speech recognition software with Linux[ edit ]

    Using a compatibility layer[ edit ]

    It is possible to use programs such as Dragon NaturallySpeaking in Linux, by utilizing Wine , though some problems may arise, depending on which version is used. [22]

    Using virtualized Windows[ edit ]

    It is also possible to use Windows speech recognition software under Linux. Using no-cost virtualization software, it is possible to run Windows and NaturallySpeaking under Linux. VMware Server or VirtualBox support copy and paste to/from a virtual machine, making dictated text easily transferable to/from the virtual machine.

    See also[ edit ]

    • Speech recognition
    • Speech interface guideline
    • List of speech recognition software

    References[ edit ]

    1. ^ A TensorFlow implementation of Baidu’s DeepSpeech architecture , Mozilla, 2017-12-05, retrieved 2017-12-05

    2. ^ Lera KDE git repository – (2015) – https://cgit.kde.org/scratch/grasch/lera.git/ Retrieved 2017-07-25.
    3. ^ “Speech to text online, Windows and Linux integration” . speechpad.pw.
    4. ^ “andre-luiz-dos-santos/speech-app” . GitHub. 2018-07-12.
    5. ^ “The Nerd Show – Platypus” . thenerdshow.com.
    6. ^ “FreeSpeech Realtime Speech Recognition and Dictation” . TheNerdShow.com.
    7. ^ “Vedics” .
    8. ^ “Projects/GnomeVoiceControl – GNOME Wiki!” . wiki.gnome.org.
    9. ^ “rcorcs/NatI” . GitHub. 2018-09-24.
    10. ^ “worden341/sphinxkeys” . GitHub. 2016-07-11.
    11. ^ Simon KDE – Main Developer until 2015 Peter Grasch – (accessed 2017/09/04) – http://simon.kde.org/ ]
    12. ^ Kiecza, Daniel. “Linux” . www.kiecza.net.
    13. ^ “Open Mind Speech – Free Speech Recognition for Linux” . freespeech.sourceforge.net.
    14. ^ Open Mind Initiative Archived 2003-08-05 at the Wayback Machine .
    15. ^ “Perlbox.org Linux Speech Control and Voice Recognition” . perlbox.sourceforge.net.
    16. ^ “Xvoice” . xvoice.sourceforge.net.
    17. ^ “:: Verbio :” . www.verbio.com.
    18. ^ “SRI Speech: Home” . www.speechatsri.com.
    19. ^ (IAR), Roedder, Margit (26 January 2018). “KIT – Janus Recognition Toolkit” . isl.ira.uka.de.
    20. ^ “Speech Recognition Software – LumenVox” . Retrieved 2013-02-28.
    21. ^ vocapia.com. “Speech to Text Software & Service – Speech Recognition Software” . www.vocapia.com.
    22. ^ “WineHQ – Dragon Naturally Speaking” . appdb.winehq.org.

    External links[ edit ]

    • Speech Synthesis & Analysis Software
    • Gnome Voice Control (an incomplete speech recognition solution for GNOME) – Demonstration
    • Speech Recognition Software – list of speech recognition projects and solutions in Linux
    • Accessibility / SpeechRecognition – Ubuntu Help
    • Alternatives to Nuance Dragon NaturallySpeaking

    Retrieved from ” https://en.wikipedia.org/w/index.php?title=Speech_recognition_software_for_Linux&oldid=863365971 ”
    Categories :

    • Linux audio video-related software
    • Speech recognition
    • Ergonomics
    • GNOME Accessibility
    Hidden categories:

    • Webarchive template wayback links
    • Articles needing expert attention with no reason or talk parameter
    • Articles needing unspecified expert attention
    • Articles needing expert attention from March 2008
    • All articles needing expert attention
    • Wikipedia articles that are too technical from February 2012
    • All articles that are too technical
    • Articles needing expert attention from February 2012
    • Articles needing additional references from February 2012
    • All articles needing additional references
    • Wikipedia external links cleanup from February 2012
    • Wikipedia spam cleanup from February 2012
    • Articles with multiple maintenance issues
    • Incomplete lists from April 2017

    Navigation menu

    Personal tools

    • Not logged in
    • Talk
    • Contributions
    • Create account
    • Log in

    Namespaces

    • Article
    • Talk

    Variants

      Views

      • Read
      • Edit
      • View history

      More


        Navigation

        • Main page
        • Contents
        • Featured content
        • Current events
        • Random article
        • Donate to Wikipedia
        • Wikipedia store

        Interaction

        • Help
        • About Wikipedia
        • Community portal
        • Recent changes
        • Contact page

        Tools

        • What links here
        • Related changes
        • Upload file
        • Special pages
        • Permanent link
        • Page information
        • Wikidata item
        • Cite this page

        Print/export

        • Create a book
        • Download as PDF
        • Printable version

        Languages

        • Español
        Edit links

        • This page was last edited on 10 October 2018, at 09:53 (UTC).
        • Text is available under the Creative Commons Attribution-ShareAlike License ;
          additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy . Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. , a non-profit organization.
        • Privacy policy
        • About Wikipedia
        • Disclaimers
        • Contact Wikipedia
        • Developers
        • Cookie statement
        • Mobile view
        • Wikimedia Foundation
        • Powered by MediaWiki