Find Jobs
Hire Freelancers

software development: transcribe speech to text using public APIs -- 2

$250-750 USD

Скасований
Опублікований over 7 years ago

$250-750 USD

Оплачується при отриманні
1. Task This software is to improve human transcription efficiency and accuracy by only asking people to listen to and work on (transcribe) audio clips where multiple leading speech recognition engines disagree with each other. You are asked to write a program that: - takes an audio file as input, - chops it into clips at sentence boundaries, - sends these audio clips one by one to three different public speech recognition services, - saves audio clips, together with their text transcribed by the above services, into MySQL database: timestamp, length, audio clip, Google, Baidu, iFlyTek, flag where: timestamp (4-byte time): audio clip starting time in original audio file length (4-byte integer): audio clip length in millisecond audio clip (binary): 16-bit 16KHz single channel PCM Google transcription (text): utf-8 Baidu transcription (text): utf-8 iFlyTek transcription (text): utf-8 flag (integer): 0 if all three transcriptions are the same, 1 if two matches, 2 if all different. 2. Audio Source The audio could be in mp3/m4a/aac/ogg/wma format. It's extracted from youtube video. Our target is educational lectures. One example is this youtube video: [login to view URL] you can extract audio with [login to view URL] the downloadable mp3 result is at [login to view URL] You can use this for any YouTube content. 3. Audio Segmentation If you view audio file with a tool (many out there), you will visually see separation between silences and voices. Some silences are merely word boundaries or even just syllable boundaries. The rule we ask to implement is, either the silence is enough long, or the "sentence" is already 7 seconds long. In the latter case we need to chop at a locally longest silence gap. I see this sentence boundary identification as the most challenging one to those not familiar with audio signal processing. So I outline the logic above. Still, the next question is, how to really calculate "silence"?! Please follow up with methods listed in this page: [login to view URL] As one of this project acceptance criteria, we will randomly (use a random number generator on the Internet) select 50 audio clips, listen to them, and confirm the sentence boundary error rate is less than 5%. 4. Speech Recognition The three speech recognition engines are: Google: [login to view URL] Baidu: A python wrap for Baidu Yuyin API [login to view URL] [login to view URL] iflytek (Xunfei): Integrate iflytek SDK to Implement Chinese Voice Recognition in AOSP [login to view URL] Note, it is required to integrate with all above three speech recognition engines. That is, you need to do three integrations, each with its own complexities, such as applying for a free account and receiving tokens/keys. For both Baidu and iFlyTek, you are encouraged to use Google Translate, as lots of content are in Chinese. Both Google and Baidu are simple REST APIs, which allows you to implement in essentially any platform and language. But iFlyTek API is really an SDK. The best example I found is the above given Android version. So put together your only choice is Android application. 5. Implementation We are open to suggestions. But given the above, we expect a pure Android APK implementation. I will first push/copy several extracted/converted audio files into an Android phone or tablet, and then run your Android APK and get results in corresponding set of files, either in MySQL database or simply CSV format. I will then pull/copy these files back to my computer. Additionally, on computer you shall provid
ID проекту: 11407167

Про проект

12 пропозицій(-ї)
Дистанційний проект
Активність 8 yrs ago

Хочете заробити?

Переваги подання заявок на Freelancer

Вкажіть свій бюджет та терміни
Отримайте гроші за свою роботу
Опишіть свою пропозицію
Реєстрація та подання заявок у проекти є безкоштовними
Доручений:
Аватарка користувача
This looks like an extensive project, but I think can handle it fast and efficiently. I have worked with text-to-speech before, but I am not sure how relax those public apis will be. There should definitely be quota for such public tools, but I guess for a project like this you will take care of that stuff. I am also experienced in signal processing, and can handle the signal specific things you have described. I can do it in pure Java, but I am not sure about being Android specific as I have lessspecific knowledge in that domain.
$250 USD за 10 дні(-в)
0,0 (1 відгук)
0,0
0,0
12 фрілансерів(-и) готові виконати цю роботу у середньому за $569 USD
Аватарка користувача
Dear Client! I have read your project description in carefully. I am honest and hard working android developer. I have good experience of mobile app developing with more than 5+ years. I can finish your project as your requirement and timeline on your budget. If you are still hesitating to select me, please check my profile and reviews and ratings. https://www.freelancer.com/u/zhandong0217.html Hope to work with you for a long term. Regards.
$300 USD за 10 дні(-в)
5,0 (109 відгуки(-ів))
8,4
8,4
Аватарка користувача
I want to discuss this project with you further, let me know the best suitable time for you to schedule the meeting, Feel free to message me at any time, i used to be online 14 hrs in a day on this website so probably you will get a quick response from my end.
$773 USD за 20 дні(-в)
5,0 (11 відгуки(-ів))
6,5
6,5
Аватарка користувача
Hello, I'm a fulltime freelancer, and can take care of your project for the price and timeframe posted. I provide only high-quality, on-time results, and keep constant communications and updates during the project Please contact me if any question or want to discuss any detail before start the project Thanks
$500 USD за 10 дні(-в)
4,9 (6 відгуки(-ів))
4,7
4,7
Аватарка користувача
hi i read through you whole project description (it was truncated). it is clearly stated. the most difficult part is i think audio segmentation we have to implement algorithm according to material you provided, with required accurate. But it is doable(with extended duration). I am highly skilled in android development. I can do this for you. please reply for further discussion.
$750 USD за 30 дні(-в)
4,8 (4 відгуки(-ів))
4,4
4,4
Аватарка користувача
Hey, I reviewed the description you provided and would like to inform you that I have previously worked on the CMU Sphinx voice to text API to create a portal that would allow the users to speak into it and convert it into text. This being used in tandem with Stanford POS tagger and some other algorithms were used to determine the pronunciation accuracy of learners. So, I am certainly aware of how the APIs you mentioned work and can work on creating the setup that you need. However, the cost of such a setup would be much more than the budget you have specified. Would you be open to considering it? Please let me know. Thanks and Regards, Rishi
$555 USD за 10 дні(-в)
4,6 (1 відгук)
3,5
3,5
Аватарка користувача
I've used google API's for things like advertising or augmented reality maps. The are a joy to work with and I will be able to complete this project well within your budget. I require a design document milestone with a token amount paid for two reasons. I need to make sure that we 100% agree on what the product is, and two i need to see that you are capable of paying me. My design documentation is thorough and useful even if you choose to use a different developer to complete the project.
$277 USD за 3 дні(-в)
0,0 (0 відгуки(-ів))
0,0
0,0

Про клієнта

Прапор UNITED STATES
Cupertino, United States
5,0
1
Спосіб оплати верифіковано
На сайті з серп. 29, 2016

Верифікація клієнта

Дякуємо! Ми надіслали на вашу електронну пошту посилання для отримання безкоштовного кредиту.
Під час надсилання електронного листа сталася помилка. Будь ласка, спробуйте ще раз.
Зареєстрованих користувачів Загальна кількість опублікованих робіт
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Завантажуємо для перегляду
Дозвіл на визначення геолокації надано.
Ваш сеанс входу закінчився, і сеанс було закрито. Будь ласка, увійдіть знову.