Jason Flaks

Redmond, Washington, United States
2K followers 500+ connections

View mutual connections with Jason

Welcome back

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

or

New to LinkedIn? Join now

Join to view profile

About

Jason is presently serving as the Co-Founder and Chief Technology Officer at Xembly…

Articles by Jason

How to Use Nondeterministic LLMs for Building Robust Deterministic Applications

By Jason Flaks

May 7, 2024
The Astonishing Reasons Why Your LLM is a Bad Notetaker

By Jason Flaks

Mar 29, 2024
Introducing Task-Oriented Multiparty Conversational AI: Inviting AI to the Party

By Jason Flaks

Feb 26, 2024

Activity

“You really have to ground your AI systems for enterprise use cases, imagine a nurse in a hospital system using AI to make some decision about…

“You really have to ground your AI systems for enterprise use cases, imagine a nurse in a hospital system using AI to make some decision about…

Liked by Jason Flaks
I’ve come to realize that #LLMs are the MP3’s of machine learning models for discriminative #NLP tasks. They aren’t as good, only experts can tell…

I’ve come to realize that #LLMs are the MP3’s of machine learning models for discriminative #NLP tasks. They aren’t as good, only experts can tell…

Posted by Jason Flaks
Today marks my first day at my new job with Sonos, Inc. here in Seattle! I'll be working as a senior SDET on the engineering team designing test…

Today marks my first day at my new job with Sonos, Inc. here in Seattle! I'll be working as a senior SDET on the engineering team designing test…

Liked by Jason Flaks

Join now to see all activity

Publications

Audio Spectrogram Factorization for Classification of Telephony Signals below the Auditory Threshold

arxiv.org November 9, 2018
Traffic Pumping attacks are a form of high-volume SPAM that target telephone networks, defraud customers and squander telephony resources. One type of call in these attacks is characterized by very low-amplitude signal levels, notably below the auditory threshold. We propose a technique to classify so-called "dead air" or "silent" SPAM calls based on features derived from factorizing the caller audio spectrogram. We describe the algorithms for feature extraction and classification as well as…

Traffic Pumping attacks are a form of high-volume SPAM that target telephone networks, defraud customers and squander telephony resources. One type of call in these attacks is characterized by very low-amplitude signal levels, notably below the auditory threshold. We propose a technique to classify so-called "dead air" or "silent" SPAM calls based on features derived from factorizing the caller audio spectrogram. We describe the algorithms for feature extraction and classification as well as our data collection methods and production performance on millions of calls per week.

Other authors
See publication
The Marchex 2018 English Conversational Telephone Speech Recognition System

arxiv.org November 5, 2018
In this paper, we describe recent performance improvements to the production Marchex speech recognition system for our spontaneous customer-to-business telephone conversations. In our previous work, we focused on in-domain language and acoustic model training. In this work we employ state-of-the-art semi-supervised lattice-free maximum mutual information (LF-MMI) training process which can supervise over full lattices from unlabeled audio. On Marchex English (ME), a modern evaluation set of…

In this paper, we describe recent performance improvements to the production Marchex speech recognition system for our spontaneous customer-to-business telephone conversations. In our previous work, we focused on in-domain language and acoustic model training. In this work we employ state-of-the-art semi-supervised lattice-free maximum mutual information (LF-MMI) training process which can supervise over full lattices from unlabeled audio. On Marchex English (ME), a modern evaluation set of conversational North American English, we observed a 3.3% (3.2% for agent, 3.6% for caller) reduction in absolute word error rate (WER) with 3x faster decoding speed over the performance of the 2017 production system. We expect this improvement boost Marchex Call Analytics system performance especially for natural language processing pipeline.

Other authors
See publication
Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

arXiv.org May 26, 2017
For conversational large-vocabulary continuous speech recognition (LVCSR) tasks, up to about two thousand hours of audio is commonly used to train state of the art models. Collection of labeled conversational audio however, is prohibitively expensive, laborious and error-prone. Furthermore, academic corpora like Fisher English (2004) or Switchboard (1992) are inadequate to train models with sufficient accuracy in the unbounded space of conversational speech. These corpora are also timeworn due…

For conversational large-vocabulary continuous speech recognition (LVCSR) tasks, up to about two thousand hours of audio is commonly used to train state of the art models. Collection of labeled conversational audio however, is prohibitively expensive, laborious and error-prone. Furthermore, academic corpora like Fisher English (2004) or Switchboard (1992) are inadequate to train models with sufficient accuracy in the unbounded space of conversational speech. These corpora are also timeworn due to dated acoustic telephony features and the rapid advancement of colloquial vocabulary and idiomatic speech over the last decades. Utilizing the colossal scale of our unlabeled telephony dataset, we propose a technique to construct a modern, high quality conversational speech training corpus on the order of hundreds of millions of utterances (or tens of thousands of hours) for both acoustic and language model training. We describe the data collection, selection and training, evaluating the results of our updated speech recognition system on a test corpus of 7K manually transcribed utterances. We show relative word error rate (WER) reductions of {35%, 19%} on {agent, caller} utterances over our seed model and 5% absolute WER improvements over IBM Watson STT on this conversational speech task.

Other authors
See publication
The New Wave of Robocallers Costing Businesses Billions

Marchex November 18, 2015

Marchex examined more than 300 million calls placed to businesses in 2014 through its Marchex Call Analytics platform and found that call centers across the U.S. receive more than 100 million spam calls a year. That's a cost of about $1 billion. Is your business protected?

See publication
Quality of Service (QoS) for Streaming Audio Over Wireless LANs

Audio Engineering Society Mar 2001

Streaming audio in computer networks requires a level of quality of service above and beyond the "best-effort" service that is typically provided. The need for enhanced QoS is even greater in wireless networks where issues of interference, security, roaming, and bandwidth constraints are added. This paper discusses QoS issues important for providing high quality streaming audio over wireless networks. In addition an overview of future QoS enhancements in IEEE 802.11e and HomeRF 2.0 is provided.

See publication
Pseudo-Continous Source Independent Load Monitoring and Applications Through Loudspeaker Impedance Analysis. Flaks, Jason

Master's Research Project at the University of Miami April 1, 1999

Loudspeaker measurements in public facilities can often be difficult to perform due to the intrusive nature of the test signals. This paper proposes a pseudo-continuous source-independent technique for measuring loudspeaker impedance, which allows for the use of program music as a test signal. Using Fourier analysis to obtain frequency domain data of both voltage and current, an accurate plot of impedance versus frequency can be made. Averaging, smoothing, and signal thresholding are used to…

Loudspeaker measurements in public facilities can often be difficult to perform due to the intrusive nature of the test signals. This paper proposes a pseudo-continuous source-independent technique for measuring loudspeaker impedance, which allows for the use of program music as a test signal. Using Fourier analysis to obtain frequency domain data of both voltage and current, an accurate plot of impedance versus frequency can be made. Averaging, smoothing, and signal thresholding are used to further increase amplitude accuracy. The impedance plot compared to a preexisting reference can alert the user to possible loudspeaker problems that can arise over time. The validity of making such measurements is given by loudspeaker impedance analysis using a derived impedance equation from a loudspeaker equivalent circuit and experimental techniques.

See publication
Speech de-esser using adaptive filters

Intl. Conf. Signal Proc. & Tech. ICSPAT, 1999 1999

See publication
Global Musical Instrument Communication Standard (GMICS(tm)): An Integrated Digital Audio and Control Communication Specification for Instruments

Audio Engineering Society Preprints

This paper provides an in-depth look at the Global Musical Instrument Communication Standard (GMICS-) from the electrical, physical, data link, and control perspective. Using the 100-megabit Ethernet physical layer, and newly defined data link and control protocols, GMICS provides a low latency digital audio and control highway appropriate for instruments, especially in live performance situations where delay is intolerable

See publication
RTP Payload Format for AC-3 Audio - RFC 4184

IETF

This document describes an RTP payload format for transporting audio
data using the AC-3 audio compression standard. AC-3 is a high
quality, multichannel audio coding system that is used for United
States HDTV, DVD, cable television, satellite television and other
media. The RTP payload format presented in this document includes
support for data fragmentation.

See publication

Patents

Fusing virtual content into real content

Issued November 11, 2014 USPTO 08884984

A system that includes a head mounted display device and a processing unit connected to the head mounted display device is used to fuse virtual content into real content. In one embodiment, the processing unit is in communication with a hub computing device. The system creates a volumetric model of a space, segments the model into objects, identifies one or more of the objects including a first object, and displays a virtual image over the first object on a display (of the head mounted display)…

A system that includes a head mounted display device and a processing unit connected to the head mounted display device is used to fuse virtual content into real content. In one embodiment, the processing unit is in communication with a hub computing device. The system creates a volumetric model of a space, segments the model into objects, identifies one or more of the objects including a first object, and displays a virtual image over the first object on a display (of the head mounted display) that allows actual direct viewing of at least a portion of the space through the display.

See patent
Sound Source Separation Using Spatial Filtering and Regularization Phases

Issued November 12, 2013 United States 8,583,428

Described is a multiple phase process/system that combines spatial filtering with regularization to separate sound from different sources such as the speech of two different speakers. In a first phase, frequency domain signals corresponding to the sensed sounds are processed into separated spatially filtered signals including by inputting the signals into a plurality of beamformers (which may include nullformers) followed by nonlinear spatial filters. In a regularization phase, the separated…

Described is a multiple phase process/system that combines spatial filtering with regularization to separate sound from different sources such as the speech of two different speakers. In a first phase, frequency domain signals corresponding to the sensed sounds are processed into separated spatially filtered signals including by inputting the signals into a plurality of beamformers (which may include nullformers) followed by nonlinear spatial filters. In a regularization phase, the separated spatially filtered signals are input into an independent component analysis mechanism that is configured with multi-tap filters, followed by secondary nonlinear spatial filters. Separated audio signals are the provided via an inverse-transform.

See patent
COMPOUND GESTURE-SPEECH COMMANDS

Issued October 23, 2012 United States 8,296,151

A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user's body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a…

A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user's body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a gesture is recognized, a limited set of voice commands associated with the recognized gesture are loaded for use. Further, additional sets of voice commands may be structured in a hierarchical manner such that speaking a voice command from one set of voice commands leads to the system loading a next set of voice commands

See patent
Strongly typed tags

Issued October 18, 2011 United States 8,041,738

In one or more embodiments, a tag is provided and includes a property that associates a strongly typed variable with the tag. Strongly typed variables can include any suitable types. For example, in at least some embodiments, the strongly typed variable is a people type that allows the tag to be associated with an individual person or group of people by virtue of a unique identification that is associated with the person or group. Strongly typed tags can then serve as a foundation upon which…

In one or more embodiments, a tag is provided and includes a property that associates a strongly typed variable with the tag. Strongly typed variables can include any suitable types. For example, in at least some embodiments, the strongly typed variable is a people type that allows the tag to be associated with an individual person or group of people by virtue of a unique identification that is associated with the person or group. Strongly typed tags can then serve as a foundation upon which various other types of information and services can be provided to enhance the user experience.

See patent
ADAPTIVE AMBIENT SOUND SUPPRESSION AND SPEECH TRACKING

Issued July 21, 2011 United States 8,219,394

A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a…

A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a multi-channel speaker signal, generate a monophonic approximation signal of the multi-channel speaker signal, apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal, generate a combined directionally-adaptive sound signal from a combination of each digital sound signal by a combination of time-invariant and adaptive beamforming techniques, and apply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal.

See patent
Integrating security by obscurity with access control lists

Issued July 19, 2011 United States 7,984,512

Aspects of the subject matter described herein relate to providing and restricting access to content. In aspects, information (e.g., a URL) that identifies content and a user is provided to a user. In conjunction with providing the information to a user, a data structure (e.g., an access control list) is updated to indicate that the user has access to the content. The user may use the information to access the content and/or may send this information to other users. The other users may use the…

Aspects of the subject matter described herein relate to providing and restricting access to content. In aspects, information (e.g., a URL) that identifies content and a user is provided to a user. In conjunction with providing the information to a user, a data structure (e.g., an access control list) is updated to indicate that the user has access to the content. The user may use the information to access the content and/or may send this information to other users. The other users may use the information (e.g., by pasting it into a browser) to access the content and may be added to the data structure so that they may subsequently access the content without the use of the information. Access to the content via using the information may be subsequently revoked.

See patent
Speaker Identification

Issued April 25, 2011 United States 8,719,019

Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies…

Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.

See patent
Strongly typed tags

Issued March 22, 2010 United States 7,912,860

In one or more embodiments, a tag is provided and includes a property that associates a strongly typed variable with the tag. Strongly typed variables can include any suitable types. For example, in at least some embodiments, the strongly typed variable is a people type that allows the tag to be associated with an individual person or group of people by virtue of a unique identification that is associated with the person or group. Strongly typed tags can then serve as a foundation upon which…

In one or more embodiments, a tag is provided and includes a property that associates a strongly typed variable with the tag. Strongly typed variables can include any suitable types. For example, in at least some embodiments, the strongly typed variable is a people type that allows the tag to be associated with an individual person or group of people by virtue of a unique identification that is associated with the person or group. Strongly typed tags can then serve as a foundation upon which various other types of information and services can be provided to enhance the user experience.

See patent
Remotely accessing protected files via streaming

Issued March 16, 2010 United States 7,681,238

A source device permits a user of a remote device to access a protected file on the source device when the user of the remote device has a right to access the protected file. The user locates the protected file on the source device using the remote device and accesses the protected file using a media player on the remote device. The media player constructs a path by which the source device streams the protected file. The remote device responds to an authentication request from the source device…

A source device permits a user of a remote device to access a protected file on the source device when the user of the remote device has a right to access the protected file. The user locates the protected file on the source device using the remote device and accesses the protected file using a media player on the remote device. The media player constructs a path by which the source device streams the protected file. The remote device responds to an authentication request from the source device that the user of the remote device has a right to access the protected file. The user is authenticated to confirm that the user of the remote device has a right to access the protected file. The protected file is streamed to the remote device via a path constructed by the remote device.

See patent
Routing of resource information in a network

Issued February 23, 2010 United States 7,668,939

A media server in a Universal Plug and Play (UPnP) network includes a resource sharing service to govern the distribution of resource information regarding resources to rendering devices. In one case, the resource sharing service consults a criterion to determine whether an identified network device is authorized to receive resource information. In another case, the resource sharing service consults another criterion to determine whether a specified individual associated with the media server…

A media server in a Universal Plug and Play (UPnP) network includes a resource sharing service to govern the distribution of resource information regarding resources to rendering devices. In one case, the resource sharing service consults a criterion to determine whether an identified network device is authorized to receive resource information. In another case, the resource sharing service consults another criterion to determine whether a specified individual associated with the media server must consent to the transfer of the resource information in order for the transfer to occur. The resource information may include resource metadata that describes high level information regarding resources, as well as resource content. The media server includes various user interface presentations that allow the media server user to specify shared resources and distribution criteria.

See patent
Speech recognition analysis via identification information

Issued January 22, 2010 United States 8,676,581

Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence…

Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.

See patent
Techniques for limiting network access

Issued January 12, 2010 United States 7,647,385

A network architecture in a Universal Plug and Play (UPnP) network includes a resource sharing service to govern the distribution of resource information from a server to a recipient entity (such as rendering device or a control point). The network architecture includes one or more of the following provisions: (a) setting the server to operate in a predetermined private address range or an Auto IP range; (b) operating one or more parts of the network architecture on the same subnet; (c) setting…

A network architecture in a Universal Plug and Play (UPnP) network includes a resource sharing service to govern the distribution of resource information from a server to a recipient entity (such as rendering device or a control point). The network architecture includes one or more of the following provisions: (a) setting the server to operate in a predetermined private address range or an Auto IP range; (b) operating one or more parts of the network architecture on the same subnet; (c) setting a time to live (TTL) parameter associated with messages transmitted by the server to a predetermined number; (d) setting a number of permitted recipient entities to a predetermined number; (e) setting a number of permitted concurrent content distribution sessions to a predetermined session number; (f) granting access to a recipient entity on condition that the recipient entity has generated a message that conforms to the UPnP protocol; and (g) retiring a URL used to identify a location of a resource provided by the server after a predetermined amount of time.

See patent
Server architecture for network resource information routing

Issued June 30, 2009 United States 7,555,543

A media server in a Universal Plug and Play (UPnP) network includes a resource sharing service to govern the distribution of media resource information to rendering devices. The media server includes: a media service module operating in a clamped down user context (e.g., a local service user context) and configured to share resource information over the network; a supplemental module operating in a local system user context and configured to assist the media service module in sharing resource…

A media server in a Universal Plug and Play (UPnP) network includes a resource sharing service to govern the distribution of media resource information to rendering devices. The media server includes: a media service module operating in a clamped down user context (e.g., a local service user context) and configured to share resource information over the network; a supplemental module operating in a local system user context and configured to assist the media service module in sharing resource information over the network; and a control panel module operating in a logged on user context and configured to interact with a user via a user interface display. The local system user context provides a higher level of access to media server resources compared to the clamped down user context. The media server also provides fast user switching (FUS) functionality that allows multiple users to have respective instances of the control panel module pending at the same time. Further, the media server includes a mechanism to prevent rogue applications from masquerading as the control panel module and thereby gaining unauthorized access to the media service module.

See patent
Universal digital media communications and control system and method

Issued September 2, 2008 United States 7,420,112

A digital media communications and control system includes a plurality of audio devices each of which includes a device interface module for communication of digital media data and control data from at least one of the devices to at least one other of the devices. A universal data link is operatively connected to each of the device interface modules. The device interface modules and universal data links are operative in combination to connect the devices together in the system and provide full…

A digital media communications and control system includes a plurality of audio devices each of which includes a device interface module for communication of digital media data and control data from at least one of the devices to at least one other of the devices. A universal data link is operatively connected to each of the device interface modules. The device interface modules and universal data links are operative in combination to connect the devices together in the system and provide full duplex communication of the digital media data and control data between the devices.

See patent
Universal digital media communications and control system and method

Issued February 3, 2004 United States 6,686,530

A digital media communications and control system includes a plurality of audio devices each of which includes a device interface module for communication of digital media data and control data from at least one of the devices to at least one other of the devices. A universal data link is operatively connected to each of the device interface modules. The device interface modules and universal data links are operative in combination to connect the devices together in the system and provide full…

A digital media communications and control system includes a plurality of audio devices each of which includes a device interface module for communication of digital media data and control data from at least one of the devices to at least one other of the devices. A universal data link is operatively connected to each of the device interface modules. The device interface modules and universal data links are operative in combination to connect the devices together in the system and provide full duplex communication of the digital media data and control data between the devices.

See patent
Apparatus and method for De-esser using adaptive filtering algorithms

Issued April 16, 2002 United States 6,373,953

A method and apparatus for the real-time creation of an output audio signal from an input signal with an unwanted or noise portion. The system detects the unwanted portion of the input signal by utilizing an adaptive detection filter and reduces the unwanted portion of the input signal. The reduction of the unwanted portion is performed by compression of the unwanted signal, subtraction of the unwanted portion of the signal, or eliminating the output signal until the unwanted portion is no…

A method and apparatus for the real-time creation of an output audio signal from an input signal with an unwanted or noise portion. The system detects the unwanted portion of the input signal by utilizing an adaptive detection filter and reduces the unwanted portion of the input signal. The reduction of the unwanted portion is performed by compression of the unwanted signal, subtraction of the unwanted portion of the signal, or eliminating the output signal until the unwanted portion is no longer detected. The system is specifically designed to find a high frequency and high amplitude sound such as a sibilant.

See patent
Universal audio communications and control system and method

Issued March 5, 2002 United States 6,353,169

An audio communications and control system includes a plurality of audio devices each of which includes a device interface module for communication of digital audio data and control data from at least one of the devices to at least one other of the devices. A universal data link is operatively connected to each of the device interface modules. The device interface modules and universal data links are operative in combination to connect the devices together in the system and provide full duplex…

An audio communications and control system includes a plurality of audio devices each of which includes a device interface module for communication of digital audio data and control data from at least one of the devices to at least one other of the devices. A universal data link is operatively connected to each of the device interface modules. The device interface modules and universal data links are operative in combination to connect the devices together in the system and provide full duplex communication of the digital audio data and control data between the devices.

See patent
SYSTEM AND METHOD FOR ANALYZING AND CLASSIFYING CALLS WITHOUT TRANSCRIPTION VIA KEYWORD SPOTTING

Filed October 3, 2013 United States

A facility and method for analyzing and classifying calls without transcription via keyword spotting is disclosed. The facility uses a group of calls having known outcomes to generate one or more domain- or entity-specific grammars containing keywords and related information that are indicative of particular outcome. The facility monitors telephone calls by determining the domain or entity associated with the call, loading the appropriate grammar or grammars associated with the determined…

A facility and method for analyzing and classifying calls without transcription via keyword spotting is disclosed. The facility uses a group of calls having known outcomes to generate one or more domain- or entity-specific grammars containing keywords and related information that are indicative of particular outcome. The facility monitors telephone calls by determining the domain or entity associated with the call, loading the appropriate grammar or grammars associated with the determined domain or entity, and tracking keywords contained in the loaded grammar or grammars that are spoken during the monitored call, along with additional information. The facility performs a statistical analysis on the tracked keywords and additional information to determine a classification for the monitored telephone call.

See patent
SYSTEM AND METHOD FOR ANALYZING AND CLASSIFYING CALLS WITHOUT TRANSCRIPTION

Filed March 15, 2013 United States

A facility and method for analyzing and classifying calls without transcription. The facility analyzes individual frames of an audio to identify speech and measure the amount of time spent in speech for each channel (e.g., caller channel, agent channel). Additional telephony metrics such as R-factor or MOS score and other metadata may be factored in as audio analysis inputs. The facility then analyzes the frames together as a whole and formulates a clustered-frame representation of a…

A facility and method for analyzing and classifying calls without transcription. The facility analyzes individual frames of an audio to identify speech and measure the amount of time spent in speech for each channel (e.g., caller channel, agent channel). Additional telephony metrics such as R-factor or MOS score and other metadata may be factored in as audio analysis inputs. The facility then analyzes the frames together as a whole and formulates a clustered-frame representation of a conversation to further identify dialogue patterns and characterize call classification. Based on the data in the clustered-frame representation, the facility is able to make estimations of call classification. The correlation of dialogue patterns to call classification may be utilized to develop targeted solutions for call classification issues, target certain advertising channels over others, evaluate advertising placements at scale, score callers, and to identify spammers.

See patent
SYSTEM AND METHOD FOR HIGH-PRECISION 3-DIMENSIONAL AUDIO FOR AUGMENTED REALITY

Filed April 19, 2012 United States

Techniques are provided for providing 3D audio, which may be used in augmented reality. A 3D audio signal may be generated based on sensor data collected from the actual room in which the listener is located and the actual position of the listener in the room. The 3D audio signal may include a number of components that are determined based on the collected sensor data and the listener's location. For example, a number of (virtual) sound paths between a virtual sound source and the listener may…

Techniques are provided for providing 3D audio, which may be used in augmented reality. A 3D audio signal may be generated based on sensor data collected from the actual room in which the listener is located and the actual position of the listener in the room. The 3D audio signal may include a number of components that are determined based on the collected sensor data and the listener's location. For example, a number of (virtual) sound paths between a virtual sound source and the listener may be determined The sensor data may be used to estimate materials in the room, such that the affect that those materials would have on sound as it travels along the paths can be determined In some embodiments, sensor data may be used to collect physical characteristics of the listener such that a suitable HRTF may be determined from a library of HRTFs.

See patent
SPEECH RECOGNITION ANALYSIS VIA IDENTIFICATION INFORMATION

Filed July 28, 2011 United States

Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence…

Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.

See patent
DISTRIBUTED ASYNCHRONOUS LOCALIZATION AND MAPPING FOR AUGMENTED REALITY

Filed June 2, 2011 United States

A system and method for providing an augmented reality environment in which the environmental mapping process is decoupled from the localization processes performed by one or more mobile devices is described. In some embodiments, an augmented reality system includes a mapping system with independent sensing devices for mapping a particular real-world environment and one or more mobile devices. Each of the one or more mobile devices utilizes a separate asynchronous computing pipeline for…

A system and method for providing an augmented reality environment in which the environmental mapping process is decoupled from the localization processes performed by one or more mobile devices is described. In some embodiments, an augmented reality system includes a mapping system with independent sensing devices for mapping a particular real-world environment and one or more mobile devices. Each of the one or more mobile devices utilizes a separate asynchronous computing pipeline for localizing the mobile device and rendering virtual objects from a point of view of the mobile device. This distributed approach provides an efficient way for supporting mapping and localization processes for a large number of mobile devices, which are typically constrained by form factor and battery life limitations.

See patent
SEMI-PRIVATE COMMUNICATION IN OPEN ENVIRONMENTS

Filed November 15, 2010 United States

A system and method providing semi-private conversation using an area microphone between one local user in a group of local users and a remote user. The local and remote users may be in different physical environments, using devices coupled by a network. A conversational relationship is defined between a local user and a remote user. The local user's voice is isolated from other voices in the environment, and transmitted to the remote user. Directional output technology may be used to direct…

A system and method providing semi-private conversation using an area microphone between one local user in a group of local users and a remote user. The local and remote users may be in different physical environments, using devices coupled by a network. A conversational relationship is defined between a local user and a remote user. The local user's voice is isolated from other voices in the environment, and transmitted to the remote user. Directional output technology may be used to direct the local user's utterances to the remote user in the remote environment.

See patent
Strategies for Queuing Events for Subsequent Processing

Filed July 29, 2005 United States

See patent

Organizations

IEEE

-

Sep 2014 - Present
Audio Engineering Society (AES)

Full Member

Mar 2013 - Present

Recommendations received

11 people have recommended Jason Join now to view

More activity by Jason

Anyone remember this amazing magazine? #audio #audioengineering #sound #soundengineering

Liked by Jason Flaks

seattle is a real mother! ☔️🔥☔️

Liked by Jason Flaks

Since starting with CA College Corp last year, Learn To Be tutors have hosted 4,448 hours of tutoring for ~270 students across California. Not…

Liked by Jason Flaks

I'll be speaking at Microsoft Build in just 2 weeks! Join me as I showcase Docker's latest innovations to help developers be more productive when…

Liked by Jason Flaks

Cobus Greyling I just wrote a bit about this. While true that LLMs can replace many traditional chatbot components, any one using it at scale will…

Shared by Jason Flaks

AI is transforming almost every industry. Zocks has built an assistant for Financial Advisors that is taking that industry by storm, automating all…

Liked by Jason Flaks

View Jason’s full profile

See who you know in common
Get introduced
Contact Jason directly

Join to view full profile

People also viewed

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Add new skills with these courses

See all courses