By Jason Goecke
What is Adhearsion?
Telephony development has significant issues today. It tends to be fragmented, arduous, and requires a steep learning curve of understanding proprietary protocols, jargon and limited APIs. These issues are exasperated by the telecom industry’s use of proprietary systems and inflexible business models. This prevents the telecom industry from keeping up with innovations happening elsewhere, especially in modern web development.
Adhearsion is a new way to write voice-enabled applications with Ruby. It’s a complete open-source Ruby-based framework, not just an API or library, that provides all of the necessary features to develop comprehensive voice-enabled applications. For example, one might build an Adhearsion application with a Rails interface for managing an international tech support team. Or maybe you want to use a phone call as a CAPTCHA system (confirming the phone number at the same time). Or maybe you’re coming home with groceries and want to unlock your doors by calling your house and entering a passcode. Because an Adhearsion application is fundamentally a voice-enabled Ruby application, there are virtually no limits to what may be done.
Today Adhearsion works in tandem with the Asterisk open-source telephony engine, maintaining Asterisk as the core telephony switching platform while providing an application layer atop it. The latest release of Adhearsion comes with a component architecture that allows for easily writing plug-ins that may be shared among the Adhearsion community.
What is Asterisk?
Asterisk is an open-source telephony engine and toolkit. With respect to Adhearsion, Asterisk provides support for converting between audio codecs, telephony protocols, and providing lower-level abstractions of telephony functionality. Asterisk may be molded into many applications, from office PBX systems, to conference calling servers to voicemail systems. There is generally a steep learning curve to get started developing applications with Asterisk. There are also design issues in various aspects of the engine that make using a development framework for extending it more appropriate for scale and stability.
The latest release of Adhearsion comes with a series of enhancements. This includes a new component architecture that allows for easily writing plug-ins that may be shared among the Adhearsion community. A complete re-work of how Adhearsion interfaces to the Asterisk Manager API (a protocol used for receiving events and issuing various commands) that uses a dynamic thread pool, as well as Ragel to create a state machine that parses the protocol efficiently providing great scalability. Adhearsion has an exciting roadmap that is rapidly evolving the framework for additional features and support of more telephony engines.
Hello, World!
Lets dive right into the action and write our first Hello World application. Install the Adhearsion gem by simply doing
$ sudo gem install adhearsion
Now that you have Adhearsion installed you have the ‘ahn’ command that is used to generate, stop and start applications as well as to create, enable and disable components. You can view usage information by doing
$ ahn --help
Let’s create your first application by entering
$ ahn create ~/my_first_app
This is similar to enerating a Rails application with the “rails” command. You will see the program print out a list of files it just created in the my_first_app folder. The next step is to wire your application to use the Adhearsion Sandbox that is available for developers just getting started. The Sandbox allows you to focus on Adhearsion, without having to worry about setting up the underlying telephony system, getting you off and running with minimal friction. For this, you must sign up for a free account at:
http://new.adhearsion.com/getting_started
Accounts are required to use the sandbox because incoming calls need some way of finding you individually. After you have your account, the next step is to enable the Sandbox component provided with Adhearsion by default from within your
my_first_app directory:
$ ahn enable component sandbox
Once you have done this, you should then edit the
~/my_first_app/components/sandbox/sandbox.yml file and enter your credentials you created on the sign-up form:
username: railsrockstar
password: rubyislove
We’re almost there! Let’s start the application next by doing
$ ahn start .
The next step is to modify the ~/my_first_app/dialplan.rb file, which is the file that contains the DSL for handling all inbound calls with realtime call control methods. When you open the file you should see something like this:
adhearsion {
simon_game
}
Add this to the bottom of the dialplan.rb file:
sandbox {
play “hello-world”
}
When a call comes into the Sandbox, control of it will be specifically forwarded to your Adhearsion application running on your system. The contexts in dialplan.rb (“adhearsion” and “sandbox” in the example above) specify many entry points into which calls may come and, by default, the sandbox starts executing the ‘sandbox’ context. The “hello-world” String references a standard Asterisk sound file we have on the sandbox that will be played back to you when you call.
The next step is to setup Voice over IP (VoIP) phone software (called a “softphone”) on your computer. There are many free softphones to choose from, but we recommend using Gizmo5 (http://www.gizmo5.com) since it does a good job of dealing with firewall issues and works on Windows, OSX and Linux. You’ll need to also sign up for a free Gizmo account (the last signup, we promise) but it’s actually quite useful because Gizmo’s servers will help you avoid firewall issues. Once you have installed and configured Gizmo5, all you need to do now is dial your Sandbox account. To do this, simply enter the following into the Gizmo5 text field near the top of the main Gizmo5 window:
your_username@sandbox.adhearsion.com
Thats it! If all went well you should now hear a woman say “Hello, world!”. Let’s now try building a more sophisticated application using Rails.
Rails Integration
While Adhearsion is a standalone framework, it may easily be integrated with Rails to leverage all of the business logic tucked away in the Rails models. Since Adhearsion and Rails run in their own interpreter instances, having messaging is required for sharing states across your applications if required beyond your models. For this, Adhearsion fully supports Distributed Ruby (DRb), a Stomp message queue as well as a set of RESTful APIs by default.
To load Rails models and a database environment in the Adhearsion application you created above, you modify the
config/startup.rb file as follows:
config.enable_rails :path => 'gui', :env => :development
In the above line the :path is simply the path to your root Rails directory, this may be an absolute path or a symbolic link, and of course the :env is which environment from database.yml you would like to use. Rails and Adhearsion will run as separate processes with their own Ruby interpreters but now both applications share the same underlying models.
Now let’s see how we may leverage this. Let’s say you have a Rails application that allows users to sign-up and listen to specially recorded audio files on your podcasting website. You might have a model that looked something like this:
class User < ActiveRecord::Base
validates_presence_of :password
validates_uniqueness_of :password
has_many :podcasts, :order => “created_at desc”
end
class Podcast
belongs_to :user
end
Now, from the same dialplan.rb we modified in the Hello World example above, we may enter the following:
podcast_content {
password = input 5,
:play => ‘please-enter-your-pin-number’,
:timeout => 5.seconds
user = User.find_by_password(password)
if user
play ”{user.id}/#{user.podcasts.first.id}”
else
play ‘vm-invalidpassword’
play ‘goodbye’
end
hangup
}
In the example above we show the ability to ask the user a question and then receive the digits entered on their phone in the input method, where :play represents the audio file to ask the question, :timeout is the amount of time in seconds the user has to input before the request times out.
Now this is a contrived scenario, but it provides a good flavor of how Adhearsion may leverage the models not only within a Rails app but anything that may benefit from the use of ActiveRecord, or any other way of accessing shared state. You could be using CouchDB, DRb, a message queue, XML-RPC interfaces, an LDAP library or any other integration-oriented technology.
Conclusion
Adhearsion is a powerful framework that brings voice to the modern web. We have only covered a handful of the capabilities here and there is so much more to explore. Adhearsion may be used to generate outbound calls, leverage Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) engines, provide advanced capabilities to call centers, enable seamless voice enabled web services and applications, the list could go on. The limit really is your imagination.
Historically finding a developer that could cross the web and voice domains was a rare breed. This no longer needs to be true for the Rails community. The true potential of Adhearsion is to allow a Rails developer to extend their capabilities beyond the web to include voice with minimal friction. Not only may you leverage this in your own applications, but in those of your customers. With your new found ability to include all forms of communications, you have the opportunity to be a thought leader and create more opportunities with your existing engagements and beyond.
We welcome everyone to join us and get started adding innovative voice solutions to your web applications. You will find more examples by visiting the Adhearsion project (http://adhearsion.com) where you may also find the API documentation (http://api.adhearsion.com) and the wiki (http://docs.adhearison.com).
Published in Issue #1: The Beginning
Back