Sunday, November 9, 2008

The Era of Amdahl’s Law

Today, the transition is being made from the individual knowledge worker in the Era of Moore’s Law to the collective Web workers in the Era of Amdahl’s Law where Web workers create, sort, search, and manage information between knowledge products, devices, and people.

While corporations were seeking to increase each individual’s productivity in the Era of Moore’s Law, now distributed groups working in parallel are reaping the benefits of plummeting transaction costs over the Web.

In 2005, IBM's announcement that it had doubled the performance of the world's fastest computer, named Blue Gene/L from 136.8 trillion calculations per second (teraflops) to 280.6 teraflops. The Blue Gene system is the new generation of a massively parallel supercomputer in the IBM System Blue Gene Solution series: the epitome of centralized computer power.

At the other end of the scale, Google has developed the largest parallelized computer complex in the world, by inventing their own Googleware technology for parallel processing across distributed servers, microchips, and databases.

As a result, parallel processing is effecting all aspects of the Information Revolution: the mainframe computers have become supercomputers with massively parallel microchip configurations; the individual personal computers of the Era of Moore’s Law have become multicore processors for application processing, and the Web is utilizing applications such as Googleware based upon a vast parallelized computer complex with its specialized concurrent programming.

Parallel processing has infiltrated all aspects of computer usage because the limitations of Moore’s Law require compensation through Amdahl’s Law given by:

Speedup ≤ 1 / (F + (1-F) / N)

Amdahl's law describes how much a program can theoretically be sped up by additional computing resources, based the proportion of parallelizable and serial components. Where F is the fraction of calculation that must be executed serially given as:

F = s / (s + p)

where s = serial execution and p=parallel execution.

Then Amdahl's law says that on a machine with N processors, the maximum speedup is given by:


As N approaches infinity, the maximum speedup converges to 1/F, or (s + p)/s.

This means that a program with fifty percent of the processing executed serially, the sped up is only a factor of two, regardless of how many processors are available. For a program where ten percent must be executed serially a factor of ten is the maximum sped up.

All computer applications must now being translated from sequential programming into parallel processing methods. As a result, the third wave of computing has become the Era of Amdahl’s Law where the information environment of each person is connected through the Web to a variety of multicore devices.

Manycore systems hold the promise of 10 to 100 times the processing power in the next few years. However, as software developer’s transition from writing serial programs to writing parallel programs there will be pitfalls to creating robust and efficient parallel code.

Even if current applications don't have much parallel functionality, s and p can be changed:

1. Increase p by doing more of the same: Increase the volume of data processed by the parts that are parallelizable. This is Gustafson's Law.

2. Increase p by doing adding new features that are parallelizable.

3. Reduce s by pipelining.

If we keep run time constant and focus instead on increasing the problem size, the total work in a fixed time:

Total Work = s + N * p


Besides solving bigger versions of the same problem, we also have the option of adding new features.

Normally, getting just N-fold speedups is considered the Holy Grail, but there are ways to leverage data locality and/or perform speculative and cancelable execution to set up super linear speedups.

References:

[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.

[2] Sutter, H., Break Amdahl’s Law!, Dr. Dobb’s Portal, Jan. 17, 2008.

[3] Goetz, B., et. al., Java: Concurrency in Practice, Addison-Wesley, Stoughton, Massachusetts, USA, 2008.

Sunday, August 10, 2008

Concurrent Programming

Mighty oaks from tiny acorns grow or so an ancient proverb claims — consider the microchip. Smaller than a penny, it is the brain of every digital device in the world. Chips connect circuits, computers, handheld devices as well as satellites and an endless list of electronics. As the centerpiece of the information revolution, the chip is the driving force behind innovation as it follows an ambiguous rule called ‘Moore’s Law.’

In 1965, Gordon Moore who shared in the invention of the microprocessor chip and went on to co-found the Intel Corporation wrote an article where he noted that the density of components on semiconductor chips had doubled yearly since 1959. This annual doubling in component density amounted to an exponential growth rate, widely known as Moore's Law.

While Moore’s Law is not a physical law, like gravity, it is an empirical observation that the capacity of memory chips has risen from one thousand bits in 1971 to one million bits in 1991 and to one billion bits by 2001. The billion-bit semiconductor memory chip represented an extraordinary nine orders of magnitude in growth, and a similar growth rate has also been seen in the capability of microprocessor chips to process data.

While many have speculated on the future of Moore’s law, some have concluded that instead of focusing on obtaining greater speed from of a single processor, innovators should develop multi-core processors. Instead of scaling clock speed, which produces power usage and heat emission to unacceptable levels, in order to increase processing power, chip manufacturers have begun adding additional CPUs, or “cores” to the microprocessor package. By working in parallel, the total 'throughput' of the device is increased. Quad cores are already being produced commercially. The advances in parallel hardware development require similar advances in optimizing the execution of multiple tasks working in parallel, called Concurrent Programming.

While on a single core computer can use multithreads to parallelize processes, true processing parallelism doesn't occur without multi-CPU's. Distributed computing uses parallel work units distributed across numerous machines. However, distributed computing incurs additional requirements for task management.

Concurrent programming utilizes task management and communication. The task manager distributes work units to available threads while task communication uses state and memory sharing to establish the initial parameters for a task and collects the result of the task's work. Task communication requires locking mechanisms to insure performance gains, prevent subtle bugs as multiple tasks overwrite memory locations. Synchronization of state and memory issues can be controlled by using locks, monitors, and other techniques to block threads from altering state another makes changes.

Microsoft’s Parallel Computing Development Center provides support for parallelism, programming models, libraries and tools with F#, Task Parallel Library (TP), Parallel Extensions Assembly (PFX), and PLINQ.

F# is a typed functional programming language for the .NET framework that does not directly support concurrent programming. However, it does include asynchronous workflows for I/O. TPL is designed assist in writing managed code for multiple processors. PFX is being folded into TPL. PLINQ is LINQ where the query is run in parallel. PLINQ takes advantage of TPL by taking query iterations and assigning work units to threads (typically processor cores).

Collectively these efforts help the .NET Parallel class to ease the development of threaded applications. Nevertheless, state and shared memory issues are left to the programmer to solve. Adding concurrent programming capabilities to LINQ seems a natural extension of LINQ.

As a result, the next series of technological advances in the information revolution will be strongly dependent on concurrent programming.

Threads

Today work items are run by creating Threads such as:

Thread t = new Thread(DoSomeWorkMethod);
t.Start(someInputValue);

For 10 work items, we could create 10 threads, but this is not ideal because of context switching, and invalidation of each thread’s cache and memory for each thread’s stack. An alternative is to use the .NET ThreadPool class:

ThreadPool.QueueUserWorkItem(
DoSomeWorkMethod, someInputValue);

However, this lacks the richness of the full API since we do not get a reference to it and there is no explicit support to know when it is completed.

Parallel Extensions is a new class similar to Thread with semantics close to ThreadPool. A code snippet for the new Task class is:

Task t = Task.Create(DoSomeWorkMethod,
someInputValue);

See References for further material.


REFERENCES

[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.

[2] Clifton, M. “Concurrent Programming - A Primer” 3 Jan 2008


[3] Microsoft - Concurrent Programming 2008.

[4] Moth, D., Parallel Extensions to the .NET Framework, 28 February 2008.

Monday, June 9, 2008

Adding Meaningful Content with Resource Description Framework (RDF)

In 1990, Tim Berners-Lee laid the foundation for the World Wide Web with three basic components: HTTP (HyperText Transfer Protocol ), URLs (Universal Resource Locators), and HTML (HyperText Markup Language). These elements were the essential ingredients leading to the explosive growth of the World Wide Web. The original concept for HTML was a modest one, where browsers could simply view information on Web pages. The HTML program can be written to a simple text file and can be easily mastered by a high school student.

However, HTML wasn't extensible, in that it has specifically designed tags requiring vendor agreement before changes could be made. The eXtensible Markup Language (XML) overcame this limitation. Proposed in late 1996 by the World Wide Web Consortium (W3C), XML offered a way to manipulate a developer's structured data. XML simplified the process of defining and using metadata and provided a good representation of extensible, hierarchical, formatted information.

For the Web to provide meaningful content and capabilities today, it will have to add new layers of markup languages starting with Resource Description Framework(RDF). Figure 1 shows the projected pyramid of Markup Languages for the Web.


Figure 1

Resource Description Framework (RDF) has been developed by the W3C in order to extend XML and make work easier for autonomous agents and automated services by introducing a rudimentary semantic capability.

RDF uses a simple relational model for structured data to be mixed, exported and shared across different applications. Resource Description Framework (RDF) defines a subject, a predicate, and an object to form an RDF triplet.

Consider the statement: The book is entitled, Gone with the Wind.

A simple XML representation might be:

{book}
{title} Gone with the Wind. {/title}
{book}

(Note: we use "{" instead of "<" brackets around tags).

The grammatical sentence has three basic parts: a subject [The book], a predicate [ is entitled], and an object [Gone with the Wind]. A machine, however, could not make an inference based upon the simple XML representation of the sentence.

For machines to make an inference automatically, it is necessary to add RDF to the traditional HTML and XML markup.

The basic RDF model produces a triple, where a resource (the subject) is linked through an arc labeled with a property (the predicate) to a value (the object). Figure 2 shows the graphical representation of the RDF statement.


Figure 2

The RDF statement can be represented as a triple: (subject, predicate, object) and also serialized in XML syntax as:

{?xml version="1.0"?
{rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"}
{rdf:Description rdf:about=”SUBJECT”}
{dc:PREDICATE}”OBJECT”{/PREDICATE}
{/rdf:Description}
{/rdf:RDF}

A collection of interrelated RDF statements is represented by a graph of interconnected nodes. The nodes are connected via various relationships. For example, let's say each node represents a person. Each person might be related to another person because they are siblings, parents, spouses, or friends.

There are many RDF applications available, for example see Dave Beckett's Resource Description Framework (RDF) Resource Guide.

Many communities have proliferated on the Internet, from companies to professional organizations to social groupings. The Friend of a Friend (FOAF) vocabulary, originated by Dan Brickley and Libby Miller, gives a basic expression for community membership. FOAF expresses personal information and relationships, and is a useful building block for creating information systems that support online communities. Search engines can find people with similar interests through FOAF.

FOAF is simply an RDF vocabulary. Its typical method of use is akin to that of RSS.

For more on RDF and additional markup languages see the references below.

REFERENCES

Connections: Patterns of Discovery

Developing Semantic Web Services

Web Site: Video Software Lab

Friday, May 16, 2008

Synchronizing Video, Text, and Graphics with SMIL

Today, video is all over the Web. Top networks and media companies now display your favorite shows online; from nail-biting dramas, high-scoring sports, and almost-real reality TV shows to classic feature films.

Apple TV and iTunes stream 720p high-definition (HD) video, and the Hulu.com video Web site has started to add high-definition videos using Adobe Flash Player 9.0 using H.264 encoding.

Synchronized Multimedia Integration Language (SMIL) is the W3C specification standard streaming media language that provides a time-based synchronized environment to stream audio, video, text, images and Flash files. The key to SMIL is its use of blocks of XML (eXtensible Markup Language).

Pronounced "smile," SMIL is an XML compliant markup language that coordinates when and how multimedia files play. Using SMIL, you can

* describe the temporal behavior of the presentation
* describe the layout of the presentation on a screen
* associate hyperlinks with media objects

SMIL players are client applications that receive and display integrated multimedia presentations. SMIL servers are responsible for providing content channels and serving presentations to clients. Although SMIL itself is an open technology, some of the players and servers use proprietary techniques to handle multimedia streaming and encoding.

A SMIL file (extension .smil) can be created with a text editor and be saved as a plain text output file. In its simplest form, a SMIL file lists multiple media clips played in sequence:

smil
body
videosrc="rtsp://yourserver.yourcompany.com/video1.rm"
video src="rtsp:// yourserver.yourcompany.com/video2.rm"
video src="rtsp:// yourserver.yourcompany.com/video3.rm"
body
smil

The master SMIL file is a container for the other media types. It provides the positions for the RealPix graphics files to appear and it starts and stops the video.

The master file is divided into three sections:

* Head: The head element contains information that is not related to the temporal behavior of the presentation. The "head" element may contain any number of "meta" elements and either a "layout" element or a "switch" element. The head contains the meta information, including copyright info, author of the page, and the title.

* Regions: The different regions, which are defined inside the REGION tags control the layout in the RealPlayer window.

* Body: The body of the SMIL file describes the order in which the presentations will appear. The PAR tags mean that the VideoChannel,

PixChannel and TextChannel will be displayed in parallel.

The regions are arranged in a layout similar to the cells in a table. The LEFT and TOP attributes control the position of the different regions along with HEIGHT and WIDTH attributes that specify their size. SMIL has many similarities to HTML, but also some important differences. The SMIL mark-up must start with a smil tag and end with the smil closing tag. All other mark-up appears between these two tags.

A SMIL file can include an optional header section defined by head tags. It requires a body section defined by body tags. Attribute values, must be enclosed in double quotation marks. File names in SMIL must reflect the file name exactly. They can use upper, lower, or mixed case but must be identical with how it appears on the server. SMIL files are saved with the extension .smi or .smil.

The SMIL Sequential seq and Parallel par tags allow you to structure your media. Use the seq tag to play various clips in sequence. In the following, the second video clip begins when the first video clip finishes.

seq
video src="videos/video1.rm"
video src="videos/video2.rm"
seq

To play two or more clips at the same time use the par tag Here the video clip is playing while the text of the lyrics are scrolling in synchroniztion.

par
video src="videos/video1.rm"
textstream src="lyrics/words.rt"
par

When RealServer G2 streams parallel groups, it ensures that the clips stay synchronized. If some video frames don't arrive, RealServer either drops those frames, or halts playback until the frames do arrive. SMIL timing elements let you specify when a clip starts playing and how long it plays. If you do not set timing event, the clips start and stop according to their normal timelines and their positions within par and seq groups. The easiest way to designate a time is with shorthand markers of h, min, s, and ms.

For more information about technology innovations and Web video see the following references.

REFERENCES:

Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery John Wiley & Sons, Inc. 2008.

Alesso, H. P. and Smith, C. F.,
e-Video: Producing Internet Video as Broadband Technologies Converge (with CD-ROM) Addison-Wesley, 2000.

Web Site:
Video Software Lab

Tuesday, May 13, 2008

CHIP Technology Trends Beyond 2008

One of the most successful long term computer trends is based upon Moore's Law. Moore’s Law is observation that the cost-performance of chip components doubles every 18 months. It is a centerpiece of the evolution of computer technology. While Moore’s Law is not a physical law like Newton’s law of gravitation, it is the result of Moore's empirical observation that the increases in circuit density appeared to double on a regular basis.

However, there are technical and physical obstacles looming ahead for Moore's Law. Just how complex chips can become is subject to the problem of power leakage. Chips with billions of transistor can leak up to 70 Watts which causes serious cooling problems.

In addition, it is possible that circuit dimensions cannot get much smaller than the current 65 nanometer (nm) without increasing production difficulties. In 2004, chips were mass producing the 90 nm integrated circuit (IC), but by 2006, migration began to 65 nm. Changing from 90 nm to 65 nm design rules was quick because the fabrication process required little change. This was because in 1988, IBM fabricated the world's smallest transistor at that time using 70 nm design rules. It used a power supply of one volt instead of five volts and required nitrogen cooling. Today these field effect transistors (FETs) run at room temperature.

Similarly, in late 2003, NEC built a FET with a 5 nm gate length and IBM built one at 6 nm gate. These are n order of magnitude smaller than what's used in production now.

Innovators have also been developing four to eight parallel core microprocessors. By working in parallel, the total throughput of the processor is greatly increased and Quad cores are already being produced commercially.

If we reach a physical limit to Moore's Law, we will require new discoveries in breakthrough phenomena demonstrating new proofs of principle, such as 3D chips and nanotubes.

3D chips uses layers of transistors forming a high rise. New technologies could lead to molecular three-dimensional computing include nanotubes, nanotube molecular computing, and self assembly in nanotube circuits.

Matrix Semiconductor, Inc. is already building three-dimensional circuits using conventional silicon lithography. They are manufacturing memory chips with vertically stacked planes.

Carbon nanotubes and silicon nanowires can be extremely strong materials of metals or semiconductors with good thermal conductivity. They can be used as nano-wires or field-effect transistors. Carbon nanotubes can be 1 to 2 nanometers in length, and substantially reduce energy consumption.

In 1991 the first Nanotubes used a rolled hexagonal network of carbon atoms in a cylinder. In a demonstration at the University of California at Irvine by Peter Burke, nanotube circuits at 2.5 gigahertz (GHz) were operated.

Defense Advanced Research Projects Agency, the National Science Foundation and the Office of Naval Research funded research into a class of molecules called rotaxanes. These were synthetic, dumbbell-shaped compounds for logic operations to provide memory and routing signals. A critical step in making a molecular computer requires that the wire be arranged in one direction as molecular switches and that a second set of wires is aligned opposite. A single layer of molecules, rotaxanes, is at the junction of these wires.

An alternative approach by Fujio Masuoka, the inventor of flash memory, has a memory design that reduces the size and cost-per-bit by a factor of ten.

Major progress has also been made in computing using just a few molecules or single-electron transistors. Avi Aviram of IBM and Mark A. Ratner of Northwestern University first suggested this in 1970.

Using a single electron to turn a transistor on and off would miniaturize as well as reduce power. However, there have been severe problems due to their extreme sensitivity to background noise. Single-electron transistors could store as much as a terabit of data in a square centimeter of silicon. That would be a two order of magnitude improvement over today's technology.

Another interesting new chip technology is crossbar latch. The Quantum Science Research (QSR) group of Hewlett Packard has demonstrated this technology that doesn't use transistors to provide the signal restoration and inversion required for general computing. The experimental latch is a single wire that lies between two control lines at a molecular-scale junction. Voltage to the control lines
allows the latch to perform NOT, AND and OR operations. It allows development of nanometer devices and could improve computing by three orders of magnitude.

While we continue to worry about the longevity of Moore's Law, the prospects for innovations contiues to be very bright.

In Connections: Patterns of Discovery the patterns of discovery are presented that produced Moore’s Law and the book explores the question, “What is the software equivalent of Moore’s Law?”

The patterns challenge the reader to think of the consequences of extrapolating trends, such as, how Moore's Law could reach machine intelligence, or retrench in the face of physical limitations.

REFERENCES:

Connections: Patterns of Discovery

Developing Semantic Web Services

Web Site:

Video Software Lab

Saturday, May 3, 2008

Search Technology Trends beyond 2008

Microsoft and Yahoo! are working feverishly to extend their market share of search which produces lucrative advertisement dollars. Though Google's leaders frown, they seem unconcerned.

Just what do Google's founders, Larry Page and Sergey Brin, know that merits such confidence.

Perhaps, they are confident in their technology - Googleware - which consists of a combination of custom software and hardware for optimizing search through the world’s most powerful computational enterprise.

Perhaps, they are confident in their global information collection, storage, and retrieval system to provide the best ranking for quick and relevant access to all information. They are already developing a Google encylcopedia from the libraries of books they have been scanning.

Perhaps, they are confident in their current overwhelming worldwide market dominance.
Or perhaps, they have a vision for the next decade that will connect all of human knowledge to what Larry Page calls 'Perfect Search.'

Extrapolating Google’s success for the near future, we would expect a steady improvement in ranking algorithms to ensure Google’s continued dominance.

Additionally, future Google services will expand into the multimedia areas of television, movies, and music using Google TV and Google Mobile. When Google digitizes and indexes every book, movie, TV show, and song ever produced, viewers could have all of TV history to choose from while Google offers advertisers targeted search. Google Mobile and G-Phone could deliver the same service and products to cell phone technology.

One of the great areas of innovation resulting from Google’s initiatives is its ability to search the Human Genome. Such technology could lead to a personal DNA search capability within the next decade. This could result in the identification of medical prescriptions that are specific to you; and you would know exactly what kinds of side-effects to expect from a given drug.

Soon search will move from PC-centric to small devices such as mobile phones and PDAs. Small objects with a chip and the ability to connect will be network-aware and searchable. While there are several hundred thousand books online, there are 100 million more that are not. So search will soon begin to access deep databases, such as University library systems.

Google's vision also includes connecting information through more intelligent search capabilities. A new Web architecture such as Tim Berners-Lee’s Semantic Web, would add knowledge representation and logic to the markup languages of the Web. Semantics on the Web would offer extraordinary leaps in Web search capabilities to handle natural language queries.

Technology futurists such as Ray Kurzweil have suggested that a web-based systems such as Google's could join a reasoning engine with a search engine to produce Strong AI (software programs that exhibit true intelligence). Strong AI could perform data mining at a whole new level. Consider what might happen if we could achieve ‘perfect search?’ where we could ask any question and get the perfect answer – an answer with real context. The answer could draw upon all of the world’s knowledge using text, video, and audio. And it would reflect the real nuance of meaning. Most importantly, it would be tailored to your own particular context. That’s the stated goal of IBM, Microsoft, Google and others.

Developing 'perfect search' would have to be reached in three steps. First, ubiquitous computing populates the world with devices - introducing microchips everywhere. Then the ubiquitous Web connects and controls these devices on a global scale. The ubiquitous Web's pervasive infrastructure allows URI access to physical objects just as the Web does in cyberspace. The final step comes when artificial intelligence reaches the level where it can manage and regulate devices seamlessly within this environment – achieving a kind of ubiquitous intelligence.

In Connections: Patterns of Discovery the patterns of discovery are presented that produce the ‘big picture’ for the Information Revolution’s innovations leading to ubiquitous intelligence (UI) where everyone is connected through small devices to what Google founder Larry Page calls ‘perfect search.’

REFERENCES:

Connections: Patterns of Discovery

Developing Semantic Web Services

Web Site: Video Software Lab

Web Site:

Semantic Web Services

It's always exciting to get a glimpse of a new innovative technology just before it really takes off. One of the more interesting Web prospects is Semantic Web Services.

Today, Web Services are self-contained, self-described, component applications that can be published, located, and invoked across the Web. Web Services provide a standard means of interoperating between different software applications running on a variety of platforms. eXtensible Markup Language (XML) provides the extensibility and language neutrality that is the key for standards-based interoperability of Web Services. They perform functions that can include anything from simple query responses to complex business processes. Once a Web Service is deployed, other applications can discover and invoke it. At present, Web Services require human interaction for identification and implementation.

Tim Berners-Lee, the inventor of the Web, has suggested that the integration of Web Services and Semantic Web technology could offer significant performance improvement for Web applications. Integration could combine the business logic of Web Services with the Semantic Web's meaningful content. There are several areas where the two could work well together. For example, the current technologies for discovery (Universal Description, Discovery and Integration, UDDI), binding (Web Services Description Language, WSDL), and messaging (Simple Object Access Protocol, SOAP) could use an ontology (Web Ontology Language, OWL) to provide automatic Semantic Web Services thereby allowing fast interaction with Web business rules’ engines.

Through the Semantic Web, users and software agents would be able to discover, invoke, compose, and monitor Web resources offering particular services with a high degree of automation. Recent industrial interest in such services and the availability of tools to enable service automation suggests the possibility that fast progress can be made. Ontology Web Language (OWL) for services (OWL-S) may be the most viable application.

Web Service Architecture requires that discrete software agents work together to implement functionality. These agents must communicate by protocol stacks that are less reliable than direct code invocation. Therefore, developers must consider the unpredictable latency of remote access, and take into account issues of partial failure and concurrency.

To make use of a Web Service, a software agent needs a computer-interpretable description of the service and the means for access. An important goal for Semantic Web markup languages is to establish a framework for making and sharing these descriptions. Web sites should be able to employ a set of basic classes and properties for declaring and describing services, and the ontology structuring mechanisms of OWL provides the appropriate framework to do this.

OWL-S is a high-level ontology, at the application level that is meant to answer the what- and why-questions about a Web Service, while the how-questions are addressed as part of WSDL. An Ontology is a taxonomy ( classes and relationships) along with a set of inference rules.

As a result, an ontology for Web Services would make Web Services machine understandable and support automated Web Service composition and interoperability.

Thereby providing automated functions for:

* service discovery,
* service execution,
* service composition,
* service monitoring.

Discovery: A program must first be able to automatically find, or discover, an appropriate Web service. Neither Web Service Description Language (WSDL) nor Universal Discovery and Description language (UDDI) allows for software to determine what a Web service offers to the client. A Semantic Web service describes its properties and capabilities so that software can automatically determine its purpose.

Invocation: Software must be able to automatically determine how to invoke or execute the service. For example, if executing the service is a multi-step procedure, the software needs to know how to interact with the service to complete the necessary sequence. A Semantic Web service provides a descriptive list of what an agent needs to be able to do to execute and fulfill the service. This includes what the inputs and outputs of the service are.

Composition: Software must be able to select and combine a number of Web services to complete a certain objective. The services have to interoperate with each other seamlessly so that the combined results are a valid solution.

Monitoring: Agent software needs to be able to verify and monitor the service properties while in operation.

With these capabilities we will be able to program agents to locate and utilize Web Services all automatically.

REFERENCES

Alesso, H. P. and Smith, C. F., Developing Semantic Web Services, A.K. Peters, Ltd., Wellesley, MA, 2004.

Alesso, H. P. and Smith, C. F., Thinking on the Web: Berners-lee, Turing and Godel, John Wiley & Sons, Inc. 2006.

Web Site Video Software Lab

High Definition Video on the Web Today

Today, video is everywhere. The journey from cell phone to Youtube.com is now a short hop and more and more often businesses are producing their message visually over the Web.

Today, when you go on-line to make a purchase, you can see a video demonstration of the product in action. If it needs assembly, instead of piecing it together from a single sheet of indecipherable instructions, you can get step-by-step video instructions on-line; and in some cases you can ask questions and receive answers in real-time with instant messaging.

As if this wasn't enough, broadband technologies are converging at breakneck speed, and higher quality video is on the way. The latest development is the arrival of High Definition (HD) video over the Web which is making the entire Web video experience much more enjoyable and acceptable.

Apple TV and iTunes can stream 720p high-definition (HD) video and now HD content is becoming available through the iTunes Store. Video podcasts are out there. Soon the availability of high-definition movies and TV shows from many sources will happen.
One of the biggest complaints with the Apple TV was the dearth of HD video content. But HD content is finally available on iTunes for free.

Already, the Hulu.com video Web site has started to add high-definition videos using Adobe Flash Player 9.0 using H.264 encoding.

See examples of HD at

www.videosoftlab.com

For more information about technology innovations and Web video see the following references.

REFERENCES:

Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery John Wiley & Sons, Inc. 2008.

Alesso, H. P. and Smith, C. F., e-Video: Producing Internet Video as Broadband Technologies Converge Addison-Wesley, 2000.

Web Site: Video Software Lab

Friday, May 2, 2008

Long Term Information Technology Forecasting

Yogi Berra once said, “Predictions can be tricky, especially when you’re talking about the future.” And looking forward is certainly more perilous than using our 20-20 hindsight. However, the future of rapidly converging technologies is not so complex and uncertain that a few reasonable speculations can’t be discerned.

The truth about the biggest scientific breakthroughs is that they often come when a scientist takes a leap of imagination out of what is probable into what just might be possible. Scientists seek to understand their surrounds through three remarkable human characteristics: discovery, invention, and creativity.

Discovery is about with finding something that is already there - like finding a gold deposit. Invention is an ingenious product of a culmination of many contributing ideas, like the invention of the telephone. On the other hand, creativity is the product of a single mind like a play by Shakespeare. Actually, there is a great deal more to the scientific process, but 'seeing the big picture' requires an ability to understand the relationship between relationships.

Forecasting scientific breakthroughs requires a look into the prospects of science principles, technologies, and the economic incentives to identify areas of strategic opportunity.

Lessons can be taken from past efforts. In a recent review of a 40 year-old forecasting study, Richard E. Albright commented on the one hundred technical innovations identified as being considered very likely to be developed in the last third of the twentieth century. While fewer than 50% of the predicted innovations were considered “good and timely,” Albright found that the accuracy rate for the areas of computers and communications rose to about 80%.

Further, Albright concluded that “we should look for sustained and continuing trends in underlying technologies, where increasing capabilities enable more complex applications and declining costs drive a positive innovation loop, lowering the cost of innovation and enabling wider learning and contributions from more people, thus sustaining the technology trends.”

The growth of a new technological capability typically follows an S-shaped curve which follows three stages. First a slow initial growth allows the new technology to prove its superiority over previous technology. Once this is demonstrated, rapid growth follows. Finally, growth is limited by technological or socioeconomic competition which leads to an asymptotical leveling off.

The S-shaped curve illustrates the progress of many inventions such as electrical appliances. Many of the early analog signal processing devices developed a paradigm shift which took nearly 50 years to come to practical fruition as the adoption and utilization of independently powered analog machines followed an S-shaped curve. Today, the growth of the digital competitors is following a similar pattern.

In Connections: Patterns of Discovery the patterns of discovery are presented that produced Moore’s Law and the book explores the question, “What is the software equivalent of Moore’s Law?”

The patterns challenge the reader to think of the consequences of extrapolating trends, such as, how Moore's Law could reach machine intelligence, or retrench in the face of physical limitations.
From this perspective, the book draws the ‘big picture’ for the Information Revolution’s innovations in chips, devices, software, and networks. One goal of science is ubiquitous intelligence (UI) where everyone is connected to devices with access to Artificial Intelligence (AI) - offering what Google founder Larry Page calls ‘perfect search.’

REFERENCE:

H. Peter Alesso and Craig F. Smith, Connections: Patterns of Discovery John WIley & Sons Inc., 2008.

Thursday, May 1, 2008

Google Building the Ubiquitous Web

Today, while Globalization is unfolding, only 18% of the world’s population is connected to the Internet. What will happen as the rest of the world becomes connected?

By 2020, it may be possible for nearly every individual to have personally-tailored access to the whole of knowledge through ‘perfect search’ - where the ultimate search engine can understand exactly what you mean and give back exactly what you want; anytime, anywhere.

In its quest for ‘perfect search’ Google could become the critical gatekeeper for connecting us to all of human knowledge; making it the leader of the Information Revolution and creating a Brave New Vision for Globalization.

The globalization of information technology began with Mark Weiser, who first identified the concept of ubiquitous computing (UC) in 1988 while he was working at Xerox’s Palo Alto Research Center (PARC). Weiser described UC as the ‘third wave’ in computing for achieving the full potential of computing and networking.

The first wave of computing was the era of IBM, where complex mainframe computers supported large numbers of people through centralized organizations. The second wave was the era of Microsoft, where individuals and personal computing machines shared the same desktop in decentralized computing.

Today searching the Web has become an essential starting point for information access. As a result, the third wave of computing is becoming the era of Google where the information environment of each person is connected through Google to many varied devices. The key enabling technologies include the Web, wireless networking, and countless small devices which extend our personal reach from the local to a global.

The process of information retrieval consists of searching within a document collection for a particular query. In 1989 document storage, access and retrieval was revolutionized by Tim Berners-Lee when he invented the World Wide Web. Unfortunately, like the infamous Tower of Babel, much of the data on the Web remained inaccessible until 1998 when link analysis for information retrieval began to be used.

The word “Google” didn’t exist before Larry Page and Sergio Brin misspelled Googol while naming their new link analysis search engine which used their PageRank algorithm.

Today, the word Google defines a 200 billion dollar company with over 25,000 employees that connects people in over a hundred languages to relevant information for free. Called “googling” the search results of over a billion searches a day includes small targeted advertisements through its ‘AdWords’ system; a software system that offers a self-service advertisement development capability and yields tremendous financial success for Google.

Soon however, ubiquitous computing will empower the fourth wave in computing called the Ubiquitous Web, where connected devices are automatically controlled over the Web to run financial transactions, as well as, pumps, lights, switches, sensors, monitors, and all manner of industrial machines. Google’s G-phone operating system which is scheduled for release in late 2008 is its first step toward dominating the fourth wave.

While Google Earth provides satellite views for nearly very location on Earth including weather patterns and information, Google’s next level of capabilities such as G-Android and G-Phone can extend to monitoring and sharing environmental data. The development of the Ubiquitous Web in coming years will permit global sensing and monitoring of many environmental parameters, such as, greenhouse gases, temperatures, water levels, ice flows, animal populations, vegetation, rainforests, weather patterns and much more. Eventually, the Ubiquitous Web connecting to billions of devices worldwide will enable us to control many environmental devices such as heat, light, pumps, etc.

For more information about technology innovations and the
Ubiquitous Web see the following references.

REFERENCES:

Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery John Wiley & Sons, Inc. 2008.

Alesso, H. P. and Smith, C. F., Developing Semantic Web Services A. K. Peters Inc., 2004.

Web Site:
Video Software Lab

Semantic Search Technology

Today, whether you are at your PC or wandering the corporate halls with your PDA, searching the Web has become an essential part of doing business. As a result, commercial search engines have become very lucrative and companies such as Google and Yahoo! have become household names.

In ranking Web pages, search engines follow a certain set of rules with the goal to return the most relevant pages at the top of their lists. To do this, they look for the location and frequency of keywords and phrases. Keywords, however, are subject to the two well-known linguistic phenomena that strongly degrade a query's precision and recall: Polysemy (one word might have several meanings) and Synonymy (several words or phrases might designate the same concept).

There are three characteristics required for search engine performance in order to separate useful searches from fruitless ones:

* Maximum relevant information,

* Minimum irrelevant information and,

* Meaningful ranking, with the most relevant results first.

In addition, some search engines use Googel's approach to ranking which assess popularity by the number of links that are pointing to a given site. The heart of Google search software is PageRank, a system that relies on the vast link structure as an indicator of an individual page's value. It interprets a link from page A to page B as a vote, by page A, for page B. Important sites receive a higher PageRank. Votes cast by pages that are themselves ‘important,’ weigh more heavily and help to make other pages ‘important.’

Nevertheless, it is still common for searches to return too many unwanted results and often miss important information. Recently, Google and other innovators have been seeking to implement limited natural language (semantic) search. Semantic search methods could improve traditional results by using, not just words, but concepts and logical relationships. Two approaches to semantics are Semantic Web Documents and Latent Semantic Indexing (LSI).

LSI is an information retrieval method that organizes existing HTML information into a semantic structure that takes advantage of some of the implicit higher-order associations of words with text objects. The resulting structure reflects the major associative patterns in the data. This permits retrieval based on the ‘latent’ semantic content of the existing Web documents, rather than just on keyword matches. LSI offers an application method that can be implemented immediately with existing Web documentation. In a semantic network, the meaning of content is better represented and logical connections are formed.

However, most semantic-network-based search engines suffer performance problems because of the scale of the very large semantic network. In order for the semantic search to be effective in finding responsive results, the network must contain a great deal of relevant information. At the same time, a large network creates difficulties in processing the many possible paths to a relevant solution.

Most of the early efforts on semantic-based search engines were highly dependent on natural language processing techniques to parse and understand the query sentence. One of the first and the most popular of these search engines is Cycorp (http://www.cyc.com). Cyc combines the world’s largest knowledge base with the Web. Cyc (which takes it name from en-cyc-lopedia) is an immense, multi-contextual knowledge based. With Cyc Knowledge Server it is possible for Web sites to add common-sense intelligence and distinguish different meanings of ambiguous concepts.

For more information about technology innovations and Web video see the following references.

REFERENCES:

Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery John Wiley & Sons, Inc. 2008.

Alesso, H. P. and Smith, C. F., Developing Semantic Web Services A. K. Peters Inc., 2004.

Web Site:
Video Software Lab

Tuesday, April 15, 2008

New Book - Connections: Patterns of Discovery

From the Foreword of James Burke:

In their fascinating analysis of the recent history of information technology, Peter Alesso and Craig Smith reveal the patterns in discovery and innovation that have brought us to the present tipping point...A generation from now every individual will have personally-tailored access to the whole of knowledge...The sooner we all begin to think about how we got here, and where we’re going, the better. This exciting book is an essential first step.

SUMMARY

Many people envision scientists as dispassionate characters who slavishly repeat experiments until ‘eureka’ - something unexpected happens. Actually, there is a great deal more to the story of scientific discovery, but 'seeing the big picture' is not easy. Connections: Patterns of Discovery uses the primary tools of forecasting and three archetypal patterns of discovery — the Serendipity, the Proof of Principle, and 1% Inspiration and 99% Perspiration — to discern relationships of past developments to synthesize a cohesive and compelling vision for the future. It challenges readers to think of the consequences of extrapolating trends, such as Moore’s Law, to either reach real machine intelligence or retrench in the face of physical limitations. From this perspective, the book draws 'the big picture' for the Information Revolution’s innovations in chips, devices, software, and networks. In a collection of compelling chapters, this book shows how the past flows into the future:

• Starting with the inspirational journey of two young inventors creating Google, the world’s best search engine, it identifies the stepping stones toward ‘perfect search.’
• The growth of the Information Superhighway links succeeding proof of principle inventions fueling Moore’s Law: the vacuum tune, the transistor, and the microprocessor, thereby illuminating the next generation of chips.
• The incubator research center of Xerox PARC led the breakthrough in six technologies, only to fail commercially, in contrast to the success of Apple and IBM PC. - setting the stage for innovative breakthroughs in small devices.
• Ray Kurzweil’s vision of accelerating discovery forms the inspiration for the future shape of discovery patterns.
• And much more

With a foreword by James Burke and bursting with fascinating detail throughout, Connections: Patterns of Discovery is a must-read for computer scientists, technologists, programmers, hardware and software developers, students, and anyone with an interest in tech-savvy topics.

http://www.amazon.com/Connections-Patterns-Discovery-Peter-Alesso/dp/0470118814/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1203981810&sr=8-1