Revised on Oct. 4, 2002
Still images are often embedded in web pages published on the World Wide Web (WWW). These images are sometimes made wholly or partly interactive (clickable) so that users are redirected to related information according to predetermined URLs when they click on the active regions of these images. Similar interactive facilities have also been developed for moving images delivered via digital TV broadcasts or across the internet, whereby users can click on specific regions or specific objects in the moving images to display information relating to the clicked object (normally specified by a URL) or to generate some kind of event (such as starting up a certain computer program), such as BML, SMIL, Cmew.
For both types of image (still and moving), the active regions with which hyperlinks are associated are defined as virtual regions within the digital pictures (such as in picture coordinates) rather than by recognizing the actual objects in the images. Since it is the objects themselves that convey meaning, rather than their pictorial representations, it would be nice to have some way of establishing links directly from the real-world objects to their related information. However, it is currently impossible to link directly between the real world and cyberspace entities on the Internet. We therefore have to make do with pictorial representations of objects in photographs and images, which are treated as symbols for the actual objects.
Additionally it is not easy to specifiy the regions of objects in an image. Objects in an image cannot accurately be recognized by computer, so instead a human has to identify the objects in the image, specify the regions occupied by these objects, and associate each of these regions with links such as URLs. However the objects may move around in frames of moving pictures.
In this study, we propose and describe an implementation of a model whereby, when capturing objects with a digital (video) camera, the regions in which the objects to be made clickable appear are also stored along with collateral information related to the objects such as link destination URLs and the time at which the images were produced. These object regions and collateral information are acquired, recorded and played back simultaneously with the original image, so when the captured images are displayed (in a web browser for example), the display system can also obtain information such as the boundaries of the clickable objects and their link destination URLs. Studies aimed at implementing similar capabilities include augmented reality (AR), where digital data such as text and images is embedded in the real world. In AR, a computer is used to superimpose information and virtual object representations on the user's view of the real world in order to provide annotations or for entertainment. This is normally achieved by placing identification tags or devices at locations where annotations are required, and then working out where to display the annotations based on knowledge about the IDs and positions of these tags or devices.
On the other hand, the technique proposed in this paper aims to produce images in which there are no visible tags or devices of this sort (at least to human viewers). We also focus on identifying regions within a captured movie rather than specific locations in space. Of course, it would be perfectly feasible to use this technique to embed annotations into the generated video and collateral information, but we will not deal with such applications in this paper. Instead, we will concentrate on the production of video content that appears to be free of artificial tags and the like while extracting the regions of objects to be made clickable. The proposed method allows content creators to capture moving images along with information relating to objects that appear in these images and the clickable areas of these objects. Then it aims to achieve point-and-click interactivity to be added to video content received via the internet and played back in a browser, or delivered by a cable network or satellite broadcasting and viewed as digital TV.
Example Picture (underconstruction)
Back to the top page
© Toshiharu Sugawara