About

Wheelchair Basketball Coaching

PHYSNET project: Physical Interaction Using the Internet: The Virtual Coach Intervention

Tele-immersive technology could significantly improve the access to knowledgeable coaches thereby improving the ability to acquire the knowledge and skills necessary to competently engage in physical activity without injuries (stopping and turning can result in soft tissue injuries to the hands (e.g., blisters), or sprained/strained fingers, and/or other upper extremity injuries even with state of the art sport wheelchairs (with waist belts and anti-tipping features) used).

In full tele-immersive environment, the learning of shooting techniques and training (hook pass, figure eight dribble, one-on-one defense) will involve very limited co-action with other players to minimize the likelihood of risks, and all such will be closely supervised remotely by the coach.

Tele-immersive technology can create digital clones of people and objects at multiple geographical locations and then place them into a shared virtual space in real time. This technology turns out to be very useful for citizens with limited proprioception (the sense of the relative position of neighbouring parts of the body and locomotion). The tele-immersive environments can provide spatial cues that would lead to re-gaining proprioception, as well as to supporting training of children, athletes and veterans with disabilities, and to facilitating physical interaction and communication of the persons with disabilities with their relatives and others in their homes and work places. The team at NCSA/UIUC works closely with the Disability Services at the University of Illinois (DRES) and the wheelchair basketball players. They have researched and developed a prototype tele-immersive technology that addresses the problems of

(a) adaptive placement of stereo camera networks for optimal deployment;

(b) robust performance under illumination changes using thermal infrared and visible spectrum imaging; and

(c) quantitative understanding of the value of tele-immersive environments for citizens with limited proprioception.

The figures below illustrate the technical challenges of building and deploying robust, inexpensive and portable tele-immersive systems, as well as evaluating the value of such technologies for citizens with disabilities. The experience gained from the current work focused on wheelchair basketball coaching and proprioception applications has opened the door to studying physical interactions using the Internet.

We colaborate with Mike Frogley, a headcoach of men's and women's basketball teams at the University of Illinois.

Collaboration between people in a shared virtual environment Collaboration between people in a shared virtual environment Collaboration between people in a shared virtual environment

Experimental Setup of the spaces. Left: Physical space at NCSA with the portable camera clusters and LCD. Right: Action in CS building (physical space) as seen remotely on the computer monitor. The two spaces were combined into virtual space (middle).


How Does the System Work?

Required infrastructure for a tele-immersive system includes everything from networking to stereo camera rigs, software controlling the operation of the cameras, calibration image acquisition, synchronization, 3D reconstruction, and foreground detection.

System Requirement

  • Key Functionality: Raw images with (x, y, z, t, spectral) information; Knowledge about interactions of cloned objects and virtual objects; Interaction and feedback in real time.
  • Portability: Re-configurable hardware, easy synchronization and calibration.
  • (Low) Cost: < $50K in comparison to the commercial solutions: > $0.5M.
  • Robustness: Invariance to environment variables, scalable with the number of hardware components and the computational resources, adaptable to LAN and web networking resources, real-time performance.

Approach

  • stereo vision, time-of-flight ranging, multi-spectral imaging, advanced analyses of scene measurements, analyses of cloned and virtual interactions, multi-sensory feedback to humans.
  • Portable mounting of hardware (tripods and carts), automated parameter configuration
  • COTS components, open source software, web 2.0
  • multi-spectral sensing, scalable algorithms for 3D reconstruction (stereo vision) and rendering (fusion of multiple streams) to accommodate a variable number of cores on PCs, distributed system operation using TCP/IP and UDP protocols over existing networks, data compression, object recognition.

Data volume and networking: Huge bandwidth requirements for the uncompressed data stream:
One 3D stream requires approximately 23MBytes/sec (640x480 [pixel/frame] x 5 [bytes/pixel] x 15 [frames/sec] = 23,040,000 [bytes/second])
One TI (10 3D Streams) requires approximately 230MBytes/sec. Also the streams have to be fused with different time latencies.

physical space

Figure 1: A sketch of our experimental setup (physical space). Approximate room size is 15ft x 15ft

System operation:

  1. Session controller registers gateways.
  2. Gateway machine manages its local cameras and displays, and relays the content to other gateways.
  3. Trigger server synchronizes cameras. It will send a message to its gateway with the number of cameras to create a new session.
  4. Camera machine performs all computation of four video streams and sends the content to the gateway on the port assigned by the gateway. After the exact number of cameras has joined the session, the gateway will make the content available to local renderers and and/or other gateways.
  5. Renderer (or Display) machine runs a small daemon code that maintains the connection with the gateway and performs the fusion of all local and remote content.

Vision algorithms that perform 3D reconstruction primarily rely on knowledge of the camera positions and orientations with respect to some reference frame. Camera calibration is the process of determining the geometric and optical parameters which describe the transformation from an object in the world to it’s image detected by the camera system. These parameters are usually grouped into intrinsic and extrinsic parameters. Intrinsic parameters describe quantities that are effected by the optical and electrical components of a camera: (a) focal length, (b) pixel aspect ratio, (c) principle point, (d) lens distortion, etc. Extrinsic parameters describe the geometric position and orientation of the camera with respect to the world.

Stereo camera clusters

Figure 2: Image depicts two trinocular stereo cluster and one thermal infrared camera. The stereo cluster contains three grayscale and one color Dragonfly digital camera. The thermal camera is an uncooled microbolometer, which detects thermal energy in the LWIR (7.5 to 13.5 microns) wavelengths.

Our calibration method is based on technique of Tomáš Svoboda and co-workers [pdf 1.7MB] with calibration time about 1-3 hours. Our goal is to automate calibration of intrinsic parameters while enabling manual calibration of extrinsic parameters.

High frame rate is required for motion capture (basketball bouncing). Our frame rate is 20-24 fps on quad core machine compared to the original 2-5 fps on single and 12 fps on dual core machine.

Integration of thermal and visible imagery for robust foreground detection in tele-immersive spaces

The central objects of interest in a tele-immersive system are these people, the things they jointly manipulate, and the tools they need to perform this manipulation. Certain assumptions about the image in order to reduce the complexity of the detection problem make it easier to find objects of interest. Typical assumptions are: background materials have non-reflective surfaces, intensity differential between background and foreground objects, scene illumination is constant, foreground object is uniformly illuminated by diffuse lighting, scene lighting exhibits a constant power spectrum.

Unfortunately, these assumptions often break down in practice. In particular, five characteristics of real scenes cause problems in the current TEEVE system: changing illumination, moving foreground objects (causing shadows), moving background objects, lack of contrast between foreground and background objects, and lack of contrast between different foreground objects.

TEEVE problems

Figure 3: Problems in object detection: Top images are the current frames, below are static backgrounds taken before image acquisition, current minus static background is the third from top and finally bottom images show the foreground detection. (a) Changing illumination: In this sequence, a small lamp was turned on before (current) acquisition, causing large portions of the background to be seen as foreground. (b) Moving foreground objects: Moving foregrounds cast shadows and change the general scattering environment of the scene. In this sequence, background objects are mistaken as foreground. (c) Moving background objects: The computer display (a background object) has changed appearance between the acquisition of the background and the current frame. Thus, the current system treats these changed pixels as foreground. (d) Low contrast between foreground and background: The visible modality has a difficulty classifying foreground objects that have intensities or colors (dark shirt here) that closely match the background model. (e) Low contrast between different foreground objects: The current TEEVE system cannot distinguish between the object of interest (the ball), and a rectangular object, thus classifying them both as foreground.

A method of fusing information from visible and thermal infrared cameras can solve all five of these problems.

Visual and thermal cameras provide fundamentally different information. Where visual cameras primarily measure how materials reflect light, thermal cameras primarily measure temperature. These differences of content mean that a combination of visual and infrared images can provide more information about a scene than either modality used alone.

TEEVE visible spectrum   TEEVE both spectra

Figure 4: Left. TEEVE system based on visible spectrum imaging only. Right. Tele-immersive system based on visible and thermal IR spectrum imaging and information fusion.

There are three types of benefits that IR imaging can provide tele-immersive systems: (1) IR can enhance image processing tasks at a low level (e.g. human foreground detection), (2) IR can allow tele-immersion users to perceive temperature in the virtual environment using visual or tactile feedback, and (3) IR can fundamentally enhance material and object classification.

Final Results

Visible and IR results 1   Visible and IR results 2   Visible and IR results 3

Figure 5: Left: Visible and infrared camera (marked IR) frames showing the changing illumination result similar to the figure 3a. Middle: Moving foreground scenario similar to the fig.3b. Top images are the current frames, below are static backgrounds taken before image acquisition, current minus static background is the third from top and finally bottom images show the foreground detection. Right: Low contrast between foreground/background experimental results (figure 3d). Fusion with thermal information is able to fill in the missing information.

Finally, figure 6 shows the combined results. In the changing illumination experimental results (figure 3a and 5 left) the thermal imagery is not sensitive to the lighitng change and is able to detect the person in the scene. A lso, because our inanimate object detection emphasizes higher level features, in this case shape, the ball is correctly identified as an object of interest.
In the moving foreground object experimental results (figure 3b and 5 middle) shadows are cast on the background due to the foreground object's motion in the scene. Room lighting remained constant during this experiment, and the change of background pixel intensity is solely due to the human foreground blocking illumination. This experiment illustrates that visible and thermal fusion is robust to shadowing on textured backgrounds.
Low contrast between foreground/background experimental results: This experiment demonstrates our fusion algorithm performance in the presence of problems that can occur when the foreground object has a similar color or brightness as the background (described in figure 3d and 5 right). In this case, the current system fails to recognize portions of the person as foreground because of their dark clothing. Fusion with thermal information is able to fill in the missing information.

Visible and IR results 3

Figure 6: Top: Changing illumination results. Three top pictures show our results of visible and IR fusion (compared to the existing system performance) in the presence of the Changing illumination problem described in figures 3a and 5 left). Middle: Results in the presence of the Moving foreground object results (figures 3b and 5 middle). Bottom: Improved detection of the foreground object with the thermal information. (figures 3d and 5 right)

Experiment Method F Neg F Pos Total % err
Changing illumination TEEVE
Fusion
141
566
21939
420
22080
986
57
2
Moving foregrounds casting shadows TEEVE
Fusion
0
244
10674
876
10674
1120
28
3
Moving background TEEVE
Fusion
122
117
2205
424
2327
541
6
1
Low contrast between foreground and background TEEVE
Fusion
6056
741
209
647
6265
1344
16
4
Low contrast between different foreground objects TEEVE
Fusion
74
206
1127
1010
1201
1216
3
3

Table of quantitative results, comparing performance of current tele-immersive system and our proposed fusion algorithm. "F Neg" represents the number of pixels that were incorrectly classified as background (i.e. the false negative detections). "F Pos" represents the number of pixels that were incorrectly classified as foreground (i.e. the false positive detections). The "Total" is the sum of these two pixel counts, and the percent error represents the percentage of the image that was misclassified.

Summary

  1. Calibration of visible and infrared cameras: we extended a state of the art automatic multi-camera calibration technique to simultaneously calibrate grayscale, color, and thermal cameras.
  2. Development of methodology for fusing visible and infrared images based on tele-immersive system scene modeling and estimation of scene 3D structure.
  3. Building prototype hardware to acquire visible and thermal IR imagery, and the design of off-line processing and analysis algorithms.
  4. Quantitative analysis of the tele-immersive system with and without fusion of visible and infrared information.

Figure below presents an example of a problem related to foreground detection approached by using the fusion of Thermal Infrared (IR) and Visible images. Through exploring the fusion of multiple sensor modalities in the context of tele-immersive systems, we can enhance computational efficiency, user immersive experience, and automatic scene understanding.

Challenges of the telle-immersive project

Figure 7: This set of images (from a single time step) demonstrates a particular challenging scenario that involves a dynamic scene in both the visible and thermal wavelengths. Top row: visible wavelength background image; visible frame, difference between current visible frame and background. Bottom row: thermal background, current thermal frame, simple thresholding and connected components in thermal frame. This scene contains a monitor, which is showing a dynamic video, and a warm cup of water which is cooling over time. Note that the monitor can also change temperature over time if it is turned off, or hibernating. In this challenging case, model-based classification will be able to tell the difference between the human subject and other objects.


People, Publications, Presentations

Team members

Former members

  • Miles Johnson
    ISDA, Aerospace Department, UIUC
  • Suk Kyu Lee
    ISDA, Computer Science Department, UIUC

Funding

The funding was provided by National Science Foundation IIS 07-03756 grant 490630 and NCSA core grant.

Publications

  • R. Malik and P. Bajcsy, "Achieving Color Constancy Across Multiple Cameras.", ACM International Conference on Multimedia, Beijing, China, October 19 - 24, 2009 (~ 30% acceptance)
  • P. Bajcsy, K. McHenry, H.-J. Na, R. Malik, A. Spencer, S.-K. Lee, R. Kooper, and M. Frogley, "Immersive Environments For Rehabilitation Activities.", ACM International Conference on Multimedia, Beijing, China, October 19 - 24, 2009 (~ 27.5% acceptance)
  • K. McHenry and P. Bajcsy, "Key Aspects in 3D File Format Conversions.", Joint Annual Meeting of the Society of American Archivists and the Council of State Archivists, 2009 Research Forum 'Foundations and Innovations', August 11, Hilton Austin, Texas, USA, 2009 [proceedings]
  • S-K. Lee, K. McHenry, R. Kooper and P. Bajcsy, "Characterizing Human Subjects In Real-Time And Three-Dimensional Spaces By Integrating Thermal-Infrared And Visible Spectrum Cameras.", Workshop on Multimedia Aspects in Pervasive Healthcare., in conjunction with 2009 IEEE International Conference on Multimedia & Expo (ICME), July 3, 2009, New York, NY, USA
  • P. Bajcsy, M. Frogley, R. Kooper, S-K. Lee, R. Malik, K. McHenry, H-J. Na, and A. Spencer, "Design And Use Of Immersive Environments For Regaining Proprioceptive Abilities.", Workshop on Multimedia Aspects in Pervasive Healthcare., in conjunction with 2009 IEEE International Conference on Multimedia & Expo (ICME), July 3, 2009, New York, NY, USA
  • R. Malik and P. Bajcsy, "Optimal Stereo Camera Placement Under Spatially Varying Resolution Requirements.", 2nd International Conference on Immersive Telecommunications., University of California, Berkeley, CA, USA, May 27-29, 2009 (accepted ~50% acceptance rate)
  • A. Spencer, H Jung, K. McHenry, H-J. Na, R. Malik, S-K. Lee, R. Kooper, P. Bajcsy, "Tele-Immersive Environments For Everybody.", poster at PRAGMA 16, KISTI, Daejon Convention Center, Korea, March 23-24, 2009
  • Miles Johnson, "Integration of thermal and visible imagery for robust foreground detection in Tele-immersive spaces.", Thesis for the degree of Master in Science in Aerospace Engineering, Graduate College of the University of Illinois at Urbana-Champaign, 2007 [pdf 4.6MB]