About
Wheelchair Basketball Coaching
PHYSNET project: Physical Interaction Using the Internet: The Virtual Coach Intervention
Tele-immersive technology could significantly improve the access to knowledgeable coaches thereby improving the ability to acquire the
knowledge and skills necessary to competently engage in physical activity without injuries (stopping and turning can result
in soft tissue injuries to the hands (e.g., blisters), or sprained/strained fingers, and/or other upper
extremity injuries even with state of the art sport wheelchairs (with waist belts and anti-tipping features) used).
In full tele-immersive environment, the learning of shooting techniques and training (hook pass,
figure eight dribble, one-on-one defense) will involve very limited co-action with other players to minimize the likelihood of risks,
and all such will be closely supervised remotely by the coach.
Tele-immersive technology can create digital clones of people and objects at multiple geographical locations and then place them into a shared virtual space in real time. This technology turns out to be very useful for citizens with limited proprioception (the sense of the relative position of neighbouring parts of the body and locomotion). The tele-immersive environments can provide spatial cues that would lead to re-gaining proprioception, as well as to supporting training of children, athletes and veterans with disabilities, and to facilitating physical interaction and communication of the persons with disabilities with their relatives and others in their homes and work places. The team at NCSA/UIUC works closely with the Disability Services at the University of Illinois (DRES) and the wheelchair basketball players. They have researched and developed a prototype tele-immersive technology that addresses the problems of
(a) adaptive placement of stereo camera networks for optimal deployment;
(b) robust performance under illumination changes using thermal infrared and visible spectrum imaging; and
(c) quantitative understanding of the value of tele-immersive environments for citizens with limited proprioception.
The figures below illustrate the technical challenges of building and deploying robust, inexpensive and portable tele-immersive systems, as well as evaluating the value of such technologies for citizens with disabilities. The experience gained from the current work focused on wheelchair basketball coaching and proprioception applications has opened the door to studying physical interactions using the Internet.
We colaborate with
Mike Frogley, a headcoach of
men's and women's basketball teams at the University of Illinois.
How Does the System Work?
Required infrastructure for a tele-immersive system includes everything from networking
to stereo camera rigs, software controlling the operation of the cameras, calibration
image acquisition, synchronization, 3D reconstruction, and foreground detection.
System Requirement
- Key Functionality: Raw images with (x, y, z, t, spectral) information;
Knowledge about interactions of cloned objects and virtual objects;
Interaction and feedback in real time.
- Portability: Re-configurable hardware, easy synchronization and calibration.
- (Low) Cost: < $50K in comparison to the commercial solutions: > $0.5M.
- Robustness: Invariance to environment variables, scalable with the number of hardware
components and the computational resources, adaptable to LAN and web networking resources, real-time performance.
|
Approach
- stereo vision, time-of-flight ranging, multi-spectral imaging, advanced analyses of scene measurements,
analyses of cloned and virtual interactions, multi-sensory feedback to humans.
- Portable mounting of hardware (tripods and carts), automated parameter configuration
- COTS components, open source software, web 2.0
- multi-spectral sensing, scalable algorithms for 3D reconstruction (stereo vision) and rendering
(fusion of multiple streams) to accommodate a variable number of cores on PCs, distributed system operation
using TCP/IP and UDP protocols over existing networks, data compression, object recognition.
|
Data volume and networking: Huge bandwidth requirements for the uncompressed data stream:
One 3D stream requires approximately 23MBytes/sec (640x480 [pixel/frame] x 5 [bytes/pixel] x 15 [frames/sec] = 23,040,000 [bytes/second])
One TI (10 3D Streams) requires approximately 230MBytes/sec. Also the streams have to be fused with different time latencies.
Figure 1: A sketch of our experimental setup (physical space). Approximate room size
is 15ft x 15ft
System operation:
- Session controller registers gateways.
- Gateway machine manages its local cameras and displays, and relays the content to other
gateways.
- Trigger server synchronizes cameras. It will send a message to its gateway with the number
of cameras to create a new session.
- Camera machine performs all computation of four video streams and sends the content to
the gateway on the port assigned by the gateway. After the exact number of cameras has joined
the session, the gateway will make the content available to local renderers and and/or other
gateways.
- Renderer (or Display) machine runs a small daemon code that maintains the connection with
the gateway and performs the fusion of all local and remote content.
Vision algorithms that perform 3D reconstruction primarily rely on knowledge of the
camera positions and orientations with respect to some reference frame. Camera calibration is the process of
determining the geometric and optical parameters which describe the transformation from an object in
the world to it’s image detected by the camera system. These parameters are usually grouped into
intrinsic
and extrinsic parameters. Intrinsic parameters describe quantities that are effected by the optical and
electrical components of a camera: (a) focal length, (b) pixel aspect ratio, (c) principle
point, (d) lens distortion, etc. Extrinsic parameters describe the geometric position and
orientation of the camera with respect to the world.
Figure 2: Image depicts two trinocular stereo cluster
and one thermal infrared camera. The stereo cluster contains three grayscale and one
color Dragonfly digital camera. The thermal camera is
an uncooled microbolometer, which detects thermal energy in the
LWIR (7.5 to 13.5 microns) wavelengths.
Our calibration method is based on technique of
Tomáš Svoboda and co-workers
[pdf 1.7MB] with calibration time about 1-3 hours. Our goal is to
automate calibration of intrinsic parameters while enabling manual calibration of extrinsic parameters.
High frame rate is required for motion capture (basketball bouncing). Our frame rate is 20-24 fps on quad core machine compared to
the original 2-5 fps on single and 12 fps on dual core machine.
Integration of thermal and visible imagery for robust foreground detection in tele-immersive spaces
The central objects of interest in a tele-immersive system are these people, the things they jointly manipulate,
and the tools they need to perform this manipulation. Certain assumptions about the image in order to reduce the complexity of the
detection problem make it easier to find objects of interest. Typical assumptions are: background materials have non-reflective surfaces,
intensity differential between background and foreground objects, scene illumination is constant, foreground object is uniformly illuminated
by diffuse lighting, scene lighting exhibits a constant power spectrum.
Unfortunately, these assumptions often break down in practice. In particular, five characteristics of real scenes
cause problems in the current TEEVE system: changing illumination, moving foreground objects (causing shadows),
moving background objects, lack of contrast between foreground and background objects, and lack of contrast between different
foreground objects.
Figure 3: Problems in object detection:
Top images are the current frames, below are static backgrounds taken before image acquisition,
current minus static background is the third from top and finally bottom images show the foreground detection.
(a) Changing illumination: In this sequence, a small lamp was turned on before
(current) acquisition, causing large portions of the background to be seen as foreground.
(b) Moving foreground objects: Moving foregrounds cast shadows
and change the general scattering environment of the scene. In this sequence, background objects are mistaken as foreground.
(c) Moving background objects: The computer display (a background object) has changed
appearance between the acquisition of the background and the current frame. Thus, the current system treats these changed
pixels as foreground.
(d) Low contrast between foreground and background: The visible modality has a difficulty
classifying foreground objects that have intensities or colors (dark shirt here) that closely match the background model.
(e) Low contrast between different foreground objects: The current TEEVE system
cannot distinguish between the object of interest (the ball), and a rectangular object, thus classifying them both as foreground.
A method of fusing information from visible and thermal infrared cameras
can solve all five of these problems.
Visual and thermal cameras provide fundamentally different information. Where visual
cameras primarily measure how materials reflect light, thermal cameras primarily
measure temperature. These differences of content mean that a combination of visual
and infrared images can provide more information about a scene than either modality
used alone.
Figure 4: Left. TEEVE system based on visible spectrum imaging only. Right. Tele-immersive
system based on visible and thermal IR spectrum imaging and information fusion.
There are three types of benefits that IR imaging can provide tele-immersive systems:
(1) IR can enhance image processing tasks at a low level (e.g. human foreground
detection), (2) IR can allow tele-immersion users to perceive temperature in the virtual
environment using visual or tactile feedback, and (3) IR can fundamentally enhance
material and object classification.
Final Results
Finally, figure 6 shows the combined results. In the changing illumination experimental results
(figure 3a and 5 left)
the thermal imagery is not sensitive to the lighitng change and is able to detect the person in the scene. A
lso, because our inanimate object detection emphasizes higher level features, in this case shape, the ball is
correctly identified as an object of interest.
In the moving foreground object experimental results (figure 3b and 5 middle) shadows are cast on the
background due to the foreground object's motion in the scene. Room lighting remained constant
during this experiment, and the change of background pixel intensity is solely due to the human
foreground blocking illumination. This experiment illustrates that visible and thermal fusion is
robust to shadowing on textured backgrounds.
Low contrast between foreground/background experimental results: This experiment
demonstrates our fusion algorithm performance in the presence of problems that can occur
when the foreground object has a similar color or brightness as the background (described in figure 3d and 5 right).
In this case, the current system fails to recognize portions of the person as foreground
because of their dark clothing. Fusion with thermal information is able to fill in the missing
information.
Figure 6: Top: Changing illumination results. Three top pictures show
our results of visible and IR fusion (compared to the existing system performance)
in the presence of the Changing illumination problem described in figures 3a and 5 left).
Middle: Results in the presence of the Moving foreground object results (figures 3b and 5 middle).
Bottom: Improved detection of the foreground object with the thermal information. (figures 3d and 5 right)
Experiment |
Method |
F Neg |
F Pos |
Total |
% err |
Changing illumination |
TEEVE Fusion |
141 566 |
21939 420 |
22080 986 |
57 2 |
Moving foregrounds casting shadows |
TEEVE Fusion |
0 244 |
10674 876 |
10674 1120 |
28 3 |
Moving background |
TEEVE Fusion |
122 117 |
2205 424 |
2327 541 |
6 1 |
Low contrast between foreground and background |
TEEVE Fusion |
6056 741 |
209 647 |
6265 1344 |
16 4 |
Low contrast between different foreground objects |
TEEVE Fusion |
74 206 |
1127 1010 |
1201 1216 |
3 3 |
Table of quantitative results, comparing performance of current tele-immersive system
and our proposed fusion algorithm. "F Neg" represents the number of pixels that were
incorrectly classified as background (i.e. the false negative detections). "F Pos"
represents the number of pixels that were incorrectly classified as foreground
(i.e. the false positive detections). The "Total" is the sum of these two pixel counts,
and the percent error represents the percentage of the image that was misclassified.
Summary
- Calibration of visible and infrared cameras: we extended a state of the art automatic
multi-camera calibration technique to simultaneously calibrate grayscale,
color, and thermal cameras.
- Development of methodology for fusing visible and infrared images based on
tele-immersive system scene modeling and estimation of scene 3D structure.
- Building prototype hardware to acquire visible and thermal IR imagery, and the
design of off-line processing and analysis algorithms.
- Quantitative analysis of the tele-immersive system with and without fusion of visible
and infrared information.
Figure below presents an example of a problem related to
foreground detection approached by using the fusion of Thermal Infrared (IR)
and Visible images. Through exploring the fusion of multiple sensor
modalities in the context of tele-immersive systems, we can enhance
computational efficiency, user immersive experience, and automatic scene
understanding.
Figure 7: This set of images (from a single time step)
demonstrates a particular challenging scenario that involves a dynamic scene in
both the visible and thermal wavelengths. Top row: visible
wavelength background image; visible frame, difference between current visible
frame and background. Bottom row: thermal background, current
thermal frame, simple thresholding and connected components in thermal
frame. This scene contains a monitor, which is showing a dynamic
video, and a warm cup of water which is cooling over time. Note that
the monitor can also change temperature over time if it is turned off, or
hibernating. In this challenging case, model-based classification will be
able to tell the difference between the human subject and other objects.
People, Publications, Presentations
Team members
- Professor Peter Bajcsy
Research group ISDA, National Center for Supercomputing
Applications, UIUC
- Professor Ruzena Bajcsy
Berkeley Center for Information Technology Research
in the Interest of Society (CITRIS)
- Yi Ma
Electrical and Computer Engineering , UIUC
- Mike Frogley
Head Coach, Men's and Women's Basketball, Disability Services
at the University of Illinois (DRES), UIUC
- Brad Hedrick
DRES, UIUC
- Professor Kenneth Watkin
Department of Speech and Hearing Sciences, UIUC, UIUC
- Professor Claire Tomlin
Electrical Engineering and Computer Sciences, UC Berkeley
- Professor Richard Ivry
Cognition and Action Lab, UC Berkeley
- Professor Robert Gotsch
- Professor Klara Nahrstedt
Research group MONET,
Computer Science Department, UIUC
- Rob Kooper
ISDA, National Center for Supercomputing
Applications, UIUC
- Gregorij Kurillo
CITRIS
tele-immersion, UC Berkeley
- Kenton McHenry
ISDA, NCSA, UIUC
- Rahul Malik
ISDA, Computer Science Department, UIUC
- Hye Jung Na
ISDA, Computer Science Department, UIUC
Former members
- Miles Johnson
ISDA, Aerospace Department, UIUC
- Suk Kyu Lee
ISDA, Computer Science Department, UIUC
Funding
The funding was provided by National Science Foundation IIS 07-03756 grant 490630 and NCSA core grant.
Publications
- R. Malik and P. Bajcsy,
"Achieving Color Constancy Across Multiple Cameras.",
ACM International Conference on Multimedia, Beijing, China, October 19 - 24, 2009 (~ 30% acceptance)
- P. Bajcsy, K. McHenry, H.-J. Na, R. Malik, A. Spencer, S.-K. Lee, R. Kooper, and M. Frogley,
"Immersive Environments For Rehabilitation Activities.",
ACM International Conference on Multimedia, Beijing, China, October 19 - 24, 2009 (~ 27.5% acceptance)
- K. McHenry and P. Bajcsy,
"Key Aspects in 3D File Format Conversions.",
Joint Annual Meeting of the Society of American Archivists and the Council of State Archivists, 2009 Research Forum 'Foundations and Innovations', August 11, Hilton Austin, Texas, USA, 2009
[proceedings]
- S-K. Lee, K. McHenry, R. Kooper and P. Bajcsy,
"Characterizing Human Subjects In Real-Time And Three-Dimensional Spaces By Integrating Thermal-Infrared And Visible Spectrum Cameras.",
Workshop on Multimedia Aspects in Pervasive Healthcare., in conjunction with 2009 IEEE International Conference on Multimedia & Expo (ICME), July 3, 2009, New York, NY, USA
- P. Bajcsy, M. Frogley, R. Kooper, S-K. Lee, R. Malik, K. McHenry, H-J. Na, and A. Spencer,
"Design And Use Of Immersive Environments For Regaining Proprioceptive Abilities.",
Workshop on Multimedia Aspects in Pervasive Healthcare., in conjunction with 2009 IEEE International Conference on Multimedia & Expo (ICME), July 3, 2009, New York, NY, USA
- R. Malik and P. Bajcsy,
"Optimal Stereo Camera Placement Under Spatially Varying Resolution Requirements.",
2nd International Conference on Immersive Telecommunications., University of California, Berkeley, CA, USA, May 27-29, 2009 (accepted ~50% acceptance rate)
- A. Spencer, H Jung, K. McHenry, H-J. Na, R. Malik, S-K. Lee, R. Kooper, P. Bajcsy,
"Tele-Immersive Environments For Everybody.",
poster at PRAGMA 16, KISTI, Daejon Convention Center, Korea, March 23-24, 2009
- Miles Johnson,
"Integration of thermal and visible imagery for robust foreground detection in Tele-immersive spaces.",
Thesis for the degree of Master in Science in Aerospace Engineering, Graduate College of the University of Illinois at Urbana-Champaign, 2007
[pdf 4.6MB]
|
|
|