Back to top.
At the opposite end of the spectrum is a simple servo mechanism with a single goal and a single actuator, and no complex decision requirements.
Most practical AI problems lie in the middle. The DARPA Grand Challenge is of this nature. The GC poses a somewhat specific autonomy problem. This problem is for a mobile land robot to accurately follow a planned route that is representative of a military mission such as supply delivery or reconnaissance, at an average speed that is sufficient for typical logistics. The AI problem becomes simpler.
This was the gist of the description of the race at the second DARPA kickoff in August 2004, setting the stage for the second GC competition.
Since the route is planned to achieve success, it follows roads to the extent possible and avoids unnecessarily difficult auto-navigation problems. This problem is specific enough that general AI is not needed. However, besides the possibility that the planned route is imperfect or not sufficiently precise, the autonomous vehicle cannot rely upon continuous, perfect, precision GPS. So what is still needed is sufficient sensing of obstacles and drop-offs that the vehicle should avoid, lest it fail its mission. This is accomplished with a straightforward local route planning mechanism (link to Overview of TerraHawk Route Planning) that uses cost functions produced by the mission route, plus these obstacle detections.
The remaining interesting problem is to select the sensor suite and to design sensor processing to provide sufficient detection and characterization of obstacles so that they can be mapped to the cost function used by the route planning. The system level issue is to accurately detect real obstacles and accurately reject false alarms.
The duration of the race (at least 10 hours of run time and probably more) determines the required fidelity of the sensor processing. Given the number of sensor cycles and the penalty for mis-detection being a potential failure to finish, one can loosely estimate the required correctness of detections needed. For example, in 10 hours run time duration, at 5 sensor evaluations per second, there are 10 x 3600 x 5 = 180,000 opportunities for sensor evaluations to be right or wrong.
This is a required error rate on the order of 1/100,000 (give or take a factor of 2), or "5 nines" (0.99999) correctness, the Probability of Detection or Pd. The corresponding Probability of False Alarm, Pfa, should be similar so that the vehicle does not get diverted by non-obstacles like shadows or inconsequential brush. Good behavior is needed for both metrics. An insensitive sensor processing suite can have a good false alarm rate, but miss essential detections. Conversely, an overly sensitive sensor suite will have good probability of detection, but a high false alarm rate.
One can turn this around and select the desired probability of the vehicle autonomously navigating, for example, 5 miles correctly. The sensor processing success can be given, as can the probability of mechanical breakdown per 5 miles. One can see the cumulative probability of reaching a given distance. In the DARPA GC in March 2004, the composite probability of the whole field of competitors reached zero after 7 miles.
The DARPA GC 2005 should be much more successful, and even more interesting!
Back to top.
Because these sensors by design provide both a range and bearing, these coordinates can be transformed via the known location and heading of the vehicle to map all the raw detections to an XY map that is registered to the earth and thus registered to the planned route. This allows us to use persistent detections by doing various stages of cross-sensor "clustering" and fusion calculations, to build up obstacle costs at locations relative to the route so the vehicle can replan to avoid the detected obstacles.
The mapping of the detections to the XY map is shown in the figure below, to illustrate the nature of the raw sensor outputs, from the two ladars, high (blue) and low (green) and the radar (red). These sensors have already been range-gated and angle-gated to avoid spurious hits that are irrelevant to the route. The GPS track (black) shows the path of the vehicle.
There is scatter in the detection locations from one sensor output to the next, so that the detections do not build up right on top each other as they ideally should. This could be due to several error sources. One error source is, of course, any bias or noise error in the detectors’ angle and range readout. Another possible source is the knowledge of the exact vehicle position and heading at the moment of the sensor reading. A third source of error is inaccuracy in the calibration of the sensors’ location and bearing reference alignment relative to the vehicle reference frame that includes the GPS and INS (Inertial Navigation System).
For the ladar, the angle error is assumed to be negligible since each ray from the ladar is created by a precise electronic timing reference relative to a known starting angle, at a regulated rotational speed of the mirror mechanism in the ladar. In our extensive tests, the ladar range error has proven to be very small, on the order of centimeters, and so we are confident that ladar intrinsic errors are not a significant contributor to the XY detection scatter.
The radar scatter is presumably larger than the ladar, both in range and angle, since the RF bandwidth and carrier frequency create fuzzier positional echoes than ladar, both in range and in azimuth. We have not characterized the radar as completely as ladar, but we expect that its intrinsic scatter is part of the observed positional persistence error that we see. The figure above shows the scatter of two metal obstacles that were detected by the radar (red).
We concluded in our early runs, before the GPS-INS Kalman Filter (shortened to GINS) system was included, that our scatter is largely due to inaccuracies in the GPS-only heading. Even though the GPS heading is very good for many purposes, when it is projected out to a long distance in order to transform the sensor position to absolute XY, the errors can exceed one to two meters for the ladars even more for the radar, which projects further out. The figure below, a close up of a detected obstacle, shows the total scatter from the two ladars, high (blue) and low (green). Besides scatter, it can be seen that there is a bimodal distribution of detects as the obstacle was detected during vehicle tracks in the two directions. This indicates some residual sensor alignment calibration error as well.
After we observed (as expected) that the heading accuracy of the vehicle is key to accurate registration and fusion on the map, we began using the Kalman Filter heading vector. This was planned anyway, but the GPS-only heading was tried as an early experiment. With the GINS, plus additional calibration of sensor alignment we achieved sufficient XY translation accuracy and reduced scatter to achieve reliable detection of obstacles from the persistent and cross-sensor fused data. The detections are shown as purple open squares in the figure below, which is a record of one run of our DARPA 200 meter test course and its return route, all traversed autonomously.
In a close-up view below, the peak-to-peak scatter of raw sensor outputs for one of the obstacles we detected and avoided has been reduced to less than one meter. The repeated final fused detections are within 0.5 meters, which meets our criteria for success. The GPS track can be seen skirting around the obstacle, thanks to the path planner.
Back to top.
Stereo processing is used to detect depth discontinuities in the forward direction when the vehicle is moving at a reasonable speed. The cameras are at a hyperstereo separation of about 0.6 meters, and the lenses used for the stereo pair provide a moderate telephoto field of view, which should enable depth detection of obstacles at distances beyond our selected range gate for the ladar. The front-side wide angle cameras enable detection of the variance in texture between smooth road and rough off-road conditions and road edges.
The front-side cameras also enable detection of depth discontinuities relatively close in to the sides of the vehicle as it moves forward, so that cost functions due to obstacles like telephone poles, ditches, underpass walls and in these front-side areas will be registered on the map. These detections should reinforce the ladar detections.
Vision processing takes by far the most CPU processing. Two dual Xeon vision processing servers are dedicated to the vision processing task. Video data from the cameras also takes by far the most network bandwidth, and so a dedicated GigE net is used to move data from cameras to the CPUs. We use Pleora converters to translate from the Base Camera Link interface of the industrial cameras, to GigE.
Terra Engineering has developed a vision processing environment, the Parametric Robotic Vision Processor (PRVP), that builds on (what we have developed as) a standard sequence of image processing steps from first level processing to final detection, but allows a lot of flexibility in terms of parameters and processing options in each stage. The PRVP is used to post-process captured video sequences from our cameras on the TerraHawk vehicle or on test vehicles. This allows us to optimize various processing threads for the incoming image sequences.
The PRVP is written in C++ and uses some of the open source OpenCV library calls. We have ported a previous version of the processor into the real-time Linux environment for the actual vision processing for 2004, but we have added new functionality and so we will be porting it soon for TerraHwk 2005. The functions in the OpenCV library also are compatible with, and call, the Intel Performance Primitives (IPP) for maximum computational efficiency.
We have elected, thus far, to do all vision processing in open source C++ rather than in FPGAs. With the exception of optical flow, all of our other vision processing can run real time over the full frame region of interest (ROI). For optical flow and for stereo processing using optical flow, we will select an ROI that permits real time processing depending on the total computational resources that we use in the vehicle when we freeze the integration. Presently, we capture a burst of consecutive frames a few times per second, and compute flow over each group of frames.
Within the PRVP, we have built processing paths for computing several useful vision outputs. They include:
- A uniform normalized texture extraction routine
- A standard deviation metric for texture
- Programmable blur functions to remove artifacts of sampling and processing before detection algorithms are applied
- Blob detection applied to the texture metric, which provides regions of apparent smooth and rough terrain to help distinguish road surfaces from road shoulders
- Modified OpenCV call for optical flow which can operate on raw images or preprocessed texture extracted images
- Modifications to the OpenCV to compute the confidence level of the ongoing flow calculations in the inner loops, and to break out of the process for invalid flow regions
An example of optical flow output is shown in the figure below.
The above image is the optical flow due to forward motion of the vehicle. To interpret the flow image, the color coding we chose for optical flow is shown in the figure below.
The way that optical flow is used to detect obstacles is to use it to find "depth discontinuities." These are sharp transitions in the flow field in places that should have smooth flow. The real time flow output is compared to a pre-computed baseline flow field that represents a flat ground type of field of view, with no obstacles. Depth discontinuity occurs in the side looking cameras when approaching or passing by an obstacle slightly ahead off to the left or right of the vehicle’s present path. The same algorithm is used for the stereo camera pair. Since optical flow works very well, we use it for the stereo pair, which represents virtual movement from right to left, which has the same kind of displacement versus distance as a transverse flow field.
The figure below shows an example of an obstacle in the field of view. By filtering the raw outputs both horizontally and vertically, artifacts are reduced and the azimuth and approximate range a single obstacle can be declared.
An early stage in the vision processing chain, that can be chosen for some processing goals, is texture normalization. The texture normalization step turns most images into a uniform texture map that can be used for determining scene qualities like road surfaces, as well as providing a basis for image correlation functions. An example of this processing is shown in the image below. Areas of complicated shadow detail or overexposed surfaces have to be separately detected and not included in the texture map scene, however, since these features do not represent real surface texture.
After further processing (texture standard deviation, parameterizable blurring and several stages of blob detection), the smoothness of the road surface can be output. Areas of smooth texture should be assigned a lower obstacle cost function for the path planner.
This suite of vision processing algorithms provides a basis for assigning cost functions to obstacles and textures in the scene. Separate AI processing to identify particular objects in the scene, such as street curbs, K-rails, or bushes, has not been included in this first level of vision processing.
Back to top.