2020-Dynamic ORB SLAM [Reading Seminar] [Open Source]
[toc]
# 2020-Dynamic ORB SLAM
**About**:
Paper: [Dynamic ORB SLAM](https://github.com/bijustin/YOLO-DynaSLAM/blob/master/dynamic-orb-slam.pdf)
W2020 EECS 568: Mobile Robotics
Demo video (Youtube): https://www.youtube.com/watch?v=sazkFVVjITQ
FAST Dynamic ORB SLAM: https://github.com/bijustin/Fast-Dynamic-ORB-SLAM
YOLO Dynamic ORB SLAM: https://github.com/bijustin/YOLO-DynaSLAM
**Problem definition:**
Simultaneous Localization and Mapping (SLAM) is often a necessity. Many state-of-the-art SLAM systems over the past decade have shown impressive performance, but they oftentimes make **the assumption that the environment is static**. This tends to be a na¨ıve assumption, as this is not the case with robots that must operate with humans in their environment such as autonomous vehicles.
**Abstract**:
In this work, we present **two** visual SLAM systems built on ORB SLAM2 that is more robust to dynamic environments while operating in real time,
- one with the use of deep learning: YOLO Dynamic ORB SLAM. Use the YOLOv3 to segment out objects that may be dynamic.
- one without the usage of deep learning: FAST Dynamic ORB SLAM respectively. A combination of optical flow and epipolar constraints are used to prevent dynamic objects from impacting the SLAM system, without the need of special hardware such as a GPU.
**Idea**:
The idea is to filter out dynamic ORB features bedore they can be processed by the local map tracking module. This is done through a combination of estimating the fundamental matrix between frames, enforcing epipolar constraints and outlier detection of optical flow magnitudes.
**Disadvantage**:
Unfortunately, it is difficult to filter slowly moving features.
没有实验数据对比(ATE/RPE)。
在论文中的discussion部分有详细的讨论。
**YOLO Dynamic ORB SLAM**:
YOLO Dynamic ORB_SLAM is a visual SLAM system that is robust in dynamic scenarios for RGB-D configuration.
rgbd_tum_yolo.cc:
```cpp
yolov3::yolov3Segment* yolo;
if (argc==6 || argc==7)
{
cout << "Loading Yolov3 net. This could take a while..." << endl;
yolo = new yolov3::yolov3Segment();
cout << "Yolov3 net loaded!" << endl;
}
...
if (argc == 6 || argc == 7)
mask = yolo->Segmentation(imRGB);
```
yolo.h
```cpp
class yolov3Segment {
private:
float confThreshold = 0.5; // Confidence threshold
float nmsThreshold = 0.4; // Non-maximum suppression threshold
int inpWidth = 640; // Width of network's input image
int inpHeight = 480; // Height of network's input image
vector classes;
Net net;
public:
yolov3Segment();
~yolov3Segment();
cv::Mat Segmentation(cv::Mat &image);
// Remove the bounding boxes with low confidence using non-maxima suppression
cv::Mat postprocess(Mat& frame, const vector& out);
// Get the names of the output layers
vector getOutputsNames(const Net& net);
};
```
yolo.cc
```cpp
yolov3Segment::yolov3Segment() {
// Load names of classes
string classesFile = "src/yolo/coco.names";
ifstream ifs(classesFile.c_str());
string line;
while (getline(ifs, line)) classes.push_back(line);
// Give the configuration and weight files for the model
String modelConfiguration = "src/yolo/yolov3.cfg";
String modelWeights = "src/yolo/yolov3.weights";
// Load the network
net = readNetFromDarknet(modelConfiguration, modelWeights);
net.setPreferableBackend(DNN_BACKEND_OPENCV);
net.setPreferableTarget(DNN_TARGET_CPU);
}
cv::Mat yolov3Segment::Segmentation(cv::Mat &image) {
cv::Mat blob;
// Create a 4D blob from a frame.
blobFromImage(image, blob, 1/255.0, cvSize(this->inpWidth, this->inpHeight), Scalar(0,0,0), true, false);
//Sets the input to the network
this->net.setInput(blob);
// Runs the forward pass to get output of the output layers
vector outs;
this->net.forward(outs, this->getOutputsNames(this->net));
int dilation_size = 15;
cv::Mat kernel = getStructuringElement(cv::MORPH_ELLIPSE,
cv::Size( 2*dilation_size + 1, 2*dilation_size+1 ),
cv::Point( dilation_size, dilation_size ) );
cv::Mat mask = cv::Mat::ones(480,640,CV_8U);
cv::Mat maskyolo = this->postprocess(image, outs);
cv::Mat maskyolodil = maskyolo.clone();
cv::dilate(maskyolo, maskyolodil, kernel);
mask = mask - maskyolodil;
return mask;
}
// Remove the bounding boxes with low confidence using non-maxima suppression
cv::Mat yolov3Segment::postprocess(Mat& frame, const vector& outs)
{
vector classIds;
vector confidences;
vector boxes;
for (size_t i = 0; i < outs.size(); ++i)
{
// Scan through all the bounding boxes output from the network and keep only the
// ones with high confidence scores. Assign the box's class label as the class
// with the highest score for the box.
float* data = (float*)outs[i].data;
for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
{
Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
Point classIdPoint;
double confidence;
// Get the value and location of the maximum score
minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
if (confidence > this->confThreshold)
{
int centerX = (int)(data[0] * frame.cols);
int centerY = (int)(data[1] * frame.rows);
int width = (int)(data[2] * frame.cols);
int height = (int)(data[3] * frame.rows);
int left = centerX - width / 2;
int top = centerY - height / 2;
classIds.push_back(classIdPoint.x);
confidences.push_back((float)confidence);
boxes.push_back(Rect(left, top, width, height));
}
}
}
cv::Mat mask = cv::Mat::zeros(480,640,CV_8U);
// Perform non maximum suppression to eliminate redundant overlapping boxes with
// lower confidences
vector indices;
NMSBoxes(boxes, confidences, this->confThreshold, this->nmsThreshold, indices);
for (size_t i = 0; i < indices.size(); ++i)
{
int idx = indices[i];
Rect box = boxes[idx];
//drawPred(classIds[idx], confidences[idx], box.x, box.y,
// box.x + box.width, box.y + box.height, frame);
string c = this->classes[classIds[idx]];
if (c == "person" || c == "car" || c == "bicycle" || c == "motorcycle" || c == "bus" || c == "truck") {
for (int x = max(0, box.x + box.width / 4); x < box.x + 3*box.width/4 && x < 640; ++x)
for (int y = max(0, box.y); y < box.y + box.height && y < 480; ++y)
mask.at(y, x) = 1;
}
}
return mask;
}
// Get the names of the output layers
vector yolov3Segment::getOutputsNames(const Net& net)
{
static vector names;
if (names.empty())
{
//Get the indices of the output layers, i.e. the layers with unconnected outputs
vector outLayers = this->net.getUnconnectedOutLayers();
//get the names of all the layers in the network
vector layersNames = this->net.getLayerNames();
// Get the names of the output layers in names
names.resize(outLayers.size());
for (size_t i = 0; i < outLayers.size(); ++i)
names[i] = layersNames[outLayers[i] - 1];
}
return names;
}
```
**Fast-Dynamic-ORB-SLAM**:
FAST Dynamic ORB-SLAM2 is a near real-time SLAM library that is built on ORB-SLAM 2 by Mur-Artal et al. It attempts to mitigate the error introduced by dynamic objects in the SLAM system on RGB-D cameras.
Tracking.cc
```cpp
mCurrentFrame = Frame(mImGray,imDepth,timestamp,mpORBextractorLeft,mpORBVocabulary,mK,mDistCoef,mbf,mThDepth);
dynpointfinder.findOutliers(&mCurrentFrame, mImGray, scaleFactors, nlevels);
mCurrentFrame.finishFrame(mImGray, imDepth, mK);
```
Frame.cc
```cpp
void Frame::finishFrame(const cv::Mat &imGray,const cv::Mat &imDepth, cv::Mat &K)
{
//modify initmvKeys
ExtractORBDesp(0,imGray);
N = mvKeys.size();
if(mvKeys.empty())
return;
UndistortKeyPoints();
ComputeStereoFromRGBD(imDepth);
mvpMapPoints = vector(N,static_cast(NULL));
mvbOutlier = vector(N,false);
// This is done only for the first Frame (or after a change in the calibration)
if(mbInitialComputations)
{
ComputeImageBounds(imGray);
mfGridElementWidthInv=static_cast(FRAME_GRID_COLS)/static_cast(mnMaxX-mnMinX);
mfGridElementHeightInv=static_cast(FRAME_GRID_ROWS)/static_cast(mnMaxY-mnMinY);
fx = K.at(0,0);
fy = K.at(1,1);
cx = K.at(0,2);
cy = K.at(1,2);
invfx = 1.0f/fx;
invfy = 1.0f/fy;
mbInitialComputations=false;
}
mb = mbf/fx;
AssignFeaturesToGrid();
}
```
DynPointFinder.h
```cpp
class DynPointFinder{
Frame prevFrame;
cv::Mat prevImage;
bool first = true;
vector prevMatches;
std::vector mvIniMatches;
Initializer* poseEstimator;
std::vector prevKeypoints;
public:
bool findOutliers(Frame *pFrame, cv::Mat &imGray, vector &scaleFactors, int nlevels);
bool findOutliers2(Frame *pFrame, cv::Mat &imGray, vector &scaleFactors, int nlevels);
cv::Mat match(vector &Keypoints1,cv::Mat imgray1);
};
```
DynPointFinder.cc
提取ORB特征点,匹配,并计算Fundamental Matrix 并返回。
```cpp
cv::Mat DynPointFinder::match(vector &Keypoints1, cv::Mat imgray1)
{
std::vector matches;
cv::Mat descriptors1, descriptors2;
cv::Ptr detector = cv::ORB::create(1000);
prevKeypoints.clear();
detector->detect(prevImage, prevKeypoints);
detector->compute(imgray1, Keypoints1, descriptors1);
detector->compute(prevImage, prevKeypoints, descriptors2);
cv::Ptr matcher = cv::DescriptorMatcher::create(cv::DescriptorMatcher::BRUTEFORCE_HAMMINGLUT);
std::vector matches1;
matcher->match(descriptors1, descriptors2, matches1);
std::vector points1, points2;
cv::Mat F;
for (auto iter = matches1.begin(); iter != matches1.end(); iter++)
{
float x = Keypoints1[iter->queryIdx].pt.x;
float y = Keypoints1[iter->queryIdx].pt.y;
points1.push_back(cv::Point2f(x, y));
x = prevKeypoints[iter->trainIdx].pt.x;
y = prevKeypoints[iter->trainIdx].pt.y;
points2.push_back(cv::Point2f(x, y));
}
// Compute F matrix using RANSAC
std::vector inliers(points1.size(), 0);
//std::cout << "Points " << points1.size() << " " << points2.size() << std::endl;
if (points1.size() > 0 && points2.size() > 0)
{
F = cv::findFundamentalMat( cv::Mat(points1), cv::Mat(points2), inliers,cv::FM_RANSAC, 0.7,.90);
}
vector inliermatches;
for (int i = 0;i < inliers.size(); i++) {
if (inliers[i]) { // it is a valid match
inliermatches.push_back(Keypoints1[i]);
}
}
//std::cout << inliermatches.size() << std::endl;
prevKeypoints = Keypoints1;
//std::cout << prevKeypoints.size() << std::endl;
// return the found fundemental matrix
return F;
}
```
计算对极约束,并根据特征点与极线的距离来确定outliers, 这里2.5应该是threshold.标识0 应该是outliers,1应该代表可能的静态点。
```cpp
vector cal_epipolar_constraints(vector prepoints, vector postpoints, cv::Mat F)
{
vector dis_idx;
for (int i = 0; i < prepoints.size(); i++)
{
double a = F.at(0, 0)*prepoints[i].x + F.at(0, 1)*prepoints[i].y + F.at(0, 2);
double b = F.at(1, 0)*prepoints[i].x + F.at(1, 1)*prepoints[i].y + F.at(1, 2);
double c = F.at(2, 0)*prepoints[i].x + F.at(2, 1)*prepoints[i].y + F.at(2, 2);
double distance = fabs(a*postpoints[i].x + b*postpoints[i].y + c) / sqrt(a*a + b*b);
if (distance>2.5)
{
dis_idx.push_back(0);
}
else{
dis_idx.push_back(1);
}
}
return dis_idx;
}
```
根据上面的 0,1 状态标识来移除outliers.
```cpp
//removes keypoints based on a status vector (1 for keep, 0 for remove)
void removePoints(vector> &initmvKeys, vector &status, int nlevels)
{
int startindex = 0;
int endindex = 0;
int removedpoints = 0;
for (int i = 0; i < status.size(); i++)
{
if (status[i] == 0)
{
removedpoints++;
}
}
removedpoints = 0;
int totalsize = 0;
for (int level = 0; level < nlevels; ++level)
{
totalsize += initmvKeys[level].size();
}
for (int level = 0; level < nlevels; ++level)
{
vector &mkeypoints = initmvKeys[level];
int nkeypointsLevel = (int)mkeypoints.size();
if (nkeypointsLevel == 0)
continue;
endindex = mkeypoints.size() - 1;
for (int i = endindex; i >= 0; i--)
{
if (status[i + startindex] == 0)
{
mkeypoints.erase(mkeypoints.begin() + i);
removedpoints++;
}
}
startindex += endindex + 1;
}
}
```
DynPointFinder类中的主函数。
- 计算光流 calcOpticalFlowPyrLK。 status – output status vector (of unsigned chars); each element of the vector is set to 1 if the flow for the corresponding features has been found, otherwise, it is set to 0.
- 计算对极约束cal_epipolar_constraints
- 移除外点 removePoints
```cpp
bool DynPointFinder::findOutliers(Frame *pFrame, cv::Mat &imGray, vector &scaleFactors, int nlevels)
{
if(first){
prevFrame = Frame(*pFrame);
imGray.copyTo(prevImage);
cv::Ptr detector = cv::ORB::create(1000);
vector kps;
detector->detect(prevImage, kps);
prevKeypoints = kps;
first = false;
return false;
}
//get ORBSLAM's keypoints
float scale;
vector keypoints;
vector points;
for (int level = 0; level < nlevels; ++level)
{
vector &mkeypoints = pFrame->initmvKeys[level];
int nkeypointsLevel = (int)mkeypoints.size();
if (level != 0)
scale = scaleFactors[level];
else
scale = 1;
vector::iterator keypoint = mkeypoints.begin();
while (keypoint != mkeypoints.end())
{
keypoints.push_back(*keypoint);
points.push_back(keypoint->pt);
keypoint++;
}
}
cv::Ptr detector = cv::ORB::create(1000);
vector currKeypoints;
detector->detect(imGray, currKeypoints);
vector state(keypoints.size(), 1);
cv::Mat F = match(currKeypoints, imGray);
//std::cout << F << std::endl;
if(F.empty()){
prevFrame = Frame(*pFrame);
imGray.copyTo(prevImage);
return false;
}
cv::Size winSize(15,15);
cv::TermCriteria termcrit(CV_TERMCRIT_ITER|CV_TERMCRIT_EPS, 10, 0.03);
vector status;
vector err;
vector outpts;
cv::calcOpticalFlowPyrLK(imGray, prevImage, points, outpts, status, err, winSize, 2, termcrit, 0, 0.001);
vector p01op, p02op;
int numFailed = 0;
float erravg = 0;
for(int i = 0; i < err.size(); i++){
erravg += err[i];
}
erravg = erravg;
vector indices;
for (int i = 0; i < status.size(); i++)
{
if (status[i]==1)
{
p01op.push_back(points[i]);
p02op.push_back(outpts[i]);
indices.push_back(i);
}
else{
state[i] = 0;
numFailed++;
}
}
float avgmag = 0;
vector mag; //模长
for(int i = 0; i < p01op.size(); i++){
float diffx = p01op[i].x - p02op[i].x;
float diffy = p01op[i].y - p02op[i].y;
avgmag += sqrt(diffx*diffx+diffy*diffy);
mag.push_back(sqrt(diffx*diffx+diffy*diffy));
}
avgmag = avgmag/p01op.size();
float stdev = 0;
for(int i = 0; i < mag.size(); i++){
stdev += (mag[i] - avgmag)*(mag[i] - avgmag);
}
stdev = sqrt(stdev/mag.size());
//std::cout << stdev << std::endl;
//std::cout << p01op.size() << std::endl;
vector idx1 = cal_epipolar_constraints(p01op, p02op, F);
cv::Mat show;
imGray.copyTo(show);
vector final1, final2;
for(int i = 0; i < p01op.size();i++){
if((idx1[i] == 0 || abs(mag[i]-avgmag) > 1 && stdev < 3 ) || (abs(mag[i]-avgmag) > 2*stdev)){
state[indices[i]] = 0;
}
else{
final1.push_back(p01op[i]);
final2.push_back(p02op[i]);
}
}
removePoints(pFrame->initmvKeys, state, nlevels);
prevFrame = Frame(*pFrame);
imGray.copyTo(prevImage);
}
```
Initializer.cc
计算Fundamental matrix
```cpp
void Initializer::getRelativePose(const Frame &CurrentFrame, const vector &vMatches12, cv::Mat &F,
vector &vP3D, vector &vbTriangulated)
{
// Fill structures with current keypoints and matches with reference frame
// Reference Frame: 1, Current Frame: 2
mvKeys2 = CurrentFrame.mvKeysUn;
mvMatches12.clear();
mvMatches12.reserve(mvKeys2.size());
mvbMatched1.resize(mvKeys1.size());
for(size_t i=0, iend=vMatches12.size();i=0)
{
mvMatches12.push_back(make_pair(i,vMatches12[i]));
mvbMatched1[i]=true;
}
else
mvbMatched1[i]=false;
}
const int N = mvMatches12.size();
// Indices for minimum set selection
vector vAllIndices;
vAllIndices.reserve(N);
vector vAvailableIndices;
for(int i=0; i >(mMaxIterations,vector(8,0));
DUtils::Random::SeedRandOnce(0);
for(int it=0; it vbMatchesInliersH, vbMatchesInliersF;
float SF;
cv::Mat fund;
FindFundamental(vbMatchesInliersF, SF, fund);
//std::cout << SF << std::endl;
fund.copyTo(F);
}
```
No comments