3
\$\begingroup\$

Previously I wrote a very generic HTTP handler.
But in real life the server needs to be able to handle different functions on different paths. i.e. /rest/addUser would add a user while /rest/getUser/45 would get user 45 etc.

So we need to add lambdas to a generic path that can pick up variables to from the path provided to the server. So this class allows you to register paths that embed sections that will provided to the lambda as variables.

Example: /rest/getUser/{id} would be a path that contains /rest/getUser/ and has a suffix that is put in the variable id for the lambda to use. Note the algorithm is generic so you can register multiple variable names in different sections. The variable will match against any character except '/'.

PathMatcher.h

#ifndef THORSANVIL_NISSE_NISSEHTTP_PATH_MATCHER_H
#define THORSANVIL_NISSE_NISSEHTTP_PATH_MATCHER_H

#include <map>
#include <vector>
#include <string>
#include <functional>
#include <regex>

namespace ThorsAnvil::Nisse::NisseHTTP
{

class Request;
class Response;

using Match     = std::map<std::string, std::string>;
using Action    = std::function<void(Match const&, Request&, Response&)>;
using NameList  = std::vector<std::string>;

class PathMatcher
{
    struct MatchInfo
    {
        std::regex  test;
        NameList    names;
        Action      action;
    };

    std::vector<MatchInfo>  paths;

    public:
        void addPath(std::string pathMatch, Action&& action);

        bool findMatch(std::string const& path, Request& request, Response& response);
};

}

#endif

PathMatcher.cpp

#include "PathMatcher.h"

using namespace ThorsAnvil::Nisse::NisseHTTP;

void PathMatcher::addPath(std::string pathMatch, Action&& action)
{
    // Variables to be built.
    std::string     expr;       // Convert pathMatch into a regular expression.
    NameList        names;      // Extract list of names from pathMatch.

    // Search variables
    std::smatch     searchMatch;
    std::regex      pathNameExpr{"\\{[^}]*\\}"};

    while (std::regex_search(pathMatch, searchMatch, pathNameExpr))
    {
        expr += pathMatch.substr(0, searchMatch.position());
        expr += "([^/]*)";

        std::string match = searchMatch.str();
        names.emplace_back(match.substr(1, match.size() - 2));

        pathMatch = searchMatch.suffix().str();
    }
    expr += pathMatch;

    // Add the path information to the list.
    paths.emplace_back(std::regex{expr}, std::move(names), std::move(action));
}

bool PathMatcher::findMatch(std::string const& path, Request& request, Response& response)
{
    for (auto const& pathMatchInfo: paths)
    {
        std::smatch     match{};
        if (std::regex_match(path, match, pathMatchInfo.test))
        {
            Match   result;
            for (std::size_t loop = 0; loop < pathMatchInfo.names.size(); ++loop)
            {
                result.insert({pathMatchInfo.names[loop], match[loop+1].str()});
            }
            pathMatchInfo.action(result, request, response);
            return true;
        }
    }
    return false;
}

Test Case

TEST(PathMatcherTest, NameMatchMultiple)
{
    PathMatcher         pm;
    int                 count = 0;
    Match               hit;
    pm.addPath("/path1/{name}/{id}", [&count, &hit](Match const& match, Request&, Response&){++count;hit = match;});

    std::stringstream   ss{"GET /path1/path2/path3 HTTP/1.1\r\nhost: google.com\r\n\r\n"};
    Request     request("http", ss);
    Response    response(ss, Version::HTTP1_1);
    pm.findMatch("/path1/path2/path3", request, response);

    EXPECT_EQ(1, count);
    EXPECT_EQ(2, hit.size());
    EXPECT_EQ("name", (++hit.begin())->first);
    EXPECT_EQ("path2", (++hit.begin())->second);
    EXPECT_EQ("id", hit.begin()->first);
    EXPECT_EQ("path3", hit.begin()->second);
}
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

Limit the scope of identifiers

Match, Action and NameList are declared inside the NisseHTTP namespace, but they are specific to the PathMatcher, so instead declare them inside class PathMatcher.

Consider that a name like Action is very generic, and maybe you want to have some other thing in NisseHTTP that has the concept of actions, but takes different parameters.

Do you really need regular expressions?

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. (Jamie Zawinski)

Regular expressions are a powerful tool, but they come at a price. They are not always that efficient; C++'s std::regexs are not known for their speed, and a std::regex object might use a lot of memory. But also, the code you wrote to parse the pathMatch string could just as well be used to match the actual URL, and would probably be as efficient as a good regex implementation.

But most importantly:

Are you handling all corner cases?

I can thing of a few match strings that have some issues:

  • "/foo/bar/{}" (empty name)
  • "/foo/{bar}/{bar}" (same name used twice)
  • "/foo/{{bar}/baz" ({ appears in name, should you allow that?)
  • "/foo/{bar}}/baz" (} is not part of the name, will it match?)
  • "/foo/bar.baz" (this will match /foo/bar/baz!)
  • "/foo/\\{bar}" (this will cause an out-of-bounds read)
  • "/foo/(bar)/{baz}" (this will cause the wrong string to be added to result)

And if you didn't know you could actually add regular expressions to the match string, what if you wanted to match a literal {?

What if I added two matcher functions, with these two different match strings:

  • "/{foo}"
  • "/{bar}"
\$\endgroup\$
2
  • \$\begingroup\$ Yea second thoughts on the regular expression matching already. :-( \$\endgroup\$ Commented Oct 22, 2024 at 16:50
  • \$\begingroup\$ Will move the types inside PathMatcher \$\endgroup\$ Commented Oct 22, 2024 at 16:51

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.