Skip to content

Conversation

@dheiberg
Copy link
Contributor

@dheiberg dheiberg commented Jul 5, 2018

Hey, so this is a prototype/proof of concept of how to integrate HTTP/2.0 into the current wptserve infrastructure.

I'm attempting to use as much of the existing infrastructure as possible, as it gives me functionality for free (such as handling TCP connections and threading client connections), and also means I don't have to re-invent the wheel and risk making mistakes in the various bits of infrastructure I might have to redo.

Currently, this prototype is able to serve directories, handle multiple h2 connections simultaneously, as well as h2 connections alongside h1 requests. Obviously it is going to have many bugs, but please point out anything glaring that doesn't have a note in the comments.

So the general idea here is to launch a H2 server on a port, 9000 currently, using the existing WebTestServer in server.py by wrapping the socket with an ssl context that sets up the ALPN protocol to negotiate H2. Because of this, the server has to be run on python 2.7.15+, as it requires OpenSSL 1.0.2+. When the server receives the first request/data from the client, it will enter the handle_one_request method in WebTestRequestHandler, and in here I simply move to a H2 handler: handle_h2.

In handle_one_request, I have moved the logic that handles the parsed request to another method: finish_handling. This is because the H2 Handler will be reusing that logic, so by placing this in a separate method, both H1 and H2 handlers are able to use that.

The handle_h2 method is fairly straightforward, it initiates the HTTP/2.0 connection, and then enters a loop that continuously reads directly from the socket and parses the H2 frames. This changes the way that the original handler works, as it now handles the entire connection from start to finish, instead of a single request at a time in a one and done fashion.

The Request object did not currently need to be changed much, just adding a h2_stream_id attribute. On the Response side, I created a H2Response object which mostly just changes which writer methods are called. The write_content method in here is a bit hacky, uses a try/except to check for the last iteration of the generator. This is because you need to know what the last item from the generator is, as you need to flag the end of the stream. Please let me know if you know of a better way! The H2ResponseWriter handles the construction of the H2 frames and sending them via the socket.

Please let me know what you think. This currently feels like the right approach to implementing this solution, but would love some extra eyes!

P.S. The large additions and file changes are due to adding the dependencies (h2, hyperframe, hpack, enum and certifi) in the third_party folder. To view the actual changes that implement the H2 server, view the changes made by the first commit. Also sorry for mini WOT.

Edit: I have made it so the h2 server is no longer started by default, since it is not ready for use and should be explicitly started. It is now started by using ./wpt serve --h2


This change is Reviewable

@wpt-pr-bot wpt-pr-bot requested a review from gsnedders July 5, 2018 13:16
@dheiberg dheiberg force-pushed the dhdavvie/h2-server branch 2 times, most recently from 8b837e0 to 3e1525f Compare July 5, 2018 18:06
@jugglinmike
Copy link
Contributor

@sideshowbarker @bmac This may be relevant to your efforts on w3c-test.org

@dheiberg dheiberg force-pushed the dhdavvie/h2-server branch 2 times, most recently from 8c3b2c7 to f2c9c4d Compare July 6, 2018 13:16
Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 12 files at r1.
Reviewable status: 3 of 184 files reviewed, 10 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)


tools/serve/serve.py, line 499 at r1 (raw file):

        # Check that python 2.7.15 is being used for HTTP/2.0
        if scheme == 'http2':

This obviously isn't a good long term solution, but it's fine for now. But maybe add a TODO comment about removing it?


tools/serve/serve.py, line 738 at r1 (raw file):

            "ws": ["auto"],
            "wss": ["auto"],
            "http2":[9000],

I would like us to not support this by default and require a command line flag or something similar in order to get the server to launch durning development.


tools/serve/serve.py, line 763 at r1 (raw file):

    def __init__(self, *args, **kwargs):
        # Skip if not version 2.7.15
        if not (sys.version_info[0] == 2 and sys.version_info[1] == 7 and sys.version_info[2] == 15):

Can we create a supports_http2 function to encapsulate this check since it's used in multiple places?


tools/wptserve/wptserve/response.py, line 73 at r1 (raw file):

        self.logger = get_logger()
        if h2:

I am not sure, but I feel like having a base class and HTTP/1 and HTTP/2 variants would be better than baking the knoledge into the constructor.


tools/wptserve/wptserve/response.py, line 392 at r1 (raw file):

        self.logger = response.logger

    def write_headers(self, headers, status_code, status_message=None):

We probably need to make this more explicit, right? Like it seems useful to test how things work if we send headers in the "wrong" order &c.


tools/wptserve/wptserve/server.py, line 191 at r1 (raw file):

                ssl_context.set_alpn_protocols(['h2'])
                self.socket = ssl_context.wrap_socket(self.socket,
                                            server_side=True)

This indentation is wrong (also elsewhere)


tools/wptserve/wptserve/server.py, line 382 at r1 (raw file):

                data = self.request.recv(65535)

                # self.logger.debug('(%s) Data: ' % (str(uid)) + str(data.encode('hex')))

Remove this unless it can be enabled


tools/wptserve/wptserve/server.py, line 387 at r1 (raw file):

                for event in events:
                    if isinstance(event, RequestReceived):

So, this only supports a single simulaneous stream and a single set of headers (no body) at the moment, right? It seems like we are going to want ot make the Request object take a queue of frames for the associated stream and/or have a handler type that can react directly to events rather than just being a single function call.


tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):

class H2Headers(dict):

Does this mean we don't have any way to access the raw headers? Seems like we should probably find a way to expose that.


tools/wptrunner/wptrunner/environment.py, line 140 at r1 (raw file):

            "https": [8443],
            "ws": [8888],
            "wss": [8889],

Seems like an unnecessary change.

Copy link
Contributor Author

@dheiberg dheiberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 184 files reviewed, 10 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)


tools/serve/serve.py, line 499 at r1 (raw file):

Previously, jgraham wrote…

This obviously isn't a good long term solution, but it's fine for now. But maybe add a TODO comment about removing it?

Good idea


tools/serve/serve.py, line 738 at r1 (raw file):

Previously, jgraham wrote…

I would like us to not support this by default and require a command line flag or something similar in order to get the server to launch durning development.

This has now been changed, the HTTP/2.0 server is started using a command line flag


tools/serve/serve.py, line 763 at r1 (raw file):

Previously, jgraham wrote…

Can we create a supports_http2 function to encapsulate this check since it's used in multiple places?

Yup, I'll find somewhere useful to put that and refactor this


tools/wptserve/wptserve/response.py, line 73 at r1 (raw file):

Previously, jgraham wrote…

I am not sure, but I feel like having a base class and HTTP/1 and HTTP/2 variants would be better than baking the knoledge into the constructor.

I agree, so something along the lines of making this class the base class, and move the handle functions into specific classes for each protocol version?


tools/wptserve/wptserve/response.py, line 392 at r1 (raw file):

Previously, jgraham wrote…

We probably need to make this more explicit, right? Like it seems useful to test how things work if we send headers in the "wrong" order &c.

So that is something that requires further testing. I believe headers can be split up and sent in different frames, using CONTINUATION frames. This is some needed functionality so I will look into how to introduce this.


tools/wptserve/wptserve/server.py, line 191 at r1 (raw file):

Previously, jgraham wrote…

This indentation is wrong (also elsewhere)

Noted, will fix on next run through


tools/wptserve/wptserve/server.py, line 382 at r1 (raw file):

Previously, jgraham wrote…

Remove this unless it can be enabled

Will do


tools/wptserve/wptserve/server.py, line 387 at r1 (raw file):

Previously, jgraham wrote…

So, this only supports a single simulaneous stream and a single set of headers (no body) at the moment, right? It seems like we are going to want ot make the Request object take a queue of frames for the associated stream and/or have a handler type that can react directly to events rather than just being a single function call.

Yes, so I am currently working on multithreading the request handling, and need to do some more testing to see how POST requests will behave. I currently don't believe that the Request object needs to be altered dramatically, as the server knows when it has received a full request when the client signals the end of a stream, but this is something to consider as I continue expanding the functionality.


tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):

Previously, jgraham wrote…

Does this mean we don't have any way to access the raw headers? Seems like we should probably find a way to expose that.

They are exposed, but they are in HTTP/2.0 format. For example, 'path' in HTTP/1.1 becomes ':path' in HTTP/2.0. This class is used as a way to convert ':' prefixed headers into their HTTP/1.1 format, so that the rest of the system can utilize them without knowing the difference.


tools/wptrunner/wptrunner/environment.py, line 140 at r1 (raw file):

Previously, jgraham wrote…

Seems like an unnecessary change.

I believe this was leftover from when I had the "http2" key entered, must've forgotten to remove this change. Will touchup in next push

Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 12 files at r1, 1 of 1 files at r2.
Reviewable status: 5 of 184 files reviewed, 7 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)


tools/wptserve/wptserve/server.py, line 387 at r1 (raw file):

Previously, dhdavvie (David H) wrote…

Yes, so I am currently working on multithreading the request handling, and need to do some more testing to see how POST requests will behave. I currently don't believe that the Request object needs to be altered dramatically, as the server knows when it has received a full request when the client signals the end of a stream, but this is something to consider as I continue expanding the functionality.

OK, let's address that in a followup.


tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):

Previously, dhdavvie (David H) wrote…

They are exposed, but they are in HTTP/2.0 format. For example, 'path' in HTTP/1.1 becomes ':path' in HTTP/2.0. This class is used as a way to convert ':' prefixed headers into their HTTP/1.1 format, so that the rest of the system can utilize them without knowing the difference.

Right, the point is sometimes we might care about the difference. This seems like reasonable default behaviour, but we need a way to get the actual string that was sent over the wire.

Copy link
Contributor Author

@dheiberg dheiberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 184 files reviewed, 7 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)


tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):

Previously, jgraham wrote…

Right, the point is sometimes we might care about the difference. This seems like reasonable default behaviour, but we need a way to get the actual string that was sent over the wire.

I see your point. I can add some sort of raw_header attribute that contains the actual header keys as sent by the client?

@dheiberg dheiberg force-pushed the dhdavvie/h2-server branch from f2c9c4d to 8ef6b48 Compare July 9, 2018 15:38
Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 184 files reviewed, 7 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)


tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):

Previously, dhdavvie (David H) wrote…

I see your point. I can add some sort of raw_header attribute that contains the actual header keys as sent by the client?

Sounds good.

@dheiberg dheiberg force-pushed the dhdavvie/h2-server branch 5 times, most recently from a34d495 to 985440f Compare July 13, 2018 15:27
Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 22 of 188 files at r4, 2 of 2 files at r5, 184 of 184 files at r6.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @dhdavvie, @jgraham, @AutomatedTester, @annevk, @gsnedders, and @andreastt)


tools/wptserve/wptserve/response.py, line 11 at r6 (raw file):

try:
    from StringIO import StringIO

Do we actually just want from io import BytesIO? I feel like that should work everywhere?


tools/wptserve/wptserve/response.py, line 409 at r6 (raw file):

        formatted_headers.extend(secondary_headers)

        self.h2_lock.acquire()

So, I asusme the deal here is that h2conn is single threaded?

Instead of having a lock used directly like this, my favourite Python pattern is to have a guard object that acts as a context manager, aquiring the lock, and giving back the lock-protected object, like

class ConnectionGuard(object):
    lock = threading.Lock

    def __init__(self, obj):
        self.obj = obj

    def __enter__(self):
        self.lock.acquire()
        return self.obj

    def __exit__(self, *args):
        self.lock.release()

Then you do something like:

with self.connection as connection:
    connection.send_headers([…])
# Maybe allow a connection to be passed in to self.write if there's a performance issue
# having to re-acquire the  lock
self.write()

tools/wptserve/wptserve/response.py, line 434 at r6 (raw file):

            self._write_content(data.read(payload_size), False)
            data_len -= payload_size
            payload_size = self.get_max_payload_size()

Why do we care about the maxmum payload size changing, but not the data changing length?


tools/wptserve/wptserve/response.py, line 438 at r6 (raw file):

        self._write_content(data.read(), last)

    def _write_content(self, data, last):

Can we call this write_content_frame and make it public, so that a user could e.g. write a too-long frame or something as test.


tools/wptserve/wptserve/server.py, line 212 at r6 (raw file):

class _WebTestRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):

Use Base rather than _


tools/wptserve/wptserve/server.py, line 307 at r6 (raw file):

class Http2WebTestRequestHandler(_WebTestRequestHandler):

Unnecessary empty line.


tools/wptserve/wptserve/server.py, line 344 at r6 (raw file):

                        self.logger.debug('(%s) Parsing RequestReceived' % (uid))
                        self._h2_parse_request(event)
                        t = threading.Thread(target=_WebTestRequestHandler.finish_handling, args=(self, True, H2Response))

I think we want one thread per stream_id and a queue to send the data to that thread, right? Is that coming in the next part? Maybe add a TODO for now.


tools/wptserve/wptserve/server.py, line 370 at r6 (raw file):

class H2Headers(dict):
    def __init__(self, headers):
        self.raw_headers = {}

OrderedDict, perhaps.

Copy link
Contributor Author

@dheiberg dheiberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)


tools/wptserve/wptserve/response.py, line 11 at r6 (raw file):

Previously, jgraham wrote…

Do we actually just want from io import BytesIO? I feel like that should work everywhere?

Yup, that's a much better solution, thanks!


tools/wptserve/wptserve/response.py, line 409 at r6 (raw file):

Previously, jgraham wrote…

So, I asusme the deal here is that h2conn is single threaded?

Instead of having a lock used directly like this, my favourite Python pattern is to have a guard object that acts as a context manager, aquiring the lock, and giving back the lock-protected object, like

class ConnectionGuard(object):
    lock = threading.Lock

    def __init__(self, obj):
        self.obj = obj

    def __enter__(self):
        self.lock.acquire()
        return self.obj

    def __exit__(self, *args):
        self.lock.release()

Then you do something like:

with self.connection as connection:
    connection.send_headers([…])
# Maybe allow a connection to be passed in to self.write if there's a performance issue
# having to re-acquire the  lock
self.write()

This looks very neat, I will look into implementing something like this. Should I push that to this PR or save it for the next?


tools/wptserve/wptserve/response.py, line 434 at r6 (raw file):

Previously, jgraham wrote…

Why do we care about the maxmum payload size changing, but not the data changing length?

payload_size is the largest size the next payload can be, which can change depending on stream capacity and WindowUpdate events received from the client. So this checks how big the next chunk of data to be sent down the pipe will be, sends that chunk, and decreases the counter keeping track of how much data is left.


tools/wptserve/wptserve/response.py, line 438 at r6 (raw file):

Previously, jgraham wrote…

Can we call this write_content_frame and make it public, so that a user could e.g. write a too-long frame or something as test.

I don't see why not


tools/wptserve/wptserve/server.py, line 382 at r1 (raw file):

Previously, dhdavvie (David H) wrote…

Will do

Done.


tools/wptserve/wptserve/server.py, line 344 at r6 (raw file):

Previously, jgraham wrote…

I think we want one thread per stream_id and a queue to send the data to that thread, right? Is that coming in the next part? Maybe add a TODO for now.

For the time being, it seems to make more sense to have a thread per entire request-response interaction and kill the thread after the interaction is done. One reasoning behind this is that one interaction could happen over multiple streams if PUSH frames are involved for example. It seems like it would add more complications to have threads per stream waiting for something to happen on that stream (which may never happen, Firefox often avoid reusing stream_ids within a certain timeframe by the looks of it). This may be a thing to consider in the future, but it currently seems like keeping the threads to a request-response scenario makes the most sense.


tools/wptserve/wptserve/server.py, line 370 at r6 (raw file):

Previously, jgraham wrote…

OrderedDict, perhaps.

Good idea

@dheiberg dheiberg force-pushed the dhdavvie/h2-server branch from 985440f to e9550b5 Compare July 16, 2018 15:33
Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 186 files at r7, 184 of 184 files at r8.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jgraham, @AutomatedTester, @annevk, @gsnedders, and @andreastt)


tools/wptserve/wptserve/response.py, line 409 at r6 (raw file):

Previously, dhdavvie (David H) wrote…

This looks very neat, I will look into implementing something like this. Should I push that to this PR or save it for the next?

I think it's a relatively small change to add it to this PR.

@dheiberg dheiberg force-pushed the dhdavvie/h2-server branch 3 times, most recently from 1837f4e to 7b30d07 Compare July 19, 2018 13:35
Copy link
Contributor Author

@dheiberg dheiberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jgraham, @AutomatedTester, @annevk, @gsnedders, and @andreastt)


tools/wptserve/wptserve/response.py, line 409 at r6 (raw file):

Previously, jgraham wrote…

I think it's a relatively small change to add it to this PR.

Done.

David Heiberg added 2 commits July 20, 2018 14:10
…onnections on port 9000, and serve directories.

* Added event handling for ConnectionTerminated

* Added backwards compatibility with HTTP/1.1 Python handler files.

* Made it so a command line flag is needed to start the H2 server, and it is no longer started by default.

* Refactored compatibility checking when launching HTTP/2.0 server

* Created H1 and H2 handler variants, keeping shared logic in a base class

* Added a `raw_headers` field to the H2Headers object, so that the non-converted keys are exposed

* Added basic multithreading support. Requests now get placed in a thread to finish handling, whilst server listens
for more requests/windows updates.

* Created a ConnectionGuard object for the H2 connection object for thread safety as per jgraham's suggestion
@dheiberg dheiberg force-pushed the dhdavvie/h2-server branch from 7b30d07 to 16f8389 Compare July 20, 2018 13:14
Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 188 files at r9, 184 of 184 files at r10.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @AutomatedTester, @annevk, @gsnedders, and @andreastt)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants