-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[WIP] Prototype HTTP/2.0 Server #11806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Prototype HTTP/2.0 Server #11806
Conversation
8b837e0 to
3e1525f
Compare
|
@sideshowbarker @bmac This may be relevant to your efforts on w3c-test.org |
8c3b2c7 to
f2c9c4d
Compare
jgraham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 12 files at r1.
Reviewable status: 3 of 184 files reviewed, 10 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)
tools/serve/serve.py, line 499 at r1 (raw file):
# Check that python 2.7.15 is being used for HTTP/2.0 if scheme == 'http2':
This obviously isn't a good long term solution, but it's fine for now. But maybe add a TODO comment about removing it?
tools/serve/serve.py, line 738 at r1 (raw file):
"ws": ["auto"], "wss": ["auto"], "http2":[9000],
I would like us to not support this by default and require a command line flag or something similar in order to get the server to launch durning development.
tools/serve/serve.py, line 763 at r1 (raw file):
def __init__(self, *args, **kwargs): # Skip if not version 2.7.15 if not (sys.version_info[0] == 2 and sys.version_info[1] == 7 and sys.version_info[2] == 15):
Can we create a supports_http2 function to encapsulate this check since it's used in multiple places?
tools/wptserve/wptserve/response.py, line 73 at r1 (raw file):
self.logger = get_logger() if h2:
I am not sure, but I feel like having a base class and HTTP/1 and HTTP/2 variants would be better than baking the knoledge into the constructor.
tools/wptserve/wptserve/response.py, line 392 at r1 (raw file):
self.logger = response.logger def write_headers(self, headers, status_code, status_message=None):
We probably need to make this more explicit, right? Like it seems useful to test how things work if we send headers in the "wrong" order &c.
tools/wptserve/wptserve/server.py, line 191 at r1 (raw file):
ssl_context.set_alpn_protocols(['h2']) self.socket = ssl_context.wrap_socket(self.socket, server_side=True)
This indentation is wrong (also elsewhere)
tools/wptserve/wptserve/server.py, line 382 at r1 (raw file):
data = self.request.recv(65535) # self.logger.debug('(%s) Data: ' % (str(uid)) + str(data.encode('hex')))
Remove this unless it can be enabled
tools/wptserve/wptserve/server.py, line 387 at r1 (raw file):
for event in events: if isinstance(event, RequestReceived):
So, this only supports a single simulaneous stream and a single set of headers (no body) at the moment, right? It seems like we are going to want ot make the Request object take a queue of frames for the associated stream and/or have a handler type that can react directly to events rather than just being a single function call.
tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):
class H2Headers(dict):
Does this mean we don't have any way to access the raw headers? Seems like we should probably find a way to expose that.
tools/wptrunner/wptrunner/environment.py, line 140 at r1 (raw file):
"https": [8443], "ws": [8888], "wss": [8889],
Seems like an unnecessary change.
dheiberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 184 files reviewed, 10 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)
tools/serve/serve.py, line 499 at r1 (raw file):
Previously, jgraham wrote…
This obviously isn't a good long term solution, but it's fine for now. But maybe add a
TODOcomment about removing it?
Good idea
tools/serve/serve.py, line 738 at r1 (raw file):
Previously, jgraham wrote…
I would like us to not support this by default and require a command line flag or something similar in order to get the server to launch durning development.
This has now been changed, the HTTP/2.0 server is started using a command line flag
tools/serve/serve.py, line 763 at r1 (raw file):
Previously, jgraham wrote…
Can we create a
supports_http2function to encapsulate this check since it's used in multiple places?
Yup, I'll find somewhere useful to put that and refactor this
tools/wptserve/wptserve/response.py, line 73 at r1 (raw file):
Previously, jgraham wrote…
I am not sure, but I feel like having a base class and HTTP/1 and HTTP/2 variants would be better than baking the knoledge into the constructor.
I agree, so something along the lines of making this class the base class, and move the handle functions into specific classes for each protocol version?
tools/wptserve/wptserve/response.py, line 392 at r1 (raw file):
Previously, jgraham wrote…
We probably need to make this more explicit, right? Like it seems useful to test how things work if we send headers in the "wrong" order &c.
So that is something that requires further testing. I believe headers can be split up and sent in different frames, using CONTINUATION frames. This is some needed functionality so I will look into how to introduce this.
tools/wptserve/wptserve/server.py, line 191 at r1 (raw file):
Previously, jgraham wrote…
This indentation is wrong (also elsewhere)
Noted, will fix on next run through
tools/wptserve/wptserve/server.py, line 382 at r1 (raw file):
Previously, jgraham wrote…
Remove this unless it can be enabled
Will do
tools/wptserve/wptserve/server.py, line 387 at r1 (raw file):
Previously, jgraham wrote…
So, this only supports a single simulaneous stream and a single set of headers (no body) at the moment, right? It seems like we are going to want ot make the Request object take a queue of frames for the associated stream and/or have a handler type that can react directly to events rather than just being a single function call.
Yes, so I am currently working on multithreading the request handling, and need to do some more testing to see how POST requests will behave. I currently don't believe that the Request object needs to be altered dramatically, as the server knows when it has received a full request when the client signals the end of a stream, but this is something to consider as I continue expanding the functionality.
tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):
Previously, jgraham wrote…
Does this mean we don't have any way to access the raw headers? Seems like we should probably find a way to expose that.
They are exposed, but they are in HTTP/2.0 format. For example, 'path' in HTTP/1.1 becomes ':path' in HTTP/2.0. This class is used as a way to convert ':' prefixed headers into their HTTP/1.1 format, so that the rest of the system can utilize them without knowing the difference.
tools/wptrunner/wptrunner/environment.py, line 140 at r1 (raw file):
Previously, jgraham wrote…
Seems like an unnecessary change.
I believe this was leftover from when I had the "http2" key entered, must've forgotten to remove this change. Will touchup in next push
jgraham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 12 files at r1, 1 of 1 files at r2.
Reviewable status: 5 of 184 files reviewed, 7 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)
tools/wptserve/wptserve/server.py, line 387 at r1 (raw file):
Previously, dhdavvie (David H) wrote…
Yes, so I am currently working on multithreading the request handling, and need to do some more testing to see how POST requests will behave. I currently don't believe that the Request object needs to be altered dramatically, as the server knows when it has received a full request when the client signals the end of a stream, but this is something to consider as I continue expanding the functionality.
OK, let's address that in a followup.
tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):
Previously, dhdavvie (David H) wrote…
They are exposed, but they are in HTTP/2.0 format. For example, 'path' in HTTP/1.1 becomes ':path' in HTTP/2.0. This class is used as a way to convert ':' prefixed headers into their HTTP/1.1 format, so that the rest of the system can utilize them without knowing the difference.
Right, the point is sometimes we might care about the difference. This seems like reasonable default behaviour, but we need a way to get the actual string that was sent over the wire.
dheiberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 184 files reviewed, 7 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)
tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):
Previously, jgraham wrote…
Right, the point is sometimes we might care about the difference. This seems like reasonable default behaviour, but we need a way to get the actual string that was sent over the wire.
I see your point. I can add some sort of raw_header attribute that contains the actual header keys as sent by the client?
f2c9c4d to
8ef6b48
Compare
jgraham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 184 files reviewed, 7 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)
tools/wptserve/wptserve/server.py, line 410 at r1 (raw file):
Previously, dhdavvie (David H) wrote…
I see your point. I can add some sort of
raw_headerattribute that contains the actual header keys as sent by the client?
Sounds good.
a34d495 to
985440f
Compare
jgraham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 22 of 188 files at r4, 2 of 2 files at r5, 184 of 184 files at r6.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @dhdavvie, @jgraham, @AutomatedTester, @annevk, @gsnedders, and @andreastt)
tools/wptserve/wptserve/response.py, line 11 at r6 (raw file):
try: from StringIO import StringIO
Do we actually just want from io import BytesIO? I feel like that should work everywhere?
tools/wptserve/wptserve/response.py, line 409 at r6 (raw file):
formatted_headers.extend(secondary_headers) self.h2_lock.acquire()
So, I asusme the deal here is that h2conn is single threaded?
Instead of having a lock used directly like this, my favourite Python pattern is to have a guard object that acts as a context manager, aquiring the lock, and giving back the lock-protected object, like
class ConnectionGuard(object):
lock = threading.Lock
def __init__(self, obj):
self.obj = obj
def __enter__(self):
self.lock.acquire()
return self.obj
def __exit__(self, *args):
self.lock.release()
Then you do something like:
with self.connection as connection:
connection.send_headers([…])
# Maybe allow a connection to be passed in to self.write if there's a performance issue
# having to re-acquire the lock
self.write()
tools/wptserve/wptserve/response.py, line 434 at r6 (raw file):
self._write_content(data.read(payload_size), False) data_len -= payload_size payload_size = self.get_max_payload_size()
Why do we care about the maxmum payload size changing, but not the data changing length?
tools/wptserve/wptserve/response.py, line 438 at r6 (raw file):
self._write_content(data.read(), last) def _write_content(self, data, last):
Can we call this write_content_frame and make it public, so that a user could e.g. write a too-long frame or something as test.
tools/wptserve/wptserve/server.py, line 212 at r6 (raw file):
class _WebTestRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
Use Base rather than _
tools/wptserve/wptserve/server.py, line 307 at r6 (raw file):
class Http2WebTestRequestHandler(_WebTestRequestHandler):
Unnecessary empty line.
tools/wptserve/wptserve/server.py, line 344 at r6 (raw file):
self.logger.debug('(%s) Parsing RequestReceived' % (uid)) self._h2_parse_request(event) t = threading.Thread(target=_WebTestRequestHandler.finish_handling, args=(self, True, H2Response))
I think we want one thread per stream_id and a queue to send the data to that thread, right? Is that coming in the next part? Maybe add a TODO for now.
tools/wptserve/wptserve/server.py, line 370 at r6 (raw file):
class H2Headers(dict): def __init__(self, headers): self.raw_headers = {}
OrderedDict, perhaps.
dheiberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @jgraham, @dhdavvie, @AutomatedTester, @annevk, @gsnedders, and @andreastt)
tools/wptserve/wptserve/response.py, line 11 at r6 (raw file):
Previously, jgraham wrote…
Do we actually just want
from io import BytesIO? I feel like that should work everywhere?
Yup, that's a much better solution, thanks!
tools/wptserve/wptserve/response.py, line 409 at r6 (raw file):
Previously, jgraham wrote…
So, I asusme the deal here is that
h2connis single threaded?Instead of having a lock used directly like this, my favourite Python pattern is to have a guard object that acts as a context manager, aquiring the lock, and giving back the lock-protected object, like
class ConnectionGuard(object): lock = threading.Lock def __init__(self, obj): self.obj = obj def __enter__(self): self.lock.acquire() return self.obj def __exit__(self, *args): self.lock.release()Then you do something like:
with self.connection as connection: connection.send_headers([…]) # Maybe allow a connection to be passed in to self.write if there's a performance issue # having to re-acquire the lock self.write()
This looks very neat, I will look into implementing something like this. Should I push that to this PR or save it for the next?
tools/wptserve/wptserve/response.py, line 434 at r6 (raw file):
Previously, jgraham wrote…
Why do we care about the maxmum payload size changing, but not the data changing length?
payload_size is the largest size the next payload can be, which can change depending on stream capacity and WindowUpdate events received from the client. So this checks how big the next chunk of data to be sent down the pipe will be, sends that chunk, and decreases the counter keeping track of how much data is left.
tools/wptserve/wptserve/response.py, line 438 at r6 (raw file):
Previously, jgraham wrote…
Can we call this
write_content_frameand make it public, so that a user could e.g. write a too-long frame or something as test.
I don't see why not
tools/wptserve/wptserve/server.py, line 382 at r1 (raw file):
Previously, dhdavvie (David H) wrote…
Will do
Done.
tools/wptserve/wptserve/server.py, line 344 at r6 (raw file):
Previously, jgraham wrote…
I think we want one thread per
stream_idand a queue to send the data to that thread, right? Is that coming in the next part? Maybe add a TODO for now.
For the time being, it seems to make more sense to have a thread per entire request-response interaction and kill the thread after the interaction is done. One reasoning behind this is that one interaction could happen over multiple streams if PUSH frames are involved for example. It seems like it would add more complications to have threads per stream waiting for something to happen on that stream (which may never happen, Firefox often avoid reusing stream_ids within a certain timeframe by the looks of it). This may be a thing to consider in the future, but it currently seems like keeping the threads to a request-response scenario makes the most sense.
tools/wptserve/wptserve/server.py, line 370 at r6 (raw file):
Previously, jgraham wrote…
OrderedDict, perhaps.
Good idea
985440f to
e9550b5
Compare
jgraham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 186 files at r7, 184 of 184 files at r8.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jgraham, @AutomatedTester, @annevk, @gsnedders, and @andreastt)
tools/wptserve/wptserve/response.py, line 409 at r6 (raw file):
Previously, dhdavvie (David H) wrote…
This looks very neat, I will look into implementing something like this. Should I push that to this PR or save it for the next?
I think it's a relatively small change to add it to this PR.
1837f4e to
7b30d07
Compare
dheiberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jgraham, @AutomatedTester, @annevk, @gsnedders, and @andreastt)
tools/wptserve/wptserve/response.py, line 409 at r6 (raw file):
Previously, jgraham wrote…
I think it's a relatively small change to add it to this PR.
Done.
…onnections on port 9000, and serve directories. * Added event handling for ConnectionTerminated * Added backwards compatibility with HTTP/1.1 Python handler files. * Made it so a command line flag is needed to start the H2 server, and it is no longer started by default. * Refactored compatibility checking when launching HTTP/2.0 server * Created H1 and H2 handler variants, keeping shared logic in a base class * Added a `raw_headers` field to the H2Headers object, so that the non-converted keys are exposed * Added basic multithreading support. Requests now get placed in a thread to finish handling, whilst server listens for more requests/windows updates. * Created a ConnectionGuard object for the H2 connection object for thread safety as per jgraham's suggestion
7b30d07 to
16f8389
Compare
jgraham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 188 files at r9, 184 of 184 files at r10.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @AutomatedTester, @annevk, @gsnedders, and @andreastt)
Hey, so this is a prototype/proof of concept of how to integrate HTTP/2.0 into the current wptserve infrastructure.
I'm attempting to use as much of the existing infrastructure as possible, as it gives me functionality for free (such as handling TCP connections and threading client connections), and also means I don't have to re-invent the wheel and risk making mistakes in the various bits of infrastructure I might have to redo.
Currently, this prototype is able to serve directories, handle multiple h2 connections simultaneously, as well as h2 connections alongside h1 requests. Obviously it is going to have many bugs, but please point out anything glaring that doesn't have a note in the comments.
So the general idea here is to launch a H2 server on a port, 9000 currently, using the existing
WebTestServerinserver.pyby wrapping the socket with an ssl context that sets up the ALPN protocol to negotiate H2. Because of this, the server has to be run on python 2.7.15+, as it requires OpenSSL 1.0.2+. When the server receives the first request/data from the client, it will enter thehandle_one_requestmethod inWebTestRequestHandler, and in here I simply move to a H2 handler:handle_h2.In
handle_one_request, I have moved the logic that handles the parsed request to another method:finish_handling. This is because the H2 Handler will be reusing that logic, so by placing this in a separate method, both H1 and H2 handlers are able to use that.The
handle_h2method is fairly straightforward, it initiates the HTTP/2.0 connection, and then enters a loop that continuously reads directly from the socket and parses the H2 frames. This changes the way that the original handler works, as it now handles the entire connection from start to finish, instead of a single request at a time in a one and done fashion.The
Requestobject did not currently need to be changed much, just adding ah2_stream_idattribute. On the Response side, I created aH2Responseobject which mostly just changes which writer methods are called. Thewrite_contentmethod in here is a bit hacky, uses atry/exceptto check for the last iteration of the generator. This is because you need to know what the last item from the generator is, as you need to flag the end of the stream. Please let me know if you know of a better way! TheH2ResponseWriterhandles the construction of the H2 frames and sending them via the socket.Please let me know what you think. This currently feels like the right approach to implementing this solution, but would love some extra eyes!
P.S. The large additions and file changes are due to adding the dependencies (h2, hyperframe, hpack, enum and certifi) in the third_party folder. To view the actual changes that implement the H2 server, view the changes made by the first commit. Also sorry for mini WOT.
Edit: I have made it so the h2 server is no longer started by default, since it is not ready for use and should be explicitly started. It is now started by using
./wpt serve --h2This change is