author | Julien Cristau <julien.cristau@logilab.fr> |
Thu, 01 Oct 2015 11:42:29 +0200 | |
changeset 10746 | 407385314c0d |
parent 10745 | 5318337e7128 |
child 10907 | 9ae707db5265 |
permissions | -rw-r--r-- |
9735
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
1 |
# -*- coding: utf-8 -*- |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
2 |
''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
3 |
Parser for multipart/form-data |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
4 |
============================== |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
5 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
6 |
This module provides a parser for the multipart/form-data format. It can read |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
7 |
from a file, a socket or a WSGI environment. The parser can be used to replace |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
8 |
cgi.FieldStorage (without the bugs) and works with Python 2.5+ and 3.x (2to3). |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
9 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
10 |
Licence (MIT) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
11 |
------------- |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
12 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
13 |
Copyright (c) 2010, Marcel Hellkamp. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
14 |
Inspired by the Werkzeug library: http://werkzeug.pocoo.org/ |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
15 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
16 |
Permission is hereby granted, free of charge, to any person obtaining a copy |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
17 |
of this software and associated documentation files (the "Software"), to deal |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
18 |
in the Software without restriction, including without limitation the rights |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
19 |
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
20 |
copies of the Software, and to permit persons to whom the Software is |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
21 |
furnished to do so, subject to the following conditions: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
22 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
23 |
The above copyright notice and this permission notice shall be included in |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
24 |
all copies or substantial portions of the Software. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
25 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
26 |
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
27 |
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
28 |
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
29 |
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
30 |
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
31 |
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
32 |
THE SOFTWARE. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
33 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
34 |
''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
35 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
36 |
__author__ = 'Marcel Hellkamp' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
37 |
__version__ = '0.1' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
38 |
__license__ = 'MIT' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
39 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
40 |
from tempfile import TemporaryFile |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
41 |
from wsgiref.headers import Headers |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
42 |
import re, sys |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
43 |
try: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
44 |
from io import BytesIO |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
45 |
except ImportError: # pragma: no cover (fallback for Python 2.5) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
46 |
from StringIO import StringIO as BytesIO |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
47 |
|
10746
407385314c0d
[multipart] decode form data before calling parse_qs in python 3
Julien Cristau <julien.cristau@logilab.fr>
parents:
10745
diff
changeset
|
48 |
from six import PY3, text_type |
10603
65ad6980976e
[py3k] import URL mangling functions using six.moves
Rémi Cardona <remi.cardona@logilab.fr>
parents:
9946
diff
changeset
|
49 |
from six.moves.urllib.parse import parse_qs |
65ad6980976e
[py3k] import URL mangling functions using six.moves
Rémi Cardona <remi.cardona@logilab.fr>
parents:
9946
diff
changeset
|
50 |
|
9735
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
51 |
############################################################################## |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
52 |
################################ Helper & Misc ################################ |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
53 |
############################################################################## |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
54 |
# Some of these were copied from bottle: http://bottle.paws.de/ |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
55 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
56 |
try: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
57 |
from collections import MutableMapping as DictMixin |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
58 |
except ImportError: # pragma: no cover (fallback for Python 2.5) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
59 |
from UserDict import DictMixin |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
60 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
61 |
class MultiDict(DictMixin): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
62 |
""" A dict that remembers old values for each key """ |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
63 |
def __init__(self, *a, **k): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
64 |
self.dict = dict() |
10662
10942ed172de
[py3k] dict.iteritems → dict.items
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10603
diff
changeset
|
65 |
for k, v in dict(*a, **k).items(): |
9735
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
66 |
self[k] = v |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
67 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
68 |
def __len__(self): return len(self.dict) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
69 |
def __iter__(self): return iter(self.dict) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
70 |
def __contains__(self, key): return key in self.dict |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
71 |
def __delitem__(self, key): del self.dict[key] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
72 |
def keys(self): return self.dict.keys() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
73 |
def __getitem__(self, key): return self.get(key, KeyError, -1) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
74 |
def __setitem__(self, key, value): self.append(key, value) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
75 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
76 |
def append(self, key, value): self.dict.setdefault(key, []).append(value) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
77 |
def replace(self, key, value): self.dict[key] = [value] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
78 |
def getall(self, key): return self.dict.get(key) or [] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
79 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
80 |
def get(self, key, default=None, index=-1): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
81 |
if key not in self.dict and default != KeyError: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
82 |
return [default][index] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
83 |
return self.dict[key][index] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
84 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
85 |
def iterallitems(self): |
10662
10942ed172de
[py3k] dict.iteritems → dict.items
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10603
diff
changeset
|
86 |
for key, values in self.dict.items(): |
9735
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
87 |
for value in values: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
88 |
yield key, value |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
89 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
90 |
def tob(data, enc='utf8'): # Convert strings to bytes (py2 and py3) |
10745
5318337e7128
[multipart] unicode → six.text_type
Julien Cristau <julien.cristau@logilab.fr>
parents:
10741
diff
changeset
|
91 |
return data.encode(enc) if isinstance(data, text_type) else data |
9735
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
92 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
93 |
def copy_file(stream, target, maxread=-1, buffer_size=2*16): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
94 |
''' Read from :stream and write to :target until :maxread or EOF. ''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
95 |
size, read = 0, stream.read |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
96 |
while 1: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
97 |
to_read = buffer_size if maxread < 0 else min(buffer_size, maxread-size) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
98 |
part = read(to_read) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
99 |
if not part: return size |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
100 |
target.write(part) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
101 |
size += len(part) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
102 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
103 |
############################################################################## |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
104 |
################################ Header Parser ################################ |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
105 |
############################################################################## |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
106 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
107 |
_special = re.escape('()<>@,;:\\"/[]?={} \t') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
108 |
_re_special = re.compile('[%s]' % _special) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
109 |
_qstr = '"(?:\\\\.|[^"])*"' # Quoted string |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
110 |
_value = '(?:[^%s]+|%s)' % (_special, _qstr) # Save or quoted string |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
111 |
_option = '(?:;|^)\s*([^%s]+)\s*=\s*(%s)' % (_special, _value) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
112 |
_re_option = re.compile(_option) # key=value part of an Content-Type like header |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
113 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
114 |
def header_quote(val): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
115 |
if not _re_special.search(val): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
116 |
return val |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
117 |
return '"' + val.replace('\\','\\\\').replace('"','\\"') + '"' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
118 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
119 |
def header_unquote(val, filename=False): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
120 |
if val[0] == val[-1] == '"': |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
121 |
val = val[1:-1] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
122 |
if val[1:3] == ':\\' or val[:2] == '\\\\': |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
123 |
val = val.split('\\')[-1] # fix ie6 bug: full path --> filename |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
124 |
return val.replace('\\\\','\\').replace('\\"','"') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
125 |
return val |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
126 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
127 |
def parse_options_header(header, options=None): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
128 |
if ';' not in header: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
129 |
return header.lower().strip(), {} |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
130 |
ctype, tail = header.split(';', 1) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
131 |
options = options or {} |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
132 |
for match in _re_option.finditer(tail): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
133 |
key = match.group(1).lower() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
134 |
value = header_unquote(match.group(2), key=='filename') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
135 |
options[key] = value |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
136 |
return ctype, options |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
137 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
138 |
############################################################################## |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
139 |
################################## Multipart ################################## |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
140 |
############################################################################## |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
141 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
142 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
143 |
class MultipartError(ValueError): pass |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
144 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
145 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
146 |
class MultipartParser(object): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
147 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
148 |
def __init__(self, stream, boundary, content_length=-1, |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
149 |
disk_limit=2**30, mem_limit=2**20, memfile_limit=2**18, |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
150 |
buffer_size=2**16, charset='latin1'): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
151 |
''' Parse a multipart/form-data byte stream. This object is an iterator |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
152 |
over the parts of the message. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
153 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
154 |
:param stream: A file-like stream. Must implement ``.read(size)``. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
155 |
:param boundary: The multipart boundary as a byte string. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
156 |
:param content_length: The maximum number of bytes to read. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
157 |
''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
158 |
self.stream, self.boundary = stream, boundary |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
159 |
self.content_length = content_length |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
160 |
self.disk_limit = disk_limit |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
161 |
self.memfile_limit = memfile_limit |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
162 |
self.mem_limit = min(mem_limit, self.disk_limit) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
163 |
self.buffer_size = min(buffer_size, self.mem_limit) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
164 |
self.charset = charset |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
165 |
if self.buffer_size - 6 < len(boundary): # "--boundary--\r\n" |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
166 |
raise MultipartError('Boundary does not fit into buffer_size.') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
167 |
self._done = [] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
168 |
self._part_iter = None |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
169 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
170 |
def __iter__(self): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
171 |
''' Iterate over the parts of the multipart message. ''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
172 |
if not self._part_iter: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
173 |
self._part_iter = self._iterparse() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
174 |
for part in self._done: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
175 |
yield part |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
176 |
for part in self._part_iter: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
177 |
self._done.append(part) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
178 |
yield part |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
179 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
180 |
def parts(self): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
181 |
''' Returns a list with all parts of the multipart message. ''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
182 |
return list(iter(self)) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
183 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
184 |
def get(self, name, default=None): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
185 |
''' Return the first part with that name or a default value (None). ''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
186 |
for part in self: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
187 |
if name == part.name: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
188 |
return part |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
189 |
return default |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
190 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
191 |
def get_all(self, name): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
192 |
''' Return a list of parts with that name. ''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
193 |
return [p for p in self if p.name == name] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
194 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
195 |
def _lineiter(self): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
196 |
''' Iterate over a binary file-like object line by line. Each line is |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
197 |
returned as a (line, line_ending) tuple. If the line does not fit |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
198 |
into self.buffer_size, line_ending is empty and the rest of the line |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
199 |
is returned with the next iteration. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
200 |
''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
201 |
read = self.stream.read |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
202 |
maxread, maxbuf = self.content_length, self.buffer_size |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
203 |
_bcrnl = tob('\r\n') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
204 |
_bcr = _bcrnl[:1] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
205 |
_bnl = _bcrnl[1:] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
206 |
_bempty = _bcrnl[:0] # b'rn'[:0] -> b'' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
207 |
buffer = _bempty # buffer for the last (partial) line |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
208 |
while 1: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
209 |
data = read(maxbuf if maxread < 0 else min(maxbuf, maxread)) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
210 |
maxread -= len(data) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
211 |
lines = (buffer+data).splitlines(True) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
212 |
len_first_line = len(lines[0]) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
213 |
# be sure that the first line does not become too big |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
214 |
if len_first_line > self.buffer_size: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
215 |
# at the same time don't split a '\r\n' accidentally |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
216 |
if (len_first_line == self.buffer_size+1 and |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
217 |
lines[0].endswith(_bcrnl)): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
218 |
splitpos = self.buffer_size - 1 |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
219 |
else: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
220 |
splitpos = self.buffer_size |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
221 |
lines[:1] = [lines[0][:splitpos], |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
222 |
lines[0][splitpos:]] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
223 |
if data: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
224 |
buffer = lines[-1] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
225 |
lines = lines[:-1] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
226 |
for line in lines: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
227 |
if line.endswith(_bcrnl): yield line[:-2], _bcrnl |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
228 |
elif line.endswith(_bnl): yield line[:-1], _bnl |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
229 |
elif line.endswith(_bcr): yield line[:-1], _bcr |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
230 |
else: yield line, _bempty |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
231 |
if not data: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
232 |
break |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
233 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
234 |
def _iterparse(self): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
235 |
lines, line = self._lineiter(), '' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
236 |
separator = tob('--') + tob(self.boundary) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
237 |
terminator = tob('--') + tob(self.boundary) + tob('--') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
238 |
# Consume first boundary. Ignore leading blank lines |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
239 |
for line, nl in lines: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
240 |
if line: break |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
241 |
if line != separator: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
242 |
raise MultipartError("Stream does not start with boundary") |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
243 |
# For each part in stream... |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
244 |
mem_used, disk_used = 0, 0 # Track used resources to prevent DoS |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
245 |
is_tail = False # True if the last line was incomplete (cutted) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
246 |
opts = {'buffer_size': self.buffer_size, |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
247 |
'memfile_limit': self.memfile_limit, |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
248 |
'charset': self.charset} |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
249 |
part = MultipartPart(**opts) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
250 |
for line, nl in lines: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
251 |
if line == terminator and not is_tail: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
252 |
part.file.seek(0) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
253 |
yield part |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
254 |
break |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
255 |
elif line == separator and not is_tail: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
256 |
if part.is_buffered(): mem_used += part.size |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
257 |
else: disk_used += part.size |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
258 |
part.file.seek(0) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
259 |
yield part |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
260 |
part = MultipartPart(**opts) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
261 |
else: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
262 |
is_tail = not nl # The next line continues this one |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
263 |
part.feed(line, nl) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
264 |
if part.is_buffered(): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
265 |
if part.size + mem_used > self.mem_limit: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
266 |
raise MultipartError("Memory limit reached.") |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
267 |
elif part.size + disk_used > self.disk_limit: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
268 |
raise MultipartError("Disk limit reached.") |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
269 |
if line != terminator: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
270 |
raise MultipartError("Unexpected end of multipart stream.") |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
271 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
272 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
273 |
class MultipartPart(object): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
274 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
275 |
def __init__(self, buffer_size=2**16, memfile_limit=2**18, charset='latin1'): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
276 |
self.headerlist = [] |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
277 |
self.headers = None |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
278 |
self.file = False |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
279 |
self.size = 0 |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
280 |
self._buf = tob('') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
281 |
self.disposition, self.name, self.filename = None, None, None |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
282 |
self.content_type, self.charset = None, charset |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
283 |
self.memfile_limit = memfile_limit |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
284 |
self.buffer_size = buffer_size |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
285 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
286 |
def feed(self, line, nl=''): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
287 |
if self.file: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
288 |
return self.write_body(line, nl) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
289 |
return self.write_header(line, nl) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
290 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
291 |
def write_header(self, line, nl): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
292 |
line = line.decode(self.charset or 'latin1') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
293 |
if not nl: raise MultipartError('Unexpected end of line in header.') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
294 |
if not line.strip(): # blank line -> end of header segment |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
295 |
self.finish_header() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
296 |
elif line[0] in ' \t' and self.headerlist: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
297 |
name, value = self.headerlist.pop() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
298 |
self.headerlist.append((name, value+line.strip())) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
299 |
else: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
300 |
if ':' not in line: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
301 |
raise MultipartError("Syntax error in header: No colon.") |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
302 |
name, value = line.split(':', 1) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
303 |
self.headerlist.append((name.strip(), value.strip())) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
304 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
305 |
def write_body(self, line, nl): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
306 |
if not line and not nl: return # This does not even flush the buffer |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
307 |
self.size += len(line) + len(self._buf) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
308 |
self.file.write(self._buf + line) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
309 |
self._buf = nl |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
310 |
if self.content_length > 0 and self.size > self.content_length: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
311 |
raise MultipartError('Size of body exceeds Content-Length header.') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
312 |
if self.size > self.memfile_limit and isinstance(self.file, BytesIO): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
313 |
# TODO: What about non-file uploads that exceed the memfile_limit? |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
314 |
self.file, old = TemporaryFile(mode='w+b'), self.file |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
315 |
old.seek(0) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
316 |
copy_file(old, self.file, self.size, self.buffer_size) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
317 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
318 |
def finish_header(self): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
319 |
self.file = BytesIO() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
320 |
self.headers = Headers(self.headerlist) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
321 |
cdis = self.headers.get('Content-Disposition','') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
322 |
ctype = self.headers.get('Content-Type','') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
323 |
clen = self.headers.get('Content-Length','-1') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
324 |
if not cdis: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
325 |
raise MultipartError('Content-Disposition header is missing.') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
326 |
self.disposition, self.options = parse_options_header(cdis) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
327 |
self.name = self.options.get('name') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
328 |
self.filename = self.options.get('filename') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
329 |
self.content_type, options = parse_options_header(ctype) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
330 |
self.charset = options.get('charset') or self.charset |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
331 |
self.content_length = int(self.headers.get('Content-Length','-1')) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
332 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
333 |
def is_buffered(self): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
334 |
''' Return true if the data is fully buffered in memory.''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
335 |
return isinstance(self.file, BytesIO) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
336 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
337 |
@property |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
338 |
def value(self): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
339 |
''' Data decoded with the specified charset ''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
340 |
pos = self.file.tell() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
341 |
self.file.seek(0) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
342 |
val = self.file.read() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
343 |
self.file.seek(pos) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
344 |
return val.decode(self.charset) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
345 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
346 |
def save_as(self, path): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
347 |
fp = open(path, 'wb') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
348 |
pos = self.file.tell() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
349 |
try: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
350 |
self.file.seek(0) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
351 |
size = copy_file(self.file, fp) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
352 |
finally: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
353 |
self.file.seek(pos) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
354 |
return size |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
355 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
356 |
############################################################################## |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
357 |
#################################### WSGI #################################### |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
358 |
############################################################################## |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
359 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
360 |
def parse_form_data(environ, charset='utf8', strict=False, **kw): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
361 |
''' Parse form data from an environ dict and return a (forms, files) tuple. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
362 |
Both tuple values are dictionaries with the form-field name as a key |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
363 |
(unicode) and lists as values (multiple values per key are possible). |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
364 |
The forms-dictionary contains form-field values as unicode strings. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
365 |
The files-dictionary contains :class:`MultipartPart` instances, either |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
366 |
because the form-field was a file-upload or the value is to big to fit |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
367 |
into memory limits. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
368 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
369 |
:param environ: An WSGI environment dict. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
370 |
:param charset: The charset to use if unsure. (default: utf8) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
371 |
:param strict: If True, raise :exc:`MultipartError` on any parsing |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
372 |
errors. These are silently ignored by default. |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
373 |
''' |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
374 |
|
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
375 |
forms, files = MultiDict(), MultiDict() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
376 |
try: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
377 |
if environ.get('REQUEST_METHOD','GET').upper() not in ('POST', 'PUT'): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
378 |
raise MultipartError("Request method other than POST or PUT.") |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
379 |
content_length = int(environ.get('CONTENT_LENGTH', '-1')) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
380 |
content_type = environ.get('CONTENT_TYPE', '') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
381 |
if not content_type: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
382 |
raise MultipartError("Missing Content-Type header.") |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
383 |
content_type, options = parse_options_header(content_type) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
384 |
stream = environ.get('wsgi.input') or BytesIO() |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
385 |
kw['charset'] = charset = options.get('charset', charset) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
386 |
if content_type == 'multipart/form-data': |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
387 |
boundary = options.get('boundary','') |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
388 |
if not boundary: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
389 |
raise MultipartError("No boundary for multipart/form-data.") |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
390 |
for part in MultipartParser(stream, boundary, content_length, **kw): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
391 |
if part.filename or not part.is_buffered(): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
392 |
files[part.name] = part |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
393 |
else: # TODO: Big form-fields are in the files dict. really? |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
394 |
forms[part.name] = part.value |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
395 |
elif content_type in ('application/x-www-form-urlencoded', |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
396 |
'application/x-url-encoded'): |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
397 |
mem_limit = kw.get('mem_limit', 2**20) |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
398 |
if content_length > mem_limit: |
10746
407385314c0d
[multipart] decode form data before calling parse_qs in python 3
Julien Cristau <julien.cristau@logilab.fr>
parents:
10745
diff
changeset
|
399 |
raise MultipartError("Request too big. Increase MAXMEM.") |
9946
ec88c1a1904a
[wsgi] Fix unicode decoding in POST
Christophe de Vienne <christophe@unlish.com>
parents:
9735
diff
changeset
|
400 |
data = stream.read(mem_limit) |
9735
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
401 |
if stream.read(1): # These is more that does not fit mem_limit |
10746
407385314c0d
[multipart] decode form data before calling parse_qs in python 3
Julien Cristau <julien.cristau@logilab.fr>
parents:
10745
diff
changeset
|
402 |
raise MultipartError("Request too big. Increase MAXMEM.") |
407385314c0d
[multipart] decode form data before calling parse_qs in python 3
Julien Cristau <julien.cristau@logilab.fr>
parents:
10745
diff
changeset
|
403 |
if PY3: |
407385314c0d
[multipart] decode form data before calling parse_qs in python 3
Julien Cristau <julien.cristau@logilab.fr>
parents:
10745
diff
changeset
|
404 |
data = data.decode('ascii') |
9735
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
405 |
data = parse_qs(data, keep_blank_values=True) |
10662
10942ed172de
[py3k] dict.iteritems → dict.items
Rémi Cardona <remi.cardona@logilab.fr>
parents:
10603
diff
changeset
|
406 |
for key, values in data.items(): |
9735
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
407 |
for value in values: |
10746
407385314c0d
[multipart] decode form data before calling parse_qs in python 3
Julien Cristau <julien.cristau@logilab.fr>
parents:
10745
diff
changeset
|
408 |
if PY3: |
407385314c0d
[multipart] decode form data before calling parse_qs in python 3
Julien Cristau <julien.cristau@logilab.fr>
parents:
10745
diff
changeset
|
409 |
forms[key] = value |
407385314c0d
[multipart] decode form data before calling parse_qs in python 3
Julien Cristau <julien.cristau@logilab.fr>
parents:
10745
diff
changeset
|
410 |
else: |
407385314c0d
[multipart] decode form data before calling parse_qs in python 3
Julien Cristau <julien.cristau@logilab.fr>
parents:
10745
diff
changeset
|
411 |
forms[key.decode(charset)] = value.decode(charset) |
9735
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
412 |
else: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
413 |
raise MultipartError("Unsupported content type.") |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
414 |
except MultipartError: |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
415 |
if strict: raise |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
416 |
return forms, files |
b71158815bc8
[wsgi] avoid reading the entire request body in memory
Julien Cristau <julien.cristau@logilab.fr>
parents:
diff
changeset
|
417 |