Python's email
module provides a set of parsers for email messages, among them:
bytes
sequence and produces an EmailMessage
, with all of its data and meta-data kept in memory;
The problem with the first approach is that if we choose to encrypt an EmailMessage
, its contents is serialised by an appropriate email.generator implementation, instead of producing the original content. This is OK for payloads like text in pure ASCII or Unicode with Latin scripts, but things get tricky when we want to transfer non-Latin scripts (e.g. Japanese, Chinese, or any of the Cyrillic alphabets). (This is mostly caused by different Content-Transfer-Encoding
being chosen by email.generator
.)
BytesHeaderParser
and its string counterpart let us only parse the headers and keep original message body in memory. The drawback is that we can't process multipart messages as sequences of MIME entities, which we need for one of the Lacre's modes of operation. (There are two: PGP/MIME with whole body encrypted and PGP/Inline with each part of multipart message encrypted separately.) We could use BytesHeaderParser
with PGP/MIME only, because a multipart message would be impossible to handle in PGP/Inline mode.
It turns out that we can avoid some of the transformations:
To achieve that, we'd need to adjust the flow and:
Envelope
initially to identify identities and prepare for encryption.lacre.core
, function delivery_plan
).
Turns out when Thunderbird is expected to send a plain text message (something that would usually become a text/plain
MIME entity accompanied by a bunch of headers), it does the following:
multipart/encrypted
MIME entity (RFC 4880 and 3156 compliant);multipart/mixed
MIME entity;Populating the contents depends on the mode:
Subject:
is encrypted too, original Subject:
is replaced with the string …
.multipart/mixed
entity.multipart/mixed
entity.
Unfortunately we need to compose new MIME entity for every PGP/MIME-encrypted message, so we need to load contents of each part. To avoid transformation we could switch from get_content()
to get_payload(decode=False)
, so we'd get strings with raw MIME entity payloads, giving us a chance to populate multipart/mixed
in step 2 above with non-transformed payloads.