Differences

This shows you the differences between two versions of the page.

--- lacre:ongoing [2023/12/03 10:59] – remove finished items pfm
+++ lacre:ongoing [2024/04/26 17:22] (current) – add: further transformation research pfm
@@ Line 2: / Line 2: @@
   - "Great renaming" --- our fork is ours, let's reflect it with unifying the names used all over the code (see [[lacreissue>81|issue 81]]).
+===== Reliability improvements =====
+  * Key lifecycle analysis and fixes. I've documented it with graphviz (see [[https://git.disroot.org/Disroot/gpg-lacre/src/commit/4c1853dd8a260d7358e164a16a1fdb7cfcd06323/doc/key-lifecycle.gv|doc/key-lifecycle.gv]]. {{ :lacre:key-lifecycle.png?200|}}
+  * Database transactions.
+  * Another approach at encoding issues --- there have been reports on broken encodings from Disroot users.
+===== Research: avoiding message transformation =====
+Python's ''email'' module provides a set of parsers for email messages, among them:
+  * [[https://docs.python.org/3/library/email.parser.html#email.parser.BytesParser|BytesParser]], which just consumes a ''bytes'' sequence and produces an ''EmailMessage'', with all of its data and meta-data kept in memory;
+  * [[https://docs.python.org/3/library/email.parser.html#email.parser.BytesHeaderParser|BytesHeaderParser]], which only parses headers and doesn't provide features to access parts of a multi-part message.
+The problem with the first approach is that if we choose to encrypt an ''EmailMessage'', its contents is serialised by an appropriate [[https://docs.python.org/3/library/email.generator.html|email.generator]] implementation, instead of producing the original content. This is OK for payloads like text in pure ASCII or Unicode with Latin scripts, but things get tricky when we want to transfer non-Latin scripts (e.g. Japanese, Chinese, or any of the Cyrillic alphabets). (This is mostly caused by different ''Content-Transfer-Encoding'' being chosen by ''email.generator''.)
+''BytesHeaderParser'' and its string counterpart let us only parse the headers and keep original message body in memory. The drawback is that we can't process multipart messages as sequences of MIME entities, which we need for one of the Lacre's modes of operation. (There are two: PGP/MIME with whole body encrypted and PGP/Inline with each part of multipart message encrypted separately.) We could use ''BytesHeaderParser'' with PGP/MIME only, because a multipart message would be impossible to handle in PGP/Inline mode.
+===== Research: avoiding message transformation (part 2) =====
+It turns out that we can avoid //some// of the transformations:
+  * For messages in cleartext, already encrypted and those processed in PGP/MIME mode we could just take original message content ([[https://aiosmtpd.aio-libs.org/en/latest/concepts.html#Envelope.original_content|Envelope.original_content]]) and process it. Thus we'd avoid using [[https://docs.python.org/3/library/email.contentmanager.html#module-email.contentmanager|email.contentmanager]], which is the component responsible for decoding MIME entities.
+  * Messages processed in PGP/Inline mode would risk being transformed before encryption.
+To achieve that, we'd need to adjust the flow and:
+  - Work with a plain ''Envelope'' initially to identify identities and prepare for encryption.
+  - We could use header-only parser mentioned above during delivery planning (''lacre.core'', function ''delivery_plan'').
+  - Finally, if the message is known to be processed in PGP/Inline mode, we could load each MIME entity's body and process it. (Without ContentManager //if possible//.)