PadLock processing multiple blocks at a time

Michal Ludvig michal at logix.cz
Tue Jan 11 18:03:57 CET 2005


Hi all,

I have got some improvements for VIA PadLock crypto driver.

1. Generic extension to crypto/cipher.c that allows offloading the 
   encryption of the whole buffer in a given mode (CBC, ...) to the 
   algorithm provider (i.e. PadLock). Basically it extends 'struct 
   cipher_alg' by some new fields:

@@ -69,6 +73,18 @@ struct cipher_alg {
                          unsigned int keylen, u32 *flags);
        void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src);
        void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src);
+       size_t cia_max_nbytes;
+       size_t cia_req_align;
+       void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
+       void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
+       void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
+       void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
+       void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
 };

  If cia_<mode> is non-NULL that function is used instead of the 
  software <mode>_process chaining function (e.g. cbc_process()). In the 
  case of PadLock it can significantly speed-up the {en,de}cryption.

2. On top of this I have an extension of the padlock module to support 
   this scheme.

I will send both patches in separate follow ups.

The speedup gained by this change is quite significant (measured with 
bonnie on ext2 over dm-crypt with aes128):

			No encryption	2.6.10-bk1	multiblock
Writing with putc()	10454 (100%)	7479  (72%)	9353  (89%)
Rewriting		16510 (100%)	7628  (46%)	10611 (64%)
Writing intelligently	61128 (100%)	21132 (35%)	48103 (79%)
Reading with getc()	9406  (100%)	6916  (74%)	8801  (94%)
Reading intelligently	35885 (100%)	15271 (43%)	23202 (65%)

Numbers are in kB/s, percents show the slowdown from plaintext run. 
As can be seen, the multiblock encryption is significantly faster 
in comparsion to the already comitted single-block-at-a-time 
processing.

More statistics (e.g. comparsion with aes.ko and aes-i586.ko) are 
available at http://www.logix.cz/michal/devel/padlock/bench.xp

Dave, if you're OK with these changes, please merge them.

Michal Ludvig
-- 
* A mouse is a device used to point at the xterm you want to type in.
* Personal homepage - http://www.logix.cz/michal


More information about the CryptoAPI mailing list