c# - StackExchange redis client very slow compared to benchmark tests -
i'm implementing redis caching layer using stackexchange redis client , performance right bordering on unusable.
i have local environment web application , redis server running on same machine. ran redis benchmark test against redis server , results (i'm including set , operations in write up):
c:\program files\redis>redis-benchmark -n 100000 ====== ping_inline ====== 100000 requests completed in 0.88 seconds 50 parallel clients 3 bytes payload keep alive: 1 ====== set ====== 100000 requests completed in 0.89 seconds 50 parallel clients 3 bytes payload keep alive: 1 99.70% <= 1 milliseconds 99.90% <= 2 milliseconds 100.00% <= 3 milliseconds 111982.08 requests per second ====== ====== 100000 requests completed in 0.81 seconds 50 parallel clients 3 bytes payload keep alive: 1 99.87% <= 1 milliseconds 99.98% <= 2 milliseconds 100.00% <= 2 milliseconds 124069.48 requests per second
so according benchmarks looking @ on 100,000 sets , 100,000 gets, per second. wrote unit test 300,000 set/gets:
private string rediscacheconn = "localhost:6379,allowadmin=true,abortconnect=false,ssl=false"; [fact] public void perftestwriteshortstring() { cachemanager cm = new cachemanager(rediscacheconn); string svalue = "t"; string skey = "testtesttest"; (int = 0; < 300000; i++) { cm.savecache(skey + i, svalue); string valread = cm.obtainitemfromcachestring(skey + i); } }
this uses following class perform redis operations via stackexchange client:
using stackexchange.redis; namespace caching { public class cachemanager:icachemanager, icachemanagerreports { private static string cs; private static configurationoptions options; private int pagesize = 5000; public icacheserializer serializer { get; set; } public cachemanager(string connectionstring) { serializer = new serializejson(); cs = connectionstring; options = configurationoptions.parse(connectionstring); options.synctimeout = 60000; } private static readonly lazy<connectionmultiplexer> lazyconnection = new lazy<connectionmultiplexer>(() => connectionmultiplexer.connect(options)); private static connectionmultiplexer connection => lazyconnection.value; private static idatabase cache => connection.getdatabase(); public string obtainitemfromcachestring(string cacheid) { return cache.stringget(cacheid); } public void savecache<t>(string cacheid, t cacheentry, timespan? expiry = null) { if (isvaluetype<t>()) { cache.stringset(cacheid, cacheentry.tostring(), expiry); } else { cache.stringset(cacheid, serializer.serializeobject(cacheentry), expiry); } } public bool isvaluetype<t>() { return typeof(t).isvaluetype || typeof(t) == typeof(string); } } }
my json serializer using newtonsoft.json:
using system.collections.generic; using newtonsoft.json; namespace caching { public class serializejson:icacheserializer { public string serializeobject<t>(t cacheentry) { return jsonconvert.serializeobject(cacheentry, formatting.none, new jsonserializersettings() { referenceloophandling = referenceloophandling.ignore }); } public t deserializeobject<t>(string data) { return jsonconvert.deserializeobject<t>(data, new jsonserializersettings() { referenceloophandling = referenceloophandling.ignore }); } } }
my test times around 21 seconds (for 300,000 sets , 300,000 gets). gives me around 28,500 operations per second (at least 3 times slower expect using benchmarks). application converting use redis pretty chatty , heavy requests can approximate 200,000 total operations against redis. wasn't expecting same times getting when using system runtime cache, delays after change significant. doing wrong implementation , know why benchmarked figures faster stackechange test figures?
thanks, paul
my results code below:
connecting server... connected ping (sync per op) 1709ms 1000000 ops on 50 threads took 1.709594 seconds 585137 ops/s set (sync per op) 759ms 500000 ops on 50 threads took 0.7592914 seconds 658761 ops/s (sync per op) 780ms 500000 ops on 50 threads took 0.7806102 seconds 641025 ops/s ping (pipelined per thread) 3751ms 1000000 ops on 50 threads took 3.7510956 seconds 266595 ops/s set (pipelined per thread) 1781ms 500000 ops on 50 threads took 1.7819831 seconds 280741 ops/s (pipelined per thread) 1977ms 500000 ops on 50 threads took 1.9772623 seconds 252908 ops/s
===
server configuration: make sure persistence disabled, etc
the first thing should in benchmark is: benchmark 1 thing. @ moment you're including lot of serialization overhead, won't clear picture. ideally, for like-for-like benchmark, should using 3-byte fixed payload, because:
3 bytes payload
next, you'd need @ parallelism:
50 parallel clients
it isn't clear whether test parallel, if isn't should absolutely expect see less raw throughput. conveniently, se.redis designed easy parallelize: can spin multiple threads talking the same connection (this has advantage of avoiding packet fragmentation, can end multiple messages per packet, where-as single-thread sync approach guaranteed use @ 1 message per packet).
finally, need understand listed benchmark doing. doing:
(send, receive) x n
or doing
send x n, receive separately until n received
? both options possible. sync api usage first one, second test equally well-defined, , know: that's measuring. there 2 ways of simulating second setup:
- send first (n-1) messages "fire , forget" flag, actually wait last one
- use
*async
api messages, ,wait()
orawait
lasttask
here's benchmark used in above, shows both "sync per op" (via sync api) , "pipeline per thread" (using *async
api , waiting last task per thread), both using 50 threads:
using stackexchange.redis; using system; using system.diagnostics; using system.threading; using system.threading.tasks; static class p { static void main() { console.writeline("connecting server..."); using (var muxer = connectionmultiplexer.connect("127.0.0.1")) { console.writeline("connected"); var db = muxer.getdatabase(); rediskey key = "some key"; byte[] payload = new byte[3]; new random(12345).nextbytes(payload); redisvalue value = payload; dowork("ping (sync per op)", db, 1000000, 50, x => { x.ping(); return null; }); dowork("set (sync per op)", db, 500000, 50, x => { x.stringset(key, value); return null; }); dowork("get (sync per op)", db, 500000, 50, x => { x.stringget(key); return null; }); dowork("ping (pipelined per thread)", db, 1000000, 50, x => x.pingasync()); dowork("set (pipelined per thread)", db, 500000, 50, x => x.stringsetasync(key, value)); dowork("get (pipelined per thread)", db, 500000, 50, x => x.stringgetasync(key)); } } static void dowork(string action, idatabase db, int count, int threads, func<idatabase, task> op) { object startup = new object(), shutdown = new object(); int activethreads = 0, outstandingops = count; stopwatch sw = default(stopwatch); var threadstart = new threadstart(() => { lock(startup) { if(++activethreads == threads) { sw = stopwatch.startnew(); monitor.pulseall(startup); } else { monitor.wait(startup); } } task final = null; while (interlocked.decrement(ref outstandingops) >= 0) { final = op(db); } if (final != null) final.wait(); lock(shutdown) { if (--activethreads == 0) { sw.stop(); monitor.pulseall(shutdown); } } }); lock (shutdown) { (int = 0; < threads; i++) { new thread(threadstart).start(); } monitor.wait(shutdown); console.writeline($@"{action} {sw.elapsedmilliseconds}ms {count} ops on {threads} threads took {sw.elapsed.totalseconds} seconds {(count * 1000) / sw.elapsedmilliseconds} ops/s"); } } }
Comments
Post a Comment