Boost.Corosio Performance Benchmarks

Table of Contents

Executive Summary
Detailed Results
Test Environment
Handler Dispatch Benchmarks
Socket Throughput Benchmarks
- Unidirectional Throughput
- Bidirectional Throughput
Socket Latency Benchmarks
- Ping-Pong Round-Trip Latency
  - Latency Distribution (64-byte messages)
- Concurrent Socket Pairs
HTTP Server Benchmarks
Analysis
- Performance Characteristics
- Scaling Behavior
Conclusions
Appendix: Raw Data
- Corosio Results
- Asio Results

Executive Summary

This report presents comprehensive performance benchmarks comparing Boost.Corosio against Boost.Asio on Windows using the IOCP (I/O Completion Ports) backend. The benchmarks cover handler dispatch, socket throughput, socket latency, and HTTP server workloads.

Bottom Line

Corosio demonstrates exceptional single-threaded handler dispatch performance (2× faster than Asio) and superior interleaved post/run throughput (70% faster). However, Asio shows better multi-threaded scaling in both handler dispatch and HTTP server workloads. Socket I/O throughput is essentially identical between the two implementations.

Where Corosio Excels

Single-threaded handler post: 2× faster than Asio (1.59 Mops/s vs 802 Kops/s)
Interleaved post/run: 70% faster (2.90 Mops/s vs 1.71 Mops/s)
Concurrent post and run: 14% faster (1.68 Mops/s vs 1.48 Mops/s)
Large-buffer throughput: Essentially identical, slight edge at some buffer sizes

Where Corosio Needs Improvement

Multi-threaded handler scaling: Throughput regresses from 4→8 threads (2.58→2.09 Mops/s)
Multi-threaded HTTP: Asio is 56% faster at 8 threads (337.68 vs 215.94 Kops/s)
Tail latency: p99 latency ~50% higher than Asio (21 μs vs 14 μs)
Concurrent connections: Latency increases faster than Asio under load

Key Insights

Component	Assessment
Handler Dispatch	Corosio is significantly faster single-threaded, but Asio scales better with threads
Socket I/O	Essentially identical throughput; Asio has ~0.5 μs lower latency per operation
HTTP Server	Asio outperforms at all thread counts; gap widens with more threads
Scaling Behavior	Corosio shows thread contention issues at 8 threads

Component

Assessment

Handler Dispatch

Corosio is significantly faster single-threaded, but Asio scales better with threads

Socket I/O

Essentially identical throughput; Asio has ~0.5 μs lower latency per operation

HTTP Server

Asio outperforms at all thread counts; gap widens with more threads

Scaling Behavior

Corosio shows thread contention issues at 8 threads

Next Steps

Profile multi-threaded contention: Investigate the 4→8 thread regression
Reduce per-operation latency: Target the ~0.5 μs gap in socket operations
Benchmark on Linux: Validate findings on epoll backend
Test realistic workloads: Mixed payload sizes and real-world traffic patterns

Detailed Results

Handler Dispatch Summary

Scenario	Corosio	Asio	Winner
Single-threaded post	1.59 Mops/s	802 Kops/s	Corosio (+98%)
Multi-threaded (8 threads)	2.09 Mops/s	3.02 Mops/s	Asio (+44%)
Interleaved post/run	2.90 Mops/s	1.71 Mops/s	Corosio (+70%)
Concurrent post/run	1.68 Mops/s	1.48 Mops/s	Corosio (+14%)

Scenario

Corosio

Asio

Winner

Single-threaded post

1.59 Mops/s

802 Kops/s

Corosio (+98%)

Multi-threaded (8 threads)

2.09 Mops/s

3.02 Mops/s

Asio (+44%)

Interleaved post/run

2.90 Mops/s

1.71 Mops/s

Corosio (+70%)

Concurrent post/run

1.68 Mops/s

1.48 Mops/s

Corosio (+14%)

Socket Throughput Summary

Scenario	Corosio	Asio	Winner
Unidirectional 1KB buffer	215 MB/s	213 MB/s	Tie
Unidirectional 64KB buffer	6.43 GB/s	6.40 GB/s	Tie
Bidirectional 64KB buffer	6.15 GB/s	6.50 GB/s	Asio (+6%)

Scenario

Corosio

Asio

Winner

Unidirectional 1KB buffer

215 MB/s

213 MB/s

Tie

Unidirectional 64KB buffer

6.43 GB/s

6.40 GB/s

Tie

Bidirectional 64KB buffer

6.15 GB/s

6.50 GB/s

Asio (+6%)

Socket Latency Summary

Scenario	Corosio	Asio	Winner
Ping-pong mean (64B)	10.10 μs	9.61 μs	Asio (-5%)
Ping-pong p99 (64B)	21.20 μs	13.30 μs	Asio (-37%)
16 concurrent pairs	162.95 μs	160.49 μs	Tie

Scenario

Corosio

Asio

Winner

Ping-pong mean (64B)

10.10 μs

9.61 μs

Asio (-5%)

Ping-pong p99 (64B)

21.20 μs

13.30 μs

Asio (-37%)

16 concurrent pairs

162.95 μs

160.49 μs

Tie

HTTP Server Summary

Scenario	Corosio	Asio	Winner
Single connection	96.31 Kops/s	95.96 Kops/s	Tie
32 connections, 8 threads	215.94 Kops/s	337.68 Kops/s	Asio (+56%)

Scenario

Corosio

Asio

Winner

Single connection

96.31 Kops/s

95.96 Kops/s

Tie

32 connections, 8 threads

215.94 Kops/s

337.68 Kops/s

Asio (+56%)

Test Environment

Platform

Windows (IOCP backend)

Benchmarks

Handler dispatch, socket throughput, socket latency, HTTP server

Measurement

Client-side latency and throughput

Handler Dispatch Benchmarks

These benchmarks measure raw handler posting and execution throughput, isolating the scheduler from I/O completion overhead.

Single-Threaded Handler Post

Posting 5,000,000 handlers from a single thread.

Metric	Corosio	Asio	Difference
Handlers	5,000,000	5,000,000	—
Elapsed	3.143 s	6.233 s	-50%
Throughput	1.59 Mops/s	802 Kops/s	+98%

Metric

Corosio

Asio

Difference

Handlers

5,000,000

—

Elapsed

3.143 s

6.233 s

-50%

Throughput

1.59 Mops/s

802 Kops/s

+98%

Key finding: Corosio’s single-threaded handler dispatch is nearly 2× faster than Asio.

Multi-Threaded Scaling

Multiple threads running handlers concurrently (5,000,000 handlers total).

Threads	Corosio	Asio	Corosio Speedup	Asio Speedup
1	2.46 Mops/s	1.51 Mops/s	(baseline)	(baseline)
2	2.24 Mops/s	2.16 Mops/s	0.91×	1.43×
4	2.58 Mops/s	2.97 Mops/s	1.05×	1.96×
8	2.09 Mops/s	3.02 Mops/s	0.85×	1.99×

Threads

Corosio

Asio

Corosio Speedup

Asio Speedup

2.46 Mops/s

1.51 Mops/s

(baseline)

2.24 Mops/s

2.16 Mops/s

0.91×

1.43×

2.58 Mops/s

2.97 Mops/s

1.05×

1.96×

2.09 Mops/s

3.02 Mops/s

0.85×

1.99×

Scaling Analysis

Throughput vs Thread Count:

Threads    Corosio    Asio       Winner
   1       2.46       1.51       Corosio +63%
   2       2.24       2.16       Corosio +4%
   4       2.58       2.97       Asio +15%
   8       2.09       3.02       Asio +44%
                ↑
           (regression)

Notable observations:

Corosio is faster at 1-2 threads
Crossover occurs between 2-4 threads
Corosio regresses from 4→8 threads (2.58 → 2.09 Mops/s)
Asio continues scaling through 8 threads

Interleaved Post/Run

Alternating between posting batches and running them (50,000 iterations × 100 handlers).

Metric	Corosio	Asio	Difference
Total handlers	5,000,000	5,000,000	—
Elapsed	1.723 s	2.930 s	-41%
Throughput	2.90 Mops/s	1.71 Mops/s	+70%

Metric

Corosio

Asio

Difference

Total handlers

5,000,000

—

Elapsed

1.723 s

2.930 s

-41%

Throughput

2.90 Mops/s

1.71 Mops/s

+70%

Key finding: Corosio excels at interleaved post/run patterns—a common pattern in real applications.

Concurrent Post and Run

Four threads simultaneously posting and running handlers.

Metric	Corosio	Asio	Difference
Threads	4	4	—
Total handlers	5,000,000	5,000,000	—
Elapsed	2.970 s	3.374 s	-12%
Throughput	1.68 Mops/s	1.48 Mops/s	+14%

Socket Throughput Benchmarks

Unidirectional Throughput

Single direction transfer of 4096 MB with varying buffer sizes.

Buffer Size	Corosio	Asio	Difference
1024 bytes	215.20 MB/s	213.17 MB/s	+1%
4096 bytes	757.98 MB/s	743.34 MB/s	+2%
16384 bytes	2.56 GB/s	2.58 GB/s	-1%
65536 bytes	6.43 GB/s	6.40 GB/s	+0.5%

Buffer Size

Corosio

Asio

Difference

1024 bytes

215.20 MB/s

213.17 MB/s

+1%

4096 bytes

757.98 MB/s

743.34 MB/s

+2%

16384 bytes

2.56 GB/s

2.58 GB/s

-1%

65536 bytes

6.43 GB/s

6.40 GB/s

+0.5%

Observation: Throughput is essentially identical. Both implementations achieve excellent performance at large buffer sizes.

Bidirectional Throughput

Simultaneous transfer of 2048 MB in each direction (4096 MB total).

Buffer Size	Corosio	Asio	Difference
1024 bytes	214.55 MB/s	212.18 MB/s	+1%
4096 bytes	707.35 MB/s	755.43 MB/s	-6%
16384 bytes	2.48 GB/s	2.59 GB/s	-4%
65536 bytes	6.15 GB/s	6.50 GB/s	-5%

Buffer Size

Corosio

Asio

Difference

1024 bytes

214.55 MB/s

212.18 MB/s

+1%

4096 bytes

707.35 MB/s

755.43 MB/s

-6%

16384 bytes

2.48 GB/s

2.59 GB/s

-4%

65536 bytes

6.15 GB/s

6.50 GB/s

-5%

Observation: Asio has a slight edge in bidirectional throughput at larger buffer sizes, but differences are small.

Socket Latency Benchmarks

Ping-Pong Round-Trip Latency

Single socket pair exchanging messages (1,000,000 iterations each).

Message Size	Corosio Mean	Asio Mean	Difference	Corosio p99	Asio p99
1 byte	10.04 μs	9.66 μs	+4%	21.10 μs	14.20 μs
64 bytes	10.10 μs	9.61 μs	+5%	21.20 μs	13.30 μs
1024 bytes	10.03 μs	9.66 μs	+4%	21.10 μs	12.30 μs

Message Size

Corosio Mean

Asio Mean

Difference

Corosio p99

Asio p99

1 byte

10.04 μs

9.66 μs

+4%

21.10 μs

14.20 μs

64 bytes

10.10 μs

9.61 μs

+5%

21.20 μs

13.30 μs

1024 bytes

10.03 μs

9.66 μs

+4%

21.10 μs

12.30 μs

Latency Distribution (64-byte messages)

Percentile	Corosio	Asio	Difference
p50	9.60 μs	9.20 μs	+4%
p90	9.80 μs	9.70 μs	+1%
p99	21.20 μs	13.30 μs	+59%
p99.9	115.70 μs	76.40 μs	+51%
min	8.30 μs	8.10 μs	+2%
max	3.15 ms	2.13 ms	+48%

Percentile

Corosio

Asio

Difference

p50

9.60 μs

9.20 μs

+4%

p90

9.80 μs

9.70 μs

+1%

p99

21.20 μs

13.30 μs

+59%

p99.9

115.70 μs

76.40 μs

+51%

min

8.30 μs

8.10 μs

+2%

max

3.15 ms

2.13 ms

+48%

Observation: Mean latencies are very close (~0.5 μs difference), but Corosio has significantly higher tail latency (p99+).

Concurrent Socket Pairs

Multiple socket pairs operating concurrently (64-byte messages).

Pairs	Iterations	Corosio Mean	Asio Mean	Corosio p99	Asio p99
1	1,000,000	9.95 μs	9.55 μs	19.20 μs	13.10 μs
4	500,000	40.90 μs	39.54 μs	81.88 μs	69.60 μs
16	250,000	162.95 μs	160.49 μs	357.36 μs	344.09 μs

Pairs

Iterations

Corosio Mean

Asio Mean

Corosio p99

Asio p99

1,000,000

9.95 μs

9.55 μs

19.20 μs

13.10 μs

500,000

40.90 μs

39.54 μs

81.88 μs

69.60 μs

250,000

162.95 μs

160.49 μs

357.36 μs

344.09 μs

Observation: Both implementations scale similarly with concurrent pairs. Asio maintains a small latency advantage throughout.

HTTP Server Benchmarks

Single Connection (Sequential Requests)

Metric	Corosio	Asio	Difference
Requests	1,000,000	1,000,000	—
Elapsed	10.383 s	10.421 s	-0.4%
Throughput	96.31 Kops/s	95.96 Kops/s	+0.4%
Mean latency	10.36 μs	10.39 μs	-0.3%
p99 latency	14.70 μs	13.80 μs	+7%

Metric

Corosio

Asio

Difference

Requests

1,000,000

—

Elapsed

10.383 s

10.421 s

-0.4%

Throughput

96.31 Kops/s

95.96 Kops/s

+0.4%

Mean latency

10.36 μs

10.39 μs

-0.3%

p99 latency

14.70 μs

13.80 μs

+7%

Observation: Single-connection HTTP performance is essentially identical.

Concurrent Connections (Single Thread)

Connections	Corosio Throughput	Asio Throughput	Corosio Mean	Asio Mean	Gap
1	92.71 Kops/s	92.35 Kops/s	10.76 μs	10.80 μs	Tie
4	92.64 Kops/s	91.14 Kops/s	43.15 μs	43.86 μs	Tie
16	92.03 Kops/s	90.38 Kops/s	173.83 μs	177.00 μs	Tie
32	92.14 Kops/s	89.11 Kops/s	347.27 μs	359.06 μs	Corosio +3%

Connections

Corosio Throughput

Asio Throughput

Corosio Mean

Asio Mean

Gap

92.71 Kops/s

92.35 Kops/s

10.76 μs

10.80 μs

Tie

92.64 Kops/s

91.14 Kops/s

43.15 μs

43.86 μs

Tie

92.03 Kops/s

90.38 Kops/s

173.83 μs

177.00 μs

Tie

92.14 Kops/s

89.11 Kops/s

347.27 μs

359.06 μs

Corosio +3%

Observation: Single-threaded HTTP performance scales identically with connection count.

Multi-Threaded HTTP (32 Connections)

Threads	Corosio Throughput	Asio Throughput	Gap	Scaling Factor
1	89.72 Kops/s	88.25 Kops/s	+2%	(baseline)
2	127.27 Kops/s	127.48 Kops/s	0%	1.42× / 1.44×
4	141.15 Kops/s	210.64 Kops/s	-33%	1.57× / 2.39×
8	215.94 Kops/s	337.68 Kops/s	-36%	2.41× / 3.83×

Threads

Corosio Throughput

Asio Throughput

Gap

Scaling Factor

89.72 Kops/s

88.25 Kops/s

+2%

(baseline)

127.27 Kops/s

127.48 Kops/s

1.42× / 1.44×

141.15 Kops/s

210.64 Kops/s

-33%

1.57× / 2.39×

215.94 Kops/s

337.68 Kops/s

-36%

2.41× / 3.83×

Multi-Threaded Latency

Threads	Corosio Mean	Asio Mean	Corosio p99	Asio p99
1	356.63 μs	362.58 μs	748.50 μs	620.88 μs
2	251.37 μs	250.92 μs	384.09 μs	352.85 μs
4	226.46 μs	151.75 μs	447.79 μs	192.31 μs
8	147.86 μs	94.26 μs	188.26 μs	120.68 μs

Threads

Corosio Mean

Asio Mean

Corosio p99

Asio p99

356.63 μs

362.58 μs

748.50 μs

620.88 μs

251.37 μs

250.92 μs

384.09 μs

352.85 μs

226.46 μs

151.75 μs

447.79 μs

192.31 μs

147.86 μs

94.26 μs

188.26 μs

120.68 μs

Key finding: Asio scales significantly better in multi-threaded HTTP workloads, achieving 3.83× scaling from 1→8 threads compared to Corosio’s 2.41×.

Analysis

Performance Characteristics

Handler Dispatch

Corosio shows dramatically better single-threaded performance but struggles with multi-threaded scaling:

Scenario	Corosio Advantage	Notes
Single-threaded	+98%	Nearly 2× faster
Interleaved post/run	+70%	Excellent batch handling
Concurrent 4 threads	+14%	Still competitive
8 threads	-44%	Scaling regression

Scenario

Corosio Advantage

Notes

Single-threaded

+98%

Nearly 2× faster

Interleaved post/run

+70%

Excellent batch handling

Concurrent 4 threads

+14%

Still competitive

8 threads

-44%

Scaling regression

Socket I/O

Socket throughput is essentially identical between implementations. Latency shows:

Mean latency: Corosio ~0.5 μs slower
Tail latency: Corosio ~50% higher at p99

HTTP Server

The HTTP benchmarks reveal a scaling disparity:

Multi-threaded HTTP Throughput:

Threads    Corosio      Asio        Winner
   1       89.7 K       88.3 K      Tie
   2       127.3 K      127.5 K     Tie
   4       141.2 K      210.6 K     Asio +49%
   8       215.9 K      337.7 K     Asio +56%

Scaling Behavior

The benchmarks reveal a consistent pattern:

Behavior	Evidence
Single-threaded excellence	2× faster handler dispatch, competitive HTTP
Multi-thread contention	Regression at 8 threads in handler dispatch
HTTP scaling gap	Asio achieves 3.83× scaling vs Corosio’s 2.41×

Behavior

Evidence

Single-threaded excellence

2× faster handler dispatch, competitive HTTP

Multi-thread contention

Regression at 8 threads in handler dispatch

HTTP scaling gap

Asio achieves 3.83× scaling vs Corosio’s 2.41×

Conclusions

Strengths

Corosio:

Exceptional single-threaded handler dispatch (2× faster)
Superior interleaved post/run performance (70% faster)
Competitive socket I/O throughput
Identical single-connection HTTP performance

Asio:

Better multi-threaded scaling (no regression at 8 threads)
Superior multi-threaded HTTP throughput (+56% at 8 threads)
Lower tail latency in socket operations
More predictable performance under load

Recommendations

Workload	Recommendation
Single-threaded handler processing	Corosio is 2× faster
Interleaved post/run patterns	Corosio is 70% faster
Multi-threaded HTTP servers	Asio scales better (56% faster at 8 threads)
Bulk socket transfers	Either—performance is identical

Workload

Recommendation

Single-threaded handler processing

Corosio is 2× faster

Interleaved post/run patterns

Corosio is 70% faster

Multi-threaded HTTP servers

Asio scales better (56% faster at 8 threads)

Bulk socket transfers

Either—performance is identical

Future Work

Profile the multi-threaded contention causing 8-thread regression
Investigate HTTP scaling disparity
Benchmark on Linux (epoll backend)
Test with realistic HTTP payloads and traffic patterns

Appendix: Raw Data

Corosio Results

Backend: iocp

=== Single-threaded Handler Post ===
  Handlers:    5000000
  Elapsed:     3.143 s
  Throughput:  1.59 Mops/s

=== Multi-threaded Scaling ===
  Handlers per test: 5000000

  1 thread(s): 2.46 Mops/s
  2 thread(s): 2.24 Mops/s (speedup: 0.91x)
  4 thread(s): 2.58 Mops/s (speedup: 1.05x)
  8 thread(s): 2.09 Mops/s (speedup: 0.85x)

=== Interleaved Post/Run ===
  Iterations:        50000
  Handlers/iter:     100
  Total handlers:    5000000
  Elapsed:           1.723 s
  Throughput:        2.90 Mops/s

=== Concurrent Post and Run ===
  Threads:           4
  Handlers/thread:   1250000
  Total handlers:    5000000
  Elapsed:           2.970 s
  Throughput:        1.68 Mops/s

=== Unidirectional Throughput ===
  Buffer size: 1024 bytes, Transfer: 4096 MB
    Throughput: 215.20 MB/s

  Buffer size: 4096 bytes, Transfer: 4096 MB
    Throughput: 757.98 MB/s

  Buffer size: 16384 bytes, Transfer: 4096 MB
    Throughput: 2.56 GB/s

  Buffer size: 65536 bytes, Transfer: 4096 MB
    Throughput: 6.43 GB/s

=== Bidirectional Throughput ===
  Buffer size: 1024 bytes: 214.55 MB/s (combined)
  Buffer size: 4096 bytes: 707.35 MB/s (combined)
  Buffer size: 16384 bytes: 2.48 GB/s (combined)
  Buffer size: 65536 bytes: 6.15 GB/s (combined)

=== Ping-Pong Round-Trip Latency ===
  1 byte:    mean=10.04 us, p99=21.10 us
  64 bytes:  mean=10.10 us, p99=21.20 us
  1024 bytes: mean=10.03 us, p99=21.10 us

=== Concurrent Socket Pairs Latency ===
  1 pair:   mean=9.95 us, p99=19.20 us
  4 pairs:  mean=40.90 us, p99=81.88 us
  16 pairs: mean=162.95 us, p99=357.36 us

=== HTTP Single Connection ===
  Throughput: 96.31 Kops/s
  Latency: mean=10.36 us, p99=14.70 us

=== HTTP Multi-threaded (32 connections) ===
  1 thread:  89.72 Kops/s, mean=356.63 us
  2 threads: 127.27 Kops/s, mean=251.37 us
  4 threads: 141.15 Kops/s, mean=226.46 us
  8 threads: 215.94 Kops/s, mean=147.86 us

Asio Results

=== Single-threaded Handler Post ===
  Handlers:    5000000
  Elapsed:     6.233 s
  Throughput:  802.18 Kops/s

=== Multi-threaded Scaling ===
  Handlers per test: 5000000

  1 thread(s): 1.51 Mops/s
  2 thread(s): 2.16 Mops/s (speedup: 1.43x)
  4 thread(s): 2.97 Mops/s (speedup: 1.96x)
  8 thread(s): 3.02 Mops/s (speedup: 1.99x)

=== Interleaved Post/Run ===
  Iterations:        50000
  Handlers/iter:     100
  Total handlers:    5000000
  Elapsed:           2.930 s
  Throughput:        1.71 Mops/s

=== Concurrent Post and Run ===
  Threads:           4
  Handlers/thread:   1250000
  Total handlers:    5000000
  Elapsed:           3.374 s
  Throughput:        1.48 Mops/s

=== Unidirectional Throughput ===
  Buffer size: 1024 bytes: 213.17 MB/s
  Buffer size: 4096 bytes: 743.34 MB/s
  Buffer size: 16384 bytes: 2.58 GB/s
  Buffer size: 65536 bytes: 6.40 GB/s

=== Bidirectional Throughput ===
  Buffer size: 1024 bytes: 212.18 MB/s (combined)
  Buffer size: 4096 bytes: 755.43 MB/s (combined)
  Buffer size: 16384 bytes: 2.59 GB/s (combined)
  Buffer size: 65536 bytes: 6.50 GB/s (combined)

=== Ping-Pong Round-Trip Latency ===
  1 byte:    mean=9.66 us, p99=14.20 us
  64 bytes:  mean=9.61 us, p99=13.30 us
  1024 bytes: mean=9.66 us, p99=12.30 us

=== Concurrent Socket Pairs Latency ===
  1 pair:   mean=9.55 us, p99=13.10 us
  4 pairs:  mean=39.54 us, p99=69.60 us
  16 pairs: mean=160.49 us, p99=344.09 us

=== HTTP Single Connection ===
  Throughput: 95.96 Kops/s
  Latency: mean=10.39 us, p99=13.80 us

=== HTTP Multi-threaded (32 connections) ===
  1 thread:  88.25 Kops/s, mean=362.58 us
  2 threads: 127.48 Kops/s, mean=250.92 us
  4 threads: 210.64 Kops/s, mean=151.75 us
  8 threads: 337.68 Kops/s, mean=94.26 us

Edit this Page