Add From<&[f32]> conversions for bulk upload optimization by nazq · Pull Request #244 · qdrant/rust-client

nazq · 2025-11-05T17:18:42Z

Summary

Adds From<&[f32]> implementations for Vector and Vectors types to optimize bulk uploads from contiguous memory sources like Apache Arrow and NumPy.

Performance Impact

Benchmarks show 4-17% throughput improvement with significantly lower variance for bulk uploads:

Dataset Size	Vec (current)	Slice (optimized)	Improvement	Notes
10K points	93,321 pts/sec (±27K)	97,437 pts/sec (±10K)	+4.4%	More stable
50K points	75,906 pts/sec (±60K)	88,744 pts/sec (±4K)	+16.9%	Much more stable
100K points	73,579 pts/sec (±46K)	84,956 pts/sec (±16K)	+15.5%	Consistent gain

Key findings:

✅ Throughput improvements scale with dataset size
✅ Significantly lower variance (more predictable performance)
✅ 99,999 fewer allocations per 100K points

Benchmark setup: 384-dim vectors, 5K batch size, 5 iterations, localhost Qdrant 1.12, HNSW indexing disabled (m=0)

Changes

impl From<&[f32]> for Vector - Create dense vector from borrowed slice
impl From<&[f32]> for Vectors - Create single vector from borrowed slice
impl From<HashMap<String, &[f32]>> for Vectors - Create named vectors from borrowed slices

Motivation

When bulk uploading vectors from contiguous memory (Arrow FixedSizeListArray, NumPy arrays), the current API requires per-vector allocations:

// Current: N allocations
for row_idx in 0..num_rows {
    let vector = slice[start..end].to_vec(); // Allocation per vector
    points.push(PointStruct::new(id, vector, payload));
}

With this change:

// Optimized: Copy deferred to serialization
for row_idx in 0..num_rows {
    let vector_slice = &slice[start..end]; // Just a slice reference
    points.push(PointStruct::new(id, vector_slice, payload));
}

The copy still happens during protobuf serialization but with better cache locality and reduced allocator contention.

Use Cases

ETL pipelines processing Parquet files with embedding columns
Bulk uploads with multiple named vector fields (text + image embeddings)
High-throughput scenarios with large dimensions (768+)
Memory-constrained environments where allocation overhead matters

Benefits

Performance: 4-17% faster bulk uploads with lower variance
Memory efficiency: Eliminates per-vector allocations (N → ~1 during serialization)
API ergonomics: More idiomatic Rust - pass &[f32] slices directly
Cache locality: Sequential memory access during serialization
Reduced allocator pressure: Especially valuable under high concurrency

Testing

Added comprehensive unit tests for all three implementations
Backward compatibility verified - existing code unchanged
All existing tests pass
Performance validated with real-world Arrow → Qdrant benchmark

Backward Compatibility

✅ Fully backward compatible - no breaking changes. Existing From<Vec<f32>> implementations unchanged.

Benchmark Methodology

The performance numbers above were generated using a real-world ETL scenario:

Generate Parquet file with FixedSizeList embeddings (384 dims)
Read with Apache Arrow and extract contiguous buffer
Upload to Qdrant with indexing disabled (bulk load best practice)
Measure throughput across 5 iterations
Compare Vec (current) vs Slice (optimized) approaches

timvisee · 2025-11-10T16:57:30Z

Thank you for the detailed description. This makes a lot of sense.

I'll try to review this tomorrow. And I'll make sure to merge this as part of the upcoming release.

timvisee

Thanks! I'm happy to see the added tests as well.

Enables creating vectors from borrowed slices to optimize bulk uploads from contiguous memory sources (Arrow, NumPy). Reduces per-vector allocations by deferring copy to serialization time. Adds From implementations for: - Vector from &[f32] - Vectors from &[f32] - Vectors from HashMap<String, &[f32]> Particularly useful for ETL pipelines processing embeddings from Arrow FixedSizeListArray or NumPy arrays with named vector fields.

nazq · 2025-11-12T12:56:18Z

@timvisee Seems you got to the rebase before I did. Many thanks

Enables creating vectors from borrowed slices to optimize bulk uploads from contiguous memory sources (Arrow, NumPy). Reduces per-vector allocations by deferring copy to serialization time. Adds From implementations for: - Vector from &[f32] - Vectors from &[f32] - Vectors from HashMap<String, &[f32]> Particularly useful for ETL pipelines processing embeddings from Arrow FixedSizeListArray or NumPy arrays with named vector fields.

timvisee · 2025-11-17T15:35:01Z

@nazq Good news. We've just released a new version of this client, which means your changes are now available in the stable release:

https://github.com/qdrant/rust-client/releases/tag/v1.16.0

nazq mentioned this pull request Nov 5, 2025

In case you look into Arrow -> Qdrant GeorgeLeePatterson/qdrant-datafusion#16

Closed

timvisee requested review from timvisee and xzfc November 10, 2025 16:56

timvisee added the release:1.16.0 label Nov 10, 2025

timvisee requested changes Nov 12, 2025

View reviewed changes

timvisee force-pushed the feat/vector-slice-conversions branch from 4bac9a6 to 8627b2a Compare November 12, 2025 11:31

timvisee changed the base branch from master to dev November 12, 2025 11:31

timvisee approved these changes Nov 12, 2025

View reviewed changes

timvisee merged commit cf041ef into qdrant:dev Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add From<&[f32]> conversions for bulk upload optimization#244

Add From<&[f32]> conversions for bulk upload optimization#244
timvisee merged 1 commit into
qdrant:devfrom
nazq:feat/vector-slice-conversions

nazq commented Nov 5, 2025 •

edited

Loading

Uh oh!

timvisee commented Nov 10, 2025

Uh oh!

timvisee left a comment •

edited

Loading

Uh oh!

nazq commented Nov 12, 2025

Uh oh!

timvisee commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nazq commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance Impact

Changes

Motivation

Use Cases

Benefits

Testing

Backward Compatibility

Benchmark Methodology

Uh oh!

timvisee commented Nov 10, 2025

Uh oh!

timvisee left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nazq commented Nov 12, 2025

Uh oh!

timvisee commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nazq commented Nov 5, 2025 •

edited

Loading

timvisee left a comment •

edited

Loading