feat: Expose Sketch Memory Footprint#136
Conversation
|
@tisonkun does this approach look OK to you? If so I can update the remaining sketches. |
tisonkun
left a comment
There was a problem hiding this comment.
Generally LGTM.
I have two comments:
-
Naming is important. We can spend a few more time on considering names. So far, I may prefer
estimated_bytesbecause (1) it clearly describes the unit is bytes (2) we may not always have the accurate size so leaving it as a hint should be better; this is what this info is typically used IMO. (Disclose: the nameestimated_bytesis used by ScopeDB for similar cases.) -
I'd ping @leerho @AlexanderSaydakov @proost @freakyzoidberg for information because this API can be helpful for other (lang) datasketches impls.
There was a problem hiding this comment.
Pull request overview
Adds a public size(&self) -> usize method that reports the in-memory footprint (stack + heap) of HllSketch and CpcSketch, addressing issue #135. Internal heap_size() helpers are added on the backing storage types (HLL Container, AuxMap, Array4/6/8, and CPC PairTable) and combined with std::mem::size_of::<Self>() at the sketch level.
Changes:
- Add
HllSketch::size()that dispatches onModeand sums backing-storage heap withsize_of::<Self>(). - Add
CpcSketch::size()that sumssliding_window+ optionalPairTableheap withsize_of::<Self>(). - Add
heap_size()helpers onContainer,AuxMap,Array4,Array6,Array8, andPairTable.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| datasketches/src/hll/sketch.rs | Adds public HllSketch::size() that aggregates per-mode heap size with size_of::<Self>(). |
| datasketches/src/hll/container.rs | Adds Container::heap_size() counting backing coupon slice. |
| datasketches/src/hll/aux_map.rs | Adds AuxMap::heap_size() counting backing entries slice. |
| datasketches/src/hll/array4.rs | Adds Array4::heap_size() summing main bytes and optional aux map. |
| datasketches/src/hll/array6.rs | Adds Array6::heap_size() returning packed-bytes length. |
| datasketches/src/hll/array8.rs | Adds Array8::heap_size() returning per-slot bytes length. |
| datasketches/src/cpc/sketch.rs | Adds public CpcSketch::size() summing sliding window and surprising-value-table heap. |
| datasketches/src/cpc/pair_table.rs | Adds PairTable::heap_size() counting slots storage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let heap_size = self.sliding_window.len() | ||
| + self | ||
| .surprising_value_table | ||
| .as_ref() | ||
| .map(|t| t.heap_size()) | ||
| .unwrap_or(0); |
| /// Returns the size of the heap allocations in bytes | ||
| pub fn heap_size(&self) -> usize { | ||
| self.slots.len() * std::mem::size_of::<u32>() | ||
| } |
|
Call it
|
Makes sense to me, just to make a couple of naming suggestions: |
A bit repeated? |
OK, I can push a commit with the rename. Just to make sure: we want both |
That's correct. |
Introduce a
size()method that returns the memory footprint of a sketch. Add implementations for HLL, CPC.If the approach is what we're looking for I can add implementations for all sketches, either here or in a follow up PR.
Addresses #135.