use nodetool cfhistograms to find the Keyspace/table with higher latency.
use nodetool toppartitions to extract the specific partition.
Large Partition-Row-Cell Check
Large parition-row-cell are detected at compaction time
search the system logs for scylla reporting large partitions/rows/cells
Check the local system tables
select * from system.large_partitions;
select * from system.large_rows;
select * from system.large_cells;
Single Node Check
Monitoring CPU / OS / I/O
Check the system logs for Scylla reporting
Errors
Stalls
Large allocation/ bad_alloc
Check the system log for os level errors (OOMKiller / disk errors)
Memory Management
Healthy System
Usually most of the memory LSA - cache and memtables
When LSA memory drops usually it means we had to evict it for other items
Large Allocartions
Scylla tries to optimise memory usage - large (contiguous) allocations are bad:
They are costly to allocate - Many times it involves freeing a lot of items to reach a point in which we have a large contiguous allocation (at worst case all LSA will need to be evicted)
Large Allocations are reported to journal (like stalls)
bad allocs
sometimes scylla is not able to allocate memory (especially if it is a large allocation)
in some cases it is a transient issue and in other cases we need to analyse