PySort: A Fast and Simple Python Sorting Library
Overview
PySort is a lightweight Python library that provides fast, easy-to-use sorting utilities focused on clarity, performance, and small-memory footprints. It offers several stable, well-tested algorithms, convenient APIs for common use cases, and sensible defaults that produce good performance without complex tuning.
Key Features
- Multiple algorithms: quicksort-like hybrid, stable merge-based sort, and an in-place introsort variant.
- Simple API: one-line calls for common tasks, customizable comparator or key functions.
- Stable defaults: chooses a stable algorithm when stability matters; opts for faster in-place sorts when memory is constrained.
- Optimized for Python: minimizes Python-level overhead, uses Timsort-inspired techniques for real-world lists.
- Small footprint: pure-Python core with optional C-extension for large-scale workloads.
Installation
Install with pip:
Code
pip install pysort
Quick Start
Basic usage — sorting a list of numbers:
python
from pysort import sort data = [5, 2, 9, 1, 5, 6] sorted_data = sort(data) # sorteddata -> [1, 2, 5, 5, 6, 9]
Sort in-place:
python
from pysort import sort_inplace arr = [3, 1, 4] sortinplace(arr) # arr -> [1, 3, 4]
Using a key function:
python
from pysort import sort words = [“apple”, “Banana”, “cherry”] sortedwords = sort(words, key=str.lower)# case-insensitive
Custom comparator (for complex ordering):
python
from pysort import sort def cmp(a, b): return (a > b) - (a < b) # basic numeric comparator sorted_data = sort(data, cmp=cmp)
API Reference (selected)
- sort(iterable, *, key=None, cmp=None, reverse=False, stable=True)
- Returns a new sorted list. If both key and cmp provided, cmp takes precedence.
- sort_inplace(list_obj, *, key=None, cmp=None, reverse=False, stable=False)
- Sorts a list in place. Defaults to faster in-place algorithm; set stable=True to use stable algorithm.
- partial_sort(iterable, k, *, key=None, reverse=False)
- Returns the k smallest (or largest if reverse=True) elements efficiently.
- merge_sorted(*iterables, *, key=None)
- Merge multiple already-sorted iterables into a single sorted iterator.
Performance Notes
- Default behavior mirrors practical workloads: Timsort-inspired run detection and galloping for nearly-sorted inputs.
- For random large arrays, the C-extension (if installed) provides up to 2–4x speedup over the pure-Python core.
- Use partialsort for top-k problems to avoid full sorts when k << n.
Memory Usage
- In-place sorts use O(log n) auxiliary space (stack for recursion or iteration).
- Stable merge-based sorts use O(n) extra memory; the library automatically selects algorithms based on the stable flag and available memory.
When to Use PySort
- You want a drop-in, readable sorting library with sensible defaults.
- You need top-k utilities or efficient merge of sorted streams.
- You require explicit control over stability or memory usage without rewriting algorithms.
Example: Sorting Complex Records
python
from pysort import sort records = [ {“name”: “Alice”, “age”: 30}, {“name”: “Bob”, “age”: 25}, {“name”: “Charlie”, “age”: 30}, ] # sort by age, then name sorted_records = sort(records, key=lambda r: (r[“age”], r[“name”]))
Advanced Tips
- For very large datasets that don’t fit in memory, use merge_sorted with chunked, externally-sorted files.
- Benchmark with your actual data; PySort’s hybrid strategies favor practical input patterns (partially-sorted, repeated keys).
- If stability is required (stable sort across equal keys), pass stable=True to ensure merge-based algorithm is used.
Contributing and Roadmap
- Open-source on GitHub: issues and pull requests welcome.
- Planned: improved parallel sort for multi-core CPUs, more C optimizations, and additional helper utilities for streaming data.
License
PySort is MIT-licensed.
Example Benchmarks (rough)
- Small lists (n < 1000): comparable to built-in sorted().
- Medium lists (1k–1M): up to 10–30% faster on real-world inputs due to run detection.
- Very large lists (>1M) with C-extension: 2–4x faster vs pure-Python.
If you want, I can produce a README-ready example repo with usage examples, benchmarks, and CI config.
Leave a Reply