Build Custom Sorts with PySort: Use Cases and Examples

PySort: A Fast and Simple Python Sorting Library

Overview

PySort is a lightweight Python library that provides fast, easy-to-use sorting utilities focused on clarity, performance, and small-memory footprints. It offers several stable, well-tested algorithms, convenient APIs for common use cases, and sensible defaults that produce good performance without complex tuning.

Key Features

  • Multiple algorithms: quicksort-like hybrid, stable merge-based sort, and an in-place introsort variant.
  • Simple API: one-line calls for common tasks, customizable comparator or key functions.
  • Stable defaults: chooses a stable algorithm when stability matters; opts for faster in-place sorts when memory is constrained.
  • Optimized for Python: minimizes Python-level overhead, uses Timsort-inspired techniques for real-world lists.
  • Small footprint: pure-Python core with optional C-extension for large-scale workloads.

Installation

Install with pip:

Code

pip install pysort

Quick Start

Basic usage — sorting a list of numbers:

python

from pysort import sort data = [5, 2, 9, 1, 5, 6] sorted_data = sort(data) # sorteddata -> [1, 2, 5, 5, 6, 9]

Sort in-place:

python

from pysort import sort_inplace arr = [3, 1, 4] sortinplace(arr) # arr -> [1, 3, 4]

Using a key function:

python

from pysort import sort words = [“apple”, “Banana”, “cherry”] sortedwords = sort(words, key=str.lower)# case-insensitive

Custom comparator (for complex ordering):

python

from pysort import sort def cmp(a, b): return (a > b) - (a < b) # basic numeric comparator sorted_data = sort(data, cmp=cmp)

API Reference (selected)

  • sort(iterable, *, key=None, cmp=None, reverse=False, stable=True)
    • Returns a new sorted list. If both key and cmp provided, cmp takes precedence.
  • sort_inplace(list_obj, *, key=None, cmp=None, reverse=False, stable=False)
    • Sorts a list in place. Defaults to faster in-place algorithm; set stable=True to use stable algorithm.
  • partial_sort(iterable, k, *, key=None, reverse=False)
    • Returns the k smallest (or largest if reverse=True) elements efficiently.
  • merge_sorted(*iterables, *, key=None)
    • Merge multiple already-sorted iterables into a single sorted iterator.

Performance Notes

  • Default behavior mirrors practical workloads: Timsort-inspired run detection and galloping for nearly-sorted inputs.
  • For random large arrays, the C-extension (if installed) provides up to 2–4x speedup over the pure-Python core.
  • Use partialsort for top-k problems to avoid full sorts when k << n.

Memory Usage

  • In-place sorts use O(log n) auxiliary space (stack for recursion or iteration).
  • Stable merge-based sorts use O(n) extra memory; the library automatically selects algorithms based on the stable flag and available memory.

When to Use PySort

  • You want a drop-in, readable sorting library with sensible defaults.
  • You need top-k utilities or efficient merge of sorted streams.
  • You require explicit control over stability or memory usage without rewriting algorithms.

Example: Sorting Complex Records

python

from pysort import sort records = [ {“name”: “Alice”, “age”: 30}, {“name”: “Bob”, “age”: 25}, {“name”: “Charlie”, “age”: 30}, ] # sort by age, then name sorted_records = sort(records, key=lambda r: (r[“age”], r[“name”]))

Advanced Tips

  • For very large datasets that don’t fit in memory, use merge_sorted with chunked, externally-sorted files.
  • Benchmark with your actual data; PySort’s hybrid strategies favor practical input patterns (partially-sorted, repeated keys).
  • If stability is required (stable sort across equal keys), pass stable=True to ensure merge-based algorithm is used.

Contributing and Roadmap

  • Open-source on GitHub: issues and pull requests welcome.
  • Planned: improved parallel sort for multi-core CPUs, more C optimizations, and additional helper utilities for streaming data.

License

PySort is MIT-licensed.

Example Benchmarks (rough)

  • Small lists (n < 1000): comparable to built-in sorted().
  • Medium lists (1k–1M): up to 10–30% faster on real-world inputs due to run detection.
  • Very large lists (>1M) with C-extension: 2–4x faster vs pure-Python.

If you want, I can produce a README-ready example repo with usage examples, benchmarks, and CI config.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *