Methodology¶
CreativeDynamics Library v0.9.8.1
The library employs techniques from Rough Path Theory to analyse time-series data. Core concept: calculating mathematical signatures of a data path and measuring distance between signatures over time for change-point detection.
Rough Path Signatures¶
Rough path signature: mathematical object capturing geometric features of a path (time series) through a hierarchy of Lie increments. Provides rich, non-linear summary of path evolution as a powerful feature extraction tool.
Key Properties:
Robust to re-parameterisation: Depends on geometric shape, not traversal speed
Faithful representation: Under mild conditions, uniquely determines the path up to tree-like equivalences
Universal approximators: Truncated signatures approximate any continuous function on path space
Mathematical Foundation: Implements signatures using Lie increments rather than tensor products for computational efficiency whilst maintaining mathematical rigour. For d-dimensional path X:[0,T] → ℝ^d, signature S(X) is computed as a sequence of Lie algebra elements encoding geometric features.
Implementation Details:
Uses
roughpylibrary with Lie increment computationPaths normalised to [0,1] interval before signature computation
Signature depth controls geometric detail level (default depth=4)
Computational complexity: O(T·d²) for fixed window size w
Signature Calculation¶
Uses roughpy library for path signature calculation:
Accuracy: Well-tested library providing correct signature computations
Efficiency: Optimised C++ backend for performance
Standardisation: Standard, community-accepted tool
Primary module: creativedynamics.core.signature_calculator.
Path Construction and Normalisation¶
Specific normalisation procedure ensures numerical stability and consistent signature computation:
Two-dimensional Path Construction: For each metric, constructs 2D path X(t) = (t_norm, y_norm) where:
t_norm ∈ [0,1]: normalised time coordinate
y_norm ∈ [0,1]: normalised metric value
Normalisation Procedure:
t_norm = (t - t_min) / (t_max - t_min) y_norm = (y - y_min) / (y_max - y_min + ε)
where ε = 10^-8 prevents division by zero for constant metrics.
Signature Parameters:
Depth: Controls Lie increments level (default=4)
Window Size (w): Consecutive data points per window (default=7)
Sliding Step: Windows slide by one time point for detailed analysis
Normalisation ensures signatures from different time periods and metrics are comparable, essential for distance-based change point detection.
Signature Distance and Change Point Detection¶
Sliding window approach detects changes in time series patterns:
Window-based Signature Computation: For time series of length T, computes signatures for overlapping windows of size w.
Distance Calculation: Euclidean distance between consecutive window signatures:
d_t = ||S_t - S_{t-1}||_2
where S_t is the signature of window t.
Statistical Thresholding: Change points detected when distance exceeds:
threshold = μ_d + k·σ_d
where μ_d and σ_d are mean and standard deviation of all distances, k is threshold multiplier (default k=1.5).
Computational Efficiency: Overall complexity O(T·d²) for fixed window size w, efficient for real-time analysis.
Applications of Signatures in the Library¶
Primary built-in application within creativedynamics.core.analyzer module: change-point detection.
Four-Phase Analysis Process¶
Detailed four-phase analysis pipeline:
Phase 1: Change Point Detection
Computes sliding window signatures across time series
Calculates signature distances between consecutive windows
Identifies statistically significant change points using adaptive thresholding
Output: List of change points segmenting time series
Phase 2: Segment Analysis
Divides time series into segments based on detected change points
Computes segment statistics (mean, variance, trend)
Classifies segment trends as ‘Stable’, ‘Improving’, or ‘Declining’
Output: Characterised segments with trend classifications
Phase 3: Benchmark Calculation
Identifies longest stable or improving segment
Computes benchmark values from optimal performance periods
Validates benchmark reliability based on segment duration
Output: Benchmark values for impact calculation
Phase 4: Impact Quantification
Calculates impact during declining periods
Quantifies
actual_overspend_gbp(financial inefficiency) andengagement_gap_clicks(operational impact)Provides correlation risk context; metrics are reported separately and not combined
Output: Operational and financial impact of performance degradation (reported separately)
Implemented in creativedynamics.core.analyzer module with configurable parameters for each phase.
Visual Representation¶
Visual reports for change-point analysis include:
Upper Chart: Original time-series metric(s)
Lower Chart: Calculated signature distances over time with significance threshold line and vertical markers for detected change points
Theoretical Properties and Advantages¶
Signature-based approach provides theoretical guarantees and practical advantages:
Theoretical Properties:
Consistency: Change point detection is statistically consistent under mild conditions
Convergence: Signature distances converge to true pattern distance as window size increases
Invariance: Detection invariant to monotonic time transformations
Practical Advantages:
Early Detection: Captures subtle pattern changes before manifesting in aggregate metrics
Non-Linearity: Naturally handles non-linear dynamics and complex interactions
Robustness: Resistant to outliers due to integral-based computation
Interpretability: Signature distances have clear geometric interpretation
Performance Characteristics:
Precision-Recall Trade-off: Controlled by threshold multiplier k
Default Settings: k=1.5 provides balanced precision (~0.7) and recall (~0.6)
Computational Efficiency: Linear in time series length for fixed window size
Multi-Dimensionality¶
“Multi-dimensionality” is key:
Path Dimensionality: Input data is often multi-dimensional (e.g., time, metric A, metric B)
Signature Dimensionality: Signature is a high-dimensional vector (or tensor), where each term captures different aspects of path geometry
Multi-dimensional approach allows detailed characterisation of time series compared to methods analysing each metric in isolation or considering only simple trends.
Data Preparation and Column Naming Conventions¶
The library standardises all column names to lowercase throughout the processing pipeline, simplifying the codebase by eliminating case-sensitivity issues and reducing complexity:
Column Transformation: Column names transformed to lowercase immediately after CSV ingestion
Consistent Usage: Only lowercase column names used throughout entire analysis pipeline
Key Column Names: Standard names include
day,link_clicks,amount_spent_gbp,impressions,cpc, andctr
Library Entry Points¶
Two primary entry points for analysis:
CLI Entry Point (
cli.py): Recommended for production use. Uses YAML configuration files with nested column mapping structure for flexible and maintainable configuration.Script Entry Point (
run_analysis.py): Alternative entry point using flat JSON mapping files. Maintained for backward compatibility but may be deprecated in future versions.
For new implementations, CLI entry point with YAML configuration is the standard approach.