Routinely collected data has been increasingly used to evaluate and monitor long-term opioid therapy (LTOT) patterns, with very little guidance on how to measure LTOT from these data sources. We conducted a systematic review of studies published between Jan 2000 and Jul 2019 to catalogue LTOT definitions, the rationale for definitions, and LTOT rates in observational research using routinely collected data in non-surgical settings. We screened 4,056 abstracts, 209 full-text manuscripts and included 126 studies; mostly from the US (82%) and published between 2015 and 2019 (68%). We identified 78 definitions of LTOT, commonly operationalised as 90 days of use within a year (21%). Studies often used multiple criteria to derive definitions (63%), mostly based on measures of duration, such as supply days/days of use (67%), episode length (21%), or prescription fills within specified time periods (9%). Definitions were based on previous publications (63%), clinical judgment (17%), or empirical data (3%); 10% of studies applied more than one definition. LTOT definition was not provided with enough details for replication in 14 studies and 37 studies did not specify the opioids evaluated. Rates of LTOT ranged from 0.2% to 57% according to study design, population and definition used. We observed a substantial rise in studies evaluating LTOT with large variability in the definitions used and poor reporting of the rationale and implementation of the definitions. This variation impacts on research reproducibility, comparability of findings and the development of strategies aiming to curb therapy that is not guideline-recommended.