Propensity score methods are widely used in comparative effectiveness research using claims data. In this context, the inaccuracy of procedural or billing codes in claims data frequently misclassifies patients into treatment groups, that is, the treatment assignment (⁠|$T$|⁠) is often measured with error. In the context of a validation data where treatment assignment is accurate, we show that misclassification of treatment assignment can impact three distinct stages of a propensity score analysis: (i) propensity score estimation; (ii) propensity score implementation; and (iii) outcome analysis conducted conditional on the estimated propensity score and its implementation. We examine how the error in |$T$| impacts each stage in the context of three common propensity score implementations: subclassification, matching, and inverse probability of treatment weighting (IPTW). Using validation data, we propose a two-step likelihood-based approach which fully adjusts for treatment misclassification bias under subclassification. This approach relies on two common measurement error-assumptions; non-differential measurement error and transportability of the measurement error model. We use simulation studies to assess the performance of the adjustment under subclassification, and also investigate the method’s performance under matching or IPTW. We apply the methods to Medicare Part A hospital claims data to estimate the effect of resection versus biopsy on 1-year mortality among |$10\,284$| Medicare beneficiaries diagnosed with brain tumors. The ICD9 billing codes from Medicare Part A inaccurately reflect surgical treatment, but SEER-Medicare validation data are available with more accurate information.