A leading AI lab published a paper describing new interpretability techniques for large language models. The work maps internal features to observable behaviors.
Independent researchers are now replicating the results. The findings could inform auditing practices for production deployments.