Datasets
Open datasets released by the UW Math AI Lab. All datasets are hosted on Hugging Face.
TheoremSearch
About →uw-math-ai/theorem-search-dataset
cc-by-sa-4.01.3M+ permissively licensed theorem statements extracted from arXiv and seven mathematical sources, with slogans generated by DeepSeek V3 and Qwen3-Embedding-8B vectors.
uw-math-ai/theorem-search-dataset-permissive
cc-by-4.01.2M+ subset of theorem-search-dataset, strictly covering sources with open or CC-compatible licensing. Suitable for commercial use.
TheoremGraph
About →uw-math-ai/math-graph
cc-by-4.0Unified statement-level dependency graph: 11.7M informal arXiv statements with 18.3M dependency edges, and 388k formal Lean declarations with 11.3M typed edges across six semantic categories.
uw-math-ai/theorem-matching
cc-by-sa-4.047,952 GPT-5.4-affirmed matches linking informal arXiv statements to their formal Lean counterparts, all above a 0.8 cosine similarity floor.