Research Papers & Books

Research Papers

Date	Keywords	Institute	Paper
1955-08	AI Proposal	Dartmouth College	A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence
2017-06	Transformers	Google	Attention Is All You Need
2018-06	GPT 1.0	OpenAI	Improving Language Understanding by Generative Pre-Training
2018-10	BERT	Google	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019-02	GPT 2.0	OpenAI	Language Models are Unsupervised Multitask Learners
2019-09	Megatron-LM	NVIDIA	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2019-10	T5	Google	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2019-10	ZeRO	Microsoft	ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
2020-01	Scaling Law	OpenAI	Scaling Laws for Neural Language Models
2020-05	GPT 3.0	OpenAI	Language models are few-shot learners
2021-01	Switch Transformers	Google	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-08	Codex	OpenAI	Evaluating Large Language Models Trained on Code
2021-08	Foundation Models	Stanford	On the Opportunities and Risks of Foundation Models
2021-09	Zero-Shot Prompting	Google	Finetuned Language Models Are Zero-Shot Learners
2021-10	T0	HuggingFace et al.	Multitask Prompted Training Enables Zero-Shot Task Generalization
2021-12	GLaM	Google	GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
2021-12	WebGPT	OpenAI	WebGPT: Browser-assisted question-answering with human feedback
2021-12	Retro	DeepMind	Improving language models by retrieving from trillions of tokens
2021-12	Gopher	DeepMind	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
2022-01	COT	Google	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2022-01	LaMDA	Google	LaMDA: Language Models for Dialog Applications
2022-01	Minerva	Google	Solving Quantitative Reasoning Problems with Language Models
2022-01	Megatron-Turing NLG	Microsoft&NVIDIA	Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
2022-03	InstructGPT	OpenAI	Training language models to follow instructions with human feedback
2022-04	PaLM	Google	PaLM: Scaling Language Modeling with Pathways
2022-04	Chinchilla	DeepMind	Training Compute-Optimal Large Language Models
2022-05	OPT	Meta	OPT: Open Pre-trained Transformer Language Models
2022-05	UL2	Google	Unifying Language Learning Paradigms
2022-06	Emergent Abilities	Google	Emergent Abilities of Large Language Models
2022-06	BIG-bench	Google	Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
2022-06	METALM	Microsoft	Language Models are General-Purpose Interfaces
2022-06	JEPA	Meta	A Path Towards Autonomous Machine Intelligence
2022-09	Sparrow	DeepMind	Improving alignment of dialogue agents via targeted human judgements
2022-10	Flan-T5/PaLM	Google	Scaling Instruction-Finetuned Language Models
2022-10	GLM-130B	Tsinghua	GLM-130B: An Open Bilingual Pre-trained Model
2022-11	HELM	Stanford	Holistic Evaluation of Language Models
2022-11	BLOOM	BigScience	BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2022-11	Galactica	Meta	Galactica: A Large Language Model for Science
2022-12	OPT-IML	Meta	OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
2023-01	Flan 2022 Collection	Google	The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
2023-02	LLaMA	Meta	LLaMA: Open and Efficient Foundation Language Models
2023-02	Kosmos-1	Microsoft	Language Is Not All You Need: Aligning Perception with Language Models
2023-03	LRU	DeepMind	Resurrecting Recurrent Neural Networks for Long Sequences
2023-03	PaLM-E	Google	PaLM-E: An Embodied Multimodal Language Model
2023-03	GPT 4	OpenAI	GPT-4 Technical Report
2023-04	LLaVA	UW-Madison&Microsoft	Visual Instruction Tuning
2023-04	Pythia	EleutherAI et al.	Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
2023-05	Dromedary	CMU et al.	Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
2023-05	PaLM 2	Google	PaLM 2 Technical Report
2023-05	RWKV	Bo Peng	RWKV: Reinventing RNNs for the Transformer Era
2023-05	DPO	Stanford	Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023-05	ToT	Google&Princeton	Tree of Thoughts: Deliberate Problem Solving with Large Language Models
2023-07	LLaMA2	Meta	Llama 2: Open Foundation and Fine-Tuned Chat Models
2023-10	Mistral 7B	Mistral	Mistral 7B
2023-12	Mamba	CMU&Princeton	Mamba: Linear-Time Sequence Modeling with Selective State Spaces
2024-01	DeepSeek-v2	DeepSeek	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2024-02	OLMo	Ai2	OLMo: Accelerating the Science of Language Models
2024-05	Mamba2	CMU&Princeton	Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
2024-05	Llama3	Meta	The Llama 3 Herd of Models
2024-06	FineWeb	HuggingFace	The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
2024-09	OLMoE	Ai2	OLMoE: Open Mixture-of-Experts Language Models
2024-12	Qwen2.5	Alibaba	Qwen2.5 Technical Report
2024-12	DeepSeek-V3	DeepSeek	DeepSeek-V3 Technical Report
2025-01	DeepSeek-R1	DeepSeek	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2025-02	Research Platform	OpenReview	OpenReview - the standard and official platform for publishing and reviewing papers in top conferences (ICLR, NeurIPS, and others)
2025-03	Preprint Repository	Cornell University	arXiv (cs.LG/cs.CL) - the primary source of cutting-edge preprints in AI/LLMs
2025-04	Benchmarks & SOTA	Meta AI	Papers with Code - tracking papers, benchmarks, and reproducible state-of-the-art results
2025-05	NLP Proceedings	ACL	ACL Anthology - official repository of NLP/LLM papers from ACL, EMNLP, NAACL, and more
2025-06	Research Discovery	Allen Institute for AI	Semantic Scholar - indexing, discovery, and tracking of relevant AI research papers
2025-07	AI Conference	ICLR	ICLR (International Conference on Learning Representations) - one of the top annual conferences in ML/LLMs with high impact
2025-08	AI Conference	ICML	ICML (International Conference on Machine Learning) - a leading annual global conference in ML and foundation models
2025-08	Research post	Hugging Face	The Smol Training Playbook
2025-09	AI Conference	NeurIPS Foundation	NeurIPS (Conference on Neural Information Processing Systems) - a global benchmark conference for cutting-edge AI advances
2025-10	AI Conference	AAAI	AAAI Conference on Artificial Intelligence - a leading annual international AI conference
2025-11	AI Conference	ACL	ACL (Annual Meeting of the Association for Computational Linguistics) - a flagship conference for NLP and LLM research
2025-12	AI Conference	EMNLP	EMNLP - a top annual conference in applied NLP and recent LLM advances
2026-01	AI Conference	NAACL	NAACL - a top-tier recurring conference in NLP and language models
2026-02	AI Conference	IJCAI	IJCAI (International Joint Conference on Artificial Intelligence) - a historic and globally recognized reference conference in AI
2026-02	Agent Skills	Anthropic	Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality
2026-03	LeWorldModel	Yann LeCun	LeWorldModel: Stable End-to-End JEPA from Pixels
2026-05	Agentic Search	Zhuofeng Li	Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
2026-06	Research post	OpenAI	Dreaming: Better memory for a more helpful ChatGPT
2026-06	Research Post	Thariq Shihipar (Anthropic)	The Unreasonable Effectiveness of HTML.

AI Books

Book	Description	Publication Date
The LLM Engineering Handbook Paul Iusztin & Maxime Labonne	Production-focused guide to RAG, evaluation, deployment, observability, and optimization for real-world AI systems.	2024
AI Engineering Chip Huyen	Practical book by Chip Huyen on building and shipping reliable applications with foundation models.	2025
Designing Machine Learning Systems Chip Huyen	End-to-end treatment of the ML lifecycle, from data and modeling to deployment, monitoring, and scaling.	2022
Building LLMs for Production Louis-François Bouchard & Louie Peters	Focuses on architecture, evaluation, latency, reliability, and deployment for customer-facing LLM products.	2024
Build a Large Language Model (From Scratch) Sebastian Raschka	Hands-on walkthrough of tokenization, embeddings, transformers, training pipelines, and inference in PyTorch.	2024
Hands-On Large Language Models Jay Alammar & Maarten Grootendorst	Project-oriented coverage of embeddings, fine-tuning, retrieval, prompt design, evaluation, and deployment.	2024
Prompt Engineering for LLMs John Berryman	Advanced prompting methods including Chain-of-Thought, ReAct, few-shot prompting, and optimization patterns.	2024
Building Agentic AI Systems Anjanava Anand	Guide to agent architectures, tool use, memory, planning, orchestration, and multi-agent workflows.	2025
Prompt Engineering for Generative AI James Phoenix & Mike Taylor	Practical frameworks to improve reliability and quality of outputs across modern generative AI applications.	2024
The AI Engineering Bible Comprehensive AI Engineering Reference	Broad reference on AI engineering workflows, tools, deployment strategies, and production best practices.	2025