GitHub CodeQL is a semantic code analysis engine that treats code as queryable data.
The tool builds a database representation of your codebase, enabling sophisticated queries that track data flow across functions, files, and modules.
Natively integrated into GitHub Advanced Security, CodeQL powers code scanning for millions of repositories.
What is CodeQL?
CodeQL works differently from pattern-matching SAST tools.
Rather than searching for text patterns, CodeQL compiles source code into a relational database that captures the semantic structure: variables, functions, control flow, data flow, and type information.
Security researchers then write queries in the CodeQL query language to find vulnerabilities by describing the characteristics of insecure code patterns.
This approach enables detection of complex vulnerabilities that span multiple files and function calls.
For example, CodeQL can trace user input from an HTTP request through multiple transformation functions to a SQL query, identifying injection vulnerabilities that pattern-based tools miss.
Key Features
Semantic Code Analysis
CodeQL understands code structure rather than just text patterns.
The analysis engine builds a complete database including:
- Abstract syntax trees for every file
- Control flow graphs showing execution paths
- Data flow graphs tracking value propagation
- Type hierarchies and inheritance relationships
- Call graphs connecting function invocations
This semantic understanding enables queries that ask questions like “find all paths from user input to database queries” rather than simple pattern matches.
Data Flow and Taint Tracking
The taint tracking engine follows potentially dangerous data through your codebase.
Starting from sources (user input, file reads, network data) and ending at sinks (database queries, command execution, file writes), CodeQL identifies paths where untrusted data reaches sensitive operations without proper sanitization.
/**
* @name SQL injection from user input
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.TaintTracking
import semmle.code.java.security.SqlInjection
from SqlInjectionConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "SQL injection vulnerability from $@.",
source.getNode(), "user input"
Custom Query Development
Security teams can write custom CodeQL queries for organization-specific security requirements.
The query language resembles SQL with object-oriented extensions, making it approachable for developers familiar with database queries.
Common use cases for custom queries:
- Detecting use of banned functions or deprecated APIs
- Enforcing authentication checks on sensitive endpoints
- Finding missing input validation patterns
- Identifying violations of internal security standards
GitHub Native Integration
On GitHub repositories, CodeQL runs automatically through GitHub Actions.
Results appear directly in pull requests as security alerts, allowing developers to fix issues before merging.
The integration includes:
- Automatic analysis on push and pull request events
- Inline annotations showing vulnerability locations
- Suggested fixes for common vulnerability patterns
- Security overview dashboards for organizations
Installation and Setup
GitHub Repository Setup
Enable CodeQL scanning through repository settings or by adding a workflow file.
# .github/workflows/codeql.yml
name: "CodeQL"
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
schedule:
- cron: '0 6 * * 1' # Weekly Monday 6 AM
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: ['java', 'javascript', 'python']
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
queries: security-and-quality
- name: Autobuild
uses: github/codeql-action/autobuild@v3
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{ matrix.language }}"
Local CLI Installation
For local development and custom query testing, install the CodeQL CLI.
# Download and extract CodeQL CLI
wget https://github.com/github/codeql-cli-binaries/releases/latest/download/codeql-linux64.zip
unzip codeql-linux64.zip
# Add to PATH
export PATH="$PATH:$(pwd)/codeql"
# Verify installation
codeql --version
# Clone standard query packs
git clone https://github.com/github/codeql.git codeql-queries
Creating a Database
# Create database for a Java project
codeql database create my-java-db \
--language=java \
--source-root=/path/to/project \
--command="./gradlew build"
# Create database for Python (no build needed)
codeql database create my-python-db \
--language=python \
--source-root=/path/to/project
Running Queries
# Run security queries against database
codeql database analyze my-java-db \
codeql-queries/java/ql/src/Security \
--format=sarif-latest \
--output=results.sarif
# Run a specific query
codeql query run \
codeql-queries/java/ql/src/Security/CWE/CWE-089/SqlInjection.ql \
--database=my-java-db
Integration
GitHub Actions (Advanced)
name: CodeQL Advanced Analysis
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: java
queries: +security-extended,security-and-quality
config-file: .github/codeql/codeql-config.yml
- name: Build with Maven
run: mvn clean package -DskipTests
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
- name: Upload SARIF to third-party tool
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
GitLab CI
codeql-analysis:
stage: security
image: github/codeql-action
script:
- codeql database create db --language=python --source-root=.
- codeql database analyze db --format=sarif-latest --output=codeql-results.sarif
- codeql github upload-results --sarif=codeql-results.sarif
artifacts:
reports:
sast: codeql-results.sarif
rules:
- if: $CI_COMMIT_BRANCH == "main"
When to Use CodeQL
CodeQL excels at finding complex vulnerabilities that require understanding program semantics.
The data flow analysis catches injection vulnerabilities, authentication bypasses, and security logic flaws that pattern-based tools miss.
Consider CodeQL when you need:
- Deep semantic analysis beyond simple pattern matching
- Custom security rules for organization-specific requirements
- Native GitHub integration with pull request annotations
- Taint tracking across function and file boundaries
Teams not using GitHub may face additional setup complexity compared to GitHub-hosted repositories.
The query language has a learning curve, though the standard query packs cover most common vulnerability types without custom development.
For organizations requiring commercial support or additional languages, alternatives like Semgrep or Checkmarx may be worth evaluating alongside CodeQL.
Note: Replaces LGTM.com which was deprecated and merged into CodeQL
