This tool rewrites a coding DNA sequence using synonymous codon substitutions to reduce
predicted cryptic splice sites, while leaving the encoded amino acid sequence completely
unchanged. Two prediction back-ends are available — choose via the toggle in the input
panel.
What is SpliceAI?
SpliceAI is a deep convolutional neural network trained on the entire human genome to
predict, at single-nucleotide resolution, the probability that any given position is a
splice donor or splice acceptor. Unlike rule-based tools it captures
complex long-range sequence context — up to 10,000 nt on each side of
every query position — including the natural splicing landscape of the surrounding
plasmid sequence.
How the prediction pipeline works
-
Plasmid ring. Your plasmid GenBank file is treated as a
circular sequence. The ORF is virtually inserted at the coordinate you
specify, and the plasmid is "unrolled" around that point so that the flanking genomic
context is always available — even when the insertion sits near position 0 or the end
of the sequence.
-
10 k window. A slice of exactly
FLANK (10 000 nt) + ORF + FLANK (10 000 nt) is cut out of the ring.
This is the input to the CNN: the ORF surrounded by 10 kb of real plasmid sequence on
each side.
-
One-hot encoding → CNN. Each nucleotide is converted to a 4-channel
one-hot vector (A/C/G/T). The resulting tensor is fed through the SpliceAI-10000
residual dilated-convolution network, which outputs two per-position probability
tracks: P(donor) and P(acceptor).
-
Splice risk score. Only the positions inside the ORF are
inspected. The optimizer takes
risk = max( max P(donor), max P(acceptor) ) across those positions as a
single scalar representing the worst predicted cryptic site in the insert.
What inputs to provide
-
Amino acid sequence — single-letter codes (e.g.
MKVQVE…). No stop codon needed. The tool reverse-translates this to DNA
using human-optimised codon frequencies before starting.
-
Plasmid GenBank file (.gbk) — the backbone into which the ORF will
be cloned. Multiple files can be stored in the browser for quick switching between
experiments. The file is cached locally and never sent to any server.
-
Insertion coordinate (1-based) — the nucleotide position on the
plasmid where the first base of the ORF will be inserted. This determines which
flanking sequence the CNN sees and is critical for accurate predictions.
How the optimizer works
Starting from the reverse-translated sequence, the algorithm iterates:
- Locate the top splice-risk hotspot(s) inside the ORF.
-
Tier 1 — try all synonymous codons at the hotspot codon (C0); score
each with SpliceAI.
-
Tier 2 — if Tier 1 finds no improvement, widen the search to a
±1 codon window (C−1, C0, C+1) and evaluate synonymous combinations.
-
Tier 3 — if still stuck, extend to ±2 codons (C−2 … C+2).
- Accept the best-scoring change; repeat from step 1.
The optimizer stops when the risk drops below the threshold (0.05 by default), no
synonymous change in any tier can further reduce it, or the iteration limit is reached.
All evaluated sequences are cached to avoid revisiting the same codon pattern.
Codon choices are additionally weighted by the human Codon Adaptation Index (CAI) so
that high-expression codons are preferred when multiple synonyms give equal splice risk.
What is ASSP?
ASSP (Alternative Splice Site Predictor) is a rule-based web service that scores
candidate splice sites using position weight matrices. It is used here as a
simpler alternative when running the local SpliceAI model is not practical.
Predictions are made by sending your sequence to an external server
(wangcomputing.com), so an internet connection is required.
What is being optimized?
ASSP returns a table of predicted donor/acceptor sites. The optimizer focuses on
undesired sites:
- Constitutive donors (always unwanted), or
- Sites with a score above the configured score threshold, or
-
Intermediate-score sites (score 5.0 – threshold) with confidence above the
confidence threshold.
The global cost is (number of undesired sites, maximum score); fewer sites
wins, then lower max-score wins.
Search strategy
The same three-tier codon window approach is used, but each candidate sequence requires
a round-trip to the ASSP web service, making each iteration significantly slower than
the local SpliceAI mode.
What is and is not changed (both modes)
- Only synonymous codon changes are made — the amino acid sequence is always preserved.
- The start codon and reading frame are never altered.
- Codon choices favour human-preferred codons when splice risk is equal.