Bulk Translate

Placeholder Preservation

How Bulk Translate keeps your variables and format specifiers intact across languages.

When translating strings that contain variables — UI copy, email templates, code strings, or internationalization keys — you need placeholders to survive translation completely unchanged. Bulk Translate provides two independent layers of protection to ensure this.

Supported Patterns

Three common placeholder formats are automatically detected and preserved:

Pattern Example Common In
{variable} Hello {name} React, Python, C#, ICU message format
{{variable}} Welcome {{user}} Mustache, Handlebars, Jinja, Angular
%s, %d, %@ Count: %d, Name: %s C printf, Objective-C, Python (legacy)

How It Works

Bulk Translate uses a defense-in-depth approach with two independent mechanisms. If either one fails, the other catches it.

Layer 1 — System Prompt Instructions

The LLM receives explicit instructions in its system prompt:

"Preserve these placeholders exactly as they appear — do not translate or modify them: {placeholder}, {{double-brace}}, %s, %d, %@"

Most modern LLMs follow this instruction reliably. The prompt also tells the model to return only the translated text with no preamble or explanation, further reducing the chance of the model "helpfully" modifying placeholders.

Layer 2 — Client-Side Tokenization

This is the safety net. Before text is sent to the LLM:

  1. Detect all placeholders matching the three supported patterns
  2. Replace each placeholder with a unique, non-colliding token (e.g., \x00PH0\x00)
  3. Send the tokenized text to the LLM for translation
  4. Restore the original placeholders from the token map in the LLM's response

The tokens use null-byte delimiters, which never appear in natural text and won't be modified by any LLM. This guarantees that placeholders come back exactly as they were sent.

Examples

Single Curly Braces

Input:

Hello {name}, your order #{orderId} is ready

Tokenized (sent to LLM):

Hello \x00PH0\x00, your order #\x00PH1\x00 is ready

LLM output (Spanish):

Hola \x00PH0\x00, tu pedido #\x00PH1\x00 está listo

Final (after reinsertion):

Hola {name}, tu pedido #{orderId} está listo

Format Specifiers

Input:

Found %d results for "%s" in %d seconds

French output:

%d résultats trouvés pour "%s" en %d secondes

The %d and %s specifiers pass through unchanged. Only the surrounding English text is translated to French.

Mixed Placeholders

Input:

Dear {customer}, your {{plan}} plan has %d days remaining

Japanese output:

{ customer }様、{{ plan }}プランの残り日数は% d日です

All three placeholder types survive across languages — the system works regardless of the source or target language pair.

Edge Cases

Escaped braces: \{ and \} are treated as literal characters, not placeholder delimiters. This is handled correctly in most cases, though very unusual brace patterns may occasionally trigger false positives in the tokenizer.

Nested patterns: {{outer {inner}}} is treated as a single {{...}} placeholder with inner content. The inner curly brace is not independently tokenized.

Placeholder count mismatch: If the LLM somehow adds or removes tokens from the response, the reinsertion step will fail gracefully — the tokens that remain in the response will be replaced, and any missing original placeholders will simply not appear. This is vanishingly rare with modern LLMs.

Recommendations

  • Keep placeholders semantic{userName} is clearer than {v1} for both human readers and LLMs
  • Test your templates — run a few sample strings through before committing to a large batch
  • Review outputs — spot-check translations that contain placeholders, especially for languages with different word order (Japanese, Korean, Arabic)