Regex Riddles: Puzzles and Perils of Patterns

Deepdive Posts

So for email marketing, I initially built a custom coze agent for automatic email workflows. But after some time, I decided to transition to an existing solution because they offer more mature systems, integrated functions, and a one-stop approach. Through extensive research, I came across Instantly and have been building my email campaigns on their …

Email Marketing 101 Read More »

How to make LLM draw charts stably according to the given data and HTML template

Give the HTML template + data in a certain format👉llm+Prompt Engineering👉Generate an HTML and then render it in the browser or other methods Step by step breakdown here: https://b2wh18277d.feishu.cn/docx/RX0bdTlTBoy7vBxcQSpckTepn8f?from=from_copylink

Graduation Thesis Project: Research on the Application of Large Language Model-based Intelligent Workflows in Chinese-English Translation of Professional Medical Texts

Final Product on Coze store: https://www.coze.com/store/agent/7476279771824095237?bot_id=true This study explores how to leverage the low-code platform Coze to construct an intelligent workflow for enhancing the quality of English translations of medical texts generated by large language models (LLMs). Given the numerous dimensions for quality evaluation and the specific issues in LLM translations observed by the author …

Graduation Thesis Project: Research on the Application of Large Language Model-based Intelligent Workflows in Chinese-English Translation of Professional Medical Texts Read More »

Placeholder for DailyNova

UX Project: Design a Reminder App with Figma

View the project via this link: https://www.figma.com/design/cqfy6b1lahYRcFUDLosTm1/Reminder-YX?node-id=0-1&t=v2bbZe2ZHwvy9AH1-1

Social Media, Marketing, and PR project

I created the commercial plan and storyboard for a 30-second mock Cybertruck commercial set in a high-tech, futuristic spaceship environment, highlighting its advanced features. The scenes include the spaceship docking at a space station, excited people rushing to see the Cybertruck, its reveal with atmospheric lights, showcasing its towing capability, high-speed drift on a Mars-like …

Social Media, Marketing, and PR project Read More »

PR Project: New Product Pitch for Nature’s Bounty

I conducted an in-depth competitor analysis and comprehensive user profiling, utilizing various market research tools and methodologies such as hootsuite and other social listening tools to ensure the feasibility of our pitch and establish a solid foundation for achievable KPIs. I organized and led strategic meetings, using project management software to align goals and expectations …

PR Project: New Product Pitch for Nature’s Bounty Read More »

Where to go from here? The Localization Industry Outlook

Let’s look at the statistics: LinkedIn Industry Ranking (2023): Localization ranks 147th out of 535 industries, placing it in the top 28%. There are 603,700 individual profiles and 21,000 Localization accounts on LinkedIn, making up 3% of personal accounts, indicating significant engagement in the Localization industry. Top Countries for Personal Profiles: United States (US), China …

Where to go from here? The Localization Industry Outlook Read More »

Terminology Journal

Terminology GPT Module: First Contact Outside ClassSubject Field: Artificial Intelligence, Language ServicesDefinition: GPT (Generative Pre-trained Transformer) refers to a series of language processing AI models developed by OpenAI. These models, particularly the latest versions like GPT-3 and GPT-4, are capable of generating human-like text and are used in applications ranging from writing assistance to language …

Terminology Journal Read More »

Continuous Website Localization Procedure

Regex Riddles: Puzzles and Perils of Patterns

History of Regular Expressions 1950s: The concept of regular expressions was first formalized by mathematician Stephen Cole Kleene. He used regular expressions to describe the syntax of formal languages.1960s: Ken Thompson, the creator of Unix, implemented regular expressions in the QED text editor. This marked the introduction of regex into the world of computing.1980s: Henry …

Regex Riddles: Puzzles and Perils of Patterns Read More »

Trados Terminology: A Tutorial

ChatGPT and the future of translation

Mixed Picture ChatGPT’s role in translation is neither overwhelmingly useful nor entirely destructive. Its power largely lies in people’s perception of its capabilities. As Stanford Professor Andrew Ng suggests, ChatGPT is more effective as a developer tool than as a standalone solution for generating human-like text. Translation Capabilities and Limitations Moderate Usability: ChatGPT excels in …

ChatGPT and the future of translation Read More »

Regular Expressions, commonly known as Regex or Regexp, are powerful tools for pattern matching and text manipulation. They have a rich history and find applications across various fields. In this tutorial, we'll explore the history, categories, grammar, and practical applications of Regular Expressions.

History of Regular Expressions

1950s: The concept of regular expressions was first formalized by mathematician Stephen Cole Kleene. He used regular expressions to describe the syntax of formal languages.
1960s: Ken Thompson, the creator of Unix, implemented regular expressions in the QED text editor. This marked the introduction of regex into the world of computing.
1980s: Henry Spencer developed one of the first regex libraries, widely used in Unix systems.
1990s: Perl, a popular programming language, integrated regex as a core feature, making it accessible to a broader audience.
2000s: Regex support expanded to numerous programming languages, including Python, JavaScript, and Ruby.

Grammar of Regular Expressions

Elements

Literal Characters: Any character not listed as a metacharacter matches itself (e.g., ‘a’ matches ‘a’).
Metacharacters: Special characters with reserved meanings (e.g., ‘.’ matches any character).
Character Classes: Define sets of characters (e.g., ‘[aeiou]’ matches any vowel).
Quantifiers: Indicate the number of times a character or group should be repeated (e.g., ‘*’ matches zero or more times).
Anchors: Specify positions within the text (e.g., ‘^’ matches the start of a line).
Groups and Alternation: Use parentheses to group expressions and ‘|’ to denote alternatives (e.g., ‘(cat|dog)’ matches ‘cat’ or ‘dog’).
For a comprehensive understanding of regex grammar and operations, you can refer to the Wikipedia page on Regular Expressions.

Common Regex Operations

Boolean “or” (|): Separates alternatives. For example, gray|grey can match “gray” or “grey.”
Grouping (()): Groups a series of pattern elements to a single element. Allows referencing matched patterns using variables (e.g., $1, $2).
Quantification (?, *, +, {M}, {M,}, {,max}, {min,max}): Specifies how many times the preceding element is allowed to repeat.
Wildcard (.): Matches any character.
Character Classes ([…]): Denotes a set of possible character matches.
Alternation (|): Separates alternate possibilities.
Word Boundary (\b): Matches a zero-width boundary between a word-class character and either a non-word class character or an edge.
Whitespace (\s) and Non-Whitespace (\S) Matches: For spaces and non-spaces.
Digit (\d) and Non-Digit (\D) Matches: For digits and non-digits.
Line Begin (^) and Line End ($) Matches: For the beginning and end of a line or string.
These operations provide powerful tools for constructing complex patterns.

Applications of Regular Expressions

Text Search and Manipulation
Searching for specific patterns or keywords in text documents.
Replacing text with desired formats (e.g., date formatting, email address extraction).
Data Validation and Extraction
Validating user input (e.g., email validation, password strength checks).
Extracting data from structured text (e.g., log files, CSV data).
Web Development
Form validation and input sanitization in web applications.
URL routing and parameter extraction.
Programming and Scripting
Pattern matching and data extraction in programming languages.
Log file parsing and analysis.
Data Science and Natural Language Processing
Tokenization and text preprocessing in NLP tasks.
Data extraction and transformation in data pipelines.

Some Application Examples (in Trados Studio):

Identify HTML tag elements
Regex: <(\/?)([a-zA-Z]+)([^>]*?)>
Use Case: Ensure consistency in XML/HTML tags within the content, especially for website localization.
Scenario: Suppose linguists are presented with raw HTML elements in their translation interface, a situation that is not uncommon in platforms like Crowdin. The primary objective is to ensure that these HTML tags remain unchanged during translation. However, we encounter several issues: In the source text, the linguist has mistakenly translated the ‘id’ attribute of a ‘div’ tag on line 1. On line 3, the tag is altered, leading to corruption. By line 6, the closing part of a tag is missing. This scenario underscores the importance of selecting and highlighting all tag elements, which can significantly aid a Quality Assurance Manager in identifying such errors. Ensuring the integrity of HTML tags is crucial for the smooth functioning of a localized website and to prevent it from crashing upon initial launch.

Removing Extra Whitespaces
In Trados
Find \s{2,}
Replace with a single whitespace.
\s or a simple whitespace

History of Regular Expressions

Grammar of Regular Expressions

Applications of Regular Expressions

Some Application Examples (in Trados Studio):

Leave a Comment Cancel Reply