{"id":80,"date":"2023-12-11T19:11:36","date_gmt":"2023-12-11T19:11:36","guid":{"rendered":"https:\/\/mingmingzi.com\/?p=80"},"modified":"2023-12-15T23:34:20","modified_gmt":"2023-12-15T23:34:20","slug":"regex","status":"publish","type":"post","link":"https:\/\/mingmingzi.com\/index.php\/2023\/12\/11\/regex\/","title":{"rendered":"Regex Riddles: Puzzles and Perils of Patterns"},"content":{"rendered":"\n<pre class=\"wp-block-code\"><code>Regular Expressions, commonly known as Regex or Regexp, are powerful tools for pattern matching and text manipulation. They have a rich history and find applications across various fields. In this tutorial, we'll explore the history, categories, grammar, and practical applications of Regular Expressions.<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading has-ast-global-color-1-color has-text-color has-link-color has-medium-font-size wp-elements-009d73ae2af70cbe1e3df1defb883540\"><strong>History of Regular Expressions<\/strong><\/h2>\n\n\n\n<p>1950s: The concept of regular expressions was first formalized by mathematician Stephen Cole Kleene. He used regular expressions to describe the syntax of formal languages.<br>1960s: Ken Thompson, the creator of Unix, implemented regular expressions in the QED text editor. This marked the introduction of regex into the world of computing.<br>1980s: Henry Spencer developed one of the first regex libraries, widely used in Unix systems.<br>1990s: Perl, a popular programming language, integrated regex as a core feature, making it accessible to a broader audience.<br>2000s: Regex support expanded to numerous programming languages, including Python, JavaScript, and Ruby.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-ast-global-color-1-color has-text-color has-link-color has-medium-font-size wp-elements-50054c48102864b55626939c82648b83\"><strong>Grammar of Regular Expressions<\/strong><\/h2>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Elements<\/strong><\/p>\n\n\n\n<p>Literal Characters: Any character not listed as a metacharacter matches itself (e.g., &#8216;a&#8217; matches &#8216;a&#8217;).<br>Metacharacters: Special characters with reserved meanings (e.g., &#8216;.&#8217; matches any character).<br>Character Classes: Define sets of characters (e.g., &#8216;[aeiou]&#8217; matches any vowel).<br>Quantifiers: Indicate the number of times a character or group should be repeated (e.g., &#8216;*&#8217; matches zero or more times).<br>Anchors: Specify positions within the text (e.g., &#8216;^&#8217; matches the start of a line).<br>Groups and Alternation: Use parentheses to group expressions and &#8216;|&#8217; to denote alternatives (e.g., &#8216;(cat|dog)&#8217; matches &#8216;cat&#8217; or &#8216;dog&#8217;).<br>For a comprehensive understanding of regex grammar and operations, you can refer to the Wikipedia page on Regular Expressions.<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Common Regex Operations<\/strong><\/p>\n\n\n\n<p>Boolean &#8220;or&#8221; (|): Separates alternatives. For example, gray|grey can match &#8220;gray&#8221; or &#8220;grey.&#8221;<br>Grouping (()): Groups a series of pattern elements to a single element. Allows referencing matched patterns using variables (e.g., $1, $2).<br>Quantification (?, *, +, {M}, {M,}, {,max}, {min,max}): Specifies how many times the preceding element is allowed to repeat.<br>Wildcard (.): Matches any character.<br>Character Classes ([\u2026]): Denotes a set of possible character matches.<br>Alternation (|): Separates alternate possibilities.<br>Word Boundary (\\b): Matches a zero-width boundary between a word-class character and either a non-word class character or an edge.<br>Whitespace (\\s) and Non-Whitespace (\\S) Matches: For spaces and non-spaces.<br>Digit (\\d) and Non-Digit (\\D) Matches: For digits and non-digits.<br>Line Begin (^) and Line End ($) Matches: For the beginning and end of a line or string.<br>These operations provide powerful tools for constructing complex patterns.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-ast-global-color-1-color has-text-color has-link-color has-medium-font-size wp-elements-4c542c2e60477d69b9c0d573b1966c3e\"><strong>Applications of Regular Expressions<\/strong><\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Text Search and Manipulation<br>Searching for specific patterns or keywords in text documents.<br>Replacing text with desired formats (e.g., date formatting, email address extraction).<\/li>\n\n\n\n<li>Data Validation and Extraction<br>Validating user input (e.g., email validation, password strength checks).<br>Extracting data from structured text (e.g., log files, CSV data).<\/li>\n\n\n\n<li>Web Development<br>Form validation and input sanitization in web applications.<br>URL routing and parameter extraction.<\/li>\n\n\n\n<li>Programming and Scripting<br>Pattern matching and data extraction in programming languages.<br>Log file parsing and analysis.<\/li>\n\n\n\n<li>Data Science and Natural Language Processing<br>Tokenization and text preprocessing in NLP tasks.<br>Data extraction and transformation in data pipelines.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading has-ast-global-color-1-color has-text-color has-link-color has-medium-font-size wp-elements-e96cd7efccbb3e9d13975fa782bdfa7f\"><strong>Some Application Examples (in Trados Studio):<\/strong><\/h2>\n\n\n\n<div class=\"wp-block-media-text has-media-on-the-right is-stacked-on-mobile\" style=\"grid-template-columns:auto 56%\"><div class=\"wp-block-media-text__content\">\n<p>Identify HTML tag elements<br>Regex: &lt;(\\\/?)([a-zA-Z]+)([^&gt;]*?)&gt;<br>Use Case: Ensure consistency in XML\/HTML tags within the content, especially for website localization.<br>Scenario: Suppose linguists are presented with raw HTML elements in their translation interface, a situation that is not uncommon in platforms like Crowdin. The primary objective is to ensure that these HTML tags remain unchanged during translation. However, we encounter several issues: In the source text, the linguist has mistakenly translated the &#8216;id&#8217; attribute of a &#8216;div&#8217; tag on line 1. On line 3, the tag is altered, leading to corruption. By line 6, the closing part of a tag is missing. This scenario underscores the importance of selecting and highlighting all tag elements, which can significantly aid a Quality Assurance Manager in identifying such errors. Ensuring the integrity of HTML tags is crucial for the smooth functioning of a localized website and to prevent it from crashing upon initial launch.<\/p>\n<\/div><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"711\" src=\"https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/htmltags-1024x711.jpg\" alt=\"\" class=\"wp-image-85 size-full\" srcset=\"https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/htmltags-1024x711.jpg 1024w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/htmltags-300x208.jpg 300w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/htmltags-768x533.jpg 768w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/htmltags.jpg 1282w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<div class=\"wp-block-media-text is-stacked-on-mobile is-vertically-aligned-top\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"732\" src=\"https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/white-space-1-1-1024x732.jpg\" alt=\"\" class=\"wp-image-83 size-full\" srcset=\"https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/white-space-1-1-1024x732.jpg 1024w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/white-space-1-1-300x214.jpg 300w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/white-space-1-1-768x549.jpg 768w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/white-space-1-1-1536x1098.jpg 1536w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/white-space-1-1.jpg 1570w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p>Removing Extra Whitespaces<br>In Trados<br>Find \\s{2,}<br>Replace with a single whitespace.<br>\\s or a simple whitespace<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"733\" src=\"https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/whitespace-2-1024x733.jpg\" alt=\"\" class=\"wp-image-84\" srcset=\"https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/whitespace-2-1024x733.jpg 1024w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/whitespace-2-300x215.jpg 300w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/whitespace-2-768x550.jpg 768w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/whitespace-2-1536x1099.jpg 1536w, https:\/\/mingmingzi.com\/wp-content\/uploads\/2023\/12\/whitespace-2.jpg 1660w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div><\/div>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>History of Regular Expressions 1950s: The concept of regular expressions was first formalized by mathematician Stephen Cole Kleene. He used regular expressions to describe the syntax of formal languages.1960s: Ken Thompson, the creator of Unix, implemented regular expressions in the QED text editor. This marked the introduction of regex into the world of computing.1980s: Henry &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/mingmingzi.com\/index.php\/2023\/12\/11\/regex\/\"> <span class=\"screen-reader-text\">Regex Riddles: Puzzles and Perils of Patterns<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"footnotes":""},"categories":[1],"tags":[],"class_list":["post-80","post","type-post","status-publish","format-standard","hentry","category-blog"],"_links":{"self":[{"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/posts\/80","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/comments?post=80"}],"version-history":[{"count":3,"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/posts\/80\/revisions"}],"predecessor-version":[{"id":203,"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/posts\/80\/revisions\/203"}],"wp:attachment":[{"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/media?parent=80"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/categories?post=80"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mingmingzi.com\/index.php\/wp-json\/wp\/v2\/tags?post=80"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}