<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mandarin Zone</title>
    <description>The latest articles on DEV Community by Mandarin Zone (@margaret_liu_03e481497e9e).</description>
    <link>https://dev.to/margaret_liu_03e481497e9e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3243963%2F57483e45-4a08-4986-99a0-1e34a1fd4b74.jpg</url>
      <title>DEV Community: Mandarin Zone</title>
      <link>https://dev.to/margaret_liu_03e481497e9e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/margaret_liu_03e481497e9e"/>
    <language>en</language>
    <item>
      <title>How I Open-Sourced 1,000+ Chinese Exam Questions from WordPress to GitHub</title>
      <dc:creator>Mandarin Zone</dc:creator>
      <pubDate>Thu, 05 Mar 2026 07:57:39 +0000</pubDate>
      <link>https://dev.to/margaret_liu_03e481497e9e/how-i-open-sourced-1000-chinese-exam-questions-from-wordpress-to-github-4e3c</link>
      <guid>https://dev.to/margaret_liu_03e481497e9e/how-i-open-sourced-1000-chinese-exam-questions-from-wordpress-to-github-4e3c</guid>
      <description>&lt;p&gt;I run &lt;a href="https://mandarinzone.com" rel="noopener noreferrer"&gt;Mandarin Zone&lt;/a&gt;, a Chinese language school in Beijing since 2008. Over the years, I built 12 complete HSK 4 mock exams using the AYS Quiz Maker WordPress plugin for our students to practice online.&lt;/p&gt;

&lt;p&gt;Recently, I decided to open-source all of this content. Here's how I extracted 1,176 questions from a WordPress database and turned them into a clean, developer-friendly GitHub repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;Our quiz data was locked inside WordPress — stored across multiple database tables (&lt;code&gt;aysquiz_questions&lt;/code&gt;, &lt;code&gt;aysquiz_answers&lt;/code&gt;, &lt;code&gt;aysquiz_quizzes&lt;/code&gt;) with HTML-embedded content, WordPress shortcodes for audio files, and messy formatting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Extraction
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Step 1: SQL Export&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I wrote targeted SQL queries to join the questions, answers, and quiz mapping tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;question_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;question_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;question_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;answer_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;correct&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;is_correct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordering&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;answer_order&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;aysquiz_questions&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;aysquiz_answers&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;question_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordering&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first export came out at 400MB for just 8,566 rows — turns out some fields had massive embedded content. After trimming unnecessary columns, it dropped to 1.4MB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Data Cleaning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The raw data had WordPress shortcodes like &lt;code&gt;[audio wav="..."][/audio]&lt;/code&gt; and HTML entities everywhere. I wrote a Python script to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extract audio URLs from shortcodes&lt;/li&gt;
&lt;li&gt;Strip HTML tags while preserving Chinese text&lt;/li&gt;
&lt;li&gt;Map question types based on content patterns (listening true/false, reading comprehension, fill-in-the-blank, sentence ordering)&lt;/li&gt;
&lt;li&gt;Group answers by question ID and sort by ordering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Structured JSON&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each test became a clean JSON file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"quiz_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HSK 4 Sample Quiz"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total_questions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"questions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"listening_true_false"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"audio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://media.mandarinzone.com/.../hsk4-1-02.wav"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"对"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"错"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"correct_answer_index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;12 complete HSK 4 mock exams&lt;/strong&gt; in JSON format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,176 questions&lt;/strong&gt; across 6 question types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Pages demo&lt;/strong&gt; where anyone can take the tests online&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CC BY-NC-SA 4.0&lt;/strong&gt; license — free for non-commercial use&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is HSK 4?
&lt;/h2&gt;

&lt;p&gt;HSK (汉语水平考试) is China's official Chinese proficiency test, recognized worldwide. Level 4 is intermediate — it certifies you can discuss a wide range of topics and understand ~1,200 vocabulary words. Each exam has 100 questions covering listening, reading, and writing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Can Build With This
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A mobile HSK practice app&lt;/li&gt;
&lt;li&gt;Anki flashcard decks&lt;/li&gt;
&lt;li&gt;NLP training data for Chinese language models&lt;/li&gt;
&lt;li&gt;Your own quiz platform&lt;/li&gt;
&lt;li&gt;Spaced repetition study tools&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Take a test online:&lt;/strong&gt; &lt;a href="https://hsk4.mandarinzone.com" rel="noopener noreferrer"&gt;hsk4.mandarinzone.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub repo:&lt;/strong&gt; &lt;a href="https://github.com/Make-dream-clear/hsk4-mock-exam" rel="noopener noreferrer"&gt;github.com/Make-dream-clear/hsk4-mock-exam&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're learning Chinese or building language learning tools, I hope this helps. PRs and stars welcome!&lt;/p&gt;




</description>
      <category>opensource</category>
      <category>webdev</category>
      <category>education</category>
      <category>hsk</category>
    </item>
  </channel>
</rss>
