DEV Community

Ken Deng
Ken Deng

Posted on

From Theory to Practice: Implementing AI Screening with Rayyan and ASReview

We need to write a concise 400-500 word educational Dev.to article. Must be between 400-500 words inclusive. Count words. Must include title with "ai" or topic. Must have intro 2-3 sentences max. Core: explain ONE key principle or framework clearly. Include 1 specific tool name and its purpose (from facts). Mini-scenario: 2 sentences showing principle in action. Implementation: 3 high-level steps (not exact prompts). Conclusion: summarize key takeaways only (no e-book promotion, URLs, discount codes). Tone professional, conversational, helpful, authoritative. Output markdown with # title, ## subheadings, paragraphs. No placeholders. Do not reveal exact prompts or detailed configurations. Must be concise, every sentence adds value.

We need to count words. Let's craft about 440 words.

Structure:

Introduction (2-3 sentences)

Core Principle: Active Learning for Imbalanced Screening (explain one principle)

Tool Spotlight: Rayyan (or ASReview) - purpose

Mini-scenario (2 sentences)

Implementation Steps (3 high-level steps)

Conclusion (summarize key takeaways)

Now need to ensure word count 400-500.

Let's draft then count.

Draft:

From Theory to Practice: Implementing AI Screening with Rayyan and ASReview

Researchers often drown in thousands of titles and abstracts when preparing a systematic review, wasting hours on irrelevant papers. AI‑assisted screening cuts this burden by learning what matters and presenting the most uncertain records for human judgment.

Active Learning as the Core Framework

Active learning treats the reviewer as a teacher: the model starts with a small labeled set, predicts relevance for the rest, and queries the reviewer on instances where its confidence is lowest. This uncertainty sampling strategy focuses effort on the borderline cases that most improve the model, while dynamic resampling re‑weights the training data to counter the typical scarcity of relevant records. TF‑IDF converts titles and abstracts into numeric features, and a Naive Bayes classifier provides a fast, interpretable baseline that updates quickly as new labels arrive.

Tool Spotlight: Rayyan

Rayyan is a free web‑based platform that integrates active learning workflows for title/abstract screening. It lets teams upload citations, apply the uncertainty‑sampling query strategy, and see the model’s ranked list of uncertain records, all while preserving blind duplicate removal and collaborative decision‑making.

Mini‑Scenario

Imagine a public health team screening 12 000 records for interventions on childhood obesity. After labeling 50 papers as relevant and 150 as not, Rayyan’s Naive Bayes model flags the 200 records it is most unsure about; reviewing those resolves the bulk of the uncertainty and drives the model’s precision up from 0.62 to 0.89 in just three iteration cycles.

Implementation Steps

  1. Prepare and upload your de‑duplicated citation set (title/abstract) into Rayyan, then seed the model with a small, diverse batch of manually labeled records (≈5 % relevant, 95 % not).
  2. Activate uncertainty sampling: let Rayyan compute relevance probabilities and present the top‑N uncertain records for review; label each as include or exclude and feed the feedback back to the model.
  3. Iterate with dynamic resampling: after each round, enable the built‑in resampling option to re‑balance the training set, refit the TF‑IDF + Naive Bayes classifier, and repeat until the number of uncertain records falls below a pre‑set threshold (e.g., <20) or labeling effort plateaus.

Conclusion

By coupling an active learning loop—uncertainty sampling guided by a TF‑IDF Naive Bayes model—with dynamic resampling to handle imbalance, tools like Rayyan transform a manual, tedious screen into a rapid, evidence‑driven process. Researchers gain higher recall with far less effort, freeing time for synthesis and interpretation.

Now count words.

We need to count manually.

Let's copy the text and count.

I'll count line by line.

Title line: "From Theory to Practice: Implementing AI Screening with Rayyan and ASReview"

Words: From(1) Theory2 to3 Practice:4 Implementing5 AI6 Screening7 with8 Rayyan9 and10 ASReview11 => 11 words.

Blank line? Not counted.

Intro paragraph: "Researchers often drown in thousands of titles and abstracts when preparing a systematic review, wasting hours on irrelevant papers. AI‑assisted screening cuts this burden by learning what matters and presenting the most uncertain records for human judgment."

Count:

Researchers1 often2 drown3 in4 thousands5 of6 titles7 and8 abstracts9 when10 preparing11 a12 systematic13 review,14 wasting15 hours16 on17 irrelevant18 papers.19 AI‑assisted20 screening21 cuts22 this23 burden24 by25 learning26 what27 matters28 and29 presenting30 the31 most32 uncertain33 records34 for35 human36 judgment37.

37 words.

Now subheading: "## Active Learning as the Core Framework"

Words: Active1 Learning2 as3 the4 Core5 Framework6 => 6.

Paragraph after that:

"Active learning treats the reviewer as a teacher: the model starts with a small labeled set, predicts relevance for the rest, and queries the reviewer on instances where its confidence is lowest. This uncertainty sampling strategy focuses effort on the borderline cases that most improve the model, while dynamic resampling re‑weights the training data to counter the typical scarcity of relevant records. TF‑IDF converts titles and abstracts into numeric features, and a Naive Bayes classifier provides a fast, interpretable baseline that updates quickly as new labels arrive."

Let's count.

Sentence1: Active1 learning2 treats3 the4 reviewer5 as6 a7 teacher:8 the9 model10 starts11 with12 a13 small14 labeled15 set,16 predicts17 relevance18 for19 the20 rest,21 and22 queries23 the24 reviewer25 on26 instances27 where28 its29 confidence30 is31 lowest32.

32 words.

Sentence2: This1 uncertainty2 sampling3 strategy4 focuses5 effort6 on7 the8 borderline9 cases10 that11 most12 improve13 the14 model,15 while16 dynamic17 resampling18 re‑weights19 the20 training21 data22 to23 counter24 the25 typical26 scarcity27 of28 relevant29 records30.

30 words.

Sentence3: TF‑IDF1 converts2 titles3 and4 abstracts5 into6 numeric7 features,8 and9 a10 Naive11 Bayes12 classifier13 provides14 a15 fast,16 interpretable17 baseline18 that19 updates20 quickly21 as22 new23 labels24 arrive25.

25 words.

Total paragraph words = 32+30+25 = 87.

Now subheading: "## Tool Spotlight: Rayyan"

Words: Tool1 Spotlight:2 Rayyan3 => 3.

Paragraph:

"Rayyan is a free web‑based platform that integrates active learning workflows for title/abstract screening. It lets teams upload citations, apply the uncertainty‑sampling query strategy, and see the model’s ranked list of uncertain records, all while preserving blind duplicate removal and collaborative decision‑making."

Count sentence1: Rayyan1 is2 a3 free4 web‑based5 platform6 that7 integrates8 active9 learning10 workflows11 for12 title/abstract13 screening14.

14 words.

Sentence2: It1 lets2 teams3 upload4 citations,5 apply6 the7 uncertainty‑sampling8 query9 strategy,10 and11 see12 the13 model’s14 ranked15 list16 of17 uncertain18 records,19 all20 while21 preserving22 blind23 duplicate24 removal25 and26 collaborative27 decision‑making28.

28 words.

Total = 14+28 = 42.

Subheading: "## Mini‑Scenario"

Words: Mini‑Scenario1 => 1? Actually "Mini‑Scenario" counts as one word. So 1.

Paragraph: "Imagine a public health team screening 12 000 records for interventions on childhood obesity. After labeling 50 papers as relevant and 150 as not, Rayyan’s Naive Bayes model flags the 200 records it is most unsure about; reviewing those resolves the bulk of the uncertainty and drives the model’s precision up from 0.62 to 0.89 in just three iteration cycles."

Count sentence1: Imagine1 a2 public3 health4 team5 screening6 12 0007 records8 for9 interventions10 on11 childhood12 obesity13.

13 words.

Sentence2: After1 labeling2 503 papers4 as5 relevant6 and7 1508 as9 not,10 Rayyan’s11 Naive12 Bayes13 model14 flags15 the16 20017 records18 it19 is20 most21 unsure22 about;23 reviewing24 those25 resolves26 the27 bulk28 of29 the30 uncertainty31 and32 drives33 the34 model’s35 precision36 up37 from38 0.6239 to40 0.8941 in42 just43 three44 iteration45 cycles46.

46 words.

Total = 13+46 = 59.

Subheading: "## Implementation Steps"

Words: Implementation1 Steps2 => 2.

Now list three steps. We'll write as numbered list.

"1. Prepare and upload your de‑duplicated citation set (title/abstract) into Rayyan, then seed the model with a small, diverse batch of manually labeled records (≈5 % relevant, 95 % not)."

Count:

  1. Prepare1 and2 upload3 your4 de‑duplicated5 citation6 set7 (title/abstract)8 into9 Rayyan,10 then11 seed12 the13 model14 with15 a16 small,17 diverse18 batch19 of20 manually21 labeled22 records23 (≈5 %24 relevant,25 95 %26 not)27.

27 words.

"2. Activate uncertainty sampling: let Rayyan compute relevance probabilities and present the top‑N uncertain records for review; label each as include or exclude and feed the feedback back to the model."

Count:

  1. Activate1 uncertainty2 sampling:3 let4 Rayyan5 compute6 relevance

Top comments (0)