DEV Community


Posted on • Updated on

Easy concurrent scraping with Scala with Akka streams

Step 1: Download ammonite

# Download Ammonite
$ curl -L > amm212 && chmod +x amm212

Step 2. Start ammonite

$ amm212 --class-based

Step 3. We need akka-streams

import $ivy.`com.typesafe.akka::akka-stream:2.6.3`
// standard imports
import akka.{ Done, NotUsed }
import akka.util.ByteString
import scala.concurrent._
import scala.concurrent.duration._

// Json parsing
import ujson._

// Handle files
import java.nio.file.Paths

implicit val system = ActorSystem("QuickStart")

val hotelids ="source.txt") // we need to scrape these urls or some ids for an endpoint

def parser(i: Int): (Int, String) = {
      val url = s"ENDPOINT/{i}"
      val urlResponse = Try(
      val result = urlResponse match {
        case Success(res) => "PARSED_RESULT"
        case Failure(e) => "null"
      return (i, result)

Step 4. We create file sink to store the data (you can print to console as well)

def lineSink(filename: String): Sink[(Int, String), Future[IOResult]] =
    Flow[(Int, String)].map(s => ByteString(s._1 + "," + s._2 + "\n")).toMat(FileIO.toPath(Paths.get(filename)))(Keep.right)

Step 5. Introduce throttling behavior because some websites disallow too many requests. Here, we make 25 requests in 10 seconds (5 requests in 2 seconds)

val resultFile =, 10.second).map(parser).runWith(lineSink("output.txt"))

All done from console with Ammonite shell!!

Top comments (0)