loading...

re: Haskell/Python/Go web scraper comparison completed VIEW POST

FULL DISCUSSION
 

What I suspect is that all three versions overlap on a section of the dom when traversing it from scratch for each of the two queries. I've write a Haskell version (using lens) that doesn't do the redundant traversal. I would like to see the corresponding python and go memorized traversal versions:

{-# LANGUAGE LambdaCase #-}

import Network.Wreq ( get, responseBody )
import Text.Taggy.Lens ( html, allAttributed, named, children, elements, contents, element )
import Data.Text.Encoding.Error ( lenientDecode )
import Data.Text.Lazy.Encoding ( decodeUtf8With )
import Data.Text ( unpack )
import Control.Lens ( (^..), (^.), ix, to, universe, only )
import Control.Exception ( catch )
import System.IO.Error ( isDoesNotExistError )
import System.Process ( readProcess )
import System.Exit ( exitSuccess )
import Control.Monad ( when )

main = do
    r <- get "https://fakenous.net"
    let markup = decodeUtf8With lenientDecode $ r ^. responseBody
        branch = markup ^.. html . allAttributed (ix "id" . only "widget-area") . children . traverse . element
        q1 = branch ^.. ix 2 . elements . named (only "ul") . children . ix 0 . to universe . traverse . contents
        q2 = branch ^.. ix 1 . elements . named (only "ul") . children . ix 0 . elements . contents
        newComment = unpack $ q1 !! 1 <> q1 !! 0 <> q1 !! 2
        newPost = unpack $ q2 !! 0 
    -- Read saved last stuff.
    file <- catch (readFile "recent-hs") (\case e | isDoesNotExistError e -> pure " \n ")
    let [oldPost, oldComment] = lines file
    when (oldPost == newPost && oldComment == newComment) exitSuccess
    -- Mail.
    let (subjectPiece, msg)
          | oldPost /= newPost && oldComment /= newComment = ("Post and Comment", "New Post: " ++ newPost ++ "\nNew Comment: " ++ newComment)
          | oldPost /= newPost = ("Post", "New Post: " ++ newPost)
          | otherwise = ("Comment", "New Comment: " ++ newComment)
    readProcess "mutt" ["user@example.com", "-s", "New " ++ subjectPiece ++ " on Fake Nous"] msg
    -- Write out the new file.
    writeFile "recent-hs" $ newPost ++ "\n" ++ newComment```

 

Interesting, but I don't think that would be a good change to make in general, since the cost of redundant traversal isn't relevant for the use case.

Also you didn't do triple backquotes so the code shows up unformatted...

 

Thanks for the tip. It's just an interesting case of composability. In Haskell this optimization comes cheap, it means adding a single let binding. I was curious what would surmount to in go and python

Code of Conduct Report abuse