DEV Community

loading...

Discussion on: Comparing the same web scraper in Haskell, Python, Go

Collapse
smunix profile image
Providence Salumu

Apologies, I should have thought of this earlier. Anyway, adding more details:

-- file : Main.hs
{-# LANGUAGE OverloadedStrings #-}

module Main where

import Control.Lens (to, only, toListOf, folded)
import Data.Text.Encoding.Error (lenientDecode)
import Data.Text.Lazy.Encoding (decodeUtf8With)
import Network.Wreq (responseBody, get)
import Text.Taggy.Lens (html, children, allAttributed)

main = (toListOf $ responseBody . to (decodeUtf8With lenientDecode) . html . allAttributed (folded . only "recentcomments") . children) <$> (get "https://fakenous.net") >>= print 

The dependencies can be put in a dev-to.cabal file:

-- dev-to.cabal
cabal-version:       2.4
name:                dev-to
version:             0.1.0.0
license-file:        LICENSE
author:              Providence Salumu
maintainer:          Providence <dot> Salumu <at> smunix <dot> com
extra-source-files:  CHANGELOG.md

executable dev-to
  main-is:             Main.hs
  build-depends:       base ^>=4.13.0.0
                     , lens
                     , bytestring
                     , http-client
                     , text
                     , taggy
                     , taggy-lens
                     , wreq
  default-language:    Haskell2010

Doing the saving and emailing you would be a simpler addition.

You can clone my repo from github.com/smunix/dev-to

Thread Thread
yujiri8 profile image
Ryan Westlund Author

Ah. Still, that doesn't seem to be a complete solution. I ran it with cabal run and the output is the object:

[[NodeElement (Element {eltName = "li", eltAttrs = fromList [("class","recentcomments")], eltChildren = [NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeContent "Gerardo"]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=1130#comment-1909")], eltChildren = [NodeContent "The Failings of Analytic Philosophy"]})]}),NodeElement (Element {eltName = "li", eltAttrs = fromList [("class","recentcomments")], eltChildren = [NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://yujiri.xyz"),("rel","external nofollow ugc"),("class","url")], eltChildren = [NodeContent "Yujiri"]})]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=1704#comment-1908")], eltChildren = [NodeContent "How Can You Put a Price on Human Life?"]})]}),NodeElement (Element {eltName = "li", eltAttrs = fromList [("class","recentcomments")], eltChildren = [NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeContent "Paul Lake"]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=327#comment-1907")], eltChildren = [NodeContent "Studies in Irrationality: Marxism"]})]}),NodeElement (Element {eltName = "li", eltAttrs = fromList [("class","recentcomments")], eltChildren = [NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeContent "Dave"]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=1704#comment-1905")], eltChildren = [NodeContent "How Can You Put a Price on Human Life?"]})]}),NodeElement (Element {eltName = "li", eltAttrs = fromList [("class","recentcomments")], eltChildren = [NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","http://www.daviddfriedman.com"),("rel","external nofollow ugc"),("class","url")], eltChildren = [NodeContent "David Friedman"]})]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=1674#comment-1904")], eltChildren = [NodeContent "Do Religious People Believe Religion?"]})]})],[NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeContent "Gerardo"]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=1130#comment-1909")], eltChildren = [NodeContent "The Failings of Analytic Philosophy"]})],[NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://yujiri.xyz"),("rel","external nofollow ugc"),("class","url")], eltChildren = [NodeContent "Yujiri"]})]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=1704#comment-1908")], eltChildren = [NodeContent "How Can You Put a Price on Human Life?"]})],[NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeContent "Paul Lake"]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=327#comment-1907")], eltChildren = [NodeContent "Studies in Irrationality: Marxism"]})],[NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeContent "Dave"]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=1704#comment-1905")], eltChildren = [NodeContent "How Can You Put a Price on Human Life?"]})],[NodeElement (Element {eltName = "span", eltAttrs = fromList [("class","comment-author-link")], eltChildren = [NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","http://www.daviddfriedman.com"),("rel","external nofollow ugc"),("class","url")], eltChildren = [NodeContent "David Friedman"]})]}),NodeContent " on ",NodeElement (Element {eltName = "a", eltAttrs = fromList [("href","https://fakenous.net/?p=1674#comment-1904")], eltChildren = [NodeContent "Do Religious People Believe Religion?"]})]]

Instead of the text.

I also wouldn't consider that one line. If I were to really use that code, I'd certainly break it into 2-4. Still, it is an impressive improvement! I'll have to look more into those libraries.