dominator 1.0.1
A HTML scraper Command-line Application
To use this package, run the following command in your project's root directory:
Manual usage
Put the following dependency into your project's dependences section:
dominator
dominator is a forgiving HTML-Parser for the Command-line.
usage & examples
###Parameters
Parameter | short | Description |
---|---|---|
--filter | -f | A Dominator specific Filter Expression |
--output-item | -o | Defines the Output |
--output-item-terminator | -t | Character, that terminates one item Group on Output |
--output-item-serparator | -s | Character, that separates the items on Output |
--input-file | -i | Read the Input from a File instead of stdin |
--with-html-comments | include matches in commented html into the output |
This Example shows a query for a-tags, that are children of a li-tag and has a class Attibute with the value "link". We want to the output to be "Tag"\t"Element Attributes csv"\t"value of the element Attribute href"\n for each hit
$ cat ./dummy.html | ./dominator -f'li.a{class:link}' -o'tag' -o'attrib-keys' -o'attrib(href)'
a href,id,class #a-1-li-1-o2-1
a href,id,class #a-2-li-2-o2-1
a href,id,class #a-3-li-2-o2-1
This Example shows a query for a-tags where the href begins with "http"
$ cat ./dummy.html | ./dominator -f'a{href:(regex)^http}' -o'tag' -o'attrib-keys' -o'attrib(href)'
a href,id,class https://github.com
#Filter Syntax Expression = TAG[PICK]{ATTRNAME:ATTRVALUE} Multiple expression can be concated with "." to find Stuff inside of specific parent nodes.
Item | Description | Example |
---|---|---|
TAG | The Name of the node | a , p , div , * |
[PICK] | (can be ommited) Picks only the n th match. n begins on 1. PICK can be a list or range | [1] picks the first match , [1,3] picks the first and third , [1..3] picks the first three matches |
{ATTRNAME:ATTRVALUE} | The attribute selector | {id:myID} , {class:someClass} , {href:(regex)^http://} |
Build & install
###build
dub build dominator
copy the binary in one of your PATH directories
###use a already build binary Check out the bin/ directory. Occasionally i put Windows and Mac binaries in this directory - please be aware, that these binaries usually are not up to date.
- Registered by Martin Brzenska
- 1.0.1 released 8 years ago
- mab-on/dominator
- MIT
- Copyright © 2016, Martin Brzenska
- Authors:
- Dependencies:
- libdominator
- Versions:
-
1.1.6 2017-Oct-31 1.1.5 2017-Oct-07 1.1.3 2017-Jan-08 1.1.2 2016-Dec-06 1.1.1 2016-Nov-22 - Download Stats:
-
-
0 downloads today
-
0 downloads this week
-
0 downloads this month
-
114 downloads total
-
- Score:
- 1.0
- Short URL:
- dominator.dub.pm