Howto: Write website script
===========================

1. General tips
1.1. Read $prefix/doc/quvi/CodingStyle
1.2. Work with the development code
1.3. Use git
1.4. Amp up libquvi verbosity
1.5. Isolate the problem
1.6. LUA script search paths
--
2. Website script
2.1. Choose a website
2.1.1. Generating HTTP traffic logs
--
2.2. Writing a website script
2.2.1. Additional documentation
--
2.3. When you have chosen a website
2.3.1. Working with the development code
2.3.2. Working with precompiled binaries
2.3.3. Typical steps to write the script
2.3.3.1. Testing your script
--
2.4. Generate the patch
2.5. Before you submit your website script


1. General tips
---------------


1.1. Read $prefix/doc/quvi/CodingStyle
--------------------------------------

Please read specifically the LUA guidelines. You can find the same
file in the $top_srcdir/doc/ directory.


1.2. Work with the development code
-----------------------------------

The web interface can be found at:
    <http://repo.or.cz/w/quvi.git>
Or:
    % git clone git://repo.or.cz/quvi.git


1.3. Use git
------------

We use git in the examples of this documentation. Even if you are new to
git, you will see that generating patches with git is very easy. git also
preserves the patch contributor information along with the changes in the
repository log. In the long haul, this is beneficial to the project as
well as to the contributor.

Therefore, please make sure that you have set the following details in
the ~/.gitconfig file:

    [user]
    name  = your_name_here
    email = your_email_here

You may, of course, choose to use diff(1) or some other SCM instead of
git, if you like.


1.4. Amp up libquvi verbosity
-----------------------------

You can increase the library verbosity which may aid while you are
working on your scripts.

% env QUVI_SHOW_SCANDIR=1 quvi

    Makes the libquvi to dump the LUA script search dirs to the stderr.

% env QUVI_SHOW_SCRIPT=1 quvi

    Similar, but in addition to this, dumps the full paths to LUA scripts.

% quvi --verbose-libcurl

    Flip this switch on if you want to see what libcurl does behind the
    scenes.


1.5. Isolate the problem
------------------------

While working on the LUA patterns, you may prefer to work with local
files instead of directly with (lib)quvi. e.g.:

    % wget PAGE_URL -O output.html
    (...)

    % cat >> parse.lua
    io.input("output.html")
    local page = io.read("*all")

    local _,_,s = page:find(...)
    (...)

    % lua parse.lua
    (...)

Once you have perfected the patterns, you can then go ahead and write
the actual website script.


1.6. LUA script search paths
----------------------------

Please read:
    $prefix/share/quvi/lua/README
Or:
    $top_srcdir/share/lua/README


2. Website script
-----------------

You can choose to work with the development source code of quvi or
precompiled binaries that you have installed onto your system. We cover
these briefly in "2.3".

Typical quvi website script:
    * Identifies itself
    * Fetches video page
    * Parses video details (ID, title, media URL) from fetched page
    * Returns parsed details

Some scripts may be more complex:
    * Compare buzzhumor.lua to youtube.lua
    * Compare funnyhub.lua to dailymotion.lua

Be sure to read also "1. General tips" for the supported environment
settings.


2.1. Choose a website
---------------------

If you have none in mind, please visit our Trac at:
    <http://sourceforge.net/apps/trac/quvi/report/1>

And check for any unassigned tickets. We could always use help.


2.1.1. Generating HTTP traffic logs
-----------------------------------

Note that analyzation of HTTP traffic is not always necessary but can
help. For example, if the media URL is not visible in the video page
HTML, digging deeper is often required.

Find yourself a system that can execute Adobe Flash object code and
capture the generated HTTP traffic. You may be able to figure out
how the media URLs are being constructed by analyzing this data.

Some contributors have reported that they have used Wireshark for this.
Others have reported that some of the Firefox add-ons can be used for
this, e.g. "Live HTTP Headers".

Even if you are not a programmer, you could always contribute us with
log data. We've seen this help the work before.


2.2. When you have chosen a website
-----------------------------------

How difficult task this turns out to be depends on how the website was
designed. Please compare some of the existing scripts to get a better
understanding of this.

You should also make a note that some websites use additional protocols
(e.g. RTMP, RTSP, MMS). If you are working on a script for a website
that uses, say RTMP, you need to define this in the protocol category
in the `ident' function in your script. See francetelevisions.lua
for an example of this.

quvi defaults to HTTP for historical reasons. This means that if you
expect quvi to support a non-HTTP website, make sure you define also
"--category-all, -a" or "--category-$scheme" when you run quvi, e.g.:
    % quvi --category-rtmp URL
    % quvi -a URL


2.2.1. Additional documentation
-------------------------------

Please read:
    $prefix/share/quvi/lua/website/README
Or:
    $top_srcdir/share/lua/website/README
    
This file details the functions expected to be found in each website script.


2.3. Writing a website script
-----------------------------

Let's assume that you have figured out how to parse the video details.


2.3.1. Working with the development code
----------------------------------------

The web interface to our git repository can be found at:
    <http://repo.or.cz/w/quvi.git>

You can grab the development code with:
    % git clone git://repo.or.cz/quvi.git

If you prefer to work with precompiled quvi binaries, please jump to 2.3.2.

We will use a VPATH ("tmp") in this example.
    % cd quvi ; mkdir tmp ; cd tmp
    % ../configure ; make

Use buzzhumor.lua as a template script.
    % cp ../share/lua/website/buzzhumor.lua ../share/lua/website/foo.lua
    (open foo.lua in an editor)

Jump to 2.3.3 to continue.


2.3.2. Working with precompiled binaries
----------------------------------------

Make sure that you have at least 0.2.0 installed to your system:
    % quvi --version
    quvi version 0.2.14

Use buzzhumor.lua as a template script.
    % mkdir -p foo/lua/website/ ; cd foo
    % cp -r $prefix/share/lua/util/ lua/
    % cp -r $prefix/share/lua/website/quvi/ lua/website/
    % cp $prefix/share/lua/website/buzzhumor.lua lua/website/foo.lua

    So the foo/ dir should look like:
    % find .
    .
    ./lua
    ./lua/website
    ./lua/website/foo.lua
    ./lua/website/quvi
    ./lua/website/quvi/const.lua
    ./lua/website/quvi/url.lua
    ./lua/website/quvi/util.lua
    ./lua/website/quvi/bit.lua
    ./lua/util
    ./lua/util/content_type.lua
    ./lua/util/charset.lua
    ./lua/util/trim.lua

    (open foo.lua in an editor)

Jump to 2.3.3 to continue.


2.3.3. Typical steps to write the script
----------------------------------------

If you are familiar with regular expressions, you will find many
similarities in LUA patterns. You can find more about the LUA patterns
at: <http://www.lua.org/pil/> -- Programming in Lua

1) Change the copyright line (add year, your name, your email).

Or you can use the project default:
"Copyright (C) year  quvi project <http://quvi.sourceforge.net/>"

2) Modify the `ident' function in your script.

-   r.domain  = 'buzzhumor.com"
+   r.domain  = 'foo.bar'

    A word about r.handles: when the `ident' function gets called, quvi
    checks whether the script can handle the user defined page URL. To
    do this, we define at least one domain signature and one path
    signature to compare against the URL. For the sake of brewity, let's
    assume that:

- r.handles    = U.handles(self.page_url, {r.domain}, {"/videos/"})
+ r.handles    = U.handles(self.page_url, {r.domain}, {"/watch/"})

    Is enough. It may be easier to understand if we take a look at
    the example URLs:
        http://www.buzzhumor.com/videos/32561/Girl_Feels_Shotgun_Power
        http://foo.bar/watch/1234/

    The `handles' function (of quvi/util.lua) that we call confirms that:
        * the domain pattern (e.g. "buzzhumor.com") is found in the URL
        * the path pattern (e.g. "/videos/") is found in the URL

    You can leave r.formats untouched, unless you know that the website
    supports more than one ("default") video format and you know how to
    access those. If you are not sure, that's OK too, we can always add
    those later.

    See youtube.lua, dailymotion.lua and vimeo.lua for examples of scripts
    with additional formats.

3) Change r.categories only if you know that the website uses non-HTTP.

    See francetelevisions.lua for an example of this.

4) Move to the `parse' function. This is where most of the magic happens.

    To keep things simple, let's go ahead and assume that our 'foo'
    website is nearly identical to 'buzzhumor'.

  4.1) Update the host_id.

-   self.host_id = 'buzzhumor'
+   self.host_id = 'foo'

  4.2) In order to have something to work with, let's grab the video page
  from the user specified URL.

    local page = quvi.fetch(self.page_url)

  4.3) Now that we have the page, it's time to parse the video details
  from it. Let's start from the video title:

    local _,_,s = page:find('<title>(.-)</title>')
    self.title  = s or error("no match: video title")

  4.4) Grab the video ID.

    local _,_,s = page:find('vid_id="(.-)"')
    self.id     = s or error("no match: video id")

  4.5) We're almost done: the only remaining one is the media URL:

    local _,_,s = page:find('vid_url="(.-)"')
    self.url    = {s or error("no match: video url")}

    Make a note of the {}, we place the URL in a table.


2.3.3.1. Testing your script
----------------------------

If you are working with the development code (see 2.3.1), run:
    (still in $top_srcdir/tmp)
    % env QUVI_BASEDIR=../share ./src/quvi TEST_URL

Or if you are working with precompiled quvi binaries (see 2.3.2), run:
    (still in foo/ dir)
    % quvi TEST_URL

You will most likely spend most of the time tweaking the patterns in
in your script. It often helps to "isolate the problem", e.g. copy
page data to a local file and write an additional script to perfect
the LUA patterns. See "1. General tips" for "Isolate the problem".

Read also $top_srcdir/tests/README for tips on how you can use the
existing test suite files to test your script.


2.4. Generate the patch
-----------------------

If you are working with the quvi development code (see 2.3.1):
    (still in $top_srcdir/tmp)
    % git add ../share/lua/website/foo.lua

Or if you are working with precompiled quvi binaries (see 2.3.2):
    (still in foo/ dir)
    % git init ; git add lua/website/foo.lua

Finally, run:
    % git commit -am 'Add foo support'
    % git format-patch -M -1

See also:
    $prefix/doc/quvi/HowtoSubmitPatches
Or:
    $top_srcdir/doc/HowtoSubmitPatches


2.5. Before you submit your website script
------------------------------------------

  * Does your script set and parse everything as expected?
    - Protocol category
    - Host ID
    - Video ID
    - Video title
    - Media URL

  * Does the website support more than one video format?
    - If yes, see if you can add support for them
    - We can, of course, add the support later

  * Does the parsed video title contain extra characters?
    - We want the video title *only*
    - Anything else, e.g. domain name, should be left out

  * If you are unsure about something, don't hesitate to ask

See also:
    $prefix/doc/quvi/HowtoSubmitPatches
Or:
    $top_srcdir/doc/HowtoSubmitPatches
