- HACKING -
  ~~~~~~~

This file describes the internal of SpcialK: the program
architecture, its data structures and other technical informations one
should read before starting to contribute to SpcialK.


Parser design
=============

This section is a technical description of the parser
implementation. It does not cover the intermediate structure (see next
section). Its purpose is to offer an overview of how it is implemented
so as to help somebody outside the SpcialK team.

Implementation was made in 2 steps:
* Design of lexer and parser using lex&yacc-like tools.
* Design of intermediate structure between parsing and translation

The intermediate structure was made first so as to allow parallel
development of the parser and the translator.

Parser tasks
- - - - - - 

* Lexical and syntaxic parsing
* Semantic analysis: expressions type checking, among others
* Generation of intermediate structure or raising of exceptions if
  errors occur

Parsing this program:

	fac(0) = 1;
	n > 0 -> fac(n) = n * fac(n-1).
	<%% (define x 0) %%>
	sort(nil) = nil;
	sort(c:L) = insert(c,sort(L)).
	fac(10).

shoud produces this structure:

((10002
  fac
  1
  ((10003 #<struct:position> ((10004 (0 0))) #t (10004 (1 0)))
   (10003
    #<struct:position>
    ((10004 (n 4)))
    (10004 (> 2) (10004 (n 4)) (10004 (0 0)))
    (10004 (* 5) (10004 (n 4)) (10004 (call 4) fac ((10004 (n-1 4))))))))
  (10006 "<%% (define x 0) %%>")
  (10002
    sort
    1
    ((10003 #<struct:position> ((10004 (nil 3))) #t (10004 (nil 3)))
    (10003
     #<struct:position>
     ((10004 (: 3) (10004 (c 4)) (10004 (|L| 4))))
     #t
     (10004 (call 4) insert ((10004 (c 4)) (10004 (call 4) sort ((10004 (|L| 4))))))))
 (10005 (10004 (call 4) fac ((10004 (10 0))))))


Intermediate structure
- - - - - - - - - - - 

The intermediate structure is defined in the special-k-structs.ss
file.

# Description:
##

During implementation, this structure came through numerous
changes. In order to allow these changes without slowing traduction
implementation down, accessors and predicates were defines for all the
structure. As a result, no change made on the structure caused changes
in the translator.

The intermediate structure is a list that may contain _functions_,
_expressions-t_ or _scheme-code_ called dictionary.

A _function_ is a list in the form:

(FONC_MAGIC name nb_args clauses_list)

* FONC_MAGIC is an integer allowing to identify the structure.
* name is the name of the function.
* nb_args is the number of function arguments.
* clauses_list is a list of _clauses_.

A _clause_ is a list in the form:

(CLAUSE_MAGIC start_pos end_pos args guard right_hand_side)

* CLAUSE_MAGIC is an integer allowing to indentify the structure.
* start_pos is a position structure generated by yacc defining the
  start of the first token in the clause.
* end_pos is a position structure generated by yacc defining the end
  the last token in the clause.
* args is a list of _expressions_ corresponding to the arguments of
the clause left_hand_side.
* guard is an _expression_.
* right_hand_side is an _expression_.

An _expression-t_ is a list in the form:

(EXPR_T_MAGIC start_pos end_pos expr)

* EXPR_T_MAGIC is an integer allowing to indentify the structure.
* start_pos is a position structure generated by yacc defining the
  start of the first token in the clause.
* end_pos is a position structure generated by yacc defining the end
  the last token in the clause.
* expr is an _expression_.

An _expression_ is a list (or tree) in the form:

(TREE_MAGIC (op TYPE) [lc] [rc] [rrc])

Trees can be have one to three childs. If there is no child, then it
is a leave.

* TREE_MAGIC is an integer allowing to indentify the structure.
* op is a symbol identifying an operator.
* TYPE is an integer identifying the expression _type_.
* lc, rc, rrc are _expressions_. They are optional and depends of the
  tree arity.

A code-scheme is a list in the form:

(SCHEME_CODE_MAGIC code)

* SCHEME_CODE_MAGIC is an integer allowing to indentify the structure.
* code is a list of Scheme code

This facility was meant to allow Scheme code integration in a SpcialK
program, but was not implemented eventually.

# Expressions construction
##

During the tree construction, we verifie the expressions type
matching,ie we verify that the type of the operands can be applied to
their operator. Since the language has a dynamic typing, this check
cannot always be done. Therefore, we try to verify as much as we can.

Examples:
(a & b) + c  (raises an exception)
foo(a) + b   (checking impossible, we do not know foo's return type)

The construction functions return a typed tree or raises an error
(return #f), in which case an error message is produced (error_str
contains the error message). There is also another possible return
value ('error_expr) in the cae where we build a tree with erroneous
expressions.

The following table shows, for each operator and for each operand
type, the return type. Since most operators are commutative, we do not
include all permutations in the table, since they can be deduced.

* INT : expression whose result will be an integer or a rational.
* REAL : expression whose result will be a real.
* BOOL : expression whose result will be a boolean.
* LISTE : expression whose result will be a list (or pair).
* NUMERIC : expression whose result will be a number of unknown type.
* VAR : expression whose result will be of unknown type.    
* TAB : expression whose result will be an array.	    
* STRING : characters string, no operator applies on this type.


Operator  | fg           | fd           | fdd       | Result
-------------------------------------------------------------
+,-,*,/   | INT          | INT          | N/A       | INT
	  |	       	 |	  	|	    |
          | NUMERIC      | INT,REAL,    | N/A       | NUMERIC
          |              | NUMERIC,VAR	|	    |
	  |	       	 |	  	|	    |
          | REAL         | INT,REAL,    | N/A       | REAL
	  |	         | NUMERIC,VAR	|	    |
	  |	       	 |	  	|	    |
          | VAR          | INT,REAL,    | N/A       | VAR
          |              | NUMERIC,VAR	|	    |
          | BOOL, LIST , | Tous types   | N/A       | Erreur
	  | TAB        	 |	     	|   	    |
-------------------------------------------------------------
<,>,<=,   | INT,REAL,    | INT,REAL,    | N/A       | BOOL
>=,==,<>  | NUMERIC,VAR  | NUMERIC,VAR	|	    |
	  |	       	 |	  	|	    |
          | BOOL, VAR    | BOOL, VAR    | N/A       | BOOL
	  |	       	 |	  	|	    |
          | LIST,VAR     | LIST,VAR     | N/A       | BOOL
	  |	       	 |	  	|	    |
          | TAB,VAR      | TAB,VAR      | N/A       | BOOL
	  |		 |	  	|	    |
          | INT,REAL,    | BOOL, LIST,  | N/A       | Error
          | NUMERIC,VAR  | TAB	  	|	    |
	  |		 |	  	|	    |
          | BOOL         | INT,REAL,    | N/A       | Error
	  |		 | NUMERIC,	|	    |
	  |		 | LIST, TAB	|	    |
	  |		 |	  	|	    |
          | LIST         | INT,REAL,    | N/A       | Error
          |              | NUMERIC,	|	    |
          |              | BOOL, TAB	|	    |
	  |		 |	  	|	    |
          | TAB          | INT,REAL,    | N/A       | Error
          |              | NUMERIC, 	|	    |
          |              | BOOL, LISTE	|	    |
-------------------------------------------------------------
&,|       | All types    | All types    | N/A       | BOOL
-------------------------------------------------------------
:         | All types    | All types    | N/A       | LIST
-------------------------------------------------------------
mod,div   | INT, NUMERIC | INT,NUMERIC  | N/A       | INT
	  | VAR          | VAR	  	|	    |
	  |		 |	  	|	    |
          | REAL,LIST    | All types    | N/A       | Error
          | TAB, BOOL	 |	  	|	    |
-------------------------------------------------------------
[]        | TAB,VAR      | <Later>      | N/A       | VAR
	  |		 |		|	    |
          | INT,NUMERIC, | <Later>      | N/A       | Error
          | REAL, BOOL,	 | 		|	    |
          | LIST	 | 		|	    |
-------------------------------------------------------------
<-	  | TAB,VAR      | <Later>      | All types | TAB
          |              |              |           |
          | Sthing else  | <Later>      | All types | Error
-------------------------------------------------------------
<->       | TAB,VAR      | <Later>      | <Later>   | TAB
-------------------------------------------------------------
call      | No check     | No check     | N/A       | VAR


For the last 3 operators, right hand side operands are a list of
expressions that must be of integer type; their type checking will be
made ouside of the tree construction functions.


Parser
- - - 

The parser is defined in the special-k-grammar.ss file.

In addition to the lexical and syntaxical analysis, the parser will
deal with checks described hereinafter. If an error occur, an
exception of type exn:read is raised. We will define, in the
exception, the beginning of the first token and the end of the last
token positions of the parsed string ($n-start-pos, $n-end-pos); those
will be used to highlight the erroneous part of the code.

* Error propagation provoked by the a tree construction function call
* If an error is detected, return 'err_expr. This value will be
  propagated to the top of the tree, and no other check will be done.
* Left hand side arguments checking, the only operator allowed in left
  hand side part of the clause being ':'.
* Name and argument number check in the clauses that define a
  function. They have to be the same, abeilt a definition can overload
  another one without masking the previous one if it has a different
  number of arguments.
* Guard type check: it has to be BOOL or VAR.
* Index type check when using operator []. It has to be of type INT,
  NUMERIC or VAR.


For more details, see the comments in special-k-grammar.ss.

The file is organized in 3 parts:
* Analysis: Lexer and parser
* Semantics: Functions called by the parser, checking the above list.
* Execution: Definitions of the functions called by the other parts of
  the project.


Translator and Integration Hack
===============================

"Be subtle to the point of formlessness."
    	      	  	   -- Neil Megson

Translator
- - - - - 

The translator is divided into 3 files basic-translator.ss;
fast-k-translator.ss special-k-translator.ss. basic-translator
contains various funs useful for both fast-k and special-k.

The fast-k language is SpcialK Light and special-k language is
SpcialK Classic

There is also a fourth file containing the guard factorisation
optimization. The file is called special-k-opt.ss.

In those files there are 3 main funs: translator; fonctionHandler and
rhsHandler.  translator is the main entry point; it take the
intermediary structure and passes it to functionHandler if this is a
fun definition; or rhsHandler if it is only a right part.

For each part it returns a syntax object with a pointer to the
original code position. For more infos on these things; see the
MzScheme language manual; and also the following excerpt from the
mailing list:


---------------------------------
-- Beginning of the transcript --
---------------------------------

[plt-scheme] problem using drscheme tools
Paul Graunke  ptg@ccs.neu.edu
Sat, 15 Jun 2002 02:07:43 +0000 

* Previous message: [plt-scheme] problem using drscheme tools
* Next message: [plt-scheme] problem using drscheme tools
* Messages sorted by:  [ date ]  [ thread ]  [ subject ]  [ author ] 

  (display "hi dave")
is really something like
  (#%module-begin (#%app display (#%datum "hi dave")))
so you need to export some #% funny things if you want
function application and constants in your language.

For example, the beginner language provides a different #%app
to prevent calling functions with no arguments. (At least it did
some time ago.)

Paul

At Fri, 14 Jun 2002 17:48:39 -0400, "David B. Tucker" wrote:
> Hi,
> 
> I'm having problems using DrScheme Tools.  I'm trying to add a new
> language by creating an implementation of drscheme:language:language<%>.
> The docs claim that the FRONT-END method should return a syntax object
> or sexp, but when I return any simple sexp such as `(display "hi
> dave"), I get the following error:
> 
> compile: bad syntax; function application is not allowed, because no
> #%app syntax transformer is bound in: (display "hi dave")
> 
> Any clues?  Thanks,
> 
> Dave
> 
> ____________________________________________________
> PLT Scheme discussion list
> plt-scheme@list.cs.brown.edu
> http://list.cs.brown.edu/mailman/listinfo/plt-scheme




[plt-scheme] problem using drscheme tools
Matthew Flatt  mflatt@cs.utah.edu
Sat, 15 Jun 2002 07:20:19 -0600 (MDT) 

* Previous message: [plt-scheme] problem using drscheme tools
* Next message: [plt-scheme] mzc: can't compile this
* Messages sorted by:  [ date ]  [ thread ]  [ subject ]  [ author ] 

At Sat, 15 Jun 2002 02:07:43 +0000, Paul Graunke wrote:
>   (display "hi dave")
> is really something like
>   (#%module-begin (#%app display (#%datum "hi dave")))

`#%module-begin' is only for moudle bodies, but the #%app part is
right.

> so you need to export some #% funny things if you want
> function application and constants in your language.

In particular, the `on-execute' method should initialize the namespace
with wahetever top-level bindings you need.


If your tool produced

 #'(display "hi dave")

instead of

'(display "hi dave")

then it would likely[*] work without any namespace initialization,
because the returned syntax wouldn't have the top-level context, and
then wouldn't need any top-level bindings.

Matthew

[*] Depends, in general, on the context of the #'(diplsy "hi dave"),
but it seems likely that it would be in the context of mzscheme
bindings, and therefore ok in the execution namespace.

---------------------------------
----- End of the transcript -----
---------------------------------

One of these problems appears when you try to execute the code inside
the language tool in file tool.ss.
Well; the answer appears to be a code like this:
(namespace-syntax-introduce tmp)
introducing syntax stings; and avoiding the bug "#%app #%datum #%top
are undefined..."  in the translator files; much of the funs have
names like 'get-what-code' with the word what replaced by the
appropriate keyword (if; fun; match; etc. ...)  in the files the
quasiquote was often used to generate the scheme code so...


Integrating things to DrScheme
- - - - - - - - - - - - - - - 

First off; some docs pointers:

* The tools manual:
  http://download.plt-scheme.org/doc/207/html/tools/index.htm
* The framework manual:
  http://download.plt-scheme.org/doc/207/html/framework/index.htm
  beware in this manual there are some incomplete documentation (!)
  which is more complete when accessed with the tool manual (the
  color-text% manual)
* If you've never heard of mixins:
  http://www.cs.utah.edu/plt/publications/icfp98-ff/

If you happen to be as desperate as I once were; I suggest you do some
code-diving in the collects subdirectories....

The files for the integration are special-k-gui.ss tool.ss info.ss.
In info.ss you've got some useful things like that for including the
SpcialK icon.
In tool.ss you've got the main inclusion part.
In special-k-gui.ss you've got the funny things like that for the
tabbing and the syntax coloring mess.
(and also setting.ss)

With tool.ss the main idea is make a mode for the language. There are
the phase1 and phase2 funs as usual; and there is some init fun
'make-mode'.

The class implementing the language is special-k-classic-lang and
special-k-light-lang which inherits from special-k-classic-lang. 
The first is to put the special-k-classic language into the DrScheme
language menu and the second to put the special-k-light language into
the DrScheme language menu to save and retrieve configs data there are
the 'k-settings' which are provided in setting.ss.

In the classes the funs are all implementations of the funs asked by
the interface drscheme:language:language<%>

special-k-gui.ss:
the syntax coloring is done with the help of a lexer hacked more or
less from ProfessorJ and/or Algol60.
The tabbing is not as precise as i'd like it to be. Anyway it works
quite well.  There are funs to tabify a region; tabify a line.

Because of the lack of facilities to expand the tabbing part; I had to
expand the color-text-mode% class. Anyway for now on, it works. I
don't know if DrScheme evolves I think it will lack highly of
compatibility...  anyway...


Other small implementation tricks
- - - - - - - - - - - - - - - - -

Exhaustive DrScheme provided tools are:
- Text edittor
- List of current file functions
- Syntax coloring
- Indenting
- Check Syntax
- Execution control
- Hyperlinks in exceptions
- Test Coverage
- Test Suite
- Profiler
- Mr Flow
- Stepper
- Startup icon
- Adding the plug-in in the about box
- Adding the plug-in in the installed software list
- Restricted search in Help-Desk
- Adding doc.txt in the documentation list
- Adding the documentation in the installed manuals list


Here are some descriptions of integration with DrScheme we did.

- Binding:

This is done using a unit that will register procedures on start-up.
A minimal unit will look like:

(module tool mzscheme
  (require (lib "tool.ss" "drscheme")
           (lib "unitsig.ss"))

  (provide tool@)

  (define tool@
    (unit/sig drscheme:tool-exports^
      (import drscheme:tool^)
      (define (phase1)
        (void))
      (define (phase2)
        (void)))))

- Declaring the tool:

Create a file named info.ss. The file contents are pretty explicit.


- Startup icon and mention in the About box
  Adding the plug-in in the installed software list

In info.ss:

(define tool-names '("Name"))
(define tool-icons '(("icon.png" "collection")))
(define tool-urls '("http://url"))

icon.png is displayed in DrScheme's startup dialog, and the About box
show an entry for the tool, optionaly creating a link with the
provided URL.

info.ss also defines the collection name and description that will be
included in the "Installed Components" page of the Help-Desk.


- Mr Flow

This tool is still described as "Coming soon". We cannot use it yet.


- Adding doc.txt in the documentation list

All doc.txt files at the root of each collection are listed in the
Help-Desk "Installed Manuals" page in the Doc.txt section.


- Adding the documentation in the installed manuals list

The "Others" section in the Help-Desk "Installed Manuals" page
automatically includes all documentations installed in
collects/doc. Including the documentation in "Languages" is
unfortunately hardcoded (collects/help/private/docpos.ss).





Data Visualization
==================

Internal visualization
- - - - - - - - - - - 

The MrEd data visualization is an extensive work out of some broken
code from http://elonen.iki.fi/code/misc-notes/scheme-draw-tree/
(which is public domain).

Exported procedures are new-tab-viewer, new-tree-viewer,
display-viewer and refresh-viewer (see user documentation).

It supports custom sizes and coloring taking into account the
(equal?)ilty of elements values.


Supported external data visualization tools
- - - - - - - - - - - - - - - - - - - - - -

- springgraph: http://www.chaosreigns.com/code/springgraph/
http://packages.debian.org/unstable/graphics/springgraph

Supports a subset of the .dot format, which is enough for our
purposes. Files in .dot format are also used by some Viz that we do
not support nor mention here, since it is not free software, nor
OSI-compliant Open Source, despite the fact its authors call it "open
source".

More exactly we support the debian unstable package, which mainly adds
some command-line options and has slightly different default behavior.

Beware that springgraph displays graphs, not trees.


- vcg: http://packages.debian.org/unstable/graphics/vcg

Visualization of Compiler Graphs permits, among others, to show a tree
in a conventionnal way.

We support the Debian unstable package, because it is the only legal
free version of VCG; previous versions of the packages included
obfuscated C code, which made the package illegal for redistribution
(violation of GPL sec.3 - "The source code for a work means the
preferred form of the work for making modifications to
it"). Fortunately the copyright holders were convinced to release the
original source code.

xvcg reads .vcg files and displays a corresponding graph in an X
window.

vcg reads .vcg files and converts them to PostScript or PNM.

The 'tree' mode show trees just the way we need.


- Implementation: Exported procedures are spring vcg, xvcg, tree2dot,
tree2vcg.

First, the tree is converted to the format used by the external tool.
.vcg and .dot formats are very similar, so both are produced using
(tree->anything): this procedure outputs an header, parse the trees in
depth producing node and edge descriptions, and outputs a footer.
(tree->anything) is used by (tree2dot) and (tree2vcg). Those are named
using '2' instead of '->' so they can used in a SpcialK program.

Then the external tool is launched using (process-apply), which takes
a command, its parameters, and deux lambdas to apply on the standard
input and output. It allows to feed the program with the tree
description and get the image from its output. vcg is a bit more
complicated to use because it produces the image in a non-existing
file, which needs to be converted using pnmtopng.

All tools has to be in the user's PATH, else the procedures will
display an error message and return false. It does not do exception
because people using SpcialK would have troubles catching them. It
also produces something a bit more explicit than only #f.


Localization
============

DrScheme uses a localization system where all texts are in one place,
which usually belongs to 'root'. It is not possible to use it for a
plug-in. See collection 'string-constants'.

Instead, we used SRFI 29 "Localization". SRFI 29 currently has some
flaws (that yours truly will fix in the near future), but we did not
need the incriminated functions.

Module 'l10n.ss' exports a procedure (localized-message 'symbol),
inspired by SRFI 29's Example, which is a convenient method to grab
any string for SpcialK only. All those strings are defined inside
l10n.ss in the form of pairs (symbol."string"), grouped by
language. l10n also detects the preferred DrScheme language. Last,
(localized-message) can take optional arguments that will used to
(format) the string before to return it. Beware that (format) is
always applied on the string, so it cannot countain special sequences
such as "~a" if no additional argument is provided.

When DrScheme v300 will be widespread, we should drop v200 support and
reimplement i18n using a gettext function that could read .po and .mo
files.

Encodings
- - - - -

As of v207, DrScheme interprets all files as Latin-1. In the
forthcoming v3xx series, DrScheme will assume files are encoded using
UTF-8. The only solution to make accents work in both v2xx and v3xx is
to use escape sequences such as \351 (). To convert a string, you can
type normally at the top-level:

> "Je m'crie:".
"Je m'\351crie:"

Another solution would be to backport v3xx's support for multiple
encodings to v2xx and implement a gettext-like facility, but strangely
nobody wanted to do that.

When v3xx will be released and widespread, we will be able to drop
support for v2xx. Then we will have two choices:

- Convert all files to UTF-8;

- Implement a gettext-like facility to read .po and .mo files, and
where you get the message from the english string instead of a symbol,
which is often considered easier for maintainance. v3xx new functions
should allow to implement that without much pain. There is already an
implementation under a mBSD license at
http://www.synthcode.com/scheme/, unfortunately for Gauche Scheme.

There were a discussion to add a Jikes-like behavior where encoding is
guessed before file is actually parsed, but it seems PLT did not like
it. Check "Encodings":
http://list.cs.brown.edu/pipermail/plt-scheme/2004-May/006187.html


---

This is part of SpcialK.
Copyright (C) 2004  Sylvain Beucler
Copyright (C) 2004  Julien Charles
Copyright (C) 2004  Pierre Chtel
Copyright (C) 2004  Cyril Rodas
See the file specialk.tex for copying conditions.
