Re: [LUG] More scripting silliness...

To: list@xxxxxxxxxxxx
Subject: Re: [LUG] More scripting silliness...
From: Jonathan Melhuish <jon@xxxxxxxxxxxxxxxx>
Date: Fri, 4 Jul 2003 16:26:17 +0100
Content-description: clearsigned data
Content-disposition: inline
Reply-to: list@xxxxxxxxxxxx

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 02 July 2003 23:50, Neil Williams wrote:
> On Wednesday 02 Jul 2003 9:19 pm, Jonathan Melhuish wrote:
> > Sorry, that was probably a bit unclear in my original email.  Although it
> > is indeed a "query string", it isn't passed in the normal
> > "?variable=value" way, it's passed as a supposedly 'normal-looking' URL,
> > eg.
> >
> > http://www.smssat.biz/scan/fi=products/sp=results_big_thumb/st=db/co=yes/
> >sf
> > =category/se=OtherReceivers/va=banner_image=/va=banner_text=.html?id=f8Yy
> >QGt r
> >
> > I'm not sure why they decided to do it like that.  I dunno, I didn't
> > design it guv ;-)  But is there anything technically wrong with an URL
> > like that?
>
> Technically wrong? I'd say it was stretching the rules because:
> 1. It uses a non-existent filesystem: It's pretending (if you read the URL
> strictly) that there are 8 sub-directories below the .biz domain whereas
> none probably exist with the names specified (with or without the = ).

Yeah, if you wish to interpret the "/" delimiter as 'directories' then it 
does; but I don't see that that should pose a problem.  The URL is 
interpreted by the server and the correct page is served.  The structure of 
my filesystem is my own business and needn't be visible to the outside world 
at all.  In my mirrored version, it does indeed match the directory structure 
on the disk, but for the dynamic version, the Interchange server does it's 
own translation.

> 2. It uses non-standard repetition: It's imitating a query string and then
> adding a real one (the xx=xx would appear to some form of variable=value
> statement) - repetition that is likely to cause many a parser to barph.

Not really.  You interpreted the bit with slashes in above as a file location, 
and that seems like a fair enough conclusion.  Are you telling me that "=" 
isn't a valid character for a filename?  I had suspected that myself, but I 
can't find any evidence to support it.

> 3. Required filedata is absent: There's no 'real' file anywhere for
> processes like Google to grab onto - I'd presume there's some index.php
> default.asp or similar behind it but it's not stated and therefore must be
> assumed, which is often a bad tactic.

No there isn't an "index.php", the page is dynamically generated by a Perl 
server engine and passed via the "sms.ic" linker program.  No-one "assumes" 
it's presence, nor can I see it's relevance.  You got to the URL, you get the 
page.

> Stretching the letter of the 'rules' but breaking the spirit? Personally, I
> wouldn't like to use an engine that relied on this type of persistence.
>
> I'm not surprised that it doesn't parse well with processes like Google.
>
> Incidentally, the W3C validator site can parse the URL but the engine
> itself responds with some very bad HTML - it uses a HTML4 Transitional
> Doctype (which would usually mean that someone cares about producing valid
> code as a DocType isn't any use to a browser, only a validator engine like
> at W3C) but uses tag attributes removed from HTML4 (marginheight), omits
> required attributes (img alt=""), fails to properly nest tags, omits to
> properly escape entities (& should be replace with &amp;) and puts settings
> in HTML that should be in CSS (img border=0). The validator URL is far too
> long to post here (as it includes the whole URL you quoted plus an extra
> query string for W3C settings). Incidentally, the validator turns the whole
> URL into the hexadecimal characters I mentioned last time. Here's the first
> bit:
> http://validator.w3.org/check?uri=http%3A%2F%2Fwww.smssat.biz%2Fscan%2Ffi%3
>Dproducts%2Fsp
>
> %2F  /
> %3D =

<groans>  I knew I shouldn't have told you the URL ;-)

I've tried to clean the code up a bit, and all of the bits I have changed or 
added should be in reasonably standards-compliant code, but the code that I 
have used from the Interchange "demo store", which is admittedly quite a bit, 
isn't exactly great.  I had mistakenly assumed that any code I used that a 
'pro' had written would be clean and standards compliant :-(

> It would take some time to bring that page to the intended HTML4
> Transitional standard proclaimed at the top of the page returned from that

You're damn right...

> It would take some time longer to get the engine itself to consistently
> produce valid HTML4 Transitional code - and a decent understanding of the
> engine itself too.

I don't really have time to delve into the internals and fix the code that 
produces the HTML, it's bad enough trying to just build stores with the damn 
thing. :-(

> Perhaps this URL format is what we have to put up with if cookies get such
> a bad press. Essentially, the URL appears to be trying to track the current
> transaction(s) and results - exactly what a cookie should do. If a cookie
> was properly designed and used, the entire construct could be replaced and
> you'd have a normal directory and filename after the .biz/ which Google
> would be only too happy to parse. Other engines like this use a server-side
> database to store all this info and a normal query string with the ID=
> setting to retrieve the rest of the data from the server database. (See the
> DCLUG Wiki as an example of database driven persistence). That requires an
> extra step in installation and an extra layer to debug - not always
> appealing but not actually that hard to implement because so many
> components fit neatly within the appropriate public standards.

The "?id=" bit does indeed store a unique customer number, the rest is stored 
in the database.  The rest of the URL is just the search string.  I don't see 
the problem with this approach.

> Is there a different engine available for the job?

It's actually something I've been considering quite carefully, especially 
after having such serious performance problems.  The Interchange user group 
generally maintain that the performance is "satisfactory", so long as your 
hardware is up to it.  Which perhaps it is, but frankly new hardware is not 
an option at the moment, so I'm stuck with a 300Mhz Celeron that's just 
recently been downgraded to 128Mb.

Mind you, I bet you Apache/MySQL could serve a few pages per second off even 
that lowly spec, so I don't see why there should be any excuse for such lame 
performance (<1 request/sec).

OSCommerce in particular looks quite promising, I would be interested to hear 
if anybody has any experience with it.  It will definately be a serious 
contender if I develop another online store, but I'm not sure if I can 
justify the time and expense of completely ditching Interchange and the 
current SMS product database at this late stage.  But it's certainly 
tempting...

Jon
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/BZyZeTVvFHAhe5cRApy6AJ95pxFkVh1wN6t5nwbJsU3nFuX8MwCeIwOu
Pvx5DIk4aaXlFkny6ReI5Tw=
=4e14
-----END PGP SIGNATURE-----

--
The Mailing List for the Devon & Cornwall LUG
Mail majordomo@xxxxxxxxxxxx with "unsubscribe list" in the
message body to unsubscribe.

Re: [LUG] More scripting silliness...
- From: Neil Williams

References:
- [LUG] More scripting silliness...
  - From: Jonathan Melhuish
- Re: [LUG] More scripting silliness...
  - From: Jonathan Melhuish
- Re: [LUG] More scripting silliness...
  - From: Neil Williams

Prev by Date: Re: [LUG] Open Source and EU patent law
Next by Date: Re: [LUG] Spam. What Spam?
Previous by thread: Re: [LUG] More scripting silliness...
Next by thread: Re: [LUG] More scripting silliness...
Index(es):
- Date
- Thread

Lynx friendly