D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] csv file editor

 

On 21/11/10 11:20, Philip Whateley wrote:
On Sun, 2010-11-21 at 08:07 +0000, tom wrote:
On 20/11/10 15:46, Philip Whateley wrote:
On Sat, 2010-11-20 at 11:15 +0000, tom wrote:

On 19/11/10 14:22, Philip Whateley wrote:

Anyone know of a good editor for delimited text data files (csv, tab
delimited etc.)

I need it for directly creating data files for R. I can export from a
spreadsheet but that is cumbersome.

I am aware that CSVed - which would be exactly what I am looking for -
will run under Wine, and I think there is a mode for emacs (although
only beta), but I'm looking for something native Linux (unlike CSVed)
and easy to learn (unlike emacs)

I have looked at google-refine, but that won't create files, although it
looks very good for correcting data errors.

I am happy with either gui or command line.

Many thanks

Phil




Can I inquire as to how the data is generated?
Tom te tom te tom


The data is either marketing data from small surveys, or industrial
process control data. In either case the data is collected manually and
entered from paper reports.

There is no option to collect the data electronically at source.

The marketing data is a mixture of numeric, categorical and ordinal
categorical data. The process control data is mainly numeric. I also
analyse data from designed experiments, but that is usually small enough
and static enough to enter directly into a data frame in R. Categorical
data I usually analyse using a mixture of R and Mondrian.

Phil



I'll rephrase that: where does it come from? Can it not be processed
directly into a format suitable for R by scripts etc.
If it comes from process control data then it may be just a case of
parsing the file carefully, though any post 19th C control data should
be made available in PC readable form anyway.
99.99% of data need never go near any 'office' style program.
Tom te tom te tom

Ok. The process control data mainly comes from shop floor control
charts; problem solving tools which are maintained by operators on
paper.

There is also some process control data which is maintained
electronically by the company, but as a consultant I am not allowed to
have access to the company network, so can only receive this data as an
Excel print out, or occasionally .xlsx on a CD. In any case the company
wide electronic process control data is usually useless for problem
solving because of the time lag between the change and its
identification.

The survey type data (from a different organisation) is also received on
paper and entered manually.

In order to get the data into R/Mondrian/gGobbi etc I need to enter it
manually. This could be scripted I guess but the script would be complex
as I need to enter some data by row (for example data points at
particular factor levels) and some data by columns (for example blocks
of factors). The main problems for me are a) needing to edit data
afterwards (adding new columns, etc) and b) ensuring that the data
entered by row lines up with the correct factors entered earlier by
column. Certainly using an office package would be easier and more
efficient than using a script as far as I can see, although a dedicated
csv / tab delimited data editor would be even better.

Many thanks for all suggestions, though.

Phil


Its sad that you have to key from hard copy produced originally from electronic data, I really thought the days of re-keying data were gone, shows how much I know. Could you try OCR scanning to at least get the data back into some form of electronic data as quickly as possible (although OCR is not "fool proof") ?

Is the data accumulative or does it change radically? If its accumulative I would go with storing it in a DB of some sort and mining the accumulative differences. Also for accumulative data you could run a diff over the previous and current "batch" of data, this would highlight the changes so that you could concentrate on them.

If you can build the "intelligence" scripting would be your best option. Script once use many. From what you are describing perl would probably eat it, but perl is not the easiest tool to get on with, (coming from someone who has used it on a less than frequent basis for the last 15 years).

Of course it all sounds easy as a strictly hands off bystander, but if the processes you are talking about have a long time span (ie something you will be doing for the rest of your working life) then I would invest in some scripts.

Having said all that it would be really nice to see a good csv / xml editor in open source. Ah if only my programming skills were 100% better than they are now :-(

Tom.


--
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/listfaq