D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] csv file editor

 

On Sun, 2010-11-21 at 08:07 +0000, tom wrote: 
> On 20/11/10 15:46, Philip Whateley wrote:
> > On Sat, 2010-11-20 at 11:15 +0000, tom wrote:
> >    
> >> On 19/11/10 14:22, Philip Whateley wrote:
> >>      
> >>> Anyone know of a good editor for delimited text data files (csv, tab
> >>> delimited etc.)
> >>>
> >>> I need it for directly creating data files for R. I can export from a
> >>> spreadsheet but that is cumbersome.
> >>>
> >>> I am aware that CSVed - which would be exactly what I am looking for -
> >>> will run under Wine, and I think there is a mode for emacs (although
> >>> only beta), but I'm looking for something native Linux (unlike CSVed)
> >>> and easy to learn (unlike emacs)
> >>>
> >>> I have looked at google-refine, but that won't create files, although it
> >>> looks very good for correcting data errors.
> >>>
> >>> I am happy with either gui or command line.
> >>>
> >>> Many thanks
> >>>
> >>> Phil
> >>>
> >>>
> >>>
> >>>        
> >> Can I inquire as to how the data is generated?
> >> Tom te tom te tom
> >>
> >>      
> > The data is either marketing data from small surveys, or industrial
> > process control data. In either case the data is collected manually and
> > entered from paper reports.
> >
> > There is no option to collect the data electronically at source.
> >
> > The marketing data is a mixture of numeric, categorical and ordinal
> > categorical data. The process control data is mainly numeric. I also
> > analyse data from designed experiments, but that is usually small enough
> > and static enough to enter directly into a data frame in R. Categorical
> > data I usually analyse using a mixture of R and Mondrian.
> >
> > Phil
> >
> >
> >    
> I'll rephrase that: where does it come from? Can it not be processed 
> directly into a format suitable for R by scripts etc.
> If it comes from process control data then it may be just a case of 
> parsing the file carefully, though any post 19th C control data should 
> be made available in PC readable form anyway.
> 99.99% of data need never go near any 'office' style program.
> Tom te tom te tom
> 

Ok. The process control data mainly comes from shop floor control
charts; problem solving tools which are maintained by operators on
paper.

There is also some process control data which is maintained
electronically by the company, but as a consultant I am not allowed to
have access to the company network, so can only receive this data as an
Excel print out, or occasionally .xlsx on a CD. In any case the company
wide electronic process control data is usually useless for problem
solving because of the time lag between the change and its
identification.

The survey type data (from a different organisation) is also received on
paper and entered manually.

In order to get the data into R/Mondrian/gGobbi etc I need to enter it
manually. This could be scripted I guess but the script would be complex
as I need to enter some data by row (for example data points at
particular factor levels) and some data by columns (for example blocks
of factors). The main problems for me are a) needing to edit data
afterwards (adding new columns, etc) and b) ensuring that the data
entered by row lines up with the correct factors entered earlier by
column. Certainly using an office package would be easier and more
efficient than using a script as far as I can see, although a dedicated
csv / tab delimited data editor would be even better.

Many thanks for all suggestions, though.

Phil


-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/listfaq