[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]
On Sun, 2010-11-21 at 08:07 +0000, tom wrote: > On 20/11/10 15:46, Philip Whateley wrote: > > On Sat, 2010-11-20 at 11:15 +0000, tom wrote: > > > >> On 19/11/10 14:22, Philip Whateley wrote: > >> > >>> Anyone know of a good editor for delimited text data files (csv, tab > >>> delimited etc.) > >>> > >>> I need it for directly creating data files for R. I can export from a > >>> spreadsheet but that is cumbersome. > >>> > >>> I am aware that CSVed - which would be exactly what I am looking for - > >>> will run under Wine, and I think there is a mode for emacs (although > >>> only beta), but I'm looking for something native Linux (unlike CSVed) > >>> and easy to learn (unlike emacs) > >>> > >>> I have looked at google-refine, but that won't create files, although it > >>> looks very good for correcting data errors. > >>> > >>> I am happy with either gui or command line. > >>> > >>> Many thanks > >>> > >>> Phil > >>> > >>> > >>> > >>> > >> Can I inquire as to how the data is generated? > >> Tom te tom te tom > >> > >> > > The data is either marketing data from small surveys, or industrial > > process control data. In either case the data is collected manually and > > entered from paper reports. > > > > There is no option to collect the data electronically at source. > > > > The marketing data is a mixture of numeric, categorical and ordinal > > categorical data. The process control data is mainly numeric. I also > > analyse data from designed experiments, but that is usually small enough > > and static enough to enter directly into a data frame in R. Categorical > > data I usually analyse using a mixture of R and Mondrian. > > > > Phil > > > > > > > I'll rephrase that: where does it come from? Can it not be processed > directly into a format suitable for R by scripts etc. > If it comes from process control data then it may be just a case of > parsing the file carefully, though any post 19th C control data should > be made available in PC readable form anyway. > 99.99% of data need never go near any 'office' style program. > Tom te tom te tom > Ok. The process control data mainly comes from shop floor control charts; problem solving tools which are maintained by operators on paper. There is also some process control data which is maintained electronically by the company, but as a consultant I am not allowed to have access to the company network, so can only receive this data as an Excel print out, or occasionally .xlsx on a CD. In any case the company wide electronic process control data is usually useless for problem solving because of the time lag between the change and its identification. The survey type data (from a different organisation) is also received on paper and entered manually. In order to get the data into R/Mondrian/gGobbi etc I need to enter it manually. This could be scripted I guess but the script would be complex as I need to enter some data by row (for example data points at particular factor levels) and some data by columns (for example blocks of factors). The main problems for me are a) needing to edit data afterwards (adding new columns, etc) and b) ensuring that the data entered by row lines up with the correct factors entered earlier by column. Certainly using an office package would be easier and more efficient than using a script as far as I can see, although a dedicated csv / tab delimited data editor would be even better. Many thanks for all suggestions, though. Phil -- The Mailing List for the Devon & Cornwall LUG http://mailman.dclug.org.uk/listinfo/list FAQ: http://www.dcglug.org.uk/listfaq