r - Dealing with missing information while converting a list into data frame or data table -
related previous question there way convert list of named elements names duplicated data table na values show in data table in order appear in list?
for example: list
testlist <- list("blue", "405", "truck", "400", "car", "white", "500", "truck") testnames <- c("color", "hp", "type", "hp", "type", "color", "hp", "type") names(testlist) <- testnames $color [1] "blue" $hp [1] "405" $type [1] "truck" $hp [1] "400" $type [1] "car" $color [1] "white" $hp [1] "500" $type [1] "truck" can changed data table using:
dcast(setdt(melt(testlist))[, n:=1:.n, l1], n~l1, value.var='value') but output this:
n color hp type 1 1 blue 405 truck 2 2 white 400 car 3 3 <na> 500 truck when want:
n color hp type 1 1 blue 405 truck 2 2 <na> 400 car 3 3 white 500 truck does have suggestion how tackle problem? appreciate help.
one approach preallocate table correct number of rows , correct number, names, , types of columns, , fill in index-assigning cells covered original list.
cns <- c('color','hp','type'); lcis <- match(names(testlist),cns); lris <- c(1l,cumsum(diff(lcis)<=0l)+1l); df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsasfactors=f)[0,]; df[max(lris),] <- na; df; ## color hp type ## 1 <na> <na> <na> ## 2 <na> <na> <na> ## 3 <na> <na> <na> (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); }; df; ## color hp type ## 1 blue 405 truck ## 2 <na> 400 car ## 3 white 500 truck in solution careful handle each column separately, provides potential benefit if different columns in output table (corresponding different subsets of components in input list) have different data types, data types preserved in final table. why opted for loop index-assignment. of course not necessary exact input list, has character types, thought worthy goal anyway.
explanation of intermediate variables
cnscolumn names in output table.lciscolumn indexes each input list component have in output table. computed matching names of input list components againstcns.lrisrow indexes each input list component have in output table. computation of variable interesting , central solution. since column representation in input list incomplete (iow there can "missing columns" in input list), consider input list components ordered respect row-wise occurrence in output table, can't use regular indexing (such taking every 3 components row), , can't use single column name marker of each row, because column can missing in row. thinking, correct approach identify when lower-index (or equal-index, actually) column occurs after higher-index (or equal-index) column in input list, , take row breaks. hence, can takediff(lcis)<=0llogical vector representing row breaks, takecumsum(), add 1 row indexes, , must manually prepend 1 complete vector.cicolumn index in output table. used duringforloop iterate on each output column.mcomputed eachciwithinforloop. logical vector representing of input list components belong current columnci. used index bothlris(to pull out row indexes assign), , input list (to pull out actual values assign).
actual data
i grabbed real data dropbox , stored testlist. below results of investigations.
first, examined unique component names in order occur, taking them cns:
## first reasonable assumption cns cns <- unique(names(testlist)); cns; ## [1] "status" "make" "model" ## [4] "kilometres" "stock number" "engine" ## [7] "number of hours" "front axle" "rear axle" ## [10] "suspension" "wheelbase" "transmission" ## [13] "price" "style/trim" "brakes" ## [16] "mfg exterior colour" "tires" "engine (hp)" ## [19] "exterior colour" from can compute new tentative lcis:
## examine lcis ordering lcis <- match(names(testlist),cns); lcis; ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 ## [26] 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 ## [51] 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 ## [76] 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 ## [101] 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 ## [126] 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 ## [151] 8 9 10 11 12 13 1 2 3 4 14 13 1 2 3 4 5 6 7 8 9 10 11 12 13 ## [176] 1 2 3 4 5 15 16 6 8 9 10 17 11 18 12 19 13 1 2 3 4 5 15 16 6 ## [201] 8 9 10 17 11 18 12 19 13 looking @ above vector, can see begins many regular repetitions of 1:13. in fact, towards end of vector become irregular, see 14 followed 13, , 16 followed 6, 10-11-12 interleaved 17-18-19, etc.
but 1 important observation can make here vector seems consist of groups delineated 1 , 13. in other words, extents seem have regularity (even if there irregularity), seem begin 1 , end 13. observation agrees comment regarding disorder in middle of vehicle data. let's call 1/13 assumption.
we can clearer view of groups splitting on 1/13 boundary:
## recognizing 1/13 consistency, split on see how each (possible) row looks under assumption split(lcis,cumsum(lcis==1l)); ## $`1` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`2` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`3` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`4` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`5` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`6` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`7` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`8` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`9` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`10` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`11` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`12` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`13` ## [1] 1 2 3 4 14 13 ## ## $`14` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ## ## $`15` ## [1] 1 2 3 4 5 15 16 6 8 9 10 17 11 18 12 19 13 ## ## $`16` ## [1] 1 2 3 4 5 15 16 6 8 9 10 17 11 18 12 19 13 now, if very @ above groups, can figure out possible reorder cns in such way groups ordered ascending. not contiguous, contiguity not required solution devised original problem; necessary ascending order.
for instance, need order column 14 before 13, , need order columns 15 , 16 before 6, 8, 9, etc.:
## recognizing possibility of reordering achieve perfect within-row ascending order, reorder cns cns2 cns2 <- cns[c(1,2,3,4,14,5,15,16,6,7,8,9,10,17,11,18,12,19,13)]; cns2; ## [1] "status" "make" "model" ## [4] "kilometres" "style/trim" "stock number" ## [7] "brakes" "mfg exterior colour" "engine" ## [10] "number of hours" "front axle" "rear axle" ## [13] "suspension" "tires" "wheelbase" ## [16] "engine (hp)" "transmission" "exterior colour" ## [19] "price" now can recalculate lcis, call lcis2, , demonstrate new group orders:
## calculate lcis2 cns2, , prove we've ordered each individual row under 1/13 (now 1/19) break assumption lcis2 <- match(names(testlist),cns2); split(lcis2,cumsum(lcis2==1l)); ## $`1` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`2` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`3` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`4` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`5` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`6` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`7` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`8` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`9` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`10` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`11` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`12` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`13` ## [1] 1 2 3 4 5 19 ## ## $`14` ## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19 ## ## $`15` ## [1] 1 2 3 4 6 7 8 9 11 12 13 14 15 16 17 18 19 ## ## $`16` ## [1] 1 2 3 4 6 7 8 9 11 12 13 14 15 16 17 18 19 and finally, can run entire solution, being careful use 2-suffixed variable names now:
## can apply preallocate/fill-in solution using cns2 , lcis2 ## use lris2 , df2 consistent lris2 <- c(1l,cumsum(diff(lcis2)<=0l)+1l); df2 <- as.data.frame(testlist[match(1:length(cns2),lcis2)],stringsasfactors=f)[0,]; df2[max(lris2),] <- na; df2; ## status make model kilometres style.trim stock.number brakes mfg.exterior.colour engine number.of.hours front.axle rear.axle suspension tires wheelbase engine..hp. transmission exterior.colour price ## 1 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 2 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 3 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 4 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 5 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 6 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 7 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 8 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 9 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 10 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 11 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 12 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 13 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 14 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 15 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> ## 16 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> (ci in 1:length(cns2)) { m <- lcis2==ci; df2[lris2[m],ci] <- do.call(c,testlist[m]); }; df2; ## status make model kilometres style.trim stock.number brakes mfg.exterior.colour engine number.of.hours front.axle rear.axle suspension tires wheelbase engine..hp. transmission exterior.colour price ## 1 new peterbilt 367 tri-drive c/w 58'' sleeper 3,360 km <na> 12949 <na> <na> cummins isx15 (550 hp) 44 dana spicer d2000 (20,000lb) dana t69-170 (wide track) t peterbilt air-trak (66,000lb) <na> 267'' <na> rtlo18918b fuller (18 speed) <na> $217,770 ## 2 new kenworth t800 t/a tractor 82,230 km <na> 10720 <na> <na> cummins isx15 (550hp) 2,712 dana spicer d2000 (20,000 lb) dana d46-170hpw (46,000 lb) ta neway adz252 (52,000lb) air <na> 244'' <na> fuller 18 spd main at1202 2 sp <na> $199,500 ## 3 new kenworth t800 tandem tractor w/ 38'' sleeper 98,521 km <na> 10722 <na> <na> cummins isx15 (550hp) 2,790 dana spicer d2000 (20,000 lb) dana d46-170hpw (46,000 lb) ta neway adz252 (52,000lb) air <na> 244'' <na> fuller 18 spd main at1202 2 sp <na> $199,500 ## 4 used kenworth w900 tri-drive sleeper truck tractor 170,422 km <na> 13227 <na> <na> cummins isx15 (600 hp) 4,925 meritor fl941 (20,000 lb) meritor rz-166 (69,000 lb) kenworth ag690 (69,000lb) air <na> 259'' <na> 18 speed main & 4 speed au <na> $197,750 ## 5 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,367 km <na> 12180 <na> <na> cummins isx15 (550hp) 38 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) <na> 244'' <na> rtlo18918b fuller (18 speed) <na> $193,300 ## 6 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,421 km <na> 12179 <na> <na> cummins isx15 (550hp) 46 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) <na> 244'' <na> rtlo18918b fuller (18 speed) <na> $193,300 ## 7 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 2,157 km <na> 12181 <na> <na> cummins isx15 (550hp) 64 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) <na> 244'' <na> rtlo18918b fuller (18 speed) <na> $189,880 ## 8 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,444 km <na> 12954 <na> <na> cummins isx15 (550hp) 45 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) <na> 244'' <na> rtlo18918b fuller (18 speed) <na> $189,880 ## 9 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,427 km <na> 12955 <na> <na> cummins isx15 (550hp) 43 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) <na> 244'' <na> rtlo18918b fuller (18 speed) <na> $189,880 ## 10 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,982 km <na> 12182 <na> <na> cummins isx15 (550hp) 78 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) <na> 244'' <na> rtlo18918b fuller (18 speed) <na> $189,880 ## 11 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 23,293 km <na> 12953 <na> <na> cummins isx15 (550hp) 394 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) <na> 244'' <na> rtlo18918b fuller (18 speed) <na> $189,880 ## 12 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 27,215 km <na> 12509 <na> <na> cummins isx15 (550hp) 458 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) <na> 244'' <na> rtlo18918b fuller (18 speed) <na> $186,600 ## 13 used volvo vnl64t 780-730 72,000 km vnl64t780-730 <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> <na> $185,000 ## 14 new peterbilt 367 t/a wet kit tractor c/w 58'' sleeper 60,657 km <na> 10838 <na> <na> cummins isx15 (550hp) 1,822 dana spicer e14621 (14,600 lb dana d46-170hp (46,000lb) tand peterbilt air-trak (46,000lb) <na> 244'' <na> rtlo18918b fuller (18 speed) <na> $171,800 ## 15 used international prostar +122 36,236 km <na> 463555 air white cummins isx <na> arvin meritor 13200 lb arvin meritor 40000 lb int'l iros 11r22.5 228 in 450 eaton fuller d/o (18 spd) white $168,750 ## 16 used international prostar +122 33,000 km <na> 463543 air white cummins isx <na> arvin meritor 13200 lb arvin meritor 46000 lb int'l iros 11r/22.5 236 in 475 eaton fuller d/o (18 spd) white $165,900 now, realized might preferable move away entirely "ascending-order assumption" (let's call it) 1/13 assumption, can changing lris calculation. absolve of need reorder cns order receive unique() call.
below demonstrate this, reverting unsuffixed variable names, useful, seen in moment:
## change lris calculation depend directly on 1/13 assumption; don't bother reordering cns <- unique(names(testlist)); lcis <- match(names(testlist),cns); lris <- c(1l,cumsum(lcis[-1]==1l)+1l); df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsasfactors=f)[0,]; df[max(lris),] <- na; (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); }; df; ## status make model kilometres stock.number engine number.of.hours front.axle rear.axle suspension wheelbase transmission price style.trim brakes mfg.exterior.colour tires engine..hp. exterior.colour ## 1 new peterbilt 367 tri-drive c/w 58'' sleeper 3,360 km 12949 cummins isx15 (550 hp) 44 dana spicer d2000 (20,000lb) dana t69-170 (wide track) t peterbilt air-trak (66,000lb) 267'' rtlo18918b fuller (18 speed) $217,770 <na> <na> <na> <na> <na> <na> ## 2 new kenworth t800 t/a tractor 82,230 km 10720 cummins isx15 (550hp) 2,712 dana spicer d2000 (20,000 lb) dana d46-170hpw (46,000 lb) ta neway adz252 (52,000lb) air 244'' fuller 18 spd main at1202 2 sp $199,500 <na> <na> <na> <na> <na> <na> ## 3 new kenworth t800 tandem tractor w/ 38'' sleeper 98,521 km 10722 cummins isx15 (550hp) 2,790 dana spicer d2000 (20,000 lb) dana d46-170hpw (46,000 lb) ta neway adz252 (52,000lb) air 244'' fuller 18 spd main at1202 2 sp $199,500 <na> <na> <na> <na> <na> <na> ## 4 used kenworth w900 tri-drive sleeper truck tractor 170,422 km 13227 cummins isx15 (600 hp) 4,925 meritor fl941 (20,000 lb) meritor rz-166 (69,000 lb) kenworth ag690 (69,000lb) air 259'' 18 speed main & 4 speed au $197,750 <na> <na> <na> <na> <na> <na> ## 5 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,367 km 12180 cummins isx15 (550hp) 38 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) 244'' rtlo18918b fuller (18 speed) $193,300 <na> <na> <na> <na> <na> <na> ## 6 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,421 km 12179 cummins isx15 (550hp) 46 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) 244'' rtlo18918b fuller (18 speed) $193,300 <na> <na> <na> <na> <na> <na> ## 7 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 2,157 km 12181 cummins isx15 (550hp) 64 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) 244'' rtlo18918b fuller (18 speed) $189,880 <na> <na> <na> <na> <na> <na> ## 8 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,444 km 12954 cummins isx15 (550hp) 45 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) 244'' rtlo18918b fuller (18 speed) $189,880 <na> <na> <na> <na> <na> <na> ## 9 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,427 km 12955 cummins isx15 (550hp) 43 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) 244'' rtlo18918b fuller (18 speed) $189,880 <na> <na> <na> <na> <na> <na> ## 10 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 3,982 km 12182 cummins isx15 (550hp) 78 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) 244'' rtlo18918b fuller (18 speed) $189,880 <na> <na> <na> <na> <na> <na> ## 11 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 23,293 km 12953 cummins isx15 (550hp) 394 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) 244'' rtlo18918b fuller (18 speed) $189,880 <na> <na> <na> <na> <na> <na> ## 12 new peterbilt 367 t/a wet-kit tractor c/w 58'' sleeper 27,215 km 12509 cummins isx15 (550hp) 458 dana spicer e14621 (14,600 lb dana d46-170 (46,000lb) ta peterbilt air-trak (46,000lb) 244'' rtlo18918b fuller (18 speed) $186,600 <na> <na> <na> <na> <na> <na> ## 13 used volvo vnl64t 780-730 72,000 km <na> <na> <na> <na> <na> <na> <na> <na> $185,000 vnl64t780-730 <na> <na> <na> <na> <na> ## 14 new peterbilt 367 t/a wet kit tractor c/w 58'' sleeper 60,657 km 10838 cummins isx15 (550hp) 1,822 dana spicer e14621 (14,600 lb dana d46-170hp (46,000lb) tand peterbilt air-trak (46,000lb) 244'' rtlo18918b fuller (18 speed) $171,800 <na> <na> <na> <na> <na> <na> ## 15 used international prostar +122 36,236 km 463555 cummins isx <na> arvin meritor 13200 lb arvin meritor 40000 lb int'l iros 228 in eaton fuller d/o (18 spd) $168,750 <na> air white 11r22.5 450 white ## 16 used international prostar +122 33,000 km 463543 cummins isx <na> arvin meritor 13200 lb arvin meritor 46000 lb int'l iros 236 in eaton fuller d/o (18 spd) $165,900 <na> air white 11r/22.5 475 white as can see, column order of df different df2, can prove data identical following:
## prove df2 , df identical, ignoring column order difference identical(df,df2[names(df)]); ## [1] true
Comments
Post a Comment