miyohideの日記

技術的なメモなどを記しています

2023年5月28日

Rubyでヘッダが複数行あるCSVの読み込み

こんな感じでCSVデータに複数行のヘッダがある場合。

"FormatVersion","v1.0"
"Disclaimer","This pricing list is for informational purposes only. All prices are subject to the additional terms included in the pricing pages on http://aws.amazon.com. All Free Tier prices are also subject to the terms included at https://aws.amazon.com/free/"
"Publication Date","2023-05-27T14:12:36Z"
"Version","20230527141236"
"OfferCode","AmazonEC2"
"SKU","OfferTermCode","RateCode","TermType","PriceDescription","EffectiveDate","StartingRange","EndingRange","Unit","PricePerUnit","Currency","RelatedTo","LeaseContractLength","PurchaseOption","OfferingClass","Product Family","serviceCode","Location","Location Type","Instance Type","Current Generation","Instance Family","vCPU","Physical Processor","Clock Speed","Memory","Storage","Network Performance","Processor Architecture","Storage Media","Volume Type","Max Volume Size","Max IOPS/volume","Max IOPS Burst Performance","Max throughput/volume","Provisioned","Tenancy","EBS Optimized","Operating System","License Model","Group","Group Description","Transfer Type","From Location","From Location Type","To Location","To Location Type","usageType","operation","AvailabilityZone","CapacityStatus","ClassicNetworkingSupport","Dedicated EBS Throughput","ECU","Elastic Graphics Type","Enhanced Networking Supported","From Region Code","GPU","GPU Memory","Instance","Instance Capacity - 10xlarge","Instance Capacity - 12xlarge","Instance Capacity - 16xlarge","Instance Capacity - 18xlarge","Instance Capacity - 24xlarge","Instance Capacity - 2xlarge","Instance Capacity - 32xlarge","Instance Capacity - 4xlarge","Instance Capacity - 8xlarge","Instance Capacity - 9xlarge","Instance Capacity - large","Instance Capacity - medium","Instance Capacity - metal","Instance Capacity - xlarge","instanceSKU","Intel AVX2 Available","Intel AVX Available","Intel Turbo Available","MarketOption","Normalization Size Factor","Physical Cores","Pre Installed S/W","Processor Features","Product Type","Region Code","Resource Type","serviceName","SnapshotArchiveFeeType","To Region Code","Volume API Name","VPCNetworkingSupport"

これをRubyCSVモジュールでparseしたい時、ヘッダをどのように認識させるかについて色々と悪戦苦闘。

とりあえず、こんな感じにした。

f = File.open('ファイル名')
5.times { f.readline }  # ヘッダ前の読み飛ばし
CSV.new(f, headers: true).each do |row|
  # rowの処理
end