宝途＆QUOT;得到字节范围和QUOT;返回超过预期字节、范围、宝途、QUOT

2023-09-11 09:46:50 作者：夜の未殇

这是我的第一个问题在这里，因为我是相当新的这个世界！我花了几天试图找出了这一点为自己，但至今仍未能找到任何有用的信息。

This is my first question here as I'm fairly new to this world! I've spent a few days trying to figure this out for myself, but haven't so far been able to find any useful info.

我想找回一个字节的范围从存储在S3中的文件，使用这样的：

I'm trying to retrieve a byte range from a file stored in S3, using something like:

S3Key.get_contents_to_file(tempfile, headers={'Range': 'bytes=0-100000'}

这是我试图从恢复的文件是一个视频文件，具体的MXF。当我要求一个字节的范围内，我回来了临时文件的详细信息比要求。例如，使用一个文件，我要求100,000字节，并取回100451。

The file that I'm trying to restore from is a video file, specifically an MXF. When I request a byte range, I get back more info in the tempfile than requested. For example, using one file, I request 100,000 bytes and get back 100,451.

有一点需要注意的MXF文件是他们合法包含的0x0A（ASCII换行）和0X0D（ASCII回车）。

One thing to note about MXF files is that they legitimately contain 0x0A (ASCII line feed) and 0x0D (ASCII carriage return).

我周围挖，看来，任何时候0D字节是present文件中，检索到的信息增加了0D 0A，而不是仅仅0D，因此出现检索比需要更多的信息。

I had a dig around and it appears that any time a 0D byte is present in the file, the retrieved info adds 0A 0D instead of just 0D, therefore appearing to retrieve more info than required.

举个例子，原始文件包含的十六进制字符串：

As an example, original file contains the Hex string of:

02 03 00 00 00 00 3B 0A 06 0E 2B 34 01 01 01 05

不过，文件下载的形式S3有：

But the file downloaded form S3 has:

02 03 00 00 00 00 3B 0D 0A 06 0E 2B 34 01 01 01 05

我试着调试code，并通过逻辑宝途的工作我的方式，但我是比较新的，在这一点，所以得到很容易丢失。

I've tried to debug the code and work my way through the Boto logic, but I'm relatively new at this, so get lost very easily.

我创造了这个测试，这表明该问题

I created this for testing, which shows the issue

from boto.s3.connection import S3Connection
from boto.s3.connection import Location
from boto.s3.key import Key
import boto
import os


## AWS credentials
AWS_ACCESS_KEY_ID = 'secret key'
AWS_SECRET_ACCESS_KEY = 'access key'

## Bucket name and path to file
bucketName = 'bucket name'
filePath = 'path/to/file.mxf'

#Local temp file to download to
tempFilePath = 'c:/tmp/tempfile'


## Setup the S3 connection and create a Key to access the file specified
## in filePath
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucketName)
S3Key = Key(bucket)
S3Key.key = filePath

def testRangeGet(bytesToRead=100000): # default read of 100K
    tempfile = open(tempFilePath, 'w')
    rangeString = 'bytes=0-' + str(bytesToRead -1)  #create byte range as string
    rangeDict = {'Range': rangeString} # add this to the dictionary
    S3Key.get_contents_to_file(tempfile, headers=rangeDict) # using Boto
    tempfile.close()
    bytesRead = os.path.getsize(tempFilePath)
    print 'Bytes requested = ' + str(bytesToRead)
    print 'Bytes recieved = ' + str(bytesRead)
    print 'Additional bytes = ' + str(bytesRead - bytesToRead)

我想有什么东西在宝途code，它是寻找出某些ASCII转义字符和修改他们，我找不到任何方法来指定，只是把它当作一个二进制文件。

I guess there is something in the Boto code that is looking out for certain ASCII escape characters and modifying them, and I can't find any way to specify to just treat it as a binary file.

有没有人遇到过类似问题，可以共享一个解决的方法？

Has anyone had a similar problem and can share a way around it?

感谢

添

宝途＆QUOT;得到字节范围和QUOT;返回超过预期字节、范围、宝途、QUOT

推荐答案